Constraint Generation and the Commitment to Evidence
1. Context
The Semantic Web's promise was machine-readable knowledge: ontologies that computers could reason over, inferring new facts from existing ones. By the mid-2000s, a significant amount of OWL — the Web Ontology Language — had been published. Researchers at UMBC's Swoogle project had cataloged hundreds of thousands of ontology documents.
A systematic survey of those documents revealed a structural problem: approximately 75% of OWL object properties lacked domain and range constraints. A domain constraint tells a reasoner what type of thing can be the subject of a relationship. A range constraint tells it what type of thing can be the object. Without them, automated inference was severely limited — the reasoner could not confidently draw conclusions that depended on knowing what kinds of things a relationship could connect.
The gap between what the Semantic Web promised and what it could actually deliver was largely a data quality problem at scale.
2. The Problem
The easy response to missing constraints is to assume them: if a property appears frequently with instances of a particular class, infer that class as the domain. This is pragmatic. It is also epistemically dishonest — it introduces assumptions as if they were facts, with no mechanism for tracking whether they are correct, no way to retract them if new evidence contradicts them, and no way to distinguish confident inferences from speculative ones.
The research question was harder: can we generate constraints from evidence, in a way that is both useful for reasoning and honest about the limits of what the evidence supports?
3. Why Existing Thinking Failed
The standard approaches to ontology development treated constraint authorship as a human task: experts wrote constraints because they understood the domain. This scaled poorly. The number of OWL ontologies in practical use vastly exceeded what could be manually curated.
Automated approaches existed but tended toward one of two failure modes. The first was over-commitment: generate the most specific plausible constraint and assert it as definite. The second was under-commitment: generate nothing because certainty was unavailable. Neither was useful for a reasoner that needed to act on incomplete knowledge without being misled by false confidence.
The deeper problem was monotonicity. Classical logic reasoning is monotonic: conclusions, once derived, hold forever. Real-world ontologies are not monotonic — new evidence changes what was previously believed. Any constraint generation system that did not account for revision under new evidence would produce a reasoner whose confidence outran its evidence base.
4. My Approach
Build constraint generation algorithms that produce useful constraints from empirical evidence while remaining honest about their epistemic status.
Three algorithms emerged from the work:
Disjunction-based generation derived constraints from the observed types of relationship instances — if a property's subjects consistently belong to a small set of classes, generate a disjunctive domain constraint from that set.
Least Common Named Subsumer (LCNS) generalized across instance observations to find the most specific named class that covered the observed domain members — producing constraints that were as tight as the evidence supported without overclaiming.
Vivification balanced specificity against reasoner performance: highly disjunctive constraints are accurate but computationally expensive to reason over; vivification identified simpler approximations that retained most of the inferential value with lower overhead.
All three operated on empirical data: the Swoogle corpus of 200,000+ real OWL ontologies, not toy examples. The evaluation was grounded in what actually existed.
Limited default reasoning with tracking and contraction addressed non-monotonicity directly: generated constraints were tagged as defaults — defeasible, revisable conclusions rather than permanent assertions. When new evidence contradicted a default constraint, the reasoner could retract it without cascading inconsistencies.
5. Organizational and Academic Challenges
Working at the intersection of Semantic Web research and practical knowledge engineering required operating across communities with different standards, vocabularies, and success criteria. The knowledge representation community cared about decidability and formal properties. The Semantic Web engineering community cared about practicality and scalability. The machine learning community cared about empirical benchmarks.
The dissertation had to be defensible in all three registers simultaneously — formally sound, practically motivated, and empirically evaluated.
Dr. Yun Peng's mentorship at UMBC shaped not just the dissertation but the model of what good research supervision looked like — a debt explicitly acknowledged in the dissertation and consciously repaid through the supervision of undergraduate researchers in the following decade.
6. Outcome
The dissertation contributed three constraint generation algorithms with documented performance on real-world OWL corpora, a limited default reasoning framework that preserved decidability while handling non-monotonic revision, and a formal treatment of constraint quality that gave practitioners a principled basis for choosing among generated constraints.
Published output included a PACISE 2015 paper on OWL property constraints. The open-source MEX-SVM tools released concurrently built on the same principle: make the computational work inspectable and transferable, not just functional.
The more lasting outcome was methodological: a way of approaching knowledge representation problems that took epistemic honesty as a design constraint, not an afterthought.
7. Lessons That Generalize
Evidence before claims. Asserting a constraint without empirical grounding is not reasoning — it is wishful thinking encoded as fact. The same principle applies to software architecture claims, system performance assertions, and organizational strategy. The discipline of grounding beliefs in observable evidence is not just good research practice; it is the foundation of trustworthy engineering.
Honesty about limitations is a system property, not an admission of failure. A reasoner that knows when it is speculating and can retract conclusions when evidence changes is more reliable than one that asserts everything with uniform confidence. Engineering systems that cannot acknowledge their own uncertainty are dangerous. Systems that can are adaptable.
Completeness is often the wrong goal. The temptation in knowledge engineering — and in many engineering domains — is to fill every gap, resolve every ambiguity, and produce a system with no unknown unknowns. This is epistemically overconfident and practically expensive. The right goal is to be accurate where accuracy matters and honest about gaps where they exist.
The cost of intellectual honesty compounds positively. The six-year, self-funded Ph.D. was expensive in the short run. The method of thinking it produced — empirically grounded, formally sound, non-dogmatic — has compounded across every subsequent research, teaching, and engineering decision. Intellectual shortcuts produce intellectual debt. The interest rate is high.
Related: Principle 8 — Truth Matters · Mental Models — Evidence Before Claims · Lessons Learned — The Cost of Intellectual Shortcuts Compounds