Artificial Intelligence
AI is changing engineering economics more than engineering itself. The judgment required to use it well is the same judgment that was always required. The consequences of using it badly are higher.
The Working Thesis
AI does not change what good engineering is.
It changes the rate at which ideas can be explored, the cost of generating artifacts, and — critically — the distribution of where the limiting constraint sits.
For most of software history, the constraint was generation. Writing code was slow. Writing correct code was slower. Writing documented, tested, maintainable code was slower still. AI is compressing that constraint dramatically. Generation is becoming cheap.
What does not compress as easily: judgment. Understanding whether what was generated is correct. Evaluating whether it is appropriate for the context. Determining whether it is safe. Verifying that it satisfies the requirements it was built to satisfy.
The constraint shifts from generation to verification.
This is the central observation that drives how I think about AI in engineering contexts. It has implications for tooling, governance, organizational design, and what skills matter most.
→ See also: Engineering — Why Verification Is Becoming the Bottleneck
What AI Changes
Economics, before architecture. AI tools change the cost structure of software development before they change what software development looks like. Generation is cheap. The same governance, review, and validation processes that were designed for expensive generation are now applied to cheap generation — which means they are the bottleneck.
The rate of exploration. Ideas that would have required weeks to prototype can be explored in hours. This is genuinely valuable. The organizations that benefit most are those with the judgment to evaluate what they are seeing — to distinguish a promising result from a plausible-looking failure.
The surface area of responsibility. When AI generates code, designs, or analysis, the person who reviews and accepts that output is responsible for its consequences. The surface area of engineering responsibility does not shrink because a model generated the artifact. It may expand, because the volume of artifacts increases.
The signal-to-noise ratio of generated output. AI systems produce confident output across a wide range of quality. Human output is usually calibrated — engineers are less confident when they are less sure. AI output does not carry that calibration signal. Evaluating it requires active skepticism that is not required when evaluating the work of a colleague who has demonstrated their own judgment.
Interpretability as a Design Requirement
AI systems that produce accurate outputs but resist explanation are less valuable than systems that are slightly less accurate but whose behavior practitioners can understand, challenge, and improve.
This is not a philosophical preference. It is an engineering requirement in any domain where:
- Practitioners must make decisions based on model output
- Regulators must evaluate whether the system behaves appropriately
- Failures must be diagnosed and corrected
- Accountability must be assigned when something goes wrong
An opaque model that is 95% accurate is less useful than a transparent model that is 91% accurate, if the 4% difference is smaller than the cost of operating without understanding.
The composite kernel research (AAAI-05) was motivated by exactly this problem: practitioners were choosing SVM kernels by trial and error because the models were opaque. The evolutionary search produced not just better kernels but human-readable formulas that practitioners could inspect, interpret, and apply to adjacent problems. The accuracy improvement was secondary. The interpretability was the primary contribution.
In IEC 62304 and FDA-regulated environments, interpretability is not optional. AI systems in software as a medical device must produce evidence of their behavior, their training, their validation, and their limitations. A system whose outputs cannot be explained cannot be regulated. A system that cannot be regulated cannot be deployed in clinical contexts.
→ See also: Interpretable ML and Composite Kernels · Principle 5 — AI Amplifies Judgment, It Does Not Replace It
Governance Is Not a Constraint on AI — It Is What Makes AI Trustworthy
The reflex in technology adoption is to treat governance as drag.
This reflex is understandable and usually wrong.
Governance designed around AI is not a brake on AI adoption. It is the mechanism by which AI becomes trustworthy — which is the condition for sustainable AI adoption at scale.
An organization that deploys AI without governance may move faster in the short term. It will pay for that speed in:
- Defects that are hard to diagnose because the system's behavior was never characterized
- Regulatory findings that require remediation of work already done
- Loss of trust from stakeholders who encounter failures that were not anticipated
- Inability to reproduce results, which makes improvement impossible
The right frame is not "governance versus speed." It is "episodic governance versus continuous governance."
Episodic governance — document at the end, review before release — becomes increasingly expensive as generation accelerates. The artifact volume grows. The review burden grows. The distance between when decisions were made and when they are reviewed grows.
Continuous governance — evidence generated during work, traceability built into the workflow, reviews triggered by defined conditions — scales with AI-assisted development in a way that episodic governance does not.
→ See also: Platform Strategy in Regulated Environments · Principle 4 — Governance Should Accelerate Engineering
AI in Regulated Environments
Deploying AI in IEC 62304, FDA 510(k), and similar regulatory environments is not primarily a technical problem. It is an evidence problem.
Regulators do not need to know how the model works at the mathematical level. They need to know:
- What was the training data? Was it representative of the deployment context?
- What was the validation methodology? Were test sets properly isolated from training?
- What are the failure modes? How were they characterized?
- What is the performance envelope? Under what conditions does the system degrade?
- How is the system monitored post-deployment? What triggers revalidation?
These questions require documentation disciplines that most AI development practices do not naturally produce. They require treating the model's development lifecycle as a design process — with requirements, design controls, verification, and validation — rather than as an empirical search process.
The organizations that will successfully deploy AI in regulated environments are not the ones with the most sophisticated models. They are the ones that can produce complete, credible, auditable evidence of how those models were built, tested, and validated.
This is an organizational capability, not a technical one. It requires the same cross-functional coordination — software, quality, regulatory, clinical — that any other regulated development activity requires. AI is not exempt from the rigor that regulated environments demand. If anything, the opacity of many AI systems makes that rigor more important, not less.
The Human Judgment That Remains
AI raises the rate at which ideas can be explored. It does not raise the rate at which good ideas can be distinguished from bad ones.
The judgment that remains distinctively human:
Contextual appropriateness. Does this solution fit this situation? The model generates something that worked before. The engineer determines whether it applies here.
Risk calibration. How wrong could this be, and what are the consequences? The model cannot answer this question. The engineer must.
Epistemic honesty. Where is the analysis confident, and where is it speculating? The model presents output with uniform confidence. The engineer must recognize where the confidence is unwarranted.
Verification strategy. What would it look like if this were wrong? How would we know? Designing a test that can actually falsify a claim is a human capability that AI tools assist but do not replace.
The decision to act. Ultimately, someone must commit to a course of action and accept responsibility for the outcome. AI informs that decision. It does not make it.
The composite kernel work illustrated this: the evolutionary algorithm explored a vast space of kernel combinations and reliably converged on structures that a practitioner could inspect. The algorithm did not eliminate the practitioner's role. It gave the practitioner something concrete to reason about. That is the correct relationship between AI and engineering judgment.
→ See also: Interpretable ML and Composite Kernels · Principle 5 — AI Amplifies Judgment, It Does Not Replace It
Agentic Engineering
Agentic AI — systems that plan and execute sequences of actions to accomplish goals — changes the unit of AI interaction from request-response to extended autonomous operation.
This is a genuine capability shift. Tasks that previously required continuous human direction can be delegated. An agent can explore a codebase, identify a defect, propose a fix, run tests, and report results without step-by-step human instruction.
This shift creates new engineering responsibilities:
Specifying intent, not procedure. Human direction of agentic systems requires articulating what success looks like and what constraints apply — not a step-by-step procedure. This is a harder cognitive task than procedure specification, not an easier one.
Scope management. Agents that operate autonomously can take actions with consequences that extend beyond the intended scope. The boundaries of agentic authority — what the agent can do, what it must ask before doing, what it cannot do — must be explicitly defined.
Output verification at scale. An agent that produces output faster than a human can verify creates a backlog of unverified artifacts. The verification challenge identified above becomes acute in agentic contexts.
Auditability. What did the agent do, and why? In regulated environments, this question has a mandatory answer. Agentic systems must maintain auditable records of their reasoning and actions — not just their outputs.
The early adopter dynamic applies here as well. Organizations that build the governance infrastructure for agentic engineering before they deploy agentic engineering will adapt to it. Organizations that wait until governance is demanded will scramble.
AI as Organizational Capability
The question "what can we do with AI" is less useful than "what kind of organization do we need to be to use AI well."
AI capability is not primarily a software purchase. It is an organizational capability built from:
- Engineers who can evaluate generated output critically
- Governance processes designed for continuous evidence generation
- Leadership that understands AI well enough to make investment decisions
- Quality systems that treat AI-generated artifacts with appropriate scrutiny
- Regulatory pathways that have been navigated before they are urgently needed
The organizations that will lead in AI adoption are not the ones that move fastest. They are the ones that move most deliberately — that build the human infrastructure for AI use at the same pace they build the technical infrastructure, and that learn faster from their AI deployment experience than their competitors do.
AI does not eliminate the need for engineering judgment. It increases the return on engineering judgment that already exists.
→ See also: Pattern 15 — The Future Belongs to Organizations That Learn Faster
→ Related: Engineering · Platform · Patterns