Artificial Intelligence

AI is changing engineering economics more than engineering itself. The judgment required to use it well is the same judgment that was always required. The consequences of using it badly are higher.

A Long View

I have been working with AI for nearly thirty-five years. I did the research. I taught the courses. I built the systems and briefed the people who had to act on them.

AI methods research tends to concentrate on a familiar set of things: model worlds — Bayesian networks, knowledge representation, probabilistic reasoning; state search — the classical algorithms from Russell and Norvig; accuracy shootouts on UCI benchmark datasets, one paper edging past the last by a point or two. I have worked in all of these traditions and taught them for years. They are real contributions. They are not, by themselves, what gets us to adoption.

Adoption requires acceptance and understanding. It requires social utility — AI that helps people make better decisions in contexts they actually inhabit. Research optimizes for novelty and benchmark performance. Deployment requires trust, legibility, and the ability to interact with a model and reason about what it is telling you. Progress in the first does not automatically transfer to the second.

We are living the consequences of that gap. We are in a FUD cycle — fear, uncertainty, and doubt — and people are reaching for pitchforks and torches. That reaction is not irrational. When AI is deployed as a black box with benchmark slides and no explanation, stakeholders have no basis for confidence. When deployment outruns understanding, backlash follows.

Three traditions have shaped how I think about this divide:

Model worlds — Bayesian belief networks, probabilistic reasoning, knowledge representation. Systems that encode domain structure explicitly and compute consequences from evidence. My master's thesis work was in this tradition: building tools to manage multiply sectioned belief networks.

State search — heuristic search, game trees, constraint satisfaction, logic. I taught these for years using Russell and Norvig's framework. They teach you how to think about problems formally — how to search a well-defined space for an optimal or satisficing answer.

Practical application — taking any of the above and making it useful to someone who needs to decide something today, in an organization with real constraints and stakeholders who have never read a paper on your method.

The first two produce capability. The third produces value. Translation between them is a different skill than invention — and it is the skill the field has consistently underinvested in.

The Working Thesis

AI does not change what good engineering is.

It changes the rate at which ideas can be explored, the cost of generating artifacts, and — critically — the distribution of where the limiting constraint sits.

For most of software history, the constraint was generation. Writing code was slow. Writing correct code was slower. Writing documented, tested, maintainable code was slower still. AI is compressing that constraint dramatically. Generation is becoming cheap.

What does not compress as easily: judgment. Understanding whether what was generated is correct. Evaluating whether it is appropriate for the context. Determining whether it is safe. Verifying that it satisfies the requirements it was built to satisfy.

The constraint shifts from generation to verification.

This is the central observation that drives how I think about AI in engineering contexts. It has implications for tooling, governance, organizational design, and what skills matter most.

What AI Changes

Economics, before architecture. AI tools change the cost structure of software development before they change what software development looks like. Generation is cheap. The same governance, review, and validation processes that were designed for expensive generation are now applied to cheap generation — which means they are the bottleneck.

The rate of exploration. Ideas that would have required weeks to prototype can be explored in hours. This is genuinely valuable. The organizations that benefit most are those with the judgment to evaluate what they are seeing — to distinguish a promising result from a plausible-looking failure.

The surface area of responsibility. When AI generates code, designs, or analysis, the person who reviews and accepts that output is responsible for its consequences. The surface area of engineering responsibility does not shrink because a model generated the artifact. It may expand, because the volume of artifacts increases.

The signal-to-noise ratio of generated output. AI systems produce confident output across a wide range of quality. Human output is usually calibrated — engineers are less confident when they are less sure. AI output does not carry that calibration signal. Evaluating it requires active skepticism that is not required when evaluating the work of a colleague who has demonstrated their own judgment.

Interpretability as a Design Requirement

AI systems that produce accurate outputs but resist explanation are less valuable than systems that are slightly less accurate but whose behavior practitioners can understand, challenge, and improve.

This is not a philosophical preference. It is an engineering requirement in any domain where:

Practitioners must make decisions based on model output
Regulators must evaluate whether the system behaves appropriately
Failures must be diagnosed and corrected
Accountability must be assigned when something goes wrong

An opaque model that is 95% accurate is less useful than a transparent model that is 91% accurate, if the 4% difference is smaller than the cost of operating without understanding.

The Provost and the Bayesian Network

In 1999, I briefed the Provost on an enrollment management model we had built for Academic Affairs. She had never seen machine learning before. Her reference frame for institutional data was the bar graphs and tables produced by our Institutional Research and Planning group from SPSS — static summaries of what had already happened.

I was young and, in retrospect, probably bolder than experience warranted. Instead of presenting slides with accuracy metrics, I brought my laptop with Netica loaded. I opened the Bayesian network and invited her to ask questions — not in statistical vocabulary, but by entering observations and watching the belief network recompute in real time. What happens to retention risk if this applicant profile enrolls? What if we change this assumption?

The explanatory power is what built her confidence. When I talked about accuracy and error rates, she had no reference frame for those numbers. When she could see the model, interact with it, and trace how evidence propagated through the structure, it made sense to her. The model became a reasoning partner, not a black box producing a number she was asked to trust.

That lesson has not aged. With large language models, the same dynamic applies at greater scale and higher stakes. Stakeholders who cannot interact with, challenge, or trace the reasoning behind an AI system's output will not adopt it — regardless of benchmark performance. The organizations that deploy AI successfully are not always the ones with the best models. They are the ones that make the models legible to the people who must act on them.

The composite kernel research (AAAI-05) was motivated by exactly this problem: practitioners were choosing SVM kernels by trial and error because the models were opaque. The evolutionary search produced not just better kernels but human-readable formulas that practitioners could inspect, interpret, and apply to adjacent problems. The selection method was theoretically important, but the most valuable artifact may have been a two-dimensional PCA visualization of the error-rate space — the shape that the search induced across kernel combinations. Being able to see the geometry of that space created intuition about where good kernels lived and why. Intuition creates trust. The accuracy improvement was secondary. The interpretability was the primary contribution.

In IEC 62304 and FDA-regulated environments, interpretability is not optional. AI systems in software as a medical device must produce evidence of their behavior, their training, their validation, and their limitations. A system whose outputs cannot be explained cannot be regulated. A system that cannot be regulated cannot be deployed in clinical contexts.

Governance Is Not a Constraint on AI — It Is What Makes AI Trustworthy

The reflex in technology adoption is to treat governance as drag.

This reflex is understandable and usually wrong.

Governance designed around AI is not a brake on AI adoption. It is the mechanism by which AI becomes trustworthy — which is the condition for sustainable AI adoption at scale.

An organization that deploys AI without governance may move faster in the short term. It will pay for that speed in:

Defects that are hard to diagnose because the system's behavior was never characterized
Regulatory findings that require remediation of work already done
Loss of trust from stakeholders who encounter failures that were not anticipated
Inability to reproduce results, which makes improvement impossible

The right frame is not "governance versus speed." It is "episodic governance versus continuous governance."

Episodic governance — document at the end, review before release — becomes increasingly expensive as generation accelerates. The artifact volume grows. The review burden grows. The distance between when decisions were made and when they are reviewed grows.

Continuous governance — evidence generated during work, traceability built into the workflow, reviews triggered by defined conditions — scales with AI-assisted development in a way that episodic governance does not.

AI in Regulated Environments

Deploying AI in IEC 62304, FDA 510(k), and similar regulatory environments is not primarily a technical problem. It is an evidence problem.

Regulators do not need to know how the model works at the mathematical level. They need to know:

What was the training data? Was it representative of the deployment context?
What was the validation methodology? Were test sets properly isolated from training?
What are the failure modes? How were they characterized?
What is the performance envelope? Under what conditions does the system degrade?
How is the system monitored post-deployment? What triggers revalidation?

These questions require documentation disciplines that most AI development practices do not naturally produce. They require treating the model's development lifecycle as a design process — with requirements, design controls, verification, and validation — rather than as an empirical search process.

The organizations that will successfully deploy AI in regulated environments are not the ones with the most sophisticated models. They are the ones that can produce complete, credible, auditable evidence of how those models were built, tested, and validated.

This is an organizational capability, not a technical one. It requires the same cross-functional coordination — software, quality, regulatory, clinical — that any other regulated development activity requires. AI is not exempt from the rigor that regulated environments demand. If anything, the opacity of many AI systems makes that rigor more important, not less.

The Human Judgment That Remains

AI raises the rate at which ideas can be explored. It does not raise the rate at which good ideas can be distinguished from bad ones.

The judgment that remains distinctively human:

Contextual appropriateness. Does this solution fit this situation? The model generates something that worked before. The engineer determines whether it applies here.

Risk calibration. How wrong could this be, and what are the consequences? The model cannot answer this question. The engineer must.

Epistemic honesty. Where is the analysis confident, and where is it speculating? The model presents output with uniform confidence. The engineer must recognize where the confidence is unwarranted.

Verification strategy. What would it look like if this were wrong? How would we know? Designing a test that can actually falsify a claim is a human capability that AI tools assist but do not replace.

The decision to act. Ultimately, someone must commit to a course of action and accept responsibility for the outcome. AI informs that decision. It does not make it.

The composite kernel work illustrated this: the evolutionary algorithm explored a vast space of kernel combinations and reliably converged on structures that a practitioner could inspect. The algorithm did not eliminate the practitioner's role. It gave the practitioner something concrete to reason about. That is the correct relationship between AI and engineering judgment.

Agentic Engineering

Agentic AI — systems that plan and execute sequences of actions to accomplish goals — changes the unit of AI interaction from request-response to extended autonomous operation.

This is a genuine capability shift. Tasks that previously required continuous human direction can be delegated. An agent can explore a codebase, identify a defect, propose a fix, run tests, and report results without step-by-step human instruction.

This shift creates new engineering responsibilities:

Specifying intent, not procedure. Human direction of agentic systems requires articulating what success looks like and what constraints apply — not a step-by-step procedure. This is a harder cognitive task than procedure specification, not an easier one.

Scope management. Agents that operate autonomously can take actions with consequences that extend beyond the intended scope. The boundaries of agentic authority — what the agent can do, what it must ask before doing, what it cannot do — must be explicitly defined.

Output verification at scale. An agent that produces output faster than a human can verify creates a backlog of unverified artifacts. The verification challenge identified above becomes acute in agentic contexts.

Auditability. What did the agent do, and why? In regulated environments, this question has a mandatory answer. Agentic systems must maintain auditable records of their reasoning and actions — not just their outputs.

The early adopter dynamic applies here as well. Organizations that build the governance infrastructure for agentic engineering before they deploy agentic engineering will adapt to it. Organizations that wait until governance is demanded will scramble.

AI as Organizational Capability

The question "what can we do with AI" is less useful than "what kind of organization do we need to be to use AI well."

AI capability is not primarily a software purchase. It is an organizational capability built from:

Engineers who can evaluate generated output critically
Governance processes designed for continuous evidence generation
Leadership that understands AI well enough to make investment decisions
Quality systems that treat AI-generated artifacts with appropriate scrutiny
Regulatory pathways that have been navigated before they are urgently needed

The organizations that will lead in AI adoption are not the ones that move fastest. They are the ones that move most deliberately — that build the human infrastructure for AI use at the same pace they build the technical infrastructure, and that learn faster from their AI deployment experience than their competitors do.

AI does not eliminate the need for engineering judgment. It increases the return on engineering judgment that already exists.

→ Related: Engineering · Platform · Patterns