The Entanglement Ceiling: Why Multi-Agent AI Will Stall at the Work That Matters Most
Blogger’s Note: This is a synthesis of research on how AI will affect workers. I couldn't find an existing framework for categorizing work by its exposure to AI, so I built one here — a five-level scale of multi-domain entanglement, drawing on prior work in labor economics, multi-agent systems, and tacit knowledge. The post also revisits two earlier workforce transitions for what they can teach us, and asks what higher education's role should be in the transition now beginning. Research sources are linked and included at the end of the article. A practical application of the framework - copy and paste a position description and see the AI exposure by level is available here. Feedback is welcome as this is as much an exploring and orienting post as anything else.
A framework for predicting which knowledge work gets automated next — and which doesn't — and what higher education can do to meet the moment.
The question hiding inside the productivity debate
The conversation about AI and white-collar work keeps circling the same question — which jobs go first? — and keeps producing unsatisfying answers. The occupational "exposure" indices disagree with each other by enormous margins. Peer-reviewed analysis has found correlations between leading exposure indices as low as 0.068, and in some cases the measures are anti-correlated across occupations. Something is wrong with the unit of analysis. The unit isn't the job. It isn't even the task. It's theentanglement — how many distinct domains of knowledge, judgment, and context have to be held in mind simultaneously to produce the output.
This piece proposes a five-level scale for multi-domain entanglement, grounds it in the labor economics and multi-agent systems literature, and argues that the current architectural direction of agentic AI — orchestrator plus specialists, handoffs, tool calls — is structurally mismatched with the highest-value remaining human work. Not because the underlying models can't reason across domains. Because the productization is pulling the wrong direction. It then turns to the institutional question of who carries the workforce across this transition, and argues that higher education is being handed not an existential crisis but its most important era.
What "entanglement" actually means
Borrow a metaphor from physics, but only loosely. In quantum systems, entangled particles can't be described independently — measuring one tells you something about the other, and the state of the pair is irreducible to the states of the parts. Knowledge work has the same property at certain altitudes. Some tasks decompose cleanly. Others don't, because the integration is the work, and pulling them apart destroys what made them valuable.
Four dimensions distinguish the levels:
Simultaneity — are the domains active in working memory at the same time, or can they be handled in sequence with clean handoffs? A literature review followed by a draft is sequential. A board memo whose content depends on knowing what three different audiences need to hear is simultaneous.
Context depth — how much of the relevant knowledge is codified versus tacit, written down somewhere or living in the body of someone who has been doing this for fifteen years? Michael Polanyi's old line — "we know more than we can tell" — describes the ceiling that even very capable language models still hit.
Integration locus — is integration the deliverable, or is it just glue between domain outputs? A consulting engagement's deliverable is the integration. A monthly financial report's deliverable is mostly the domain output, with integration as packaging.
Tolerance for handoff loss — what happens when you split the work between specialists and information gets lost between them? Sometimes nothing — the specialists pick it up on the next pass. Sometimes the lost information was the value.
When all four dimensions point toward "high," you have entangled work. When all four point toward "low," you have work that decomposes into a pipeline. Most knowledge work sits somewhere in the middle, which is exactly why exposure indices disagree.
The five levels
Level 1 — Single-domain, well-bounded. One domain. Codified knowledge. Clear inputs and outputs. Writing a SQL query, summarizing a PDF, formatting a citation, generating boilerplate code, transcribing audio. The Dallas Fed's analysis of AI's labor market effects captures the dynamic: AI automates codified textbook knowledge cleanly, and this is the bucket where it happens fastest. Current single-purpose agents already cover most of L1.
Level 2 — Multi-domain, sequential, low context. Two or three domains, but the work decomposes cleanly into a pipeline. Research a topic, outline it, draft it, cite it. Pull data from a source, transform it, load it into a dashboard. Triage a ticket, look up the resolution, close the loop. This is what most "agentic" solutions showcase, and it's the architecture the industry has been investing in heavily. Orchestrator-plus-specialist patterns work well here because the work was already a sequence of bounded steps before AI showed up.
Level 3 — Multi-domain, sequential, high context. Same pipeline shape as L2, but each step requires institutional, relational, or domain-specific tacit knowledge that isn't in any document. Mid-level financial analysis at a specific company. Customer success on an enterprise account. Performance review writeups. The technical work is L2; the which-rules-apply-to-whom layer is the L3 markup. AI agents can produce the surface form but routinely miss the context that makes the output usable, which is why people end up re-prompting and patching rather than shipping the first draft. As context windows, memory systems, and connectors mature, this is the bucket where AI will gain the most ground over the next few years.
Level 4 — Multi-domain, simultaneous, integration-as-deliverable. Three or more domains held together at once because the value is the integration. Strategic critiques, organizational change plans, portfolio consolidation recommendations, architecture decisions on brownfield systems, real consulting engagements, founder-stage strategy. Decomposing into specialists destroys what makes the work worth doing. The MIT Sloan group's recent reframing of work as "sequences of interdependent tasks" rather than independent ones captures part of this — but L4 is the case where the interdependence is so dense the sequence collapses into a single integrative act.
This is where current multi-agent architectures actively underperform. Benchmark work on long-horizon planning has produced a striking and somewhat embarrassing finding for the orchestration-everything school: single-agent systems with strong base models sometimes beat orchestrated multi-agent setups, because fewer handoffs mean less information loss and more streamlined reasoning. Complexity is not the same as competence.
Level 5 — Multi-domain, simultaneous, with embodied or relational stakes. L4 plus: the work is partly constituted by who is doing it, the trust relationships, the live read of a room, judgments that aren't fully articulable even after the fact. Therapy. Trial law in front of a jury. Hospice care. Negotiation where the surface topic and the real topic are different. Diplomacy. The actual delivery of a leadership presentation, as distinct from designing it.
L5 isn't just "harder L4." It's a different category. The information needed to do the work doesn't fully exist outside the interaction itself — it's generated in the moment by an embodied participant whose presence is part of the signal.
Why current AI architectures struggle above L3
The industry has converged on a pattern. A user-facing orchestrator interprets the request, decomposes it into subtasks, dispatches the subtasks to specialist agents (each tuned for a domain or tool), collects the results, and synthesizes a response. Variants of this pattern power most production agentic systems. This is a beautiful architecture for L1 and L2 work. It's a fine architecture for L3. And it is structurally bad for L4 and above, for three reasons.
The decomposition step is the failure point. L4 work resists decomposition by definition. The orchestrator must either decompose it badly (producing specialist outputs that don't compose back into anything coherent) or refuse to decompose it (in which case the multi-agent architecture adds no value over a single capable model).
Handoffs flatten simultaneity into sequence. When a finance specialist hands off to a change-management specialist who hands off to a technical-feasibility specialist, the resulting sequence cannot reconstruct the simultaneous holding-in-mind that produced the original L4 insight. Each handoff is a lossy compression. Stack enough of them and the original problem disappears.
Coordination overhead compounds. As more specialists are added, the surface area for miscommunication grows. Production case studies have documented degraded performance on complex tasks as orchestration complexity increases — the opposite of what the architecture is supposed to deliver. The most rigorous academic treatment of this, the ICLR 2025 paper by Cemri, Pan, Yang et al., analyzed 150+ tasks across five popular multi-agent systems and identified 14 distinct failure modes, with their core finding being that the failures are architectural, not capability-driven.
The wry part of all this is that frontier single-model reasoning is already good at L4 work when given the chance. A capable generalist model with long context and good memory can hold the financial, political, technical, and human dimensions of a problem together in a way that a router-plus-specialists configuration mechanically cannot. The bottleneck is architectural choice, not model capability.
The labor economics literature is converging on the same shape
The economics of automation has been quietly developing the vocabulary this framework needs. A 2025 review of AI and labor that synthesized dozens of studies converged on four dimensions for distinguishing simple from complex tasks: knowledge requirements, clarity of goals, task interdependence, and context requirements. Tasks scoring high on at least three are where AI exposure breaks down — almost a direct mapping onto the entanglement framework above.
The NBER working paper Chaining Tasks, Redefining Work by Demirer, Horton, Immorlica, and Lucier models AI as an autonomous problem solver embedded in a Garicano-style knowledge hierarchy, with coordination frictions arising specifically from tacit and social knowledge that AI can't easily execute. The authors describe how AI restructures the division of labor by pulling adjacent steps "under the hood" of a single AI-executed task — which is exactly what happens at L1 and L2 — while leaving the cross-boundary coordination problems harder than ever.
There is also a generational story embedded in this. A separate 2025 working paper by Enrique Ide on the intergenerational transmission of tacit knowledge argues that automating entry-level work (largely L1 and L2) may sever the apprenticeship channel through which workers develop the judgment needed for L4 work. Novices acquired tacit expertise by doing the low-end work alongside experts; if the low-end work disappears, the pipeline to expertise narrows. Ide's back-of-envelope estimate is that AI-driven entry-level automation could reduce U.S. long-run growth by 0.05 to 0.35 percentage points per year. This is a non-obvious and somewhat alarming implication of the entanglement framework: even if L4 work itself stays human, the training ground for L4 work erodes.
The Dallas Fed reads the wage data and arrives at a related conclusion: AI-exposed occupations that place a high value on tacit knowledge and experience are seeing wages rise, while younger workers in the same occupations are seeing employment pressure. The empirical signal matches the theoretical prediction — AI substitutes at the codified end and complements at the tacit end, which is just another way of saying it replaces L1 and L2 while leaving L4 alone, for now.
The debate worth having
The framework is not without counterarguments.
Counterargument 1: "A capable enough single model will replace everything anyway, multi-agent debates aside." This is plausible. The entanglement ceiling is a ceiling for a particular LLM architectural choice, not for AI in general. A future model with sufficient context, memory, and reasoning may handle L4 work directly — without orchestrators, without specialists, just one mind holding all the domains. If that happens, the entanglement framework still predicts what gets automated first (L1 → L2 → L3) but the ceiling moves up the stack faster than the multi-agent literature would suggest. The question becomes empirical: does single-model capability scale faster than coordination overhead grows?
Counterargument 2: "L4 work is rarer than this framework implies; most L4 knowledge work is L2/L3 in disguise." Also plausible, and worth taking seriously. A lot of work that feels integrative on the inside is, on inspection, a habitual sequence the worker no longer consciously decomposes. Tacit pipeline beats explicit pipeline, but it's still a pipeline. The honest test is the one a manager does instinctively: if I split this across three specialists who don't talk to each other, does the result still work? If yes, it was L2 or L3 wearing L4 clothing.
Counterargument 3: "The framework underrates relational/L5 work by lumping it in at the top." L5 is not just harder L4; it's a category change, and treating it as the top of the same scale flattens something important. A future version of this framework probably needs to split the simultaneity dimension from the embodiment dimension and let them vary independently.
Counterargument 4: "This is just Polanyi's Paradox restated." Partly. Polanyi's "we know more than we can tell" is the deep root, and the tacit-knowledge frame in the economics literature is its direct descendant. What's new here is the architectural observation — that even capable models can be hobbled by the way they're deployed — and the practical implication that the choice between "one strong generalist" and "many specialists in a workflow" is not a neutral engineering decision but a bet on what work is worth doing.
What this predicts and what to do about it
AI agents will absorb L1 and L2 work fastest. They are doing so now. L3 work will follow as memory and context infrastructure matures — probably faster than most workers expect, slower than vendors claim. L4 work will remain structurally hard for longer than the current hype suggests, because the architecture is pulling against the grain of the task.
For workers, the practical question isn't "when will agents replace this?" It's "which of my tasks am I incorrectly splitting into a pipeline when I should be holding them together?" The L4 work is where human authorship still matters — and where, paradoxically, working with a single capable model in a long, unbroken conversation may be more valuable than orchestrating a fleet of specialists.
For organizations, the question is whether the apprenticeship pipeline survives the automation of the lower levels. If novices never do the L1/L2 work that used to teach them judgment, where do the L4 practitioners of the next decade come from?
For tool builders, the question is harder still. The architectural defaults of the agentic-AI moment — decomposition, handoff, specialization — are the right answer for the work that's already being automated. They may be the wrong answer for the work that will remain. The next generation of useful tools may look less like an orchestra and more like a single deep conversation that doesn't end.
A historical analogy worth sitting with
The argument so far has a problem. If AI absorbs L1 and L2 work and the L3 work follows behind it, what happens to all the people who were doing that work? The implicit answer in the framework above is that they move up the entanglement stack — into L3 and L4 roles where human judgment still matters. And on its face that answer is laughable. There aren't enough L4 roles in the world to absorb the L1 and L2 workforce. Most people don't want to be management consultants. The math doesn't math. This is the same objection that was made — correctly, on the numbers available at the time — about manufacturing in the mid-twentieth century.
In May 1953, manufacturing employment in the United States peaked at 32 percent of all nonfarm jobs. Roughly one in three working Americans went to a factory every morning. The service industries that we now think of as the post-industrial economy — health care, education, professional services, hospitality, legal work, social services — collectively employed about 10 percent of the workforce. If you had stood in 1953 and asked "what happens when factory automation comes for these jobs?", the honest answer would have been: there is no realistic destination for thirty million displaced workers. The service economy was a third the size of the manufacturing industry. The transition would be impossible.
By 2019, manufacturing employment had fallen to roughly 9 percent of nonfarm jobs, and the broader service-providing sector — including the categories above plus trade, transportation, finance, and government — employed about 86 percent. The transition that looked impossible in 1953 happened. It happened slowly, painfully, unevenly, and at significant cost to specific people and specific towns. The aggregate move was nonetheless real and it absorbed an entire workforce.
The point of the analogy is not that the AI transition will be painless, or even that it will work out cleanly. The point is that the apparent impossibility of absorption is not evidence that absorption won't happen. The post-war observer staring at a 32-to-10 ratio and concluding that displaced manufacturing workers had nowhere to go was making a category error. They were holding the destination service economy constant while imagining the origin manufacturing economy shrinking; the destination economy didn't stay constant. New categories of work emerged. Old categories grew enormously. The very shape of "where people work" reorganized around the productivity gains the automation produced.
The same category error is available to us now. If you hold the L3 and L4 economy constant and imagine the L1 and L2 economy shrinking, you get a math problem with no solution. But the L3 and L4 economy will not stay constant. The productivity gains from automating the lower levels will create demand for more integrative work, not less. The shape of L4 work will itself change — what counted as L4 in 2025 may be ambient by 2045, and entirely new categories of integration we can't currently name will exist above it. The destination economy is a moving target, and it moves in the direction of the automation pressure, not against it.
How that transition actually went for the people inside it
The manufacturing-to-services analogy is useful only to the degree that we are honest about how that transition actually worked. The popular memory of it splits into two narratives that talk past each other, and neither narrative on its own captures what the data shows.
The first narrative — call it the optimist story — points to the long arc of improvement. Real median family income roughly doubled between 1950 and the late 1970s. Life expectancy rose from 68.2 years in 1950 to 79.25 years by 2024 — about eleven additional years of life for the average American. Household goods got cheaper relative to wages, often dramatically so. Vehicles per household, square footage of housing, consumer technology — almost every material measure of standard of living rose substantially. By aggregate measures, the average American in 2020 was meaningfully better off than the average American in 1953.
The second narrative — call it the pessimist story — points to the part of the data the optimist version skips. For roughly the first twenty-five years of the manufacturing-to-services transition, from 1948 to 1973, worker pay and productivity grew in lockstep. Hourly compensation for typical workers rose about 91 percent in that period, almost exactly matching productivity growth of 97 percent. The post-war prosperity that gets remembered as "the good old days" was, on the data, a real thing: broad-based gains, rising wages tracking rising output, a labor market in which the median worker actually shared in the productivity dividend.
Then, starting around 1973, the two lines decoupled. Productivity kept rising. The median worker's pay did not. The Economic Policy Institute, which has tracked this divergence for two decades, finds that from 1979 to 2025 productivity rose roughly 90 percent while typical hourly compensation rose only about 33 percent. Across the full 1973-to-2014 stretch, productivity grew 74.4 percent against compensation growth of 9.2 percent for the median worker. The gains the economy generated kept accruing — they simply stopped reaching most of the people producing them. Wage inequality rose. The wealth gap between the top decile and the median widened. The "good old days" gave way to a long period in which aggregate prosperity rose, individual standard of living for the median worker rose more slowly, and the political and economic distance between winners and losers grew.
Both narratives are right about what they describe. The full picture requires holding them together: the manufacturing-to-services transition produced enormous aggregate gains and failed to distribute them broadly after about 1973. Median Americans today are materially better off than median Americans in 1953 by almost every measure. They are also further behind the top of the distribution than their grandparents were, and the share of national productivity growth that reaches median workers has been declining for half a century. The transition worked. It also, in a specific and important sense, didn't.
The reason this matters for the AI-and-entanglement argument is that the second half of that story is not inevitable. The decoupling of productivity and median pay after 1973 was not a force of nature. It had causes — the erosion of collective bargaining is the most-cited single factor in the labor-economics literature, though globalization, monetary policy, technology, and the decline of industry-wide wage-setting all contributed. Different choices, applied at different moments, could have produced a different distributional outcome on the same aggregate trajectory.
This is the honest version of the historical analogy. The manufacturing-to-services transition shows that an apparently impossible workforce absorption is in fact possible — the destination economy did absorb the displaced workers, and aggregate prosperity rose substantially in the process. It also shows that aggregate absorption is not the same as broad-based gains. The transition can happen, the economy can grow, total wealth can rise, and yet the median worker can experience that growth as decades of stagnation while the gains pool at the top.
If we run the AI transition the way we ran the manufacturing transition, the optimist case is that the destination economy will exist and absorb the displaced workers. The pessimist case is that the destination economy will exist and absorb the displaced workers, and the gains will once again concentrate at the top while the median worker watches productivity rise for forty years without participating in it. Both can be true. Both have been true before.
The framework I have been building — the move toward L3- and L4-capable workforce, the redesign of education to produce that capability, the institutional response that will or will not happen — is in part a structural answer to the distributional problem. If the L3 economy is wide and deep, the gains it produces have somewhere to be distributed. If the L3 economy is narrow and gate-kept by elite institutions, the distributional pattern of the last fifty years repeats itself, only more sharply, because the entanglement gap between AI-augmented L4 workers and everyone else may turn out to be wider than the manufacturing-services gap ever was.
The aggregate transition will probably happen either way. Who participates in its gains is the open question, and it is the question higher education's response in the next decade will go a long way toward answering.
A second example: NAFTA and the China shock
The manufacturing-to-services transition was not the last time the American economy went through a structural shift of this magnitude. There was a second one, beginning in 1994 with NAFTA and accelerating sharply after China's accession to the World Trade Organization in 2001. The same logic should have applied: jobs displaced in one part of the economy, opportunities created in another, retrain the workers, grow the destination economy, mediate the transition. The same opportunity to do it well. The labor economics evidence on this is unsparing, and worth sitting with before we make claims about what the AI transition will look like.
The headline numbers first. The China shock alone — the surge of Chinese manufacturing imports following WTO accession — eliminated roughly 2 million U.S. jobs between 1997 and 2011, including approximately 985,000 in manufacturing. The job losses were concentrated geographically in the South Atlantic, the Deep South, and the industrial Midwest. NAFTA contributed an additional displacement that EPI estimates at approximately 683,000 jobs, roughly 60 percent of them in manufacturing. The aggregate economy benefited modestly from cheaper consumer goods. Specific places and specific workers paid the bill.
What is striking, and what the economics literature has spent the last fifteen years documenting, is how little the textbook adjustment story actually happened. The expectation — the implicit assumption underlying both the trade agreements and the political case for them — was that displaced workers would move to other regions, retrain into other industries, and the destination economy would absorb them the way the post-war services economy had absorbed the displaced manufacturing workers of the 1950s and 1960s. The reality was nothing like that. Autor, Dorn, and Hanson found that displaced manufacturing workers did not move, did not retrain at scale, and did not transition into other sectors. They became unemployed or left the labor force, particularly workers without a college education and lower-earning workers. Adjustment in local labor markets was "remarkably slow," with wages and labor-force participation rates depressed and unemployment elevated for at least a full decade after the shock began.
The 2021 follow-up paper found the damage persisting for two full decades — through 2019, twenty-five years after NAFTA took effect. The economy substituted at the aggregate level. The workers, individually, did not.
At the human level, this looks like communities where unemployment spiked into double digits for a decade. It looks like former manufacturing workers taking minimum-wage service jobs, retiring early, or simply leaving the labor force entirely. It looks, in the most documented and most-cited literature in this space, like the "deaths of despair" pattern Anne Case and Angus Deaton traced through these same communities — suicide, opioid overdose, alcoholic liver disease, all elevated in places hit hardest by the trade-driven manufacturing decline. The Autor-Dorn-Hanson research has subsequently extended into the social consequences as well: declining marriage rates, rising single parenthood, increased mortality from drug and alcohol abuse, greater uptake of government transfers.
Why didn't the retraining work?
The retraining infrastructure existed. The Trade Adjustment Assistance program had been on the books since 1962 and was specifically expanded to cover NAFTA dislocations in 1993. It provided up to two years of extended unemployment insurance, paid retraining, relocation allowances, and wage-insurance benefits for older workers. The policy mechanism was real.
The evidence on its effectiveness is more nuanced than either the "retraining failed" or "retraining worked" narratives suggest, and the nuance is what matters for the AI argument. The Mathematica Policy Research evaluation of TAA found that participation had small or non-significant impacts on aggregate earnings — and some studies, like Reynolds and Palatucci (2008), found participants experiencing wage losses up to 10 percentage points greater than non-participants. The Heritage Foundation and other critics pointed to these findings as evidence the program should be eliminated.
But the more recent and more rigorous causal-identification work tells a different story. Hyman's 2018 paper using quasi-random assignment to TAA — comparing approved versus denied applicants — found that TAA recipients earned roughly $50,000 more in cumulative earnings over ten years than non-recipients. The program worked when it was applied. The earlier studies had been comparing apples to oranges, conflating selection effects with program effects.
There is a sharper diagnosis hidden inside the contradictory findings, and it is the one most relevant to the AI moment. TAA's failures were not failures of pedagogy or design. They were failures of three specific things:
Scale. The program was always too small. The eligibility criteria required workers to be certified as having lost their jobs specifically because of trade — a determination that took months and was frequently denied. The displaced workforce was vastly larger than the institutional capacity allocated to serving it. Even the U.S. Government Accountability Office, in its 2001 review, noted that training participation rates were low and that 75 percent of workers who left the programs found jobs but "many earned far less than their prior salaries." The program reached a small fraction of the workers it was supposed to serve.
Targeting. TAA was structured as a special program for a narrow category of workers. This created perverse incentives — workers had to prove trade causation specifically, while workers displaced by automation or other forces received no equivalent support. As a recent Searchlight Institute synthesis put it in the context of AI-driven displacement: "Congress can help the U.S. workforce retool without repeating the mistakes of TAA" — specifically by building robust universal workforce systems rather than special-case programs that benefit only workers who can prove their jobs were eliminated by a specific named cause.
Political stability. The program was repeatedly reauthorized at the last minute, expanded and contracted across administrations, and never accumulated the institutional weight that a serious retraining infrastructure requires. When public opinion soured on trade in the mid-2010s, the program's political coalition fractured. The result was a workforce-development apparatus that was simultaneously too small to handle the actual displacement and too politically marginal to grow into the role it needed to fill.
What David Autor himself recommends
The most useful voice on this is the lead author of the China Shock papers himself. In interviews following the 2021 follow-up paper, Gordon Hanson — co-author of the entire China Shock series — was direct: "Nothing in our paper challenges the idea that free trade raises gross domestic product. The question is what public policies do we want in place to ensure freer trade doesn't generate concentrated pockets of hardship."The recommended responses are place-based: regionally tailored training programs, expanded unemployment insurance and earned income tax credits in affected labor markets, and direct assistance to employers in declining regions. The diagnosis was never anti-trade. It was anti-laissez-faire-about-the-transition.
The CSIS Big Data China review of the literature reaches the same synthesis: the economists who documented the China Shock most rigorously do not recommend tariffs as the cure. They recommend higher education, worker retraining, and government transfers, deployed at a scale commensurate with the disruption. The program designs aren't broken. The political will to build them at scale was.
Five lessons for the AI transition
If we apply this to the AI moment, five lessons emerge.
First: aggregate absorption can coexist with individual devastation. The 1950s-to-1970s lesson was that even when the destination economy is much smaller than the origin economy, absorption can happen. The NAFTA-and-China-shock lesson is that absorption at the aggregate level does not mean the displaced workers themselves transition. They can exit the labor force, take lower-paying service jobs, or simply not be counted in the statistics, while the economy as a whole "absorbs" the shock through other channels. The aggregate number masks individual outcomes. Any honest version of "the AI transition will be absorbed" has to be explicit about whether it means the economy adjusts or whether it means the displaced workers do.
Second: the speed of the shock matters more than its size. The post-war manufacturing-to-services shift took roughly seventy years and moved across three generations. The China shock concentrated most of its damage into a single decade after 2001. Workers had no time to retrain, communities had no time to diversify, the institutions of mediation had no time to scale. The AI transition, if Amodei and others are right about timeline, will be faster still — perhaps a single generation, perhaps less. Speed compounds the adjustment problem. The faster the shock, the more inadequate any retraining infrastructure will be relative to the displacement it has to mediate.
Third: place matters in ways the aggregate doesn't capture. The Rust Belt's decline was geographically concentrated in ways that made the national absorption story misleading at the local level. Workers cannot simply move to where the jobs are; family, housing, community, and the practical economics of selling a home in a declining market all create stickiness that economic models tend to under-weight. The AI transition will have its own geography. Some places will be hit harder than others. The L3 economy will be unevenly distributed. If the institutions of mediation are not geographically distributed to match — if the L3 educational capacity is concentrated in the same places where the L4 economy is already concentrated — the geographic inequality of the manufacturing transition will be reproduced in the AI transition, perhaps more sharply.
Fourth: working retraining models can fail at scale. TAA was a working program by the evidence of its participants' outcomes. It also reached only a fraction of the workers it was supposed to serve. The lesson is not that retraining doesn't work; the lesson is that under-deployed retraining produces a small group of participants who do well and a large group of non-participants whose data feeds the political narrative that "retraining doesn't work." This is the most dangerous version of failure for the AI transition: a high-quality higher-education response that reaches the top quintile of the workforce and leaves the rest in the same position as the displaced manufacturing workers of 2005, while the data on the participants is used to argue that the system is functioning. The reach of the response matters as much as its quality.
Fifth: political legitimacy follows distributional outcomes. The political backlash to the China shock — populism, anti-trade sentiment, the conditions that produced the 2016 election and have continued shaping American politics ever since — was a direct consequence of the perception, accurate on the data, that the gains of trade had been distributed to one set of people and the costs to another. The AI transition is poised to produce a similar political dynamic on a faster timeline. If the L3 economy is gate-kept by elite institutions and the L1/L2 workforce is left to absorb the displacement without commensurate institutional support, the political consequences will not be confined to economic underperformance. They will reshape the political coalitions that determine whether any coordinated response is possible.
These lessons are not arguments against the AI transition. They are arguments about what mediating the transition would require, and they are warnings about what happens when the mediating institutions fail to rise to the scale of the moment. The higher-education response I have been describing in this essay is not a sufficient answer on its own. It is, however, the necessary first move — and the NAFTA-and-China-shock evidence makes clear what happens when even the necessary moves are not made at sufficient scale.
We had this opportunity once before, twenty-five years ago, with workers we could see and a problem we could measure. We mostly failed them — not because the policy ideas were wrong, but because we never built the institutions at the scale the moment required. The same opportunity is in front of us now, with a workforce we can see and a problem we can measure. The question is whether we have learned anything.
Higher education's existential moment
The entanglement framework lands hardest on the institution most responsible for producing the workers who do entanglement-rated work: colleges and universities.
The data is no longer ambiguous. Only 30 percent of 2025 college graduates secured a full-time job in their field of study, down from 41 percent in 2024. Entry-level and internship postings dropped 15 percent year-over-year while applications per posting rose 26 percent. Unemployment for young bachelor's-degree holders climbed to 5.7 percent — more than a full percentage point above the national average and, for the first time in recent memory, higher than what high-school graduates face. The CEO of Anthropic has publicly warned that AI could eliminate up to half of entry-level white-collar jobs within five years.
This is the empirical reality colleges are now sitting inside. Ohio State's Office of Academic Affairs used the phrase "existential issue" in a January 2026 piece on the topic. Columbia's provost convened a forum titled "Reimagining Teaching and Learning in the Age of AI." Fortune published a piece asking how colleges can fill the gap left by vanishing entry-level work. The conversation has moved from "should we worry about this?" to "what do we do?" in roughly eighteen months.
The answer is not "less college." It's "different college." And the entanglement framework tells us what different should mean.
The traditional contract is broken
The implicit deal between higher education and the labor market for the past half-century was straightforward. Colleges produced graduates with codified knowledge — the textbook learning Polanyi distinguished from tacit knowledge — and employers provided the apprenticeship that turned that codified knowledge into expert judgment. Students learned the L1 and L2 substrate in the classroom. They learned L3 and L4 on the job, slowly, through the mentorship of senior people who had time to teach because the firm had time to wait.
That contract assumed entry-level work existed in sufficient quantity to absorb new graduates and pay them while they learned the unwritten parts of their profession. The entry-level jobs are now disappearing precisely because AI is competent at the codified-knowledge tasks that defined them. The traditional deal — trading rote labor for mentorship — is, as one recent analysis put it, "dead, and if not quite dead yet, it really is dying."
The dying is asymmetric. AI absorbed the L1 work first. The L2 work followed. The L3 work is partially exposed and partially protected, depending on how much institutional context each step requires. The L4 work is mostly intact. But the training ground — the entry-level positions where a new graduate used to learn judgment by doing the easier work alongside someone who had judgment — has collapsed faster than the senior work has. We've automated the bottom of the ladder and left the top of the ladder hanging in the air.
This is the apprenticeship-pipeline problem the Ide working paper formalized, now visible in the BLS data. And the institution best positioned to address it — to fill the gap left by the dying training-ground — is the one currently being declared obsolete.
The reframe: from L1 production to L3 launch
Higher education's traditional outcome goal was to produce graduates capable of doing entry-level work — to get students to an L1 or, in the more professional disciplines, an L2 level of competence. The implicit assumption was that L3 and L4 capability would develop later, on someone else's time, with someone else's investment.
That assumption no longer holds. If entry-level work is shrinking, the firm-side investment in turning L2 graduates into L3 practitioners is shrinking with it. The economics of taking a chance on a recent graduate are different when the AI can already do most of what the graduate would have done in their first eighteen months.
The strategic move for higher education is to redefine the outcome goal: stop graduating students at the L1/L2 level and start graduating them at the threshold of L3. The institution that used to deliver "ready to do the work" must now deliver "ready to do the work that AI can't do."
What does L3-ready look like in practice? It means a graduate who has not just learned the codified content of their field but has been put repeatedly in situations that require:
Judgment under uncertainty. The L3 layer is where rules stop being sufficient and context starts mattering. Students need extensive practice making calls where no answer is obviously right and the cost of getting it wrong is real enough to remember. Case-based learning, simulations, internships with actual stakes, capstone projects with external clients — the apparatus for this exists; it just needs to become the spine of the degree rather than its embellishment.
Institutional and relational fluency. Tacit knowledge can't be transferred through textbooks, but it can be cultivated through repeated immersion in environments that have norms, hierarchies, and unwritten rules. Practicum requirements, embedded fieldwork, longitudinal mentorship, multi-semester team projects with consistent stakeholders — anything that forces students to learn the contextual dimension that distinguishes L3 from L2.
Integration across domains. The defining feature of L4 work is holding multiple domains in mind at once. The defining feature of an L3-ready graduate is having practiced the habit of multi-domain reasoning, even on smaller problems. Interdisciplinary capstones, dual-track majors that actually require synthesis (not just credit-hour totals), problem-based curricula where the assignments span fields — the formats exist. Most are currently treated as enrichment rather than core.
Critical evaluation of AI output. The graduate of 2030 will spend most of their working life reviewing AI work product, not producing the equivalent work themselves. The capacity to read AI output critically — to notice when it's confidently wrong, when it has missed context, when it has produced something polished but useless — is itself a domain that needs to be taught explicitly. Some of the more thoughtful pedagogical work in 2025 has been focused on exactly this: treating AI not as a shortcut but as an object of critical inquiry, what one EDUCAUSE Review synthesis called the cultivation of "evaluative judgment."
Communication and disagreement. L3 and L4 work happens in conversation with other humans whose models of the world are different. The ability to be specific, to push back productively, to bring an idea into contact with disagreement and have it survive — these are old liberal-arts virtues that turn out to be exactly the capacities the L4 economy will pay for. The cruelest irony of the current moment is that the humanities departments under the most budgetary pressure are the ones teaching the skills the labor market is about to start valuing most.
Why this is harder than it sounds
The argument has to acknowledge that every undergraduate graduating at L3-readiness is not a marginal adjustment to current practice. It is a fundamental redesign of what a degree does.
This is where the higher-ed conversation gets uncomfortable. The institutions best positioned to make this transition — small, well-resourced, selective — are not the institutions that educate the majority of American undergraduates. The institutions that educate the majority — large publics, regional comprehensives, community colleges, broad-access universities — are precisely the institutions where the per-student investment required to produce L3-ready graduates is hardest to find. If the L1/L2-to-L3 reframe is the right strategic move, the resource constraints required to make it real are not evenly distributed.
This is not a reason to abandon the reframe. It is a reason to take seriously the redistribution of resources, the redesign of credentials, the rethinking of what counts as a college, and the rebuilding of the apprenticeship infrastructure that the labor market has stopped providing. Some of this will happen inside existing colleges and universities. Some will happen through new institutional forms — apprenticeship networks, project-based credential programs, employer-funded fellowship pipelines, certifications that mean something. The pieces are being built; they're just not yet legible as "college."
The case for higher education is stronger when AI performs the L1/L2 layer, not weaker, because someone has to develop the L3 judgment that AI can't, and the institution best positioned to do that work is the one that has been doing it for centuries.
Course structure: remove and replace, not append
The strategic reframe from L1/L2 production to L3 launch is the easy part. The harder question is what curriculum redesign actually looks like in practice — and the honest answer is unsettling for most institutions.
The instinct of most faculty, faced with the L1/L2-to-L3 reframe, will be to add L3 instruction on top of what they already teach. A capstone seminar. A practicum. An "applied" elective. A required AI-ethics module. These are real and useful, but they are not the redesign the moment requires.
The honest version of the redesign is harder, and it asks something difficult of higher-education faculty: teach less, in service of teaching deeper. The L1 and L2 content that currently fills the credit hours of most undergraduate degrees cannot all stay. Some of it has to come out so that the L3 work can occupy the curricular real estate it needs. This is removal and replacement, not addition.
The reason this is the right move — and not just an easier-sounding move — has thirty years of evidence behind it.
What the medical-education experiment showed
Medical education ran the natural experiment for this question starting in 1969, when McMaster University introduced problem-based learning (PBL) and other medical schools watched to see what happened. By the early 1990s, over 70 percent of U.S. medical schools had incorporated PBL into their curricula. By the time Harvard launched its Pathways curriculum in the 2010s, the medical-education world had decades of data on what happens when you replace lecture-based basic-science instruction with case-based, integrated, problem-driven learning.
The results were not subtle. A 2022 scoping review of 124 studies published in BMC Medical Education found that PBL students performed at least as well as lecture-taught students on knowledge acquisition — and substantially better on problem-solving, self-directed learning, and integration of basic and clinical sciences. A meta-analysis of physicians who graduated from PBL versus traditional curricula found that PBL graduates had greater competencies "especially in communication skills, coping with uncertainty, and self-directed continuing learning." The same source documents that one year after taking basic-science courses, knowledge loss in traditional curricula ranged from 13 to 47 percent. PBL students lost less because the knowledge was acquired in the context of working problems, not as decontextualized facts.
The pattern is consistent across thirty years and six continents: when basic-science content is taught in service of clinical reasoning rather than as a prerequisite to it, students learn the basic-science content at least as well, retain it longer, and develop the judgment skills that the prerequisite-first approach was supposed to enable but rarely did.
The implication for the L1/L2-to-L3 reframe is direct. The fear that "we can't teach L3 without first teaching all the L1/L2 content" is the same fear medical educators had in 1969 about teaching clinical reasoning without first teaching all of anatomy. It turned out to be wrong. The L1/L2 content gets learned anyway, more durably, when it's taught in service of L3-grade problems rather than as a separate phase.
The "twin sins" of traditional course design
Grant Wiggins and Jay McTighe, in their foundational Understanding by Design framework, named the failure modes of traditional curriculum design with unusual precision. They called them the "twin sins."
The first sin is activity-oriented design — what they described as "hands-on without being minds-on." Curricula that consist of engaging activities and experiences that "do not lead anywhere intellectually." Every faculty member has seen this and most have produced it. A clever assignment, a popular module, a memorable demonstration that students enjoyed and learned nothing meaningful from.
The second sin, more directly relevant to the L1/L2 problem, is coverage — "an approach in which students march through textbook content with little priority among the topics, often with little real understanding." Coverage is what happens when a faculty member's primary planning question is "how do I get through everything in the syllabus?" rather than "what do students need to be able to do at the end of this course?" Coverage is the institutional default. It is also incompatible with developing L3 capability, because L3 capability is what gets squeezed out when coverage gets prioritized.
Wiggins's most-quoted line, and the one most relevant here, is: "Facts and skills are means, not ends." The standard L1/L2 curriculum treats facts and skills as ends. Get the students to recall the facts, perform the skills, pass the exam. The L3 reframe requires treating facts and skills as means — as the things students develop on the way to doing something integrative with them.
This is why the redesign cannot be additive. If you keep coverage as the curricular spine and add L3 instruction at the edges, the coverage will crowd out the L3 work. Coverage always wins the time fight because it has the inertia of the existing syllabus, the existing assessments, the existing faculty expertise, and the existing assumption that "we have to teach all of this." The only way L3 instruction gets the time it needs is by removing some of what currently fills that time.
Backward design and the practical question of what to remove
Wiggins and McTighe's backward-design framework offers the operational answer to "what should come out?" The framework is a three-stage planning sequence:
Identify the desired learning outcomes — what students should be able to do, understand, and judge at the end of the course.
Determine acceptable evidence — how students will demonstrate they can do, understand, and judge those things.
Plan the learning experiences and instruction that produce that evidence.
The discipline this imposes is brutal in a useful way. Every piece of content in the current syllabus has to justify itself against the outcomes. The questions are: does this content, taught this way, contribute to a learning outcome that matters? Is there evidence the student will actually need this in the integrative work the course is designed to prepare them for? If the answer is no — and it will frequently be no — the content comes out.
What survives a backward-design audit of an existing L1/L2-heavy course is typically a much shorter list of foundational concepts taught in much greater depth, integrated into problem contexts that require their use. The content that doesn't survive is the long tail of "we cover this because we've always covered this," which is precisely the content AI is most capable of providing on demand when a student needs it during an integrative task.
This is the operational form of remove-and-replace. It is not a vague pedagogical preference; it is a specific planning sequence with thirty years of evidence behind it and a published framework that thousands of faculty have already used. The barriers to applying it are institutional and political, not pedagogical.
Why this is institutionally difficult
Removing content is harder than adding it because removal makes someone lose. Faculty members whose courses get restructured lose some of what they've spent years developing. Departments whose required courses get consolidated lose enrollment-driven budget allocations. Accreditors whose competency frameworks include long lists of required topics resist trimming. Students whose parents paid for a degree expect to receive more education, not less, even when "less" would in fact produce better learning. Each of these is a real constraint, and together they create an institutional gravity that pulls toward the additive approach even when the additive approach is known to be worse.
The institutions that will make this transition successfully are the ones willing to take losses inside the institution — discontinued courses, restructured majors, renegotiated faculty workloads, recalibrated accreditation conversations — in service of producing graduates who actually have the capabilities they're being credentialed for. This is, in plain terms, a leadership problem rather than a pedagogical one. The pedagogy has been worked out. The will to apply it is what's scarce.
There is a smaller pattern worth naming inside this larger one. The faculty members who most need to do this work — the ones whose courses are most L1/L2-heavy — are typically the ones with the least incentive to do it. The faculty teaching at the L3/L4 layer already (capstone instructors, advanced seminar leaders, clinical preceptors, studio teachers) are not the bottleneck. The bottleneck is the intro-and-survey course faculty whose entire teaching identity is built around coverage, and whose departmental position depends on the current structure of the curriculum. Asking these faculty to redesign their courses around backward design is, for many of them, asking them to redesign their professional identity. Some will. Most won't. The institutions that make the transition may have to do it through hiring and incentive realignment rather than through retraining the existing faculty.
A working model
The clearest existing model for what L3-focused undergraduate education looks like is the medical-school PBL curriculum, but the model is not limited to medicine. It has analogues across higher education that are worth pointing to as proof of concept:
The case method at business and law schools, which teaches doctrine and analysis through cases rather than through systematic lecture. Decades of refinement, mature pedagogy, transferable to other disciplines.
Studio-based education in architecture, art, and design, where critique of work-in-progress is the dominant pedagogy and content is acquired in service of producing artifacts.
Capstone-driven engineering programs where multi-semester project work occupies the curricular spine and traditional coursework is reorganized around it.
Clinical placements in nursing, social work, and education, which integrate L1/L2 content acquisition with L3-grade real-world judgment.
None of these are radical. All of them have decades of evidence behind them. What they share is the architectural commitment Wiggins and McTighe described: outcomes first, evidence second, content third. The L1 and L2 material is taught — but it is taught as part of doing the L3 work, not as a prerequisite to it.
The institutions that make the transition the entanglement framework calls for will look, structurally, more like these models than like the lecture-and-recitation default that most undergraduate education currently inherits from the nineteenth century. That's the curricular shift the AI moment is asking higher education to make. It is hard and it is overdue and the playbook for it has existed for thirty years.
Why this responsibility falls on higher education
Here’s what we have been circling around for the past several thousand words.
The framing of higher education's AI moment as "existential" is half right and half wrong. The half that's right: the traditional value proposition of higher education — credentialing L1 and L2 capabilities — is being replaced by AI, fast, and an institution that does only what AI now does will not survive. The half that's wrong, and that the moment requires us to say directly: this is not the end of higher education's importance. It is the beginning of higher education's most important era.
The L1/L2-to-L3 transition the economy is being pushed into needs a delivery institution. Something has to take millions of workers — current students, displaced employees, career switchers, mid-career professionals whose roles are being hollowed out — and bring them across the entanglement threshold from "doing tasks AI can do" to "doing the integrative work AI cannot." This is not a marginal training problem. This is the largest workforce-development requirement of the next twenty years.
So who delivers it? The answer involves ruling things out, not asserting a preference.
Corporate training is not the answer. The numbers on this are bracing. A 2025 industry survey found that 91 percent of L&D professionals say continuous learning is essential, while only 36 percent of organizations qualify as "career development champions" with robust programs. 63 percent of employers cite skills gaps as their biggest transformation barrier. 64 percent of employees say their company provides AI tools, but only 25 percent strongly agree their employer has a clear vision for how to use them. The dirty secret of enterprise learning, articulated by practitioners willing to be honest, is that employees finish learning modules on Friday and return to outdated workflows on Monday. Corporate training was never designed to produce L3-capable practitioners. It was designed to fill specific skill gaps for specific roles, on the assumption that the L3 judgment was already there. When the L3 judgment is what's missing, corporate training can't manufacture it. The infrastructure is wrong, the timescale is wrong, and the incentives are wrong: companies optimize for their own near-term skill needs, not for the long-arc development of integrative judgment that workers carry across employers.
Alternative credentials are not the answer. The coding bootcamp story of 2024-2025 is the canonical case, and it's worth dwelling on because it's the most direct natural experiment we have. Bootcamps were the great non-college credentialing experiment of the 2010s — three-month intensive programs that took adults with no technical background and produced employable software engineers, often at outcomes that matched or exceeded computer-science bachelor's degrees on initial job placement. They worked. And in December 2024, 2U shut down its entire $750-million bootcamp business with the CEO saying out loud: "the long-form, intensive training that boot camps provide no longer aligns with what the market wants and needs." Southern New Hampshire University shut down its bootcamp in 2023. Epicodus closed in early 2024. Momentum Learning closed shortly after.
The reason the bootcamp model collapsed is precisely the reason this essay exists. A private-equity investor in educational technology, Daniel Pianko, named it directly in the same Inside Higher Ed piece: "Ten years ago, employers wanted people who could convert business practices into programming languages. But in 2025, AI-powered machines can do much of that programming, which has elevated demand for higher-level workers who have an understanding of the specific business problems you want to solve rather than specific coding skills." That is the entanglement framework, articulated by someone with money in the game. Bootcamps were L1/L2 producers. The L1/L2 economy is shrinking. The bootcamps that could not pivot to L3 production are gone, and the structural reasons they couldn't pivot — short program length, narrow scope, lack of disciplinary depth, no apparatus for cultivating judgment — are intrinsic to the bootcamp model. The alternative-credential sector cannot scale to produce L3-capable graduates because its own architecture is designed for the L1/L2 problem that no longer exists.
Government workforce programs are not, by themselves, the answer. The U.S. federal workforce-development apparatus is real and has been doing useful work for decades. It is also, by widespread acknowledgment within the field, structurally inadequate to the scale of the moment. Total federal workforce funding is spread across more than 43 programs in nine agencies, with chronic coordination problems and political vulnerability. The bipartisan NSF AI Education Act and the 2025 expansion of Workforce Pell to cover short-term training are useful steps. They are not, individually or collectively, of the order of magnitude required to retrain a workforce of millions across a generation. Government can fund and coordinate the work. It cannot deliver it.
Higher education — including its community-college backbone — is the answer, by elimination and by infrastructure. The argument is not that universities and community colleges are uniquely suited to this task because of some pedagogical genius or institutional virtue. The argument is that they are the only institution with the existing scale to deliver it. Over 1,000 community colleges educate 10.5 million students across the United States. Together with four-year institutions, the U.S. higher-education system serves roughly 18 million students at any given moment and credentials about three million graduates per year. No other institutional category — not corporate training, not bootcamps, not government — has that throughput, that geographic distribution, that employer-relationship density, or that public-interest mission.
There is also something the alternatives don't have, which is the pedagogical tradition the previous sections of this essay drew on. The thirty-year medical-education problem-based-learning literature, the Wiggins-McTighe backward-design framework, the case-method tradition, the studio-pedagogy tradition, the clinical-placement tradition, the capstone-engineering tradition — these are all artifacts of higher education. They are the existing playbook for producing L3-capable practitioners at scale. Corporate training doesn't have an analogue. Bootcamps don't have an analogue. Government workforce programs don't have an analogue. The institution that has spent the last fifty years quietly building the infrastructure for L3 education — even when it didn't have a name for what it was doing — is higher education, and the infrastructure is real and transferable.
The community-college story in particular is worth pulling forward. Over the past two years, the National Applied AI Consortium has trained more than 1,900 faculty across 337 colleges in 49 states and two U.S. territories, reaching more than 50,000 students. Houston Community College graduated its first cohort of AI bachelor's-degree holders in May 2025. Chandler-Gilbert Community College launched the nation's first associate degree in AI and machine learning. Miami Dade College, Maricopa County, and dozens of others have been doing this work for several years, in partnership with NSF, with Big Tech, and with regional employers. The infrastructure for the L3 transition is being built, in places that don't make the front page, by people who don't think of themselves as elite educators. This is the unglamorous heart of the answer.
What happens if higher education doesn't rise to it
I want to be plain about the stakes in this section because they are not abstract.
If higher education does not make the L1/L2-to-L3 transition — if it continues to credential L1 and L2 capabilities, continues to treat coverage as the spine of its curriculum, continues to push L3 work to the edges as "enrichment," continues to graduate students at the threshold of an entry-level economy that no longer exists — then the workers who were supposed to be served by the institution will not be served by it. They will not develop L3 capabilities through some other channel, because the other channels do not exist at scale. They will enter a labor market in which the L1 and L2 work has been absorbed and the L3 work is reserved for people who got the training they didn't.
The aggregate consequence of this failure, if it occurs, is straightforward. Millions of working-age adults will be sorted into a labor market in which the work they were trained for does not pay a livable wage and the work that pays a livable wage requires capabilities they were not given the opportunity to develop. The Dallas Fed wage data is already showing the bifurcation — experienced workers in tacit-knowledge-heavy occupations seeing wage gains, younger workers and entry-level workers seeing employment pressure. That bifurcation will widen, not narrow, if the institution responsible for producing the next generation of L3-capable workers fails to retool.
The human form of this aggregate consequence is people. It is the graduate in 2030 who took on debt for a degree that prepared them for entry-level jobs that no longer exist. It is the 45-year-old whose mid-career role got hollowed out and who has no realistic path to the L3 work that survived. It is the entire cohort of young adults entering the workforce in the late 2020s and early 2030s who, through no fault of their own, were credentialed for an economy that vanished while they were studying for it. The cost of higher education's failure to rise to the moment is not borne by the institution. It is borne by these people, and the bill for it is paid in lower lifetime earnings, narrower career options, foregone savings, delayed family formation, and the slow grind of underemployment that hollows out individual lives one quiet decade at a time.
This is not a hypothetical risk. The bootcamp shutdowns and the entry-level employment numbers cited earlier in this essay are early indicators of what the bifurcation looks like in real time. The question is not whether the shift is happening. The question is whether the institution best positioned to mediate the shift will do its job, or whether it will preside over the displacement of millions and tell itself a story about how that wasn't really its responsibility.
Recall the two historical lessons earlier in this essay: aggregate absorption is not the same as broad-based gains, and working retraining programs can fail entirely if they aren't deployed at scale. The destination economy of the 1953-to-2020 manufacturing-to-services transition did absorb the displaced workers — and starting in 1973, stopped distributing the gains to the median worker. The NAFTA-and-China-shock transition had the same policy mechanism available (TAA worked when applied) and failed to build it to anything close to the scale required. We are at the analogous decision point for the L3 transition, with both lessons available to us. The structural choice that determines whether the L3 economy is wide enough to distribute the gains broadly, and whether the institutional response reaches enough of the displaced workforce to actually mediate the transition, is being made now, in the institutions that decide what they teach and to whom.
The actual call to action
Here is what this moment is asking of higher education.
Stop framing the AI shift as an existential threat to the institution. It is not. It is the opposite. The L4 economy needs an institution that can produce L3-capable workers at scale, with the pedagogical depth that only a tradition of teaching for understanding can deliver, with the credentialing authority that lets graduates carry the capability across employers and across decades, and with the public mission that lets the institution take seriously the workers no other system will train. There is exactly one institutional category that fits this description and it is the one currently being told it is in crisis.
The institutions that read this moment correctly will rise. The ones that don't will decline. The community colleges quietly building applied-AI programs in Houston and Charlotte and Miami have already started. The medical schools that did the PBL transition thirty years ago wrote the playbook. The Wiggins-McTighe framework was published in 1998. The pieces are on the table. The question is whether the rest of higher education will pick them up.
If it does, the AI moment will look, in retrospect, like the period when higher education stopped being the institution that prepared workers for the jobs employers used to want and started being the institution that prepared workers for the work that is still worth doing. That is not a smaller mission than what colleges and universities have historically claimed. It is the same mission, restated for a moment that requires it.
If higher education does not rise to it, the bill comes due in millions of unnecessary lives narrowed by an economic transition the institutions of mediation failed to mediate. That is the stake. It is large enough to be worth saying directly.
The framework I have offered in this essay — the five levels of entanglement, the architectural mismatch of multi-agent systems, the labor-economics evidence, the manufacturing-to-services analogy, the distributional warning embedded in that analogy, the NAFTA-and-China-shock lesson about under-deployed retraining, the L1/L2-to-L3 reframe, the remove-and-replace curriculum redesign, the institutional case for higher education — is in the end a single argument with a single implication. The economy is moving up the entanglement stack. The institution responsible for moving workers up the entanglement stack with it is higher education. The work is hard, the timescale is shorter than the institution is used to, and the cost of failure is borne by people who have done nothing to deserve it.
Bibliography
Multi-agent AI architectures and failure modes
Cemri, M., Pan, M. Z., Yang, S., et al. (2025). "Why Do Multi-Agent LLM Systems Fail?" ICLR 2025. arXiv:2503.13657. https://arxiv.org/pdf/2503.13657
orq.ai. "Why Do Multi-Agent LLM Systems Fail?" https://orq.ai/blog/why-do-multi-agent-llm-systems-fail
Galileo AI. "Why Multi-Agent Systems Fail." https://galileo.ai/blog/why-multi-agent-systems-fail
Labor economics and AI exposure
Davis, A. M. (Feb 2026). "AI's Wage Effects in the Texas and U.S. Labor Markets." Federal Reserve Bank of Dallas. https://www.dallasfed.org/research/economics/2026/0224
Demirer, M., Horton, J., Immorlica, N., & Lucier, B. (2025). "Chaining Tasks, Redefining Work: How AI Restructures the Division of Labor." Summary at MIT Sloan: https://mitsloan.mit.edu/ideas-made-to-matter/how-ai-reshaping-workflows-and-redefining-jobs
Ide, E. (2025). "Apprenticeship and the Pipeline to Expertise." arXiv:2507.16078. https://arxiv.org/abs/2507.16078
Anghel, R., et al. "Why occupational AI exposure indices disagree." PNAS Nexus. https://academic.oup.com/pnasnexus/article/4/4/pgaf107/8104152
Brynjolfsson, E., Li, D., & Raymond, L. (2023). "Generative AI at Work." NBER Working Paper 31161. https://www.nber.org/papers/w31161
Manufacturing-to-services transition (BLS data)
Bureau of Labor Statistics (June 2019). "Forty Years of Falling Manufacturing Employment." Beyond the Numbers. https://www.bls.gov/opub/btn/volume-9/forty-years-of-falling-manufacturing-employment.htm
Bureau of Labor Statistics (June 2024). The Economics Daily. https://www.bls.gov/opub/ted/2024/a-look-at-a-long-term-trend-for-the-bureaus-birthday.htm
Meisenheimer, J. R. (BLS analysis, summarized). "The American Workplace — The Shift to a Service Economy." https://jobs.stateuniversity.com/pages/16/American-Workplace-SHIFT-SERVICE-ECONOMY.html
Standard of living and the productivity-pay gap
Economic Policy Institute (March 2026 update). "The Productivity-Pay Gap." https://www.epi.org/productivity-pay-gap/
Mishel, L. & Bivens, J. (2015). "Understanding the Historic Divergence Between Productivity and a Typical Worker's Pay." Economic Policy Institute. https://www.epi.org/publication/understanding-the-historic-divergence-between-productivity-and-a-typical-workers-pay-why-it-matters-and-why-its-real/
Economic Policy Institute (Jan 2015). "The erosion of collective bargaining has widened the gap between productivity and pay." https://www.epi.org/publication/collective-bargainings-erosion-expanded-the-productivity-pay-gap/
Federal Reserve Economic Data (FRED). Real Median Household Income in the United States. https://fred.stlouisfed.org/series/MEHOINUSA672N
EH.net Encyclopedia. "A History of the Standard of Living in the United States." https://eh.net/encyclopedia/a-history-of-the-standard-of-living-in-the-united-states/
NCHStats (March 2026). "US Life Expectancy 1950-2025." https://nchstats.com/us-life-expectancy-trends/
Strain, M. (May 2022). "The Productivity-Pay 'Gap': A Pernicious Economic Myth." American Enterprise Institute. https://www.aei.org/articles/the-productivity-pay-gap-a-pernicious-economic-myth/
NAFTA, the China Shock, and Trade Adjustment Assistance
Autor, D., Dorn, D., & Hanson, G. (2016). "The China Shock: Learning from Labor-Market Adjustment to Large Changes in Trade." Annual Review of Economics, 8: 205-240. https://www.annualreviews.org/content/journals/10.1146/annurev-economics-080315-015041
Autor, D., Dorn, D., & Hanson, G. (NBER version, 2016). NBER Working Paper 21906. https://www.nber.org/papers/w21906
Autor, D., Dorn, D., & Hanson, G. (2021). "On the Persistence of the China Shock." Brookings Papers on Economic Activity. https://www.brookings.edu/articles/on-the-persistence-of-the-china-shock/
MIT News (Dec 2021). "Q&A: David Autor on the long afterlife of the 'China shock.'" https://news.mit.edu/2021/david-autor-china-shock-persists-1206
Cato Institute (Dec 2023). "The 'China Shock' Demystified: Its Origins, Effects, and Lessons for Today." https://www.cato.org/publications/china-shock
CSIS Big Data China (April 2024). "The China Shock: Reevaluating the Debate." https://bigdatachina.csis.org/the-china-shock-reevaluating-the-debate/
Hyman, B. (2018). "Can Displaced Labor Be Retrained? Evidence from Quasi-Random Assignment to Trade Adjustment Assistance." Summarized in Searchlight Institute (May 2026), "Lost in Transition: How Trade Adjustment Assistance came up short (and where it succeeded)." https://www.searchlightinstitute.org/research/lost-in-transition-how-trade-adjustment-assistance-came-up-short-and-where-it-succeeded/
Mathematica Policy Research. "Trade Adjustment Assistance Evaluation." https://www.mathematica.org/projects/trade-adjustment-assistance-evaluation
Wikipedia. "Trade Adjustment Assistance" (summary of multiple evaluation studies). https://en.wikipedia.org/wiki/Trade_Adjustment_Assistance
U.S. Government Accountability Office (2001). "Trade Adjustment Assistance: Trends, Outcomes, and Management Issues in Dislocated Worker Programs." https://www.gao.gov/products/gao-01-59
Higher education's AI moment
CNBC (Nov 2025). "AI puts the squeeze on new grads looking for work." https://www.cnbc.com/2025/11/15/ai-puts-the-squeeze-on-new-grads-looking-for-work.html
Ohio State Office of Academic Affairs (Jan 2026). "What College in the Age of AI: Young Graduates Can't Find Jobs, Colleges Know They Have To Do Something." https://oaa.osu.edu/news/2026/01/20/what-college-age-ai-young-graduates-cant-find-jobs-colleges-know-they-have-do
Columbia Center for Teaching and Learning. "2026: Reimagining Teaching and Learning in the Age of AI." https://ctl.columbia.edu/about/2026-reimagining-teaching-learning/
Hansen, M. (May 2026). "AI Entry-Level Jobs and Higher Education's Experience Gap." Fortune. https://fortune.com/2026/05/15/ai-entry-level-jobs-higher-education-experience-gap/
Rezi.ai (2026). "Entry-Level Jobs and AI: 2026 Report." https://www.rezi.ai/posts/entry-level-jobs-and-ai-2026-report
EDUCAUSE Review (Sept 2024). "Must-Have Competencies and Skills in Our New AI World: A Synthesis for Educational Reform." https://er.educause.edu/articles/2024/9/must-have-competencies-and-skills-in-our-new-ai-world-a-synthesis-for-educational-reform
Curriculum design and problem-based learning
Trullàs, J. C., et al. (2022). "Effectiveness of problem-based learning methodology in undergraduate medical education: a scoping review." BMC Medical Education, 22: 104. https://link.springer.com/article/10.1186/s12909-022-03154-8
Servant-Miklos, V. (2024). "Problem-based learning and the development of physician competencies: a meta-analytic review." PMC. https://pmc.ncbi.nlm.nih.gov/articles/PMC10758192/
Khalili, M., et al. (2025). "Implementation and outcomes of problem-based learning in U.S. medical schools." Frontiers in Education. https://www.frontiersin.org/journals/education/articles/10.3389/feduc.2025.1631337/full
Wiggins, G., & McTighe, J. (2012). "Understanding by Design: A Framework for Effecting Curricular Development and Assessment." ASCD White Paper. https://files.ascd.org/staticfiles/ascd/pdf/siteASCD/publications/UbD_WhitePaper0312.pdf
McTighe, J. (n.d.). "The Fundamentals of Backward Planning." ASCD Educational Leadership. https://www.ascd.org/el/articles/the-fundamentals-of-backward-planning
Bowen, R. S. (2017). "Understanding by Design: A Framework for Effecting Curricular Development and Assessment." CBE—Life Sciences Education. https://pmc.ncbi.nlm.nih.gov/articles/PMC1885909/
Corporate training and alternative credentials
D2L (Nov 2025). "Employee Training Statistics and Trends to Know in 2026." https://www.d2l.com/blog/employee-training-statistics/
Upside Learning (May 2026). "Workforce Upskilling & Reskilling: Enterprise Roadmap." https://blog.upsidelearning.com/workforce-upskilling-reskilling-enterprise-roadmap/
Inside Higher Ed (Jan 2025). "Have Coding Boot Camps Lost Their Appeal?" https://www.insidehighered.com/news/tech-innovation/teaching-learning/2025/01/09/changes-boot-camp-marks-signal-shifts-workforce
Course Report (Jan 2026). "2025 Year in Review: Coding Bootcamp News." https://www.coursereport.com/blog/2025-year-in-review-coding-bootcamp-news
Community colleges and AI workforce development
Jyotishi, S. (Aug 2025). "How Community Colleges Can Realize the Promise of AI Action Plan." DC Journal / InsideSources. https://dcjournal.com/how-community-colleges-can-realize-the-promise-of-ai-action-plan/
The EDU Ledger (April 2026). "The Quiet Revolution: Community Colleges Are Training America's AI Workforce." https://www.theeduledger.com/institutions/community-colleges/article/15823676/the-quiet-revolution-community-colleges-are-training-americas-ai-workforce
New America (July 2025). "What We've Heard About AI Education at Community Colleges." https://www.newamerica.org/education-policy/edcentral/what-weve-heard-about-ai-education-at-community-colleges/
Foundational theoretical works
Polanyi, M. (1966). The Tacit Dimension. University of Chicago Press.
Autor, D., Levy, F., & Murnane, R. (2003). "The Skill Content of Recent Technological Change: An Empirical Exploration." Quarterly Journal of Economics.
Garicano, L. (2000). "Hierarchies and the Organization of Knowledge in Production." Journal of Political Economy.
Garicano, L. & Rossi-Hansberg, E. (2006). "Organization and Inequality in a Knowledge Economy." Quarterly Journal of Economics.
Deming, D. (2017). "The Growing Importance of Social Skills in the Labor Market." Quarterly Journal of Economics.
Case, A. & Deaton, A. (2020). Deaths of Despair and the Future of Capitalism. Princeton University Press.
Frameworks and cognitive context
Artificiality Institute. https://www.artificialityinstitute.org/

