AI Horizon Map

A living map of what has happened, what is changing now, and what credible futures may come next.

Past Now Next

21 past turning points · 6 now signals · 15 next forecasts

Five Horizons where consensus sits — single chart, five paths →

Where are we now?

6 patterns tracked across the field

1 confirmed, 5 emerging

2 inflections, 4 trends

Top themes: infrastructure, models, agents

Filter by themeshowing everything

Now

what is changing

Open-weight reasoning models match the frontier and undercut the price

confirmed

modelsinfrastructureenterprise

GLM-5.1 (Z.AI) ships as a 754B open-weight agentic model that hits SOTA on SWE-Bench Pro and sustains 8-hour autonomous execution at roughly a third of frontier API cost. Three distinct moats collapse into one release: open weights, frontier capability, agentic competence. Direct continuation of the DeepSeek-R1 thread from January.

If your stack still routes reasoning through a paid API by default, the math has changed. Self-hosted frontier reasoning is no longer a research curiosity.

10 Apr 2026 2 sources

The agent production-ops layer is crystallising

emerging

agentsinfrastructureenterprise

Three independent launches in three days, each targeting a different layer of the agent production stack: BotCTL (process management, billed as systemd for agents), OnCell.ai (per-user isolation), Relvy (automated on-call runbooks, YC F24). These are problems that only matter once agents are actually being deployed in production, which means agents are actually being deployed in production.

Building on agents in production now means picking from a real toolkit, not duct-taping cron jobs and shell scripts. The stack is shipping faster than the frameworks.

10 Apr 2026 4 sources

Tool-calling and agent-messaging protocols are hardening into infrastructure

emerging

agentsinfrastructure

AgentDM (agent-to-agent over MCP and A2A), QVeris (10k capabilities discoverable via one protocol), Postagent (Postman-style CLI for agents), and ZeroID (OIDF-based agent identity) all landed within two days. The pattern is the same shape MCP started in late 2024. Agent infrastructure stops being framework wars and starts being shared plumbing.

If you are picking an agent framework today, prefer the ones that compose over the protocol layer (MCP, A2A) rather than locking you into a single vendor's tool surface.

10 Apr 2026 5 sources

OS-as-environment is becoming the standard for agent training

emerging

agentsinfrastructurecode

OSGym ships infrastructure to manage 1,000+ OS replicas at $0.23 per day for computer-use agent research. That is not a product launch, it is a capability. Parallel rollouts on real operating systems become economically viable. Astropad Workbench and TUI-use round out the pattern: agents need bodies, and the bodies are computers.

Computer-use agents are about to be cheap to train. If your product touches workflows that humans currently do in a browser or terminal, expect competitive pressure within six months.

10 Apr 2026 4 sources

Local-first inference moves from edge case to default for new launches

emerging

modelsinfrastructureinterfaces

Four launches in five days defaulting to local-first inference rather than cloud: Google's offline AI dictation app on iOS using Gemma, Imbue's Bouncer (on-device LLM for Twitter feed control), QVAC SDK (universal JS for local AI), and Meta's EUPE (sub-100M-parameter vision encoder family). Notable that Google itself is shipping offline-first using its own models. That is the shift, not any single launch.

The economics are flipping. Default to local for new launches unless you have a specific reason to go cloud-first. Users increasingly notice the difference.

10 Apr 2026 5 sources

Multimodal reasoning becomes a frontier-race table-stakes capability

emerging

modelsinterfacescreativity

Meta Superintelligence Lab released Muse Spark, a multimodal reasoning model with thought compression and parallel agents. Frontend-VisualQA (Yutori AI) gave coding agents visual verification of their own UI work. EUPE shows compact vision encoders can rival specialists. Multimodal reasoning is no longer a frontier-lab talking point. It is becoming a baseline capability across the stack.

Vision is no longer a separate capability you bolt onto a model. Plan for it as a baseline assumption when designing agent workflows.

10 Apr 2026 3 sources

what credible futures may come

Agent security becomes a platform layer

emerging

agentssecurity

Within 18 months, sandboxing, approval gates and audit logs will be standard features in serious agent platforms. Tool-connected agents turn prompt injection from a content problem into an operational-security problem.

forecast reviewed 11 Apr 2026

Coding agents own the easy middle

emerging

codeagents

Within 12 months, AI agents will produce most greenfield boilerplate, test scaffolding and straightforward refactors in fast-moving software teams. The capability is already good enough for bounded software work, while the hard part remains review, architecture and production judgement.

forecast reviewed 11 Apr 2026

Labelling outruns licensing

emerging

regulationcreativity

Within 24 months, more jurisdictions and platforms will enforce synthetic-media labelling or provenance rules than will settle training-data licensing rules. Transparency is easier to operationalise than copyright economics.

forecast reviewed 11 Apr 2026

Open-weight reasoning reaches practical frontier parity

emerging

models

Within 18 months, at least one open-weight reasoning model will sit within striking distance of the best closed APIs on public reasoning and coding evals at materially lower cost. The DeepSeek-R1 to Gemma 4 arc suggests the open curve is steepening faster than frontier labs can preserve a clean capability moat.

forecast reviewed 11 Apr 2026

Protocols beat bespoke agent plumbing

emerging

agentsinfrastructure

By April 2027, most serious enterprise agent stacks will expose tools and data through an MCP/A2A-style protocol layer rather than mostly bespoke connectors. Once agents need many tools, shared plumbing becomes cheaper and more governable than custom glue.

forecast reviewed 11 Apr 2026

Computer-use agents find narrow fit

contested

agentsinterfaces

Within 12 months, browser and desktop agents will stick in a small set of repetitive back-office workflows but remain too brittle for broad consumer autopilot. The capability curve is real, but permissions, exception handling and reliability are still improving slower than the demos.

forecast reviewed 11 Apr 2026

Enterprise AI stays multi-model

contested

enterprisemodels

Within 24 months, most large enterprises will run at least three model families in production rather than standardise on one provider. Bedrock, Vertex and Azure are all training buyers to purchase optionality instead of allegiance.

forecast reviewed 11 Apr 2026

Memory beats model delta

contested

modelsinterfaces

Within 12 months, assistant retention will depend more on memory, projects and connectors than on small differences in base-model quality. Once models are all good enough, continuity starts to matter more than eloquence.

forecast reviewed 11 Apr 2026

Regulation stays regionally forked

contested

regulationsociety

By 2028, AI regulation will still be regionally split, with the EU more rule-heavy, the UK lighter-touch, and no clean global settlement on copyright or frontier-model duties. The compliance machinery is diverging faster than international consensus is forming.

forecast reviewed 11 Apr 2026

Robotics pays off at work before home

contested

roboticswork

By 2028, AI-driven robots will create clearer economic value in warehouses, factories and structured commercial settings than in ordinary homes. Manipulation is improving quickly, but the home is still the messiest possible deployment environment.

forecast reviewed 11 Apr 2026

Search becomes an assistant surface

contested

searchinterfaces

By late 2027, planning, shopping and research queries will more often begin in assistant-style surfaces than in a blank search box for power users. Search is already absorbing reasoning, follow-up dialogue, personal context and agentic steps.

forecast reviewed 11 Apr 2026

AI video becomes normal pre-production

speculative

creativitywork

Within 18 months, AI video will be routine for pitches, previs, explainers and ad variants but not a clean substitute for premium live-action production. The tools are improving fast, but taste, control, rights and production workflows still matter.

forecast reviewed 11 Apr 2026

Education adapts assessment before credentialing

speculative

educationsociety

Within 24 months, schools and employers will rely more on supervised, oral and process-based assessment because AI tutoring is easier to absorb than AI credentialing. Teaching is easier to displace than trusted signalling.

forecast reviewed 11 Apr 2026

Local inference becomes the private default

speculative

infrastructuremodels

Within 24 months, many privacy-sensitive or offline AI features in mainstream apps will default to local inference, with cloud models reserved for heavier reasoning. The model-size curve is falling fast enough that latency, privacy and cost can outweigh raw frontier quality.

forecast reviewed 11 Apr 2026

Vibe coding goes corporate

speculative

codeworkenterprise

Within 18 months, non-engineers in medium and large firms will ship useful internal tools through conversational builders and agentic IDEs. The interface to software is getting easier faster than organisations can redesign review, integration and security.

forecast reviewed 11 Apr 2026

Compare View

three credible futures, side by side

AGI ⚠

modelssociety

Systems matching or exceeding human performance across most cognitive tasks, including ones outside their training distribution, without task-specific retraining. See debate →

optimistic by end of 2028

Assumptions: Reasoning, memory, tool use and self-improvement keep compounding until frontier systems can transfer robustly across most cognitive domains at roughly human level.
Blockers: The last 20% of reliability, autonomy and world-model grounding proves far harder than scaling advocates expect.
Implication: Build for an agent-native world now: outcome-based products, tiny oversight teams, and businesses that assume intelligence becomes abundant before trust does.

pragmatic 2032-2035

Assumptions: Systems become extraordinary co-workers first, and only later cross the threshold into genuinely general autonomous competence across domains.
Blockers: Deployment friction, safety constraints and evaluation gaps keep real-world capability behind benchmark capability for years.
Implication: Design around hybrid intelligence: let machines dominate bounded cognition while humans keep accountability, cross-functional judgement and final authority.

sceptical not before the 2040s

Assumptions: Human-level general intelligence depends on embodiment, durable goals, social learning and world models that current architectures do not naturally supply.
Blockers: Digital-only systems start generalising across messy real-world domains with minimal scaffolding and without brittle failure modes.
Implication: Optimise for augmentation, governance and competitive advantage from AI tools, not for AGI-timing theatre.

Agentic Work

agentsenterprisework

AI systems that autonomously execute multi-step knowledge work across tools, queues and approval boundaries, owning outcomes end-to-end rather than assisting a human operator.

optimistic by end of 2027

Assumptions: Tool use, memory, permissions, evaluation and error recovery improve fast enough that agents can own large volumes of queue-based knowledge work without human approval loops.
Blockers: Identity, auditability, exception handling and liability stay unresolved long enough to stop organisations trusting unattended execution.
Implication: Build narrow, high-volume domain agents now and wrap them in approval tiers, rollback paths and outcome-level monitoring.

pragmatic 2029-2031

Assumptions: Agents become dependable in structured workflows first, while open-ended office work remains mostly supervised because tacit context is still hard to encode.
Blockers: Enterprise data stays fragmented and process owners fail to redesign workflows around machine delegation.
Implication: Sell partial autonomy, not full replacement: hand-offs, triage, drafting, reconciliation and escalation will land before lights-out execution.

sceptical mid-2030s or later

Assumptions: Most knowledge work hides politics, ambiguity, negotiation and accountability that cannot be cleanly reduced to tools plus prompts.
Blockers: Agents prove they can recover from ambiguity, manage cross-system state and own consequences in messy live environments.
Implication: Treat agents as force multipliers for people and focus on better interfaces, memory and review rather than labour-substitution bets.

Robotics

roboticsinfrastructurework

General-purpose physical robots (humanoid or otherwise) that are commercially routine, not demo-quality, with fleet-deployable reliability and workable unit economics.

optimistic by 2030

Assumptions: Vision-language-action models, dexterity, battery performance and manufacturing scale improve together quickly enough to make general-purpose robots commercially routine and cheap.
Blockers: Reliability in cluttered environments, safety certification and service economics fail to move from demo quality to fleet quality.
Implication: Start designing robot-ready workflows, facilities and software now, because the integration layer will matter as much as the hardware.

pragmatic 2033-2036

Assumptions: General-purpose robotics lands first in warehouses, factories, logistics and other structured commercial environments before the home catches up.
Blockers: Unit economics stay weak because teleoperation, maintenance and failure recovery remain too expensive.
Implication: Build for structured environments and mixed fleets, where robot coordination, observability and process redesign create the first durable value.

sceptical not before the late 2030s

Assumptions: True general-purpose robotics is a full-stack systems problem, and manipulation, safety and upkeep are much harder than the current curve implies.
Blockers: On-device robot models plus mass manufacturing crack reliability and cost at the same time.
Implication: Treat humanoids as long-duration options and keep investing in fixed automation, sensors and workflow software that pays off sooner.

Software Automation

codeenterprisework

Coding agents reliably owning the loop from ticket to monitored production deploy across large codebases, leaving humans mostly specifying, reviewing and steering.

optimistic by end of 2028

Assumptions: Long-horizon coding agents become strong enough at planning, editing, testing, migration and repository memory that humans mostly specify, review and steer.
Blockers: Security, reproducibility, architecture drift and repo-specific context keep agents from owning production changes at scale.
Implication: Rebuild engineering around evaluation, review policy and product judgement, because typing and boilerplate stop being the scarce skill.

pragmatic 2030-2032

Assumptions: AI takes over most routine implementation and maintenance, but humans still dominate architecture, incident response, stakeholder translation and high-risk decisions.
Blockers: Firms fail to trust generated changes in production and never build the testing and governance needed for deeper automation.
Implication: Prepare for smaller engineering teams with stronger QA, clearer specs and codebases designed to be legible to agents.

sceptical not before the mid-2030s

Assumptions: Production software is mainly about ambiguous requirements, coordination, risk and long-tail maintenance rather than writing lines of code.
Blockers: Agents start reliably running the full loop from ticket to monitored deploy across large, messy codebases.
Implication: Invest in developer leverage and system clarity, not in simple headcount-reduction stories.

Education Disruption

educationworksociety

AI tutoring and assessment displacing institutional course and credential delivery as the primary structure through which people learn and signal mastery.

optimistic by 2030

Assumptions: AI tutoring becomes dramatically better and cheaper than conventional content delivery, and assessment adapts fast enough to preserve trust in learning outcomes.
Blockers: Credentialing inertia, safeguarding, procurement cycles and political resistance keep institutions tied to legacy delivery models.
Implication: Build assessment, coaching, learning-record and teacher-orchestration products rather than more static content libraries.

pragmatic 2032-2035

Assumptions: AI transforms tutoring, practice and feedback quickly, but schools, universities and employers retain the institutional shell of courses, cohorts and credentials.
Blockers: Demonstrated learning gains become so overwhelming that institutions are forced to redesign faster than expected.
Implication: Plug into existing institutions instead of trying to replace them; the winning tools will fit classrooms, campuses and compliance.

sceptical not this generation

Assumptions: Education is not primarily content delivery; it is socialisation, signalling, childcare, norm formation and supervised practice, so AI enhances learning without replacing structured learning.
Blockers: Employers stop trusting conventional credentials and start trusting AI-mediated mastery records instead.
Implication: Focus on teacher augmentation, administrative relief and better evidence of skill, not on betting against the institution itself.

Debate Zone

where smart people genuinely disagree

Will open-weight models catch the frontier permanently, or only in specific niches?

modelsinfrastructure

For

The pattern is no longer “open follows years later”; it is “open absorbs frontier ideas fast enough to matter commercially”. Stable Diffusion proved that closed advantages can leak into public ecosystems, LLaMA broke the sense that only a few labs could play, and DeepSeek-R1 showed that open reasoning can arrive with real teeth. If open gets to 90–95% of frontier capability at a fraction of the cost, permanent practical parity is enough to change who wins distribution.

Supporting evidence

Stable Diffusion public release
LLaMA and the open-weight era
DeepSeek-R1
Open-weight reasoning reaches practical frontier parity

Against

The frontier does not stand still while open catches up. Closed labs still control the best compute, the richest feedback loops, the strongest product distribution and the hardest-to-copy post-training tricks, which lets them keep moving the goalposts from text to multimodal, from answers to agents, and from fluency to reasoning. Open will be formidable in cost-sensitive niches, but the top layer of capability will remain structurally concentrated.

Supporting evidence

GPT-4 release
Claude becomes a credible alternative
o1 reasoning shift

Is reasoning a real paradigm shift, or expensive pre-processing wearing a new hat?

modelsinterfaces

For

o1 made it plausible that test-time compute is not just extra verbosity but a new way to buy capability on hard tasks. Once models can spend inference budget deliberately, the product question changes from “how fluent is it?” to “how much thinking is this worth?”. That creates a genuinely new optimisation space for models, interfaces and pricing, and it is why reasoning now feels like a separate frontier rather than a mere feature.

Supporting evidence

o1 reasoning shift
DeepSeek-R1
Open-weight reasoning reaches practical frontier parity

Against

A lot of the current reasoning story may be packaging rather than paradigm. Bigger inference budgets, hidden chain-of-thought and better scaffolding can lift benchmark scores without solving the messy problems users actually care about, such as reliability, context management and action in the real world. If the gains are costly, slow and narrow, “reasoning” could turn out to be an expensive wrapper around familiar techniques.

Supporting evidence

InstructGPT and RLHF
GPT-4 release
Memory beats model delta
Computer-use agents find narrow fit

Will local inference replace cloud inference for the majority of routine AI interactions within 24 months?

infrastructureenterprise

For

Routine, high-frequency AI use wants low latency, predictable cost and privacy by default. Once a model is good enough to summarise, transcribe, draft, organise and personalise locally, shipping every interaction to the cloud starts to look like a design relic rather than a necessity. Cloud will still do the heavy lifting, but local could become the default surface most people touch first.

Supporting evidence

Stable Diffusion public release
Local inference becomes the private default
Regulation stays regionally forked

Against

The average user will not choose local over cloud on principle; they will choose whatever works best. The strongest models, freshest world knowledge, richest tool access and fastest product-improvement cycles are still cloud-shaped advantages, and platform owners have economic reasons to keep the smart layer centralised. Local will grow quickly, but mostly as a complement to cloud rather than its replacement.

Supporting evidence

GPT-4 release
Claude becomes a credible alternative
Enterprise AI stays multi-model

Is agent infrastructure becoming a real software category, or are we prematurely standardising a messy phase?

agentssecurity

For

When a field starts inventing shared protocols, permission layers, eval harnesses and operational controls, that usually means the pattern is becoming real. Agents that touch tools, data and workflows need governable plumbing, and that need does not disappear even if today’s demos are messy. The category may still be early, but the infrastructure demand looks structural rather than faddish.

Supporting evidence

Model Context Protocol
Protocols beat bespoke agent plumbing
Agent security becomes a platform layer

Against

Every platform shift produces a premature standards rush before anyone knows what the durable primitive actually is. Many so-called agent frameworks are still thin wrappers around brittle model calls, and the winning abstractions may end up bundled into the major model platforms instead of living as an independent layer. What looks like category formation could still be the noisy middle stage before consolidation.

Supporting evidence

Memory beats model delta
Computer-use agents find narrow fit

Will vibe coding broaden software creation, or mainly create a larger QA and governance burden?

codework

For

Vibe coding lowers the hardest barrier for many would-be builders: getting from intent to working software. That means domain experts can finally ship internal tools without having to become full-time engineers, which is how real capability usually spreads inside organisations. If guardrails improve, the organisational effect could look less like chaos and more like the spreadsheet moment for software creation.

Supporting evidence

GitHub Copilot preview
Vibe coding
Coding agents own the easy middle
Vibe coding goes corporate

Against

Most software cost arrives after the prototype: security, integrations, ownership, maintenance and change control. Vibe coding widens the front door, but it also makes it easier to generate brittle systems that create hidden review and governance debt. In companies, that may mean more shadow IT and more QA burden rather than a clean productivity windfall.

Supporting evidence

Vibe coding
Agent security becomes a platform layer
Computer-use agents find narrow fit

What actually counts as AGI — economic substitution, cognitive breadth, or something else entirely?

modelssociety

For

The useful definition is economic: a system is AGI when it can reliably perform most paid knowledge work at human level or better, across domains, without task-specific retraining. This framing is measurable, matters commercially and treats “general” as a threshold of breadth rather than a philosophical claim about understanding. Reasoning models plus tool use already point at this definition, even if the threshold is years away.

Supporting evidence

o1 reasoning shift
DeepSeek-R1
GPT-4 release

Against

The economic definition smuggles in a philosophical claim and hides the hard part. Genuine general intelligence arguably requires durable goals, embodied experience, robust world-modelling and cross-domain transfer that current systems do not have even when they score well on benchmarks. A system that passes every exam and still cannot run a small business on its own is not general; it is a very capable text engine. The label matters because it shapes policy, investment and safety expectations.

Supporting evidence

AlphaGo defeats Lee Sedol
AI safety breaks into public view
Gemini 2 frames the agentic era

Past

what mattered, and when

2025

Vibe coding

1 Feb 2025

Andrej Karpathy named a new way of building software by talking to AI and steering outputs conversationally. It marked the moment coding started to feel like directing and taste-making, not just writing syntax.

codework

DeepSeek-R1

1 Jan 2025

DeepSeek released an open reasoning model it said was on par with o1, with open weights and MIT licensing. It reset expectations on open models, reasoning economics and who could shape the frontier.

modelsinfrastructure

2024

Gemini 2 frames the agentic era

11 Dec 2024

Google's Gemini 2.0 launch foregrounded agentic capability — multimodal reasoning, tool use, and long-running task execution — as the explicit framing for what the next generation of frontier models would compete on.

agentsmodels

Model Context Protocol

1 Nov 2024

Anthropic open-sourced MCP, a standard for connecting AI assistants to tools and data. It gave the agent era shared plumbing instead of endless bespoke integrations.

infrastructureagents

o1 reasoning shift

1 Sept 2024

Models began visibly spending more effort on reasoning before answering. It marked the move from fluent prediction to deliberate reasoning as the next frontier.

models

GPT-4o brings real-time multimodal AI

13 May 2024

OpenAI's GPT-4o unified text, audio and vision in a single model with conversational latency. It moved the assistant interface from turn-by-turn typing toward real-time spoken interaction across modalities.

modelsinterfaces

Claude becomes a credible alternative

1 Mar 2024

Anthropic's Claude 3 made the frontier model race feel genuinely multi-player. It ended the idea that one lab would automatically own the assistant layer.

models

Sora reveal

1 Feb 2024

OpenAI previewed a video model with unusually coherent scene generation. It made world-simulation feel plausible rather than gimmicky.

modelscreativity

2023

AI safety breaks into public view

29 Mar 2023

Eliezer Yudkowsky's intervention helped push existential-risk arguments into mainstream debate. It split the story of AI into two tracks, capability acceleration and safety confrontation.

regulationsociety

GPT-4 release

14 Mar 2023

OpenAI released a much stronger model with a visible jump in capability. It moved AI from novelty to serious knowledge work, coding and professional use.

modelswork

LLaMA and the open-weight era

1 Feb 2023

Meta's LLaMA escaped into the wider world and sparked rapid open model work. It broke the feeling that only a few labs could seriously participate.

models

2022

ChatGPT launch

30 Nov 2022

A conversational interface put frontier AI in front of ordinary people. It turned AI from a sector story into a civilisation story almost overnight.

modelsinterfacessociety

Whisper releases as open-weight speech recognition

21 Sept 2022

OpenAI released Whisper as open-weight automatic speech recognition matching commercial systems on many languages. It seeded a wave of locally-runnable transcription that did not need to call a cloud API.

modelsinfrastructure

Stable Diffusion public release

1 Aug 2022

Powerful image generation became openly available on ordinary hardware. It started the open model era in public consciousness.

modelscreativity

InstructGPT and RLHF

1 Mar 2022

OpenAI showed that human feedback could make models more useful and aligned. It changed the winning formula from just training bigger models to training and then aligning them.

modelssociety

2021

AlphaFold 2 lands

1 Jul 2021

DeepMind solved protein structure prediction at near-experimental quality. It showed AI was not just a media tool but a scientific instrument.

modelsdata

GitHub Copilot preview

1 Jun 2021

AI coding help moved into everyday developer workflow. It turned software creation into one of the first mass human-plus-AI production loops.

codework

DALL-E first reveal

1 Jan 2021

OpenAI showed text-to-image generation as a general capability. It opened the public imagination to generative AI beyond text.

modelscreativityinterfaces

2020

GPT-3 paper

1 May 2020

OpenAI showed a giant language model that could do many tasks from prompting alone. It shifted the field towards general-purpose foundation models instead of narrow task systems.

models

2017

Transformer paper

1 Jun 2017

Attention Is All You Need introduced the transformer architecture. It became the core design behind modern language models and much of generative AI.

modelsinfrastructure

2016

AlphaGo defeats Lee Sedol

1 Mar 2016

DeepMind's system beat one of the world's best Go players. It broke the assumption that intuition-heavy human domains were still safe from machines.

modelssociety

Weekly Shift Log

what changed, last 7 days

2026-04-22 debates updated

feat(horizon): Five Horizons — standalone strip at /horizon/five (#114)
2026-04-22 scenarios updated

feat(horizon): Five Horizons — standalone strip at /horizon/five (#114)

Track a theme

Filter the map by what you actually care about. Picked themes sync to the URL, so you can bookmark or share a slice.

Explore the machinery

The horizon map is built and maintained by the same six bots that run the rest of the site. See how it works.