AI Dev 26 Field Card

28-29 Apr

At registration (ask first)

▾

Are sessions being recorded today? (AI Dev 25 sessions did appear on the DeepLearning.AI YouTube channel afterward, so the answer is likely yes, but worth confirming for this event.)
When will recordings go live for paid attendees?
Is there an attendee-only feed, or is the only access via the public YouTube drop?
Are speaker slides distributed separately, and where?
Is there an attendee directory or contact-exchange tool you can opt into for follow-up with speakers and panelists?

If recordings exist on a reasonable turnaround, you can be more selective about which talks need verbatim live capture and lean harder on the room (Q&A, hallway track, booths) for the ones that will be on tape later.

Intro line (deliver cold)

▾

Default

"I'm a Cambridge PhD researcher here for fieldwork. I spent 25 years on developer tools at Microsoft and Google (Visual Studio, Android Studio, Ads Platform), and the puzzle I'm chasing now is what changes about how engineering teams evaluate their own work as AI moves deeper into the loop. Curious what's been surprising for your team."

Vendor booth (swap last sentence)

"Curious what your customers are running into that you didn't expect."

Recording consent

"Mind if I record this for my research notes? I'm a Cambridge PhD studying how teams evaluate AI systems."

Around 50 words, 3 breaths, 20 seconds. The Microsoft and Google line does more work than Cambridge at a developer conference, so lead with industry. "Puzzle" not "framework." Stay mechanism-quiet on the topic itself, which is what makes practitioners lean in.

Day 1 Targets (Tue 28 Apr)

▾

Morning (confirmed times)

9:10 – 9:25

Anush Elangovan (AMD, VP Software)

Impact of AI on Software

Treat as foil capture. Get verbatim on "software becomes tokens, advantage shifts to execution velocity" and "writing code to steering intent." This is velocity stacking with no correctness variable in the frame, a cleaner Yegge-shape than the Yegge exhibit and one level deeper in the stack at the kernel layer.

9:25 – 9:40

Marc Brooker (AWS)

Keynote: The Sorcerer's Apprentice Problem (Why Agent Safety Lives Outside the Agent)

He is PSF-adjacent rather than foil material. He concedes that alignment-from-inside is losing and displaces the problem to architecture. Listen for who writes the Cedar policies and on what evidence, since the "dumber box" solution presupposes the very evaluative capacity PSF says is eroding. Quote-of-the-day candidate: "mathematically verified, not probabilistically hoped for." Tag for the collaborator file next to Celia Moore.

9:40 – 10:35

Panel: Future of Software Engineering

Catasta (Replit) · Maloney (LandingAI) · Alake (Oracle) · Reis (Practical Data Media) · Mogilko (mod.)

This is the quote-mining hour. Catasta and Maloney will run on the augmentation rail. Watch Reis specifically, since he is the closest thing to a contrarian on stage given his recent posts. Track whether the panel metabolizes Brooker's architectural framing or proceeds on Elangovan's velocity rail. If the panel ignores Brooker, the displacement is itself data.

10:35 – 10:50

Emma McGrattan (Actian)

Engineering the Context Layer (Vector Databases Across Cloud, Edge, On-Prem)

Lower PSF yield than the rest of the morning. Use this slot as a buffer to write up panel notes while the room is fresh, or skim for the deployment-topology framing if you want it on tape.

10:50 – 11:05

Keynote (Google), speaker unspecified

[Title TBA, Google product or research framing expected]

Correction from earlier: Paige Bailey is now confirmed at 4:15 PM, so this 10:50 slot is a different Google speaker, likely a Google Cloud or product team voice given the slot pattern. Listen for product-rollout language, capability claims validated by demos, and unstated counterfactuals in the productivity narrative.

11:05 – 11:25

Andrew Ng (DeepLearning.AI)

Keynote: The future of software development

The structural high-water mark of the morning's augmentation rail. Listen for productivity claims, whether internalization is addressed, and the counterfactual gains are measured against. Pair with Yegge and Elangovan in writeup. Brooker at 9:25 is the only structural pushback in the morning, so the live question is whether Ng acknowledges architectural constraints or proceeds as if Brooker did not speak.

11:30 – 12:00

Marc Manara (OpenAI)

Fireside chat

Conversational format means more unguarded rhetoric than a keynote, so this is high foil-capture potential. Listen for product-capability claims tied to specific announcements, internal adoption metrics if cited (parallel to your Anthropic self-report anchor), "AI engineer" identity language, and shipping or velocity framing. The interviewer's steering matters as much as Manara's answers.

Afternoon (confirmed times) · Stage 2 is the camp room 1:00–3:15

1:00 – 1:40 (conflict)

Harrison Chase (LangChain, Stage 2, default pick)

The Observability Flywheel: From Traces to Continuously Improving Agents

This is the talk you came for. Continuously improving against which evaluator? The flywheel metaphor accumulates momentum through proxies that get institutionalized as quality. Chase is the most influential observability voice in the field, and what he says here shapes how thousands of teams describe their own dashboards. Capture this verbatim. Conflict: Nyah Macklin (Neo4j, Stage 3) is opposite Chase with "'The AI Said So?' How to Build Auditable AI Agents Using Context Graphs." The title is more directly named at the PSF mechanism, and the audit framing is meatier than the typical session blurb. Default is Chase. If registration confirms recordings on a fast turnaround, flip to Macklin live and watch Chase on tape, since auditability framing is harder to recover from a recording without the room.

1:45 – 2:25

Anupam Datta (Snowflake, Stage 2)

Optimize Your Agent's GPA with Coding Agents

This is on-the-nose PSF territory. The title is literally about using coding agents to optimize a quality metric (GPA), which is recursive proxy optimization. Datta is more substantive than the title suggests, since he was formerly at CMU and founded TruEra (now part of Snowflake) with serious work on AI accountability. He may turn out to be PSF-adjacent rather than foil material, so tag accordingly and consider as a potential collaborator candidate.

2:30 – 2:50

Jean-Marie John-Mathews (Giskard, Stage 2)

Red Teaming LLM Applications: Systematically Finding Failures in Agents, RAG, and Chatbots

Red teaming is the externalization of evaluation, often a substitute for direct judgment when internal evaluative capacity has eroded. Listen for who decides what counts as a failure and on what evidence. The slot is short at 20 minutes, so the cost of attending is low.

2:55 – 3:15

Pratik Verma (Okahu AI, Stage 2, optional)

Observability Agent to Find & Fix Issues in AI Agents

This is recursive observability: an agent watching agents, the proxy-watching-proxy pattern in pure form. If you stayed on Stage 2 from 1:00, attend. If you need a break before coffee, skip the talk and use the time to write up.

3:15 – 3:30

Coffee break

Hallway track

Catch Chase, Datta, or John-Mathews near coffee if any of them stayed. Brooker too if he is still around. Have cards ready and be in follow-up mode.

3:30 – 4:10

Buffer slot, or Melissa Herrera (Temporal, Stage 3)

Your Agents Should Be Durable

Lower PSF yield, but "durability" is reliability rhetoric and worth a sample if energy permits. Otherwise use the slot to consolidate notes from the heavy 1:00–3:15 block before Bailey at 4:15.

4:15 – 4:55

Paige Bailey (Google DeepMind, Stage 1)

What's New and What's Next in AI

Listen for the stable-evaluator assumption, how capability claims get validated on stage, and alignment with or divergence from the Sziebert "18-Month Wall" frame. Capability talks are where unstated counterfactuals are most visible. Cross-reference with whatever the 10:50 Google speaker said in the morning.

Day 2 Targets (Wed 29 Apr)

▾

Morning · 3 parallel stages all day

9:00 – 9:35 (3-way conflict)

Amrita Venkatraman (Cursor, Stage 1, default pick)

3rd Era of Software Development

The talk is pure Yegge-shape from the most discourse-shaping coding-agent vendor, with "Era" framing serving as the continuity move that smooths over discontinuity. The session is foil exhibit territory. Capture verbatim. Conflicts: Bain (Stage 2) is running "Autonomous AI Agents in the Wild," which is management-consultancy case-study framing and prime foil material for the AI Alibi paper. Mike Chambers (AWS, Stage 3) has "The Loop Was Never the Hard Part," which is Brooker-adjacent and possibly PSF-adjacent if he displaces the difficulty to evaluation or deployment rather than the loop itself. Default is Cursor for the durable foil. If you can peek into Bain and see data slides, flip there, since case-study evidence is harder to recover from a recording than vendor framing.

9:45 – 10:25 (the talk)

Barun Singh & KJ (Andela, Stage 2)

Hidden Cost of AI Velocity

The title is PSF in plain language. Andela is a distributed talent platform with proprietary visibility into what happens when AI deploys across global engineering teams, so the underlying data may be evidence-grade rather than rhetoric-grade. KJ is almost certainly Kennith Jackson, SVP of Solutions at Andela. He published "Why hiring 'AI engineers' won't work" on April 23, three days before the conference. The article reads like a public companion to the talk and gives you a 3-day preview of his framing: a 75% fail rate on basic AI skills assessments, a missing variable he calls "technical taste," and a three-archetype decomposition (Prototypers, Builders, Scalers) plus Forward Deployed Engineers. The framing is PSF-adjacent rather than foil. He sees the proxy-substitution problem but treats technical taste as a static property to select for, missing the dynamic capacity-erosion mechanism. Q&A prompt if you get face time: "Your archetype decomposition reads as a fix to a static classification problem. What changes if technical taste is dynamic, eroding rather than fixed, when engineers engage AI systems heavily?" The question signals you have read the article, names a mechanism distinction without naming PSF, and tests whether his team has thought about the dynamic case.

Pre-reads · Day 2 evening · ~20 min total

Jackson, "Why hiring 'AI engineers' won't work" · CIO · 23 Apr 2026 Jackson, "The forward-deployed engineer" · CIO · 20 Jan 2026

A 2-minute LinkedIn check on Barun Singh is also worth doing, to identify whether he sits on the engineering operations side (platform telemetry data) or the research side (interpretive framing). Their balance shifts what the talk's evidence base looks like.

10:30 – 10:55

David Park (LandingAI, Stage 2)

Building Production-Grade Agentic Systems

Maloney was on the Day 1 panel, and this is his team's deeper take. Listen for what "production-grade" means in their account, since the phrase usually contains a proxy substitution. Useful for the LandingAI exhibit set.

Afternoon

1:00 – 1:40

Brandon Middleton (Replit, Stage 1)

Vibe Coding Master Class

Direct PSF target. Connect the talk to the Koren et al. Tailwind anchor (downloads up, revenue −80% under vibe coding) and to Catasta's framings from the Day 1 panel. The "master class" format is interesting because it is instruction-coded rhetoric, which makes the implicit evaluation criteria more visible than a keynote would. Capture verbatim.

1:45 – 2:25

Erik Thorelli (CodeRabbit, Stage 1)

Deploying AI Code Review at Scale

AI reviewing AI is recursive proxy in pure form. Listen for what "review" means in their account and against what standard. The talk pairs with the Sonar session at 2:30 for an evaluator-vendor exhibit set.

2:30 – 3:10

Tom Howlett (Sonar, Stage 1)

Can LLMs Generate Enterprise Quality Code?

Sonar is the code-quality vendor, so this is an evaluator-tooling company explicitly framing the evaluation question. High-yield evaluation discourse. Listen for whose definition of "quality" is operative and how it gets measured.

3:30 – 4:00

Paul Everitt (JetBrains, Stage 1)

Shift to Agentic Engineering

IDE vendor augmentation rail. With Cursor at 9:00 and JetBrains here, plus Elangovan from Day 1, you will have a complete vendor-exhibit set on the post-craft engineering mode. The "shift" framing is itself a continuity move worth flagging.

4:15 – 4:55

Diamond Bishop (Datadog, Stage 1)

The Next 100 Agents

Observability at scale, the natural follow-on to Chase from Day 1. The "100 agents" framing presupposes scale and velocity as the success variable. Datadog is a heavyweight observability voice in enterprise, so the talk captures the enterprise side of what Chase represents on the developer-tools side.

4:40 – 5:00

João Moura (CrewAI, Stage 3, optional)

Building Governed Enterprise Workflows

Moura has been on your radar already. "Governed enterprise workflows" is governance discourse that pairs with the AI Alibi paper. Sample if you stayed on Stage 3 after Bishop, or skip and use the slot to consolidate.

5:00 – 5:25

Manos Koukoumidis & Webb (Oumi, Stage 2, optional)

VibeML

Vibe coding rhetoric extended into ML training. A brief sample completes the vibe-engineering exhibit set across software, code review, and ML. If energy is gone by here, skip and head to Happy Hour to find the Andela team.

5:30 onward

Happy Hour (AMD sponsor)

Hallway track at scale

Approach the Andela team if you have not already. Try to find the Bain people if you missed them in the morning. Brooker if he is still around from Day 1. Have cards ready and be in follow-up mode.

Day 2 has no keynote weight comparable to Brooker or Ng on Day 1, so the day is breakouts at three parallel stages with thinner per-talk significance but denser foil opportunities per hour. The strongest concentrate is the vibe-coding rhetoric cluster (Cursor 9:00, Replit 1:00, JetBrains 3:30, Oumi 5:00), which together with Elangovan and Catasta from Day 1 should give you a complete vendor-exhibit set on the post-craft engineering mode.

Trap categories to listen for

▾

AInflation

Significance framing the evidence cannot bear
False comparative ranges (X to Y productivity gain)
Rule-of-threes patterns in claims
Unearned negative parallelisms ("not just X, but Y")

BSubstitution

Volume as quality proxy (commits, PRs, tickets, lines)
Benchmark conflated with evaluation (Bean et al.)
Engagement metric standing in for outcome
Self-report taken as ground truth (METR perception gap)

CContinuity

Augmentation / centaur framing (Brynjolfsson, Mollick)
Evolution / staircase framing of discontinuity (Yegge)
"Just a tool" framing
Stable-evaluator assumption (cross-cutting)

DConcealment

Premature arrest (declaring done before judgment forms)
Differential burden absent from the account
"AI mindset" or posture talk without mechanism
Counterfactual not specified, only outcomes named

These four categories mirror the operational structure of the 21-trap repo. Exact trap labels live in the toolkit. Use the category letter when tagging in real time.

Live tag system

▾

T-A/B/C/D trap (with category letter) F foil candidate E evidence (corroborates PSF) Q direct quote (capture verbatim) B boundary activity reference ? follow up later

Pattern in notes: TIME · SPEAKER · TAG · 6–10 words. Reconstruct in the logbook tonight, not live.

Foil-recognition prompts

▾

Is the speaker assuming the evaluator stays the same across the transformation?
Is engagement framed as merely informative rather than constitutive of the metric?
Is the gap framed as measurement timing rather than capacity erosion?
Is the productivity claim measured against any specified counterfactual?
Whose judgment validated this metric? On what evidence?
Does the framing presuppose evaluative capacity is preserved?
Is engagement substituted for outcome, or kept distinct?
What is being lost in the account, even by implication?

Booth questions by strand

▾

Agent platforms / dev tools (LangChain, Replit, LandingAI)

How do your customers know when the agent is wrong about something they could not have caught themselves?
Have you seen teams whose ability to evaluate output got better, or worse, over time on your platform?
What is the best diagnostic you have seen for whether a team is internalizing the agent's work or just shipping it?
When something goes wrong in production, where does the audit start, and who can read the trace?

Enterprise infra / data (Oracle, AMD, Actian, Neo4j, Arm)

When a customer reports the system is performing well, how do they know that?
Have you encountered customers who could no longer distinguish a working system from a failing one?
What does evaluation look like for the people who used to do this work manually?
Which proxy do your customers most often mistake for outcome?

Startup track

What does success look like that is not measurable in your first 18 months?
Which of your current metrics would you sacrifice if you could keep the rest?
Which boundary activities (developer relations, customer success, design review) became harder, not easier?
What is your customers' most confident wrong belief about your product?

Ask consent before recording. Field-note shorthand right after each booth, not later. One vendor at a time, no double-booth without a reset walk.

Anchor studies (one-line cite)

▾

METR · perception gap, 39pp

Cruces et al. 2026 NBER 34851 · scaffolded, not internalized

Bean et al. 2026 Nature RCT · benchmark proxy failed

Liu et al. 2026 arXiv 2604.04721 · 3 RCTs, N=1,222, ~10min AI exposure reduced persistence and independent performance, causal

Kim & Kang 2025 OS · predictions degrade reasoning

Acemoglu et al. 2026 NBER · knowledge collapse

Koren et al. 2026 · Tailwind: downloads up, revenue −80%

Anthropic, OpenAI, Uplevel, DORA · supporting self-report and engagement-metric evidence