"𝐃𝐨𝐜𝐭𝐨𝐫 + 𝐀𝐈 𝐢𝐬 𝐰𝐨𝐫𝐬𝐞 𝐭𝐡𝐚𝐧 𝐀𝐈 𝐚𝐥𝐨𝐧𝐞" — Prof. Jonathan Chen, Stanford Physician-Scientist, on the Future of Medicine and Human Judgment
Distilled wisdom from a Stanford MD/PhD who has spent two decades at the intersection of computing and medicine—and performs magic tricks at keynotes.
Last week, we sat down with Stanford’s Director of AI Education, Prof. Jonathan Chen, for an interview that completely blew up everything we thought we knew about the “human-in-the-loop.”
P.S. Interested in building AI-native startups? Level up your network and exchange best practice with fellow builders in person at our next meetup:
Forget the mantra of “human-in-the-loop.”
A landmark Stanford study revealed a highly counterintuitive truth: AI alone outperformed doctors who had access to the exact same AI.
The result was so shocking that lead author Prof. Jonathan Chen made his biostatistician check the data three times.
This finding challenges our most fundamental assumptions about human-AI collaboration, not just in medicine, but across every industry. To truly grasp the implications of his work, it helps to understand the unconventional mind behind it.
Prof. Chen is not your typical academic. An MD/PhD in computer science, he is a Stanford professor, Director of AI Education for Stanford Medicine, and co-creator of ChatEHR — an AI system directly integrated with Stanford's Electronic Medical Record infrastructure. His JAMA Network Open paper ranks among the most-cited AI-in-medicine studies of 2024.
He also performs magic tricks at scientific keynotes. On purpose.
After watching this interview, you’ll walk away with a framework for where AI genuinely helps in medicine, where it genuinely fails, and what it means to be a useful human in a world where the calculator is smarter than the calculator operator.
What We Discussed
The JAMA study that broke a cherished belief: human + AI was no better than AI alone—and what it really means.
The “From Tool to Teammate” research: what it takes to redesign AI workflows so that human-computer collaboration actually works.
Why frontier AI models still give harmful medical recommendations 10–20% of the time—and why the harm usually isn’t hallucination, but silence.
The Competence / Communication / Character framework: what doctors (and humans generally) are actually for in the age of AI.
Why AI regulation is probably hopeless as written—and why that might be fine.
How magic became Jonathan’s signature—and what sleight of hand taught him about teaching at Stanford.
Full interview (YouTube):
Key Takeaways
1. Human + AI is not automatically better than AI alone.
This is the uncomfortable finding from his most-cited paper. Doctors with access to GPT-4 performed at roughly the same level as doctors with access to the regular internet (72% vs. 74%). But GPT-4 asked directly got over 90%. The reason? A third of the doctors in the study had never touched an AI chatbot before. Many were treating it like a Google search. They didn’t know how to copy-paste a full patient case. They didn’t know they could ask follow-up questions. The interface was a tool, not a teammate—and the tool went unused.
“I looked straight at the biostatistician. Wait, you have to double confirm. That is really true? That is an actual result? Because if it is—that is clearly the headline.”
2. The fix isn’t better AI. It’s better workflow design.
The follow-up study—”From Tool to Teammate”—tested a custom AI that explained itself in real time, acted more like a co-pilot than a blank prompt box, and taught doctors how to use it while they used it. Result: human-computer collaboration became effective again, roughly matching AI-alone performance. The lesson generalizes far beyond medicine. Don’t just hand someone a powerful tool with no onboarding and be surprised when they can’t fly the plane.
3. The danger isn’t that AI invents wrong answers. It’s that AI says nothing.
His “No Harm” study (currently in pre-print) found that even the best frontier models give harmful recommendations 10–20% of the time on medical consult cases. But the majority of those harms aren’t hallucinated drugs or invented surgeries. They are errors of omission. A patient describes suicidal ideation. The AI says: “Thanks for telling me.” A patient has symptoms strongly suggesting cancer. The AI reassures them and does nothing. Silence, in medicine, is not neutral.
“Doing nothing is not the same as doing no harm.”
4. Competence, Communication, Character—what humans are actually for.
Jonathan has a perspective piece coming soon on this question: what is the point of a doctor when AI outperforms doctors on medical reasoning tasks? His answer is a three-part framework. Competence: not raw knowledge (you will lose that competition) but judgment—experiential understanding of what is good, what is acceptable, what risks are worth taking, which stats and evidence alone cannot answer. Communication: not just empathy (a robot can simulate empathy) but influence—the ability to direct someone toward something actually good for them. Character: the willingness to tell a patient what they don’t want to hear, which AI systems are specifically not designed to do. These systems are designed to be engaging, not correct. To make you come back, not to be honest.
“These systems aren’t designed to be useful—they’re designed to be engaging. They’re designed to make you want to come back. That is what a business optimizes for, not to be useful.”
5. AI education cannot be banned. It can only be reframed.
When Stanford Medicine tried drafting its first AI policy, the first draft read: “Unless your instructor specifically says you can use AI, assume you cannot.” Jonathan fought that policy hard. It’s unenforceable—you cannot stop students from using AI at home to study. Worse, it puts students in an honor-code trap. The policy they landed on: foundational curriculum that will still be true regardless of what tool is invented (patient privacy, sensitivity vs. specificity, risk-benefit reasoning), paired with explicit acknowledgment that students will be tested in closed-book environments where they can’t depend on AI. Because one day they’ll be in front of a patient and need to think on their feet.
“AI could be the best tutor and trainer you’ve ever had. But if you misuse it, it will be the worst thing that undermines your learning and compromises your ability to be a competent professional.”
6. Agents will be disruptive. But “AI” is not a magic wand.
Jonathan’s working definition of an agent vs. a chatbot: a chatbot says things, an agent does things. But he’s also clear that agents are currently overhyped in the broadest sense—people say “can you use AI to solve my problem?” without understanding what AI actually does or doesn’t do. The internet bubble burst, Pets.com went to zero, and yet the internet did change the world. He thinks agentic frameworks will do the same in healthcare. But the hype cycle will have to pass first.
References
Prof. Jonathan Chen (Guest): https://profiles.stanford.edu/jonc101
Stanford ARISE Network: https://med.stanford.edu/hospitalmedicine/research/ARISENetwork.html
AI in Med-Ed Resources: https://med.stanford.edu/ai-in-meded/resources-and-tools.html
Jing Conan Wang (Host): https://www.linkedin.com/in/jingconan/
FounderCoHo: https://www.foundercoho.com/
Timestamps
00:00 - Highlight
02:38 - Skipping High School at 13
05:26 - Choosing a Major & Pre-Med Path
06:57 - Deciding to Pursue MD + PhD
18:23 - Building an AI for Organic Chemistry
23:44 - The JAMA Study: Doctors + AI vs. AI Alone
29:06 - Redesigning AI Workflows in Medicine
30:39 - AI in Mental Health: Promise and Risk
40:41 - AI Safety & Benchmarking in Healthcare
53:53 - Innovations in Electronic Health Records
57:46 - Transforming Medical Education with AI
01:12:41 - Magic Tricks as a Teaching Tool
01:25:25 - Advice for Interdisciplinary Builders
01:26:50 - Resources & Closing Thoughts
01:27:00 - Outro
Never miss a FounderCoHo community event, podcast, or newsletter. Join us:



This is expected, but what isn’t being talked about enough is the concept of nostalgic jobs, the idea that even though Ai would be better at something, doesnt mean to say humans will accept it.