The Business of Health with Chip Kahn
What AI Can Do — And What It Can’t
May 5, 2026
Audio
About this Episode
Episode 2, AI Series: The data is good enough, the technology is getting better, the computing is becoming more available, and the use cases are getting clearer—but is AI truly a revolutionary technological advancement yet for health care? With a 30-year perspective on what digital technology has done and failed to do in health care, Dr. John Halamka, President of the Mayo Clinic Platform, joins Chip in discussing whether AI is actually disruptive or another wave of incremental change.
The Host
Charles N. Kahn III is a senior visiting fellow at KFF. He is also a visiting senior fellow at the American Enterprise Institute and a nonresident senior scholar at the University of Southern California’s Schaeffer Center for Health Policy & Economics. He serves as co-chair of the international Future of Health collaborative.
Guest
Dr. Halamka is an emergency medicine physician, medical informatics expert and president of Mayo Clinic Platform, a digital initiative that brings together solution developers, data partners and healthcare service providers to transform healthcare. Dr. Halamka has been developing and implementing healthcare information strategy and policy for more than 40 years. Previously, he was executive director of the Health Technology Exploration Center for Beth Israel Lahey Health, chief information officer at Beth Israel Deaconess Medical Center, and International Healthcare Innovation Professor at Harvard Medical School. He is a member of the National Academy of Medicine.
Transcript
AI Usage Disclosure: This transcript was created with assistance from AI tools. It was reviewed and edited by KFF Staff.
Chip Kahn: Last week Eric Larson gave us the strategic landscape, the case that AI is a general-purpose technology on the order of the steam engine and the Internet, and that American health care is uniquely exposed to its disruption and largely unprepared for it. This week we go from the roadmap to the road. Our guest is John Halamka. There may be no one in American medicine who has had a longer or closer view of what digital technology has done and failed to do in health care over the last 30 years. He ran IT at Beth Israel Medical center for more than two decades, advised the Bush and Obama administrations on national health IT, and lived through high tech, meaningful use and, and the rise of telehealth. He has watched every wave of digital innovation promise to transform American medicine and deliver something less. Today, he is president of Mayo Clinic Platform. So, when John says, as he did recently, that the data’s good enough, the technology’s getting good enough, the compute is getting available enough, and the use cases are getting clearer, that this time really may be different, it carries the weight of three decades of having heard this time is different before. The mission critical question for this conversation is whether AI is genuinely a disruptive revolution that has to be navigated or another wave of incremental change. And if it is a revolution, what makes it categorically different, and what does it take to navigate it? There is no better person to take us from the big picture to the operational reality. John Halamka, welcome to KFF’s Business of Health.
John Halamka: It is extraordinary to be with you because we’ve worked together for 25 years. This is going to be fun.
Chip Kahn: Great. John, one thing striking me, and at least for our YouTube viewers, you’ve got two grandfather clocks in the back of the room that I can see, and that’s an actual room, it’s not a virtual site. What’s that all about?
John Halamka: So, Eric Schmidt, who is on the board of Mayo, would tell you the following, that there are certain technological principles you should follow because the technology is going to change very fast, but the principles will not. And so, the two clocks behind me illustrate a bit of guidance as we start to talk about AI. So, in 1905 a guy in Armonk, New York had this idea. Could he take modular componentry knowing that technology would change? But you know, for the moment, take the technology you had, put it together in a novel way and create something of value. And so, the clock that you see over there was assembled to wind itself in 1905 using a singer sewing machine motor and Mercury switches. But of course, you could change it out as technology changed. The notion of modular replaceable technology was rolled into a company that the inventor called the International Time Recording Company. But then he said, “I wonder if I could work on other business machines.” So, he renamed the company IBM. So, the clock on the left is the pitch deck for the startup company of Thomas J. Watson, IBM. The clock on the right is exactly the opposite. And every component is hand-tooled, not replaceable, not maintainable, and locked in the technology of the mid-1700s. That’s Paul Revere’s clock. And the only way to maintain it is to be a silversmith in Boston. So, there you go. I think Thomas J. Watson had the right idea.
Chip Kahn: What a way to start. Let’s look at you for a moment. I mean you’ve spent decades on it. Whether it was your years at Beth Israel, whether it’s all the advice and guidance you’ve given to people here in Washington over that time, you’ve been there for it all. The advent of the EHR, the advent of high tech, meaningful use, the beginning of telehealth. Let’s look at it and just keep AI out of it for a moment. Why did all of that have an effect but then had so many unintended consequences and on the one hand got us somewhat advancing and we can call it a transformation, but we surely can’t call it a revolution in terms of health care.
John Halamka: Chip, you were there. I mean again, this is the great thing, isn’t it, a line from Hamilton, that we were there when these events happened in the room.
Chip Kahn: Yes.
John Halamka: And so, what happened in the room? Take us back to 2009. And as you remember, I was the chair of the Healthcare Information Technology Standards Panel. We had policy and we had standards and I was charged by Obama of figuring out all of these questions of how technology would revolutionize medicine. So, I said, well Mr. Obama, what should I do? He said well go talk to the FDA, see what they want. The FDA said oh, post market surveillance device and implantables. We need universal device identifiers. So just simply have every doctor at every visit type in every device in the patient’s body and be able to track it for quality and safety and recalls. Wow, who could argue with that? And then I said well, what should I do next? He said go talk to CMS. And CMS, we care about quality. In fact, 40 different quality measures. So what we just simply want is every doctor at every visit to record 40 numerators and denominators. So we can measure quality. Are we going to argue against quality? And then I talked to the CDC and they said, we want to look at epidemiology, and we want to look at emerging diseases or trends in violence. So all we have to do is ensure that every doctor at every visit and every nurse a complete understanding of every infectious agent that could be entering the community. By the time we were done, every doctor and every nurse had to enter 140 data elements while seeing the patient, being empathetic, never committing malpractice, in 11 minutes. It’s impossible. So, as you suggest, all of the best people with all of the best intent created a set of burden for our clinicians that unfortunately has these unintended consequences of burnout and less working at the top of your lists.
Chip Kahn: John, I think last year you said, and I’ll quote you here, the data is good enough, the technology is getting good enough, the compute is getting available enough, and the use cases are getting clearer. Is it really different this time?
John Halamka: It really is. And so, I am just turning 64. And so, I know this sounds a bit odd, but I have been working on these issues for 50 years. And 50 years ago, what did I do? Oh, well, we didn’t have compute, so I actually did something called wire-wrapped something called an Altair 8080. I actually built a computer 50 years ago. I was the very first student at Stanford University to have a computer because I built it. Well, today you could go get teraflops for pennies in an instant. Mayo wanted to do an algorithm that required 20,000 GPUs running for two weeks. No problem. You could order it like you order a Happy Meal, right? I mean, it was very easy in 2025 and 2026 to get the compute, the storage that you need that wasn’t there 50 years ago. And I actually don’t have to write a lot of code to do these things. Many of the tools are low code or no code kinds of tools. And data. Think about it. I mean, we’ve both been on this journey for decades to reduce the friction for interoperability and data standards and aggregation of information to turn it into wisdom. And today at Mayo Clinic, as we’ll talk about, I work with eight countries on sovereign AI looking at hundreds of millions of birth-to-death multimodal records so that we can create the models for the patients of the future. So yes, 2026 is the perfect storm for innovation.
Chip Kahn: So, it’s categorically different.
John Halamka: It is categorically different. How about this, sometimes I’m asked, what is the best era that you would want to relive? Oh, did you like the 60s, the 70s, the 80s? I’ll tell you the answer to the question is today. Today is the best era to be alive.
Chip Kahn: So, let’s go to Mayo and walk us through the AI applications that are generally operational right now that you have strong feelings about and that are not piloting, that are actually affecting patients at the bedside.
John Halamka: Well, sure. So, Bob Wachter, who I’m sure you know very well and you’ll be chatting with, visited Mayo for a week. And I actually took him to the bedside and saidI’m going to show you how a patient, how a doctor uses this stuff day to day, and how it materially changes the way we are delivering a service.
John Halamka: So, for example, in cardiology. And again, I’m just going to give you some real examples. And you know, I have no privacy of any kind. And it’s all okay. So I have a supraventricular tachycardia. And that means my heart rate, which is about 50 or so at rest, sometimes goes to 170. It’s irritating. It is not life threatening. Mayo Clinic said, wow, John, maybe you have a cardiomyopathy, maybe you have pulmonary hypertension, maybe you have valvular disease. I mean, we’re not sure. So you have two choices. You know, you could come to Mayo. We could spend four days doing expensive invasive procedures, or we could just run 14 algorithms on the Lead 1 ECG you gather over a consumer device in your living room. Your choice. What did I do? Again, I’m not, of course, endorsing any product or service here, but I literally bought like a $50 device on Amazon that was able to gather either, a one-lead or a six-lead ECG. And then I sent it to Mayo that ran all the algorithms, and they came back and said, John, your heart is that of a 17-year-old. It is amazing. But you have a conduction defect. Take 25 cents of Diltiazem every day and your SVT will disappear forever. I did all that literally from my living room. And I am cured. And I didn’t have a single invasive procedure. And this is what Mayo does. Take every specialty. Radiology, radiation oncology, early detection of breast cancer, prostate cancer, all of these things in production today, augmenting the workflow of our clinicians, so that those clinicians can see more patients with greater quality and safety than ever before.
Chip Kahn: Boy, that’s really significant. You know, obviously Ambient AI in doing charting is one of the big areas of progress. And I know that you do a lot, both in Arizona and Florida, with the nurses. Actually, almost all their charting is done by voice. How does that all work? And how are the nurses working with that? And then, what are the efficiencies that come from that?
John Halamka: You may remember in Bob Wachter’s first book, the Digital Doctor, the first page is a crayon drawing done by a 7 year old called “A Visit to the Doctor,” where the doctor and the nurse are staring at a computer at one side of the room and the patient and the family are on the other side of the room. And that’s unfortunately, as we have moved from an analog to digital world, that’s unfortunately, we’ve lost the hearts and the minds of our doctors and nurses by turning them into administrative typists. And so, what ambient listening can do is several things. Well, first I mentioned those 140 data elements that need to be gathered. Those 140 data elements can actually be automagically—I know that’s not a word, right—
Chip Kahn: but it sounds good, though.
John Halamka: Yeah, yeah. from the doctor and the patient having a dialogue. So, you know, have you been sleeping okay? How’s your weight, how’s your mood? How’s your family? Right? You start to populate all of that and then the clinician just goes back and edits or signs off on the result. It is a substantial reduction burden with nurses especially, right? You’ve got nursing care plans and you’ve got progress notes, and the nurse and the patient have a dialogue. And that, in effect, inpatient record is created automatically so that the goal that our Chief Nursing Officer has is that a nurse will not touch a keyboard during a shift. And what a noble goal that is. I’m going to give you an analogy to ambient listening that you’re going to find kind of funny. Take you back to 2011. There was a product called Google Glass. And you remember, you put on the glasses, they had a camera, they had capacity to run software. Beth Israel at the time was the pilot site for that product. And what did we do? Well, we displayed the patient’s chart and their vital signs and their problem list on the glasses. So, we said, hey, patient, how did you like that experience? They said, the doctor was looking at me instead of a computer the whole time. Well, of course, the reality is the doctor was just reading the computer on the glasses in front of them. But the patient experience was better. And that’s the goal of ambient listening, compliance and accuracy with a patient focus.
Chip Kahn: I guess it also affects literally the nurse’s time because she or he is not stuck at a desk anymore…
John Halamka: I mean, you’ve talked to our clinicians. Approximately 50% of nursing days are spent at a keyboard. And so now, as you say, reduce that from 50% to 5%. It means that the reason they went into nursing was active listening, empathy, contact with patients, service. They can now work at the top of their lessons.
Chip Kahn: So, the issue of whether to go with an app or a technology that’s AI driven, you’ve said that they ought to be evaluated similarly to a pharmaceutical. What do you do at Mayo? What’s the process that you have and how much rigor do you want in evidence before you’ll pilot or experiment even with a new technology?
John Halamka: Sure. So you, of course know Micky Tripathi, and Micky served as ASTP ONC lead. When he retired from the Biden administration, he actually came to Mayo and is now the Chief AI Implementation Officer. You say, “Wow, that’s a weird title.” Well, so Micky obviously had spent a career in safety and quality and data, and is charged with making sure that we deploy AI, we do it rapidly. Right. We don’t want to constrain innovation, but we also understand its implications, you know, it’s safety and consequence. So here’s what we do for every algorithm, and I’m going to start with predictive AI because predictive, generative, and agentic AI, they all have slightly different characters. Predictive AI. What data set did you use to develop it? So suppose. And of course, Chip I’m making this up.
Chip Kahn: Sure.
John Halamka: I create an algorithm from the 10 million patients that Mayo has in Minnesota, lots of Scandinavian Lutherans, and then I run that algorithm in rural Georgia. Fewer Scandinavian Lutherans. Will it be good? Will it be bad? Do you know? So a data card tells you who phenotype, genotype, exposome was put into the training set for the algorithm. So every algorithm at Mayo has a data card. Then a model card tells you actually how does it run in practice? So here’s a fun one. I don’t know if you spent time with Eric Horvitz, chief scientist at Microsoft, but back in the day, Microsoft bought Amalga, I think it was, Craig Feied, Mark Smith and MedStar created this thing, I don’t know, 20 years ago, MedStar, Washington D.C., typically insured patients. The folks at Microsoft took the algorithms developed at MedStar and moved the algorithm six blocks away to a largely Medicaid population clinic. It didn’t work at all. Right, because your insured population in Georgetown has maybe a different diet or, or a different set of medication adherence than does a Medicaid clinic. So a model card tells you a bit about how the model actually works on each patient, given stratifications of race, ethnicity, zip code, age, gender, et cetera. So Mayo does that. But then here’s the biggest issue. I am going to develop an algorithm that is going to tell Chip whether you should eat more vegetables for dinner and whether you should walk 10,000 steps a day. Suppose that algorithm is wrong. Maybe you eat too many vegetables, and you walk too much. The likelihood of harm to you probably zero. Right? So, you have to do what we call qualification. If the algorithm is wrong, what is the consequence? Suppose I have an algorithm that is actually going to automatically go, back to device integration here, automatically inject insulin into your bloodstream. Aha. that algorithm’s wrong. You could be in hypoglycemic coma. So what you see is for every algorithm, not only data card and model card, but, but a stratification of six different ranks of risk if the algorithm goes bad. And once we do that, then in a—don’t worry, this is a relatively quick process, I mean, a week or two turnaround time—we then get the approval to put it in production.
Chip Kahn: This is causing big changes. And what this podcast is all about is how do we get to, good patient outcomes with the notion that at the end of the day, the business model is what’s going to be right to get there? And so how are the economics of running a health system affected by all the kinds of apps and adaptation of these new technologies that you’re bringing into place in your health system and recommending for other health systems?
John Halamka: Well, and of course you ask the best question, but also a complicated question. And, sometimes I say with a bit of levity, the United States is actually five countries. You know, the East coast, the West coast, the Midwest, the South, and Texas, which is its own country. I say this because the reimbursement models and the incentives in each region of the United States are different. I mean, again, just knowing your career, would you say that in the Midwest in general, of course, heads in beds is a good idea, but let’s take the East coast to the West coast. Heads in beds. Oh my God, no. You don’t want that. You want wellness, you want home care, you want value-based purchasing, et cetera. So here’s the question for you, right? Depending on your reimbursement model, what is it that you’re going to do with AI that is going to ensure the best patient care? That’s of course what you want to do first. But also reimbursement, is going to cover some kinds of costs. Here’s why it’s hard. I think we probably all listened to Dr. Oz say, let’s move from sickness to wellness. Let’s move from hospital tertiary, quaternary referral to community and home, and let’s move from analog to digital. But ask yourself this question. What’s the reimbursement today for chemotherapy delivered in a hospital facility versus the home? Right. So, the incentives are slightly misaligned to do that end delivery in a non-traditional setting. So anyway, I say all this because your question is so complicated. I, mean, right, with AI, I can deliver right care, right patients, right time, right setting. But you know, hospital systems have to keep the lights on and so they will also have to reflect, “is there reimbursement for what activity that they use AI to automate?” Don Berwick, our mutual friend, said, if you automate a bad process, you just achieve a bad result faster. So, imagine you and I design a system that is great and unreimbursed. We’ll go bankrupt quicker. So again, this is not about letting revenue drive what it is we do. We have to be realistic when we deploy these things. We’re not building a CPT code for every use of AI. We’re trying to achieve efficiencies that are aligned with the reimbursement we get from delivering the service.
Chip Kahn: One of the issues, to me, I should say with pay-for-performance as an area, is that if it’s been successful at all in all the areas around the country you talked about up to this point, it’s given the payers an edge so that maybe they can get a cheaper price, maybe it’s used effectively, sometimes maybe inappropriately in terms of controlling volume, but it doesn’t really have any kind of outcomes measurement. It has all these measurement requirements that really don’t tell you much other than the hospital or the physician followed the right process or the right structure was in place. Can AI be a game changer here to begin to reform the structure you just described, which is sort of hostile to evolution that is appropriate because it’s so complicated?
John Halamka: Well, sure. So let me ask you an interesting question again. You’ve done this for decades how easy is it for you to order and get an echocardiogram on a patient? Well, here’s a problem. We don’t have a lot of echo techs and the supply and the demand has a mismatch. So, you’re going to wait six weeks to get an echo? I mean, unless you’re in some sort of life-threatening situation. Well, and again, I’m not endorsing any product or service here. I’m just telling you my experience. There are companies that are now creating AI driven devices so that a person who’s never done an echo in their lives with a minimal amount of training, as in a couple of hours, can produce an echo with the same quality as an echocardiographer with 30 years of experience. Wow. That means I’m actually be able to see more patients and deliver more services with more quality in more regions than ever before. Okay. Again, it’s going to always be in the interest of the patient and doing the right thing and it’s appropriate. But I will now be able to increase volume. But there’s another aspect of all this which is that a primary caregiver who’s utterly overwhelmed may say, ah, I am not really sure if I should refer this patient to a cardiologist or not. And how about this? I have doubt. So let me just refer them to cardiologists, which as you know, especially referrals and result in increased expense, obviously increased testing. What if the AI says actually the person in front of you right now has an ejection fraction of 70% based on their Apple Watch? (Not endorsing Apple). Oh, you don’t actually need to refer this person to a cardiologist. Well, and of course what I’m referring to is the Mayo Eagle and Beagle study, right, which actually took 125,000 EKGs from consumer devices and actually had primary care givers be able to now decide who to refer and not to refer based on AI interpretation of patient device data. And it had two interesting implications. First, those who needed cardiology referral got it 30% faster. And a whole lot of patients were actually not in need of a cardiology referral and fully managed by the PCP, resulting in the substantial increase in job satisfaction for the PCP. So again, you can hear this. We have in the United States a limited number of specialists. And if I can ensure that the right care is delivered by the right person in the right setting and AI helps us figure that out, everybody wins.
Chip Kahn: There has been a lot of discussion about AI hallucinations and other issues that are raised by the complexity and the mystery in some ways of the technology. How do you deal with that? One of my interviews the other day mentioned something, I think he used the word wobble or some word like that that said, that over time, even though their technology’s approved, it’s validated, it works over time, there’s an evolving of the way it works, so they’ve got to constantly recalibrate it. How do you make sure all of that is appropriately in place so that the AI results you’re talking about will be as assured a month from now as today?
John Halamka: Right. So what you’re talking about is data drift or data shift. And I’ll give you a real example. Think back. January 2020. Mayo was asking, how do we start delivering care in the home? How do we do telemedicine? In January of 2020, we are going to create an algorithm based on every person who is seeking remote care in January of 2020. And it will help us figure out who will benefit from remote care. And then we deployed it in March of 2020. So again, think back. How many of your patients were seeing their doctors through telemedicine in 2019 or January of 2020 versus say March or April of 2020, we literally went from 3% of the population to 93% of the population. And so, the algorithm developed back when it was 3% is completely useless when you get to the 93% because of this thing we call COVID. Right. And so, it requires, and this is what I would argue like a pharmaceutical post market surveillance on every use of the technology to say, did it work? Did it not work? Was there benefit? Was there harm? And then constant fine tuning. And so, here’s again a sort of interesting challenge. And again, I’m just going to be realistic because I get to live this every day. I went to medical school in the 1980s, and so I recently had the opportunity to speak with one of my colleagues who is the director of National Library of Medicine, Lister Hill. And I said, I’m curious, if you look back at the literature that I mastered in the 1980s, how much of it is wrong? And she did do an analysis. 60% of what I learned in medical school is wrong. I just don’t know which 60%. Right. So isn’t it interesting? Although AI, as you say, has hallucination, the AUCs aren’t perfect, but it’s probably a whole lot better than somebody who trained in medical school in the 1980s. So where does society draw the line? If my AUC is 0.6 and the algorithm AUC is 0.8, I’m betting you probably want the algorithm over me, even though it’s imperfect.
Chip Kahn: So, what you’re describing in some ways is the Waymo problem. When they hit a cat, it’s a big scandal. But if you compare them to all of us driving, they have a lot fewer accidents, if they have any at all. And we are a big risk. But the public doesn’t look at it that way. So, this is something that is an issue for technology generally.
John Halamka: So let me just give you another dark side to this. So, a few months ago, I was in Dublin and I met with all of the world’s radiology chairs. And they told me they hate AI. I said, well, why is that? And they say, well, let’s imagine that it has a positive predictive value of 95%. I mean, wow, that’s wildly better than any human. But here’s the problem. If I’m going to argue against the AI, right, there’s 5% false positives. The amount of time it takes me to document that I disagree with the AI, and I’m going to actually go a different direction from a medical legal perspective outweighs the benefit of the 95% of good advice that it offered me. And as you suggest, this is a cultural issue, that we are not allowing AI to have any margin of error, despite the fact that our human doctors and nurses have an amazing level of error.
Chip Kahn: If we look at, FDA, I think they’ve authorized roughly 950 AI enabled medical devices. How many of them actually clear your bar, for deployment at Mayo? And what does the ratio tell us about the gap between authorization and real clinical research readiness?
John Halamka: So isn’t it interesting, as you look at adoption of AI across health care systems, the radiologists and the cardiologists tend to adopt it first. And so as you look at the FDA approvals, the vast majority of these are in the field of radiology, cardiology devices, and that kind of thing. So then you start to ask the question, where is there a human nearby? Right? And that is, is it an autonomous decision where the AI looks at something and takes an action? Or is it that it’s that smart consultant that’s telling the human, hey, you know, I saw this fracture here. You may want to recheck that. So I would tell you where Mayo has been an early adopter of this stuff is, especially in the field of imaging, right? So radiology or digital pathology, radiation oncology, where it’s augmenting human behavior by helping them focus their attention. And at the moment I don’t think there’s a single case, I mean maybe we could find one in supply chain that orders Band Aids or something, but a single case where the AI itself is running autonomously without a human nearby.
Chip Kahn: I think that’s important. And if AI is generally disruptive, the question is whether health care’s decision making structures are designed for incremental change. And here I’m speaking generally not of Mayo specifically, can they actually navigate well, this revolution, I mean you’ve got a very contained shop, you know, you’ve got your implementer staff. Everyone’s not going to have the facility or the knowledge that you’re bringing. How is the average health care system, the individual or small group physician going to deal with the kinds of issues we’re talking about in terms of navigating this?
John Halamka: It’s a brilliant question, right? And there’s several ways you could look at it. I mean when you talk to Marty Makary, FDA has said it’s going to take a bit of a light touch for regulation. So you’re probably not going to see this rigorous premarket testing and such. So what that means it’s probably going to be up to the marketplace, the innovators and provider organizations, to figure out what to use and how to use it. So here’s what Mayo’s done. Although we have three destination medical centers, Minnesota, Florida, Arizona, we actually have around the world, about 50 affiliates that are typically community hospitals, some critical access hospitals. And they’ve said, hey Mayo, help us figure out what AI to deploy. So what Mayo will do is look at all these products and services built by Mayo, built by third parties, qualify them, and once we think they’re good enough, then we will go out to the community hospitals and say, oh, we’ve actually found this particular solution to be reasonable in terms of its positive, predictive value, its risk, its post market surveillance and that kind of thing. So maybe I would argue those who have the sophistication to develop and test these things have a societal responsibility to spread them to those who don’t. And certainly that’s the work that I do at Mayo Clinic Platform is I’ve been given an interesting KPI and that is Gianrico Farrugia said, John, I want you by 2030 to have touched the lives of 4 billion people by ensuring these algorithms that are qualified are disseminated globally to every Android phone, every HER, and every country on the planet.
Chip Kahn: Well, I guess along those lines, you’re also a chair of the Coalition for Health AI. What should AI governance look like inside a health system? And does that exist today? I mean, do the institutions even have the kind of structure to get the information from you and those who can provide guidance to make the kind of decisions they need to make?
John Halamka: And so, one of the challenges, and about four years ago we put this coalition together because there was not a community standard. As you and I know, malpractice isn’t a good or bad outcome. It’s, did you deviate from the community standard of care? So our thought was if we could put 4,000 organizations together across government, academia and industry and define what’s good enough, what are the best practices, what are the right safeguards, what is a standard data card or model card or qualification schema, then others would say, oh, I don’t have to define a data card myself. I can adopt the community standard of care for the evaluation of a given AI algorithm. And that’s really the purpose of the Coalition for Health AI. It’s a nonprofit organization bringing people together just to decide what will we, as a society ,accept and not. I mean, it was very funny. Eric Schmidt, who I mentioned, you know, he’s a board member of Mayo. When he created Waymo, he said, we actually did a cultural analysis and we found that the public wanted self-driving cars to be 10,000 times better than a human driver. Wow. Okay. The community standard of care is that a Waymo will have one accident in a million miles. Fascinating. I mean, somebody had to decide what’s good enough. So as you pointed out, if suddenly, I mean, there’s one tiny accident, it’s news, front page news. Despite the fact that we all agreed it’s wildly safer than any other transportation alternative. That’s why we need the standard for how you test, how you govern, and what is good enough.
Chip Kahn: One of the things that all of us face and see whenever we have any kind of interaction with the health care system is the health care workforce shortage. Sort of hits us in the face. You’ve said that AI is essential to closing that gap from a policy standpoint and then from a practical, real-world standpoint, how is AI going to do that and how fast can it do it?
John Halamka: So, let me give you a couple of statistics. I don’t know if you have, spent time in Davos at the World Economic Forum, but in 2025, the theme was AI and all the government leaders in Davos said, we have a problem. The birth rates in many industrialized countries are, are less than replacement. And in some places like Japan and South Korea, I mean looking at birth rates of 0.6, 0.7, but yet we have societies that are living into their 80s. So, lifespan may be in the 80s, but health span is probably in the 60s. So, what that means is we have a 20 year period where we’re going to need more care. Oh, but wait, our birth rates are so low there’s no one to deliver that care. And so, what I have heard from societal leader after societal leader and I just flew in from three days in Rochester where I hosted 28 international companies, including many government officials from Japan. What they said is we have decided that unless we deploy AI as part of extending the license of our mid-levels, helping our specialists see the right patient, delivering care in the home, an autonomous fashion, robotics, all these other things, we’re never going to meet the demand of an aging society with a birth rate that’s 0.6 or 0.7. So this becomes kind of a John Carter problem, right? That is there’s an urgency to change and it’s up to us to figure out how. And so I think the vision is this, that we assess as we’ve been talking about levels of risk and if we can build AI that will help a mid-level, a nurse practitioner, a PA deliver a higher quality of care to more patients with more serious disease, suddenly we are going to have a healthier society. And I’m seeing this sovereign AI notion that is country scale adoption of this stuff to meet the societal problems of supply, demand, mismatch. And I don’t see a lot of other arrows in our quiver because we’re not going to graduate enough nurses and doctors over the next 20 years to solve this problem for us without AI.
Chip Kahn: Well, I think your echo example is a good one in that respect. And I’m sure there are other areas with techs and other aspects of the workforce where literally, you know, having AI can change the whole aspect of that kind of care. And I assume that’s coming. And that gets to my sort of the overall question from a regulatory and policy framework, what do you think we need to assure the public that is risk averse, and I’ll use the word in a sense, and I think it comes from our discussion, not risk illiterate, but has a notion of risk that frankly reflects a nervousness that doesn’t reflect the reality of everyday life. I mean when you cross the street, you’re at risk. And people don’t compare that to other aspects of risk. What kind of structure do we need for AI deployment so that it can achieve, on the positive side, all the kinds of outcomes you’re describing?
John Halamka: Of course, you’ve asked the $64,000 question, so let me frame it in an unusual way. I was speaking with a prominent industry leader the other day about a paradox, and here’s the paradox is, say there are AI products, I’m sure you use several of them that go out and summarize the literature or summarize clinical data for you, and let’s say they’re 80% good. There are products that use different technologies and get actually higher levels of accuracy. And this leader told me the paradox is people trust the output of an LLM because it’s so compelling. Right? It makes you happy. And even if it’s telling you information that’s completely false, you feel good about it. So I asked the question, how many people do you know that are uploading their medical record or their wearable device information to ChatGPT or to Claude and then are asking questions of it and actually feel really good about the advice they are getting back, not only because it’s compelling and it’s well phrased, but because they’re instantaneously getting information that could take a couple of days for their PCP to synthesize and respond to. So, wow, there’s a cultural question that you ask. If people say, I am actually more interested in, immediacy and comfort than I am in complete accuracy and delay, we have to decide as a society if that’s okay or not. We’re replacing Dr. Google with Dr. Claude. And so, I guess here would be my dream. We’re not there yet. What if for every generative AI product, we could figure out an accuracy score? And remember, every time you give a prompt to a generative AI product, you get a different answer. So you actually need a score on every single answer that you get. Again, I’m going to hypothesize here. Let’s say that, the National Library of Medicine, working with industry innovators, creates a knowledge graph, and every generative AI response is checked, error checked against a knowledge graph of the world’s literature or clinical observations, then gets a score. You can say, wow, Claude just synthesized your medical record and gave you advice, and the confidence level is 0.9 as opposed to 0.1. Then you’ll decide as a human. Yes, it’s comforting and compelling. But you want to believe it. So here’s where I’m getting a little bit speculative. I have no question that people are going to use these products to guide their care journeys because they feel like it’s democratized access to knowledge and it’s reduced burden of navigation. But data cards, model cards, qualification, that all helps. But the individual patient, I’m betting is not going to go look at a data card or model card. They’re going to need something that says, oh, you, you know, this is believable or not believable. And we haven’t as a society built that yet, but we must to sort of close out.
Chip Kahn: Clearly, five years from now, ten years from now, the world is going to be different because of this technology. What keeps you up at night when you think about the prospects of that? Obviously, we’ve talked about the positive, but just in terms of how it could affect health care, is there an aspect that keeps you up at night when you think about how this is all going to work out?
John Halamka: Well, a couple of things keep me up at night. I’m sure you talk to many medical school deans. Generally for the last 30 years or so I’ve been able to lecture to conferences of the medical school deans of not only the U.S. but internationally. They are not preparing the next generation of students to be AI interpreters. Unfortunately, our medical schools tend to be fairly conservative and haven’t changed the curriculum to move away from memorization to data science or to tool assessment. And we need to. So, you know, I do not want the next generation of doctors to blindly accept the advice of an AI without having the training to decide if that AI is credible or not. That’s certainly one issue I do worry in agentic AI. We didn’t talk a lot about it, but I have recently had discussions with some of the chief information security officers of the largest hyperscalers in the world. They are really concerned that as we use agentic AI, in effect say AI can now take action on your own, that if a bad actor takes over that agentic AI, they could literally shut down your company in a few seconds. So we’re going to be really careful about cybersecurity and the potential for some of these tools that we create to actually have an effect that was never intended and, and one that could be extraordinarily harmful to our care delivery system. Again, I told you, I will never mention a product or service, but there is an open-source stack called OpenClaw which you may have looked at.
Chip Kahn: Yes, I’M familiar with it.
John Halamka: It’s a lovely open-source software system with no security of any kind. So if you say hey OpenClaw, you can now go answer all of Chip’s emails or order all of Chip’s groceries and operate the entire home ecosystem of locks and lights and heating. Just think about what happens when a bad actor has taken control of everything in your life. We just need to ensure that doesn’t happen.
Chip Kahn: John, this has been terrific and I just appreciate you spending the time with us and I think our audience will clearly have learned a lot today from our conversation.
John Halamka: Well, and I live this every day and as I said I’m approaching 64 but I’ve been a vegan for 25 years, so I got like 30 more years of working through this. So, you and I, 30 years from now we’ll say and here’s what we said in 2026 and here’s what came to pass.
Chip Kahn: John, I want to be with you 30 years from now. Thanks.
SERIES

This weekly podcast features insightful conversations between host Chip Kahn and his guests, who discuss the business of health care, connecting the dots between the health care business, policy, and patients.
The podcast’s first series on AI in health care illuminates how AI is changing health care, and features guests who are deploying this technology, managing its consequences, and designing policy around it.

