International Finance
MagazineTechnology

Is Microsoft’s AI better than your doctor?

Microsoft’s AI
Microsoft’s MAI-DxO marks a significant milestone in the integration of AI into clinical diagnostics, showcasing higher accuracy and lower costs than human doctors in complex cases

Recently, Microsoft’s AI Diagnostic Orchestrator (MAI-DxO), the tech giant’s AI-based medical programme, hit the headlines by accurately diagnosing 85% of cases reported in the New England Journal of Medicine. According to a study published on the preprint website arXiv, this was four times higher than the accuracy rate of human doctors, who made the correct diagnoses about 20% of the time.

The cases are from the journal’s weekly series that aims to baffle physicians by presenting difficult, complex situations in which the diagnosis isn’t immediately apparent. Using roughly 300 of these cases, Microsoft compared the performance of its MAI-DxO to 21 general-practice physicians. Microsoft’s team first developed a mechanism to simulate the iterative process that physicians usually follow when handling these cases: gathering data, evaluating it, ordering tests, and making decisions based on the findings.

“A collection of commercial AI models, including Claude, DeepSeek, Gemini, GPT, Grok, and Llama, were compared to the 21 physicians. The Microsoft team also created an Orchestrator, a virtual representation of the sounding board of colleagues and consultations that doctors frequently seek out in complex cases, to further mimic how human doctors handle such difficult cases,” the study stated.

Microsoft monitored the tests that the AI system and human doctors ordered to determine which approach could complete the work more affordably, as ordering medical tests in the real world is expensive. In addition to performing significantly better than physicians in determining the right diagnosis, MAI-DxO was able to do so at an average cost that was 20% lower.

“The four-fold increase in accuracy was more than previous studies have shown. Most of the time, there is a 10% absolute percentage difference, so this is a really big jump. But what really got his attention was the cost. Not only was the AI more accurate, but it was much less expensive,” Dr. Eric Topol, chair of translational medicine and director and founder of the Scripps Research Translational Institute, told Times Magazine.

Since MAI-DxO is still in development, it is not yet usable for purposes other than research. However, implementing such a model could potentially improve patient outcomes by reducing medical errors, which contribute significantly to healthcare costs, and by increasing the effectiveness of human physicians.

“This is a startling result. I think it gives us a clear line of sight to making the very best expert diagnostics available to everybody in the world at an unbelievably affordable price point. We are nearing AI models that are not just a little bit better, but dramatically better, than human performance: faster, cheaper and four times more accurate,” said Mustafa Suleyman, CEO of Microsoft AI, during an interview with the Financial Times, while describing the trial as a step toward “medical superintelligence.”

According to Suleyman, when AI algorithms were first used in medicine ten years ago, they were primarily used for binary tasks like tumour detection in image scanning. Microsoft’s research even suggests AI diagnostic tools could reduce unnecessary healthcare expenditures whilst improving accuracy – as the United States’ health spending is approaching 20% of GDP, with an estimated 25% providing minimal impact on patient outcomes.

“Important challenges remain before Gen AI can be safely and responsibly deployed across healthcare. We need evidence drawn from real clinical environments, alongside appropriate governance and regulatory frameworks to ensure reliability, safety and efficacy,” said Microsoft’s research team.

He claims that these models are currently having very high-quality, fluid conversations in which they ask the appropriate questions, probe in the right ways, and recommend the appropriate testing and interventions at the appropriate times. An AI system may also benefit from not having many of the biases that come with being human.

“Across Microsoft’s AI consumer products like Bing and Copilot, we see over 50 million health-related sessions every day. From a first-time knee-pain query to a late-night search for an urgent-care clinic, search engines and AI companions are quickly becoming the new front line in healthcare,” the company said in a blog post.

“As demand for healthcare continues to grow, costs are rising at an unsustainable pace, and billions of people face multiple barriers to better health. We want to do more to help and believe generative AI can be transformational. That’s why, at the 2024 end, we launched a dedicated consumer health effort at Microsoft AI, led by clinicians, designers, engineers, and AI scientists,” the tech giant stated further. According to Dominic King, vice president of Microsoft AI, “Confirmation bias affects everyone. Clinicians occasionally think, ‘I’m sure this is just like the patient I saw recently,’ after observing something. However, AI has a slightly different way of thinking.”

MAI-DxO is not just a simple spit-out system. It does its work in such a way that doctors may be able to study and examine its reasoning. However, some medical and AI experts point out that Microsoft’s method isn’t wholly original because its diagnoses relied on the combined output of several AI models.

“In my mind, they are not testing any individual model that is optimised for healthcare. They are testing the concept of testing all of the models out there today and combining their decision-making. That part, to me, is not surprising,” Keith Dreyer, chief data science officer at Massachusetts General Hospital and Brigham and Women’s Hospital Centre for Clinical Data Science, said.

Additionally, Dreyer notes that the findings do not necessarily mean that regulatory bodies like the US Food and Drug Administration, which has yet to make a determination on whether or not such systems qualify as medical devices, will approve them.

Microsoft is not alone in its pursuit of an AI-powered medical diagnostic programme. Google is creating a dialogue-based system to simulate the back-and-forth between a doctor and patient, replicating how real doctors gather data from patients and analyse those symptoms to arrive at a diagnosis. In preliminary testing, the system performed better than physicians in correctly diagnosing case studies of simulated patients. The previous iteration of Google’s system correctly identified 59% of cases in a 2024 test that was comparable to the one Microsoft conducted using case studies, while human doctors only did so in 33% of cases.

The true test will be how well these AI systems function in real-world healthcare settings. Understanding how AI could enhance or support a physician’s role in disease diagnosis is the next step. Topol remarks, “What they accomplished is impressive.” However, until they are implemented in actual medical environments, it won’t alter medical practice. Topol hopes the AI systems will be tested in various health systems so that physicians and the AI platform can be compared on a variety of more common and diverse cases.

A comprehensive clinical trial and regulatory agency approval are necessary to ensure that patients won’t suffer any harm from a greater reliance on AI-based decision-making in the provision of care. Dominic King states, “We are very much on that journey to create the evidence base required to support doctors and patients to make a difference in their health.”

Microsoft’s MAI-DxO marks a significant milestone in the integration of AI into clinical diagnostics, showcasing higher accuracy and lower costs than human doctors in complex cases. While still in development, the system’s potential to enhance medical decision-making, reduce diagnostic errors, and ease healthcare costs is compelling. However, the road to real-world adoption remains cautious. Experts stress the need for extensive clinical trials, transparency, and regulatory approval to ensure safety and reliability. Though not a replacement for physicians, AI systems like MAI-DxO could become powerful collaborators in modern medicine, offering second opinions and data-driven insights.

As tech giants like Microsoft and Google race to refine these tools, the healthcare industry stands on the brink of transformation. This shift must be approached with scientific rigour, ethical foresight, and a strong commitment to patient safety. Microsoft has already built other healthcare AI tools like RAD-DINO (for radiology workflows) and Dragon Copilot (a voice assistant for clinicians). It will now be partnering with hospitals, clinicians, and health organisations to further validate the technology in real-world settings. That’s where the real test will start for the tech giant.

What's New

Exact match domains: The new digital gold rush

IFM Correspondent

G20 Summit: UAE announces USD 1 billion initiative to expand AI in Africa

IFM Correspondent

Check out the 10 most popular video games

IFM Correspondent

Leave a Comment

* By using this form you agree with the storage and handling of your data by this website.