ChatGPT’s medical diagnoses are accurate less than half of the time, a new study reveals.
Scientists asked the artificial intelligence (AI) chatbot to assess 150 case studies from the medical website Medscape and found that GPT 3.5 (which powered ChatGPT when it launched in 2022) only gave a correct diagnosis 49% of the time.
Previous research showed that the chatbot could scrape a pass in the United States Medical Licensing Exam (USMLE) — a finding hailed by its authors as “a notable milestone in AI maturation.”
But in the new study, published Jul. 31 in the journal PLOS ONE, scientists cautioned against relying on the chatbot for complex medical cases that require human discernment.
“If people are scared, confused, or just unable to access care, they may be reliant on a tool that seems to deliver medical advice that’s ‘tailor-made’ for them,” senior study author Dr. Amrit Kirpalani, a doctor in pediatric nephrology at the Schulich School of Medicine and Dentistry at Western University, Ontario, told Live Science. “I think as a medical community (and among the larger scientific community) we need to be proactive about educating the general population about the limitations of these tools in this respect. They should not replace your doctor yet.”
ChatGPT’s ability to dispense information is based on its training data. Scraped from the repository Common Crawl, the 570 gigabytes of text data fed into the 2022 model amounts to roughly 300 billion words, which were taken from books, online articles, Wikipedia and other web pages.
Related: Biased AI can make doctors’ diagnoses less accurate
AI systems spot patterns in the words they were trained on to predict what may follow them, enabling them to provide an answer to a prompt or question. In theory, this makes them helpful for both medical students and patients seeking simplified answers to complex medical questions, but the bots’ tendency to “hallucinate” —making up responses entirely — limits their usefulness in medical diagnoses.
To assess the accuracy of ChatGPT’s medical advice, the researchers presented the model with 150 varied case studies — including patient history, physical exam findings and images taken from the lab — that were intended to challenge the diagnostic abilities of trainee doctors. The chatbot chose one of four multiple-choice outcomes before responding with its diagnosis and a treatment plan…
Click Here to Read the Full Original Article at Livescience…