A study has found that ChatGPT provided better answers than human doctors to medical questions. The study used questions asked on Reddit’s r/AskDoctors community, which had already been answered by human doctors. The same questions were posed to ChatGPT, and the responses were judged by medical professionals on several criteria. The study found that the independent evaluators thought that the responses by ChatGPT were higher in quality and empathy than those provided by human doctors.
“In this cross-sectional study of 195 randomly drawn patient questions from a social media forum, a team of licensed health care professionals compared physician’s and chatbot’s responses to patient’s questions asked publicly on a public social media forum. The chatbot responses were preferred over physician responses and rated significantly higher for both quality and empathy,” the study noted.
The evaluators were medical professionals themselves, and weren’t old which responses were from ChatGPT and which were from human doctors. “
The original question, physician response, and chatbot response were reviewed by 3 members a team of licensed health care professionals working in pediatrics, geriatrics, internal medicine, oncology, infectious disease, and preventive medicine (J.B.K., D.J.F., A.M.G., M.H., D.M.S.). The evaluators were shown the entire patient’s question, the physician’s response, and chatbot response. Responses were randomly ordered, stripped of revealing information (eg, statements such as “I’m an artificial intelligence”), and labeled response 1 or response 2 to blind evaluators to the identity of the author. The evaluators were instructed to read the entire patient question and both responses before answering questions about the interaction. First, evaluators were asked “which response [was] better” (ie, response 1 or response 2). Then, using Likert scales, evaluators judged both “the quality of information provided” (very poor, poor, acceptable, good, or very good) and “the empathy or bedside manner provided” (not empathetic, slightly empathetic, moderately empathetic, empathetic, and very empathetic) of responses. Response options were translated into a 1 to 5 scale, where higher values indicated greater quality or empathy,” the study said.
The evaluators found that ChatGPT consistently outperformed humans in the responses. “Of the 195 questions and responses, evaluators preferred chatbot responses to physician responses in 78.6% (95% CI, 75.0%-81.8%) of the 585 evaluations. Mean (IQR) physician responses were significantly shorter than chatbot responses (52 [17-62] words vs 211 [168-245] words; t = 25.4; P < .001). Chatbot responses were rated of significantly higher quality than physician responses (t = 13.3; P < .001). The proportion of responses rated as good or very good quality (≥ 4), for instance, was higher for chatbot than physicians (chatbot: 78.5%, 95% CI, 72.3%-84.1%; physicians: 22.1%, 95% CI, 16.4%-28.2%;). This amounted to 3.6 times higher prevalence of good or very good quality responses for the chatbot. Chatbot responses were also rated significantly more empathetic than physician responses (t = 18.9; P < .001). The proportion of responses rated empathetic or very empathetic (≥4) was higher for chatbot than for physicians (physicians: 4.6%, 95% CI, 2.1%-7.7%; chatbot: 45.1%, 95% CI, 38.5%-51.8%; physicians: 4.6%, 95% CI, 2.1%-7.7%). This amounted to 9.8 times higher prevalence of empathetic or very empathetic responses for the chatbot,” the study said.
It’s a pretty remarkable result, even more so because the study was conducted on ChatGPT in December 2022, and prior to the release of the more advanced GPT-4. But medicine would seem to be a use-case that’s ideally suited to LLMs — the same cases and symptoms keep repeating over and over again, and AI could help reduce the load on doctors for most initial assessments. AIs have been known to help in other cases as well — a man had managed to save his dog’s life when he’d uploaded his tests results on ChatGPT, and it had given a diagnosis that the vet had missed. And with ChatGPT proving to be more accurate and empathetic than human doctors, LLMs could end up making a big dent in the healthcare business overall.