ChatGPT can answer patient questions about radiation protection for medical imaging exams comparably to websites of radiology institutions, according to research published on 25 June in Radiology.
A team led by Sofyan Jankowski, MD, of Lausanne University Hospital in Switzerland found no statistically significant difference between ChatGPT’s answers and those posted on radiology institutional websites. Although there were some noticeable differences in terms of wordiness, the results showcased the overall performance of ChatGPT in this application.
“Implementing ChatGPT may address the need for clear and accessible information about radiation protection,” the authors wrote.
In an effort to assess responses from ChatGPT about radiation protection, the researchers first gathered 12 patient questions on radiation protection found on radiology institutional websites. They posed the same questions to ChatGPT and then recruited 12 experts (four radiologists, four medical physicists, and four radiographers) from the U.S. and Europe to evaluate both sets of answers, blinded to the source.
These readers analyzed the answers for scientific adequacy, public comprehension, and overall satisfaction on a Likert scale of 1 (No) to 7 (Yes). In addition, they were asked if the text was generated by artificial intelligence (AI) or not on the same Likert scale.
Median scores of answers to patient questions on radiation protection in radiology | ||
---|---|---|
Radiology institutional websites | ChatGPT | |
Scientific adequacy | 5.4 | 5.6 |
General public comprehension | 5.6 | 5.1 |
Overall satisfaction | 5.1 | 4.7 |
None of these differences reached statistical significance.
However, the researchers did find that scores were significantly different regarding perception of whether or not AI had generated the response (p = 0.02). They reported that raters correctly identified with high confidence 88 (61%) of 144 reports as being generated by ChatGPT, compared with 43% (62 of 144) as being produced by humans (p < 0.001).
In other findings, responses provided by ChatGPT had a median word count of 268 words, compared with 173 words for the human-generated responses. That difference was also statistically significant (p = 0.08), according to the team, which included David Rotzinger, MD, PhD, and Chiara Pozzessere, MD -- both also of Lausanne University Hospital -- and co-senior author Francesco Ria, PhD, of Duke University Health System.
Some of the raters also criticized the ChatGPT-generated texts in the areas of response format and language, relevance, and misleading or missing critical information.
Nonetheless, the researchers pointed out that although communication about radiation protection is indispensable, “it often remains shrouded in complex language that health care professionals cannot easily simplify, thus leading to inadequate information.”
“A practical application could involve making ChatGPT or other AI chatbots available in radiology waiting rooms to allow patients access to information while waiting for their examinations,” they wrote. “It is important to note that this should complement, not replace, the communication between health care providers and patients.”
The complete study can be found here.