Skip to content

DHRN Seminar: I Hope this Helps: Comparing Emotional Content of Physician & Chatbot Responses to Health Queries

Danny Burns (University of Wyoming)

Monday 16 February, 2-3 pm
James Logie 613

Recent surveys suggest that a large portion of the population is open to using generative artificial intelligence (AI) for health-related questions. While prior research has primarily focused on the factual accuracy of AI-generated responses, some studies indicate that both physicians and patients often prefer chatbot-generated text over physician-written responses. This study aimed to compare physician responses with those generated by two AI chatbots, focusing on emotional content, readability, length, and use of medical disclaimers.

A dataset of 100 patient questions from a public, deidentified telehealth website was used. Each response was analyzed sentence by sentence for emotional content using a predefined codebook, with emotions ranked as primary, secondary, or tertiary based on frequency. Two coders independently classified the data and resolved discrepancies through review. Statistical comparisons were conducted for emotional content, word count, Flesch Reading Ease (FRE), Flesch-Kincaid Grade Level (FKGL), and disclaimer usage.

Across all response types, primary emotions were overwhelmingly neutral. Differences emerged in secondary and tertiary emotions: ChatGPT responses were significantly less likely to convey hope, while Gemini responses were more likely to express fear as a secondary emotion. Gemini also showed higher odds of compassion as a tertiary emotion and was less likely to lack a tertiary emotion altogether. In terms of length, Gemini responses were the longest, followed by ChatGPT, with physician responses being substantially shorter. Chatbot responses were also more difficult to read, as reflected by lower FRE scores and higher FKGL scores, particularly for Gemini. Gemini was significantly more likely than ChatGPT to include disclaimers stating that its responses did not constitute medical advice.

Overall, chatbot responses were longer, more complex, and emotionally richer than physician responses. Qualitatively, they also displayed greater variation in presentation and emotional range. These findings suggest that insights from AI-generated responses could help physicians craft more emotionally connective replies to patient inquiries.

Danny is a PhD student in the Department of Mathematics and Statistics at the University of Wyoming studying Data Science in the Biomedical Sciences Program. He moved to Wyoming from New Jersey to study neuroscience before switching to statistics and data science. His research focuses on interpretable machine learning, uncertainty quantification, and AI use in healthcare and biomedical research, and he loves teaching statistics and research methods to various audiences. Outside of work he enjoys spending time with his fiancée and their pets, hiking, camping, skiing, traveling, and reading.