April 18, 2024

A mystery in the ER? Ask the Dr. for a Diagnosis. Chatbot.

The patient was a 39-year-old woman who presented to the emergency department at Beth Israel Deaconess Medical Center in Boston. Her left knee had been injured for several days. The day before, she had a fever of 102 degrees. He was gone now, but she still had chills. And her knee was red and swollen.

What was the diagnosis?

On a recent elegant Friday, Dr. Megan Landon, medical resident, presented this real case to a room full of medical students and residents. They were gathered to learn a skill that can be damn hard to teach — how to think like a doctor.

“Doctors are terrible at teaching other doctors how we think,” said Dr. Adam Rodman, an intern, medical historian and organizer of the event at Beth Israel Deaconess.

But this time, they could call an expert for help in finding a diagnosis – GPT-4, the latest version of chatbot released by the company OpenAI.

Artificial intelligence is changing many aspects of the practice of medicine, and some medical professionals are using these tools to help them diagnose. Doctors at Beth Israel Deaconess, a teaching hospital affiliated with Harvard Medical School, decided to explore how chatbots could be used – and misused – to train doctors in the future.

Instructors like Dr. Rodman hope that medical students can turn to GPT-4 and other chatbots for something similar to what doctors advise on the side of the road — when they pull a colleague aside and ask for an opinion on a difficult situation. The idea is to use a chatbot in the same way that doctors turn to each other for recommendations and insights.

For over a century, doctors have been portrayed as detectives who gather clues and use them to find the culprit. But experienced doctors use a different method – pattern recognition – to figure out what’s wrong. In medicine, it’s called an illness script: signs, symptoms and test results that doctors put together to tell a coherent story based on similar situations they know or have seen themselves.

If the illness script does not help, said Dr. Rodman, doctors turn to other strategies, such as assigning probabilities to different diagnoses that may be appropriate.

Researchers have tried for more than half a century to design computer programs to make medical diagnoses, but nothing has really succeeded.

Physicians say GPT-4 is different. “It will create what is very much like a sick script,” Dr. Rodman said. In that way, he said, “it’s fundamentally different than a search engine.”

Dr. Rodman and other doctors at Beth Israel Deaconess requested GPT-4 for possible diagnoses in difficult cases. I study released last month in the medical journal JAMA, they found that he did better than most doctors on weekly diagnostic challenges published in the New England Journal of Medicine.

But, they learned, there is an art to removing the program, and there are traps.

Dr. Christopher Smith, director of the internal medicine residency program at the medical center, said that medical students and residents “are definitely using it.” But, he said, “whether they’re learning anything is an open question.”

The concern is that they may rely on AI to make diagnoses in the same way they rely on a calculator on their phone to do a math problem. That, said Dr Smith, is dangerous.

Learning, he said, is about trying to figure things out: “That’s how we retain things. Struggle is part of learning. If you outsource learning to GPT, that struggle is gone.”

At the meeting, students and residents broke up into groups and tried to figure out what was wrong with the patient with swollen knees. They then converted to GPT-4.

The groups tried different approaches.

One used GPT-4 to search the internet, similar to how one would use Google. The chatbot circulated a list of possible diagnoses, including trauma. But when group members asked him to explain his reasoning, the bot was disappointed, explaining his choice by saying, “Trauma is a common cause of knee injury.”

Another group considered possible hypotheses and asked GPT-4 to check them. The chatbot’s list matched the group’s list: infections, including Lyme disease; arthritis, including gout, a type of arthritis associated with crystals in the joints; and trauma.

GPT-4 added rheumatoid arthritis to the best possibilities, although it was not high on the group’s list. Gout, instructors with the group later, said it was impossible for this patient because she was young and female. And probably rheumatoid arthritis could be ruled out because only one joint was inflamed, and only for a few days.

As a sidewalk consultation, GPT-4 seemed to pass the test or, at the very least, agree with the students and residents. But in this exercise, he offered no insight, and no sick script.

One reason could be that the students and residents used a search engine more like a curbside consultation.

To use the bot properly, the instructors said, they need to start telling GPT-4 something like, “You are a doctor who sees a 39-year-old woman with knee pain.” Then, they would have to list her symptoms before asking for a diagnosis and following up with questions about the rationale of the robe, the way they would with a medical colleague.

That, the instructors said, is a way to harness the power of GPT-4. But it’s also crucial to recognize that chatbots can make mistakes and “hallucinate” – providing answers with no basis in reality. To use it you need to know when it is wrong.

“It’s not wrong to use these tools,” said Dr. Byron Crowe, an internal medicine physician at the hospital. “You just have to use them the right way.”

He gave an analogy to the group.

“Pilots use GPS,” Dr Crowe said. But, he said, “airlines have a very high standard of reliability.” In medicine, he said, the use of chatbots is “very stressful,” but the same high standards should apply.

“It’s a great thought partner, but it’s no substitute for deep mental expertise,” he said.

As the session drew to a close, the instructors revealed the true cause of the patient’s swollen knee.

It turned out that there was a possibility that each group had considered, and that GPT-4 had proposed.

She had Lyme disease.

Olivia Allison contributed reporting.

Leave a Reply

Your email address will not be published. Required fields are marked *