A new study examines how large language models perform in a variety of medical contexts, including real emergency room cases — where at least one model seemed to be more accurate than human doctors.
The study was published this week in Science and comes from a research team led by physicians and computer scientists at Harvard Medical School and Beth Israel Deaconess Medical Center. The researchers said they conducted a variety of experiments to measure how OpenAI’s models compared to human physicians.