When It Comes to Health Care, AI Has a Long Way to Go

Medical information is more complex and less available than the web data that many algorithms were trained on, so results can be misleading.

THE CORONAVIRUS PANDEMIC has prompted countless acts of individual heroism and some astounding collective feats of science. Pharmaceutical companies used new technology to develop highly effective vaccines in record time. A new type of clinical trial has remade our understanding of what works, and doesn’t work, against Covid-19. But when the UK’s Alan Turing Institute looked for evidence of how artificial intelligence had helped with the crisis, it didn’t find much to celebrate.

The institute’s report, published last year, said that AI had made little impact on the pandemic and experts faced widespread problems accessing the health data needed to use the technology without bias. It followed two surveys that reviewed hundreds of studies and found that nearly all AI tools for detecting Covid-19 symptoms were flawed. “We wanted to highlight the shining stars that show how this very exciting technology has delivered,” says Bilal Mateen, a physician and researcher who was an editor of the Turing report. “Unfortunately we couldn’t find those shining stars; we found a lot of problems.”

It’s understandable that a relatively new tool in health care, like AI, couldn’t save the day in a pandemic, but Mateen and other researchers say the failings of Covid-19 AI projects reflect a broader pattern. Despite great hopes, it’s proving difficult to improve health care by marrying data with algorithms.

Many studies using samples of past medical data have reported that algorithms can be highly accurate at specific tasks, such as finding skin cancers or predicting patient outcomes. Some are now incorporated into approved products that doctors use to watch for signs of stroke or eye disease.

But many more ideas for AI health care have not progressed beyond initial proofs of concept. Researchers warn that, for now, many studies don’t use data of adequate quantity or quality to properly test AI applications. That raises the risk of real harms from untrustworthy technology let loose in health systems. Some health care algorithms in use have proved unreliable, or biased against certain demographic groups.

That data-crunching might improve health care is not a new notion. One of the founding moments of epidemiology came in 1855, when London physician Jon Snow marked cholera cases on a map to show that it was a water-borne disease. More recently, doctors, researchers, and technologists have become excited about tapping machine learning techniques honed in tech industry projects like sorting photos or transcribing speech.

Yet conditions in tech are very different from those inside research hospitals. Companies such as Facebook can access billions of photos posted by users to improve image-recognition algorithms. Accessing health data is harder because of privacy concerns and creaky IT systems. And deploying an algorithm that will shape someone’s medical care carries higher stakes than filtering spam or targeting ads.

“We can’t take paradigms for developing AI tools that have worked in the consumer space and just port them over to the clinical space,” says Visar Berisha, an associate professor at Arizona State University. He recently published a journal article with colleagues from engineering and health departments at Arizona State warning that many health AI studies make algorithms appear more accurate than they really are because they use powerful algorithms on data sets that are too small.

That’s because health data such as medical imaging, vital signs, and data from wearable devices can vary for reasons unrelated to a particular health condition, such as lifestyle or background noise. The machine learning algorithms popularized by the tech industry are so good at finding patterns that they can discover shortcuts to “correct” answers that won’t work out in the real world. Smaller data sets make it easier for algorithms to cheat that way and create blind spots that cause poor results in the clinic. “The community fools [itself] into thinking we’re developing models that work much better than they actually do,” Berisha says. “It furthers the AI hype.”

Berisha says that problem has led to a striking and concerning pattern in some areas of AI health care research. In studies using algorithms to detect signs of Alzheimer’s or cognitive impairment in recordings of speech, Berisha and his colleagues found that larger studies reported worse accuracy than smaller ones—the opposite of what big data is supposed to deliver. A review of studies attempting to identify brain disorders from medical scans and another for studies trying to detect autism with machine learning reported a similar pattern.

The dangers of algorithms that work well in preliminary studies but behave differently on real patient data are not hypothetical. A 2019 study found that a system used on millions of patients to prioritize access to extra care for people with complex health problems put white patients ahead of Black patients.

Avoiding biased systems like that requires large, balanced data sets and careful testing, but skewed data sets are the norm in health AI research, due to historical and ongoing health inequalities. A 2020 study by Stanford researchers found that 71 percent of data used in studies that applied deep learning to US medical data came from California, Massachusetts, or New York, with little or no representation from the other 47 states. Low-income countries are represented barely at all in AI health care studies. A review published last year of more than 150 studies using machine learning to predict diagnoses or courses of disease concluded that most “show poor methodological quality and are at high risk of bias.”

Two researchers concerned about these shortcomings recently launched a nonprofit called Nightingale Open Science to try and improve the quality and scale of data sets available to researchers. It works with health systems to curate collections of medical images and associated data from patient records, anonymize them, and make them available for nonprofit research.

Ziad Obermeyer, a Nightingale cofounder and associate professor at the University of California, Berkeley, hopes providing access to that data will encourage competition that leads to better results, similar to how large, open collections of images helped spur advances in machine learning. “The core of the problem is that a researcher can do and say whatever they want in health data because no one can ever check their results,” he says. “The data [is] locked up.”

Nightingale joins other projects attempting to improve health care AI by boosting data access and quality. The Lacuna Fund supports the creation of machine learning data sets representing low- and middle-income countries and is working on health care; a new project at University Hospitals Birmingham in the UK with support from the National Health Service and MIT is developing standards to assess whether AI systems are anchored in unbiased data.

Mateen, editor of the UK report on pandemic algorithms, is a fan of AI-specific projects like those but says the prospects for AI in health care also depend on health systems modernizing their often creaky IT infrastructure. “You’ve got to invest there at the root of the problem to see benefits,” Mateen says.

 
Original article from WIRED.