Democratized Diagnostics: Why Medical Artificial Intelligence Needs Vetting

Originally published on September 22, 2017, on the Petrie-Flom Center for Health Law Policy, Biotechnology, and Bioethics Bill of Health blog.

Pancreatic cancer is one of the deadliest illnesses out there.  The five-year survival rate of patients with the disease is only about 7%.  This is, in part, because few observable symptoms appear early enough for effective treatment.  As a result, by the time many patients are diagnosed the prognosis is poor.  There is an app, however, that is attempting to change that.  BiliScreen was developed by researchers at the University of Washington, and it is designed to help users identify pancreatic cancer early with an algorithm that analyzes selfies.  Users take photos of themselves, and the app’s artificially intelligent algorithm detects slight discolorations in the skin and eyes associated with early pancreatic cancer.

Diagnostic apps like BiliScreen represent a huge step forward for preventive health care.  Imagine a world in which the vast majority of chronic diseases are caught early because each of us has the power to screen ourselves on a regular basis.  One of the big challenges for the modern primary care physician is convincing patients to get screened regularly for diseases that have relatively good prognoses when caught early.

I’ve written before about the possible impacts of artificial intelligence and algorithmic medicine, arguing that both medicine and law will have to adapt as machine-learning algorithms surpass physicians in their ability to diagnose and treat disease.  These pieces, however, primarily consider artificially intelligent algorithms licensed to and used by medical professionals in hospital or outpatient settings.  They are about the relationship between a doctor and the sophisticated tools in her diagnostic toolbox — and about how relying on algorithms could decrease the pressure physicians feel to order unnecessary tests and procedures to avoid malpractice liability.  There was an underlying assumption that these algorithms had already been evaluated and approved for use by the physician’s institution, and that the physician had experience using them.  BiliScreen does not fit this mold — the algorithm is not a piece of medical equipment used by hospitals, but rather part of an app that could be downloaded and used by anyone with a smartphone.  Accordingly, apps like BiliScreen fall into a category of “democratized” diagnostic algorithms. While this democratization has the potential to drastically improve preventive care, it also has the potential to undermine the financial sustainability of the U.S. health care system.

Democratized diagnostic algorithms should be a source of financial concern for the health care system because of the malpractice risks they will create for physicians who disagree with them. A study published in JAMA Oncology in 2015 found that patients demand specific medical interventions in 8.7% of encounters.  As democratized diagnostic apps proliferate in app stores, patients with “diagnoses” may begin to appear in doctors’ offices — smartphones in hand — demanding tests and procedures more frequently. These apps will likely be wrapped in legal language disclaiming all diagnostic accuracy — telling users to consult a physician for an actual evaluation — but a jury may find a “diagnosis” from such an app sufficient to establish that a physician who failed to pursue further testing or treatment was negligent.  Thus, if a patient presents with a cancer warning from an app, a physician may feel obligated to run a barrage of tests to confirm or refute the algorithm’s determination — even if the patient exhibits none of the symptoms typically associated with the diagnosis.

The potential influx of patients will be exacerbated by the liability concerns faced by the apps themselves.  It’s a common joke that patient information websites like WebMD will identify any set of symptoms as a potential indicator of a terminal illness.   But this makes sense given the liability landscape.  WebMD does not want to risk discouraging patients from seeking medical attention — or pursuing more traditional preventive screenings — because of assurances received on the website.  Diagnostic apps will want to avoid similar liability, and as a result they may be designed to either over-diagnose or tell patients to consult a physician regardless of the algorithm’s analysis.

The systemic harm of this perfect storm will be its impact on defensive medicine.  As previously mentioned, defensive medicine is when a physician orders more diagnostic tests and procedures than a patient’s condition warrants, and it puts an immense burden on the U.S. health care system.  In a 2013 survey of private-sector physicians, 75 percent admitted to using defensive medicine to avoid lawsuits, resulting in an estimated $650 billion spent annually on unnecessary care.  As democratized diagnostic apps proliferate, it is easy to see how defensive medicine will increase commensurately.  Regardless of a patient’s symptoms, no physician will want to disregard an app diagnosis only find herself in front of a jury trying to explain why she ignored the algorithm’s warning.

The underlying issue here is fundamentally one of quality control.  If these artificially intelligent algorithms were universally superior to physicians in their diagnostic capabilities, they could decrease defensive medicine by limiting expensive testing to cases where the likelihood of a positive finding is very high.  However, when any coder can spin up a “diagnostic” algorithm and throw it on an app store, it will be difficult for physicians to determine which apps to trust and which to ignore.  Some of these apps, including BiliScreen, may be able to diagnose diseases earlier and more accurately than physicians with decades of experience, but it’s unlikely that all medical apps available from an app store for $0.99 will be so good.  Indeed a 2014 study published in Translational Behavioral Medicine reported an “enormous range of quality among [mobile health] apps.” While accuracy rate could be a helpful metric, physicians will still face the difficulty of determining how low the accuracy rate must be for an app’s diagnosis to be safely ignored.  Would a jury find that disregarding an algorithm with 75% accuracy constitutes negligence?  What about 60% or 45% accuracy?  This will be almost impossible for any individual physician to predict.  As a result, medical providers may feel pressure to order confirmatory tests in cases involving all but the least accurate apps.

One possible solution is vetting of diagnostic algorithms by a trusted third party. Having a curated set of accurate democratized diagnostic apps would serve two functions critical for reaping the benefits of these innovative technologies while avoiding their pitfalls.  First, it would give physicians — and juries — guidelines for determining which algorithms warrant diagnostic deference.  A physician could point to a poor rating by a vetting organization as a justification for her decision not to pursue additional testing for a patient with no clinically relevant symptoms.  Second, a curated set of reliable apps would help guide patients to the most accurate diagnostic algorithms available, decreasing the likelihood of erroneous diagnoses and expensive physician-ordered tests to refute them.  This quality control would maximize the ability of democratized diagnostic apps to serve both individual patients and the health care system as a whole — accurately identifying diseases early and decreasing the prevalence of defensive medicine by taking some diagnostic liability away from health care providers.

There are many ways this vetting could be accomplished.  It could be done by a non-profit or university that reviews the evidence supporting diagnostic algorithms and rates them according to their accuracy, or it could be taken on by the platforms purveying these apps to users, such as the iOS App Store or the Google Play store.  These tech companies already serve as gatekeepers, vetting both code and content before an app will be offered for download, so it would not be outlandish for them to require a certain degree of diagnostic accuracy prior to allowing a medical app on their platforms.

The promise of democratized diagnostic algorithms is immense — but as with all technological advancements, where there is promise there is also peril.  If not vetted and implemented properly, each $0.99 app has the potential burden the health care system with additional unnecessary diagnostic testing.  Empowering patients to conduct their own preventive screening could save countless lives — so long as it doesn’t cause the already bloated health care system to collapse under its own weight.

One thought on “Democratized Diagnostics: Why Medical Artificial Intelligence Needs Vetting

Comments are closed.