Covid19, the limitations of machine learning, and the importance of data

The Covid-19 crisis is showing us the limitations of many things that we took for granted. Medicine’s ability to cure, for instance. For the time being, there is no cure for Covid19 – the best thing that the fantastic health professionals can do for us is support our bodies, while they fight the virus’s infection. Or, the sharing economy’s potential to top up one’s income. Those that had upgraded their cars to make some extra income via ride-sharing, or who had bought property to rent out to tourists, are still faced with the costs of purchasing or maintaining those assets, but have seen their revenues dry-up overnight. Another example: AI’s ability to solve, well, everything.

AI’s capacity to process and analyse data has been touted as the solution to many problems, including in health. So, it was no surprise to see many turn to this technology, to solve the problem of diagnosing and treating those infected with the SARS-CoV- 2 virus. For instance, back in February, asmart healthcare company announced the launch of a new AI-based system which could read CT scans in 15 seconds, a vertiginous improvement on the 15 minutes that it takes an average radiologist to complete the same task. And, yet, such systems have failed to take over the world.

cdc-SrHKQxGuuqQ-unsplash — Photo by CDC on Unsplash

The analysis of why AI has failed to revolutionise disease diagnosis in the Covid19 case is yet another example of the need to think about AI as a system, in order to understand its limitations. The discussion around the potential of AI focuses on the ability of machine learning algorithms to analyse very large datasets, and to detect patterns in the data. And, indeed, algorithms do have that potential. The AI-based diagnosis solutions bring trialled use deep learning algorithms to analyse lung images, and detect variations that might suggest a Covid19 related infection (vs., say, lung cancer).

However, for those powerful algorithms to work, they need data. Lots of it. And good quality data, too. In this case, that means large datasets of lung images which have been duly labelled as healthy, Covid19, lung cancer, etc… However, as discussed in this really interesting article published on ZDNet, with this being new illness, that high quality database is, simply, not there.

First, there are a limited number of CT scan images of lungs, of confirmed Covid19 cases. Many Covid19 patients never go to hospital; and not all of those patients that go into hospital, get a lung scan. As explained in the ZDNet article:

“An X-ray or a CT scan will show formations in the lung that are associated with a number of respiratory conditions including pneumonia. The feature in an image most often linked to a COVID-19 case, although not exclusive to COVID-19, is what’s called “ground-glass opacity,” a kind of haze hovering in an area of the lung, caused by a build-up of fluid. (…) (N)neural networks have to be tuned to pick out opacities in the pixels of a high-resolution image, and that takes data.”

Second, experts need to label those images, in order to guide the algorithm in terms of what to look for in the scans. Again, the ZDNet article:

“AI systems require lots of labels… Labels are the annotations of images created by human radiologists that tune the settings of the neural net to properly summarize the pixels of data… And labelling requires physician time”.

As both data and the expertise to label such data are in short supply, AI’s potential to help with this crisis is severely limited.

Reading this article, reminded me of that fantastic quote in the paper by Constantiou and Kallinikos paper: “Algorithms without data are just a mathematical fiction”. It’s a paper very much worth reading – now, more than ever.

Covid19, the limitations of machine learning, and the importance of data

One thought on “Covid19, the limitations of machine learning, and the importance of data”

Leave a comment Cancel reply

Please share this post with your community:

Related

One thought on “Covid19, the limitations of machine learning, and the importance of data”

Leave a comment Cancel reply