AI to learn from COVID-19 scans in massive virtual library
Tens of thousands of chest images are needed to train artificial-intelligence algorithms; the portal enables widespread hospital access.
Aiming to improve the precision of COVID-19 diagnostics, U.S. radiologists, medical physicists, and computer scientists have created a massive virtual library. Its collection of x-rays and CT scans are intended to mobilize artificial intelligence against the novel coronavirus.
The hope is that early achievements of AI in medicine – for instance, in detecting cancers from mammograms and from pictures of skin lesions – will similarly be realized in limiting future spread of the pandemic.
To do this, the three major professional radiology associations have jointly developed an online portal that enables hospitals to upload chest scans of COVID-19 patients to a centralized database. Its creation removes a longtime obstacle.
“The challenge has been that there are not enough images in one place; they’re scattered throughout hospitals around the world. And not only are they scattered, but there is also privacy laws that constrain patient images. They aren’t like pictures of your cat that can be easily shared,” said Paul Kinahan.
He is vice chair of radiology research at UW Medicine and he represented the American Association of Physicists in Medicine in the collaboration with the American College of Radiology and the Radiological Society of North America.
The groups had been separately creating their own databases with scans of patients who have COVID-19.
“Now we’ll have a network built with open-source software. Through one portal, any practitioner or researcher can access all the images distributed across other domains,” Kinahan said.
Medical imaging helps radiologists detect and monitor disease. Computer engineers are actively developing algorithms, based on x-rays and CT scans, to generate a probability that someone has COVID-19. These algorithms have been created based on a small number of images, but “our best estimates suggest we should be testing these algorithms on something like 10,000 images,” Kinahan said. “We want to avoid time spent developing algorithms whose logic is based on too few patient samples, because ultimately, the algorithm’s projections will fail to hold up.”
To test, train, and validate machine-learning algorithms, hundreds of thousands of scans must first be collected, de-identified, and curated so they are free of artifacts – visual “noise” created by the scanning technology – and then uploaded to the database.
Several dozen U.S. hospitals and universities are currently participating, and the project’s leaders expect to ultimately have several hundred hospitals and medical systems contributing images.
“If you have a system that's able to detect COVID-19 early from an image, it could then be used for surveillance of the patient. We think there is almost certainly another wave coming, so this network of images could give rise to an algorithm that helps to monitor the presence of COVID-19,” Kinahan said.
The database, called the Medical Imaging and Data Resource Center, is supported by a $20 million contract from the National Institute of Biomedical Imaging and Bioengineering, part of the National Institutes of Health. The site is due to open in September.