Most mole biopsies are benign, says text analysis of EMRs
Natural language processing scan of 80,000 skin biopsies paints population-level picture.
Brian Donohue - 206.543.7856, email@example.com
The great majority of biopsied moles pose no danger, according to an analysis of 80,000 skin samples that employed natural language processing (NLP) software to glean patient data and generate population-level estimates of diagnoses.
The novel approach demonstrates that text-analysis tools can, with high accuracy and far faster than human annotators, harvest and interpret physicians’ reports to yield context about the prevalence and incidence of medical conditions.
The findings were published in JAMA Dermatology. The multi-site investigation involved Drs. Michael Piepkorn and Joann Elmore, faculty physicians at the University of Washington School of Medicine. Elmore is also an epidemiologist with the UW School of Public Health.
“There’s a wealth of information in electronic medical records that has been untapped for population-level studies like this,” Elmore said. “Text-analysis software is especially helpful to decipher pathologists’ reports, because they have a wide array of terms for the same type of lesion.”
The investigators were interested to identify the portion of biopsied skin lesions that were “melanocytic” – the type that can develop into malignant melanoma.
On the basis of 289 original pathology reports, the researchers designed search strings to recognize and differentiate diagnostic terms, phrases and their derivations, and neighboring context so that pathologists’ intended linguistic meanings were construed accurately in the subsequent scan of electronic medical records.
That search comprised 80,368 biopsy diagnoses from more than 47,000 patients in the Group Health Cooperative healthcare system (now Kaiser Permanente) in Washington state between January 2007 and December 2012.
Melanocytic lesions accounted for 23 percent of all diagnoses, a portion that surprised Elmore.
“I was shocked. I thought it’d be only 5 or 10 percent,” she said, quickly adding that most melanocytic cells were characterized as benign.
The melanocytic diagnoses were stratified into four risk classes:
- Class I: Nevi and other benign proliferations (83 percent)
- Class II: Moderately dysplastic and other low-risk lesions (8.3 percent)
- Class III: Melanoma in-situ and other higher-risk lesions (4.5 percent)
- Class IV/V: Invasive melanoma (4.1 percent)
The other 77 percent of biopsies were diagnosed as non-melanocytic – an umbrella term for lesions caused by light damage, infection, viruses, and the two most common skin cancers that are far less likely to be life-threatening: basal cell carcinoma and squamous cell carcinoma. The study did not further classify non-melanocytic diagnoses.
Pathologists also use a “striking array of subjective terms” when interpreting the same melanocytic lesion, the investigators noted, despite repeated calls for national guidelines on precise, consistent wording of the gray-area diagnoses between benign and malignant.
Natural language software is not flawless but has been used to reliably gather details from electronic medical records and found to perform as well as, or better than, manual record review by human annotators.
The study also involved researchers from Pennsylvania, Connecticut, Rhode Island, New Hampshire, and Paris, France. It was supported by funding from the National Cancer Institute (R01 CA151306, K05 CA104699, CRN14008).