TOPMed shows vital role of large-scale genome sequencing
Analysis of data on 53,581 individuals from diverse backgrounds offers insights into population health
A flagship paper co-authored by scores of researchers from across the United States, including scientists from the University of Washington Schools of Public Health and Medicine and the Brotman Baty Institute for Precision Medicine – shows how valuable whole-genome sequencing data collections– and the variants they reveal –are.
The paper, published online Feb. 10 in Nature, shares insights from an analysis of sequencing data representing the entire genomes of more than 53,000 individuals from diverse backgrounds.
The analysis is part of a multi-institution effort called the Trans-Omics for Precision Medicine (TOPMed) program, funded by the National Heart, Lung and Blood Institute, part of the National Institutes of Health. The national Data Coordinating Center for TOPMed, one of the largest whole-genome sequencing projects in the world, is housed in the UW School of Public Health’s Department of Biostatistics and coordinated by its Genetic Analysis Center.
“TOPMed is generating whole genome sequence data on a large and diverse set of study participants with detailed, and in some cases decades-long, records of health and disease,” said study co-author Sarah Nelson, a project manager for the Data Coordinating Center at the UW. “This allows researchers to discover genetic variation that impacts disease risk, progression or treatment options, which can help realize precision medicine and prevention approaches.”
The volume and diversity of TOPMed data have also enabled the detection of a huge number of genetic variations not previously detected or reported in prior research efforts, according to Nelson, also a research scientist in the biostatistics department. In this early analysis, researchers identified more than 400 million genetic variations, which can refer to differences between individuals or populations. Among these are extremely rare variants occurring in less than 1% of the population. Roughly half of these 400 million variants were seen in just one individual. These “singletons” were more likely to disrupt genes known to be either associated with human disease or essential to basic cell function.
TOPMed data has already enabled researchers to discover information important to a range of conditions, including atherosclerosis, sickle cell disease, chronic obstructive pulmonary disease, blood pressure and asthma. A recent study, led by researchers in Colorado in collaboration with UW biostatisticians and others, defined airway responses to common coronavirus infections in children and revealed reasons why some people are more prone to infection than others.
Over the past six years, the TOPMed program has grown to include the genomic data of more than 180,000 participants from over 80 studies. The Data Coordinating Center provides logistical support to more than 1,300 investigators affiliated with the TOPMed program.
“We have pioneered innovative ways to share data, enabling investigators from different studies to efficiently pool their data and improve power to make new discoveries,” said study co-author Kenneth Rice, who co-led the Data Coordinating Center with corresponding author Cathy Laurie until her retirement in 2020. Susanne May, a professor of biostatistics in the School of Public Health, serves as the center’s new director.
The center’s researchers have developed new statistical methods and software to provide investigators with novel ways to analyze the huge datasets. “Our data-cleaning and quality-control work have been invaluable,” said Rice, also a professor of biostatistics.
“With such a large and diverse set of studies, it is inevitable that mistakes happen, mixing up data samples and labels, for example,” he said. “Through our rigorous evaluation of trait and genetic data across studies – which no other part of TOPMed can do – we have been able to catch and fix problems that would otherwise have invalidated analyses,” he said.
Scientists in the center have also identified and addressed issues at the cross-section of genetics research and ethics. They continue to draft guidelines on the use of race, ancestry and genetics in TOPMed and lead TOPMed’s Ethical, Legal, and Social Issues Commitee.
Written by Ashlie Chandler, UW School of Public Health