
Diverse human genomes reveal complex genetic variation
Near-complete genome sequences of 65 individuals from five continents and 28 population groups advance discovery of human DNA code alterationsMedia Contact: Leila Gray, 206-475-9809, leilag@uw.edu

Genome assemblies from 65 individuals, representing a variety of the world’s populations, are advancing the scientific exploration of complex genetic structural variation.
Structural variations are genetic code alterations that span more than 50 base pairs, the rungs on the DNA ladder. These changes were hard to detect until the recent advent of newer sequencing technologies and analytical algorithms, as well as larger collections of more complete, diverse genomes.
Results from the latest work in this area, conducted by the Human Genome Structural Variation Consortium with participants from the international 1000 Genomes Project, are reported today, July 23, in the scientific journal Nature.
Evan E. Eichler, professor of genome sciences at the University of Washington School of Medicine in Seattle and a Howard Hughes Medical Institute investigator, is one of six joint senior authors on the paper. His recent postdoctoral scholar, Glennis A. Logsdon, now an assistant professor of genetics at the University of Pennsylvania’s Perelman School of Medicine, is the first lead author along with three other researchers.
“These complete genomes from diverse genetic backgrounds are providing us new insights into how genomes have changed over time.” Eichler said. “It’s like geneticists have just been presented with a brand new microscope to see the true complexity of human genetic variation for the first time”.
In addition to expanding the catalog of structural variations, the Human Genome Structural Variation Consortium also obtained new insights into centromere differences among people. A centromere – appearing as a constricted area on a chromosome – is a control center for separating genetic materials before cell division. Genome areas involved in centromere form and function are some of the most diverse, quickly evolving genetic regions in humans.
“The level of diversity within human centromeres is just remarkable,” said Logsdon. “We see differences in their sequence, structure, and organization that suggest these regions are evolving more quickly than we ever thought before. This rapid evolution may be important for how centromeres function and adapt over time.”
Although complex structural variations have been difficult to spot and analyze, they are important finds because they are much more likely to alter the expression of genes. After identifying such variation between and within populations, it is now easier to determine if the differences result in disease or other traits, like helping our ancestors adapt to their environments.
Structural variations in our genomic code can occur in several ways: deletions, inversions, duplications, transpositions, mobile element insertions, or more intricate rearrangements. Scientists study these variations to see whether they significantly affect gene function or gene expression.
Genome sequencing by the Consortium closed 92% of all the gaps in previous assemblies—most of which corresponded to these complex variants. In analyzing this set of diverse human genomes, the international collaboration of scientists uncovered up to 26,115 structural variants per individual for a total of more than175,000 sequence-resolved events that were seen at least once.
A few other highlights from the research include:
Improved assemblies of several Y chromosomes. Y chromosomes are difficult to assemble because they contain many highly repetitive sequences. With their several new Y chromosome assemblies, the researchers investigated one of the most extensive densely packed regions of the human genome, known as Yq12. Tight packing limits gene activity by making the DNA code less reachable by the mechanisms that copy the information contained in genes. While acknowledging that the Yq12 region remains challenging to probe, the researchers have begun making inroads in determining variation. Their findings suggest that it is among some of the most variable portions of a human’s Y chromosome.

New look at the major histocompatibility complex. This complicated region, highly relevant to disease research, is associated with immune function and autoimmunity dysfunction. Among the several locations in this complex examined for variations was an area important to vaccine response and to autoimmune diseases. Other studies of this complex region looked at variations in areas responsible for coding cell surface receptors that sense and signal the presence of invaders like viruses.
Centromere variations. Genome regions associated with centromeres are among the most highly prone to mutations. The lengths of more than a fifth of centromeres vary by more than 1.5-fold, and about a third vary in structure. Not surprisingly, the researchers found a large number of new variants – more than 4,000 based on their complete sequence of 1,246 centromeres. The researchers also noticed indications suggesting that sometimes two sites, rather than one, exist for the kinetochore – a structure for the attachment and control of the microscopic ropes that pull apart chromosomes during cell division. The researchers pointed out that additional research would need to confirm the functional consequence of these di-kinetochores.
Survival motor neuron genes (SMN1/SMN2). These genes are in a structurally complex region of biomedical interest. Mutations in or lack of the SMN1 gene are linked to spinal muscular atrophy (caused by the lack of a protein needed for muscle movement). SMN2 is a less powerful backup gene but a target of one of the most successful gene therapies. These genes are embedded in a region of long, repeated DNA sequences. This has made full sequencing nearly impossible until now. Through their assemblies of this region, researchers obtained the structure and copy number of these and a few other genes among several of the individuals in their study. They distinguished functional copies of SMN1 and SMN2. Their analysis also suggested potential disease-risk sites in a few of the genomes analyzed.
Senior scientists and institutions heading the “Complex genetic variations in nearly complete human genomes” project, in addition to Eichler at the UW, include Miriam Konkel at Clemson University in South Carolina, Jan Korbel at EMBL (the European Molecular Biology Laboratory) in Germany, Tobias Marschall at Heinrich Heine University in Germany, and Charles Lee and Christine R. Beck at the Jackson Laboratory for Genomic Medicine in Connecticut. Peter Ebert of the Marschall Lab at Heine University, and Peter A. Audano and Mark Loftus, both at the Jackson Laboratory for Genomic Medicine, along with Logsdon, were joint first authors.
Funding was provided by National Institutes of Health (NIH) grants U24HG007497, R00GM147352, R01HG002385 and R01HG010169, R01HG011649, K99HG012798, U01HG013748; NIH National Institute of General Medical Sciences R35GM133600, 1P20GM139769, 1R35GM138212; NIH National Institute of Allergy and Infectious Disease (NIAID) U01AI090905; NIH National Cancer Institute (NCI) R01CA261934, R21CA259309, P30CA034196; National Science Foundation (NSF) Career 2046753, the Ministry of Culture and Science of North Rhine-Westphalia (PROFILNRW-2020–107-A), and the German Research Foundation (DFG) 496874193. This work was also supported, in part, by the Intramural Research Program of the National Human Genome Research Institute, the Jürgen Manchot Foundation, Howard Hughes Medical Institute, and the Düsseldorf School of Oncology (SPATIAL).
The researchers also thank the individuals who provided their samples for sequencing and analysis to the 1,000 Genomes Project.
For details about UW Medicine, please visit http://uwmedicine.org/about.