Twenty years after the Human Genome Project was declared complete, the Y chromosome has been fully sequenced for the first time.
Most people have 22 pairs of chromosomes plus two sex chromosomes – either a pair of X chromosomes or one X and one Y chromosome. Having a Y usually – but not always – results in an embryo developing male characteristics.
The Y is one of the smallest chromosomes and has the fewest genes coding for proteins. Because it normally has no paired chromosome to swap pieces with prior to sexual reproduction, it is especially likely to accumulate bits of repetitive DNA.
All early methods of DNA sequencing involved breaking DNA up into small pieces, reading their genetic code and then reassembling the pieces by looking for overlaps. This technique doesn’t work with repetitive DNA where lots of the pieces are identical.
Because of this, the “completed” human reference genome announced in 2003 was actually far from complete. “The Y chromosome just kept being pushed aside,” says Charles Lee at the Jackson Laboratory for Genomic Medicine in Connecticut. “It’s a hard chromosome to complete because of all the repetitive sequences.”
Only in 2021 did a team including Karen Miga at the University of California, Santa Cruz, finally fill in almost all the gaps, and again declare the human genome complete.
What made this possible is a technique, developed by a company called Oxford Nanopore, that reads the sequence of a single DNA molecule as it goes through a tiny hole, producing pieces that are millions of DNA letters long rather than a few hundred.
But the “complete” genome sequenced by Miga and her colleagues was a female one, consisting of the 22 normal chromosomes and the X chromosome. Only now have Miga’s team completed the Y chromosome as well, from a person of European descent.
“The Y chromosome is riddled with complicated structures and includes huge areas where the same blocks of code repeat over and over with minor variations, making its assembly quite challenging,” says Sergey Nurk, who worked on the project before getting a job at Oxford Nanopore. “[The] ability to sequence any-length fragments of DNA was absolutely instrumental for this project.”
This complete Y chromosome has 106 protein-coding genes, which is 41 more than in the reference genome. But almost all these extra genes are just copies of one gene called TSPY.
At the same time, Lee’s team has sequenced the Y chromosomes of 43 diverse men, including 21 of African origin. The teams were independent but did collaborate, says Lee.
However, only three of his team’s Y sequences are gapless, he says. The rest still have between one and five gaps.
The 43 Y chromosomes show considerable diversity, says Lee. For instance, the number of copies of the TSPY gene ranges between 23 and 39.
Whether the repetitive DNA in the Y does anything important remains unclear. “I believe there’s a lot to learn about repetitive DNA and we just don’t understand it yet, and so we’ve still dismissed it as junk,” says Lee.
But most biologists and clinicians have little interest in the repetitive DNA, says David Page at the Massachusetts Institute of Technology, who studies the Y chromosome. The sequencing has also revealed very little that is new about the “euchromatic” parts of the Y that do include genes, he says.
“The present study [by Miga’s team] represents an incremental advance in our understanding of the euchromatic portions, which were nearly complete 20 years ago,” says Page.