ixlayer COVID-19 Clinical Testing Platform
can plug into any health system and lab within 48 hours. Physicians and Patients can use it to order testing.

 
 

Applications of low-pass sequencing in animal genetics and beyond

What happened to the promised genomics revolution?

Sequencing the human genome took 13 years to be completed and cost $2.7B1. It has been almost two decades since the Human Genome Project (HGP) was completed and, while genomics is gradually becoming a greater part of our lives, the promised truly transformational impact of genomics on healthcare and our way of life is still yet to come. The cost of DNA sequencing has been steadily declining, however, high-coverage (15X-30X) whole genome sequencing (WGS), the gold standard in genomics, still gravitates towards $1000/sample for human samples.

This has significantly stalled the progress of genomics and delayed its entrance as a key player in health and medicine for humans and animals. While high-coverage WGS is not the only available method for obtaining genetic information from organisms, it is the most comprehensive and data-rich approach. Alternative methods can deliver genetic information for a lower cost but suffer from multiple drawbacks such as limited throughput and scope, high operational complexity, high initial capitalization, etc.

This article discusses the potential of low-pass WGS (< 1X genome coverage), in combination with imputation analysis, to address the high unmet need for an affordable substitute of high-coverage WGS in genomics. We present a case study for using low-pass WGS and imputation to solve a previously unsolved problem in feline breed analysis and discuss other potential applications of the method.

Can high-coverage genome sequencing be de-throned?

DNA microarrays

Genotyping microarrays reign over the direct-to-consumer genetics market and are also an essential part of clinical diagnostics. They have been around for decades and offer some indisputable advantages over high-coverage WGS. These include:

  • Significantly lower cost
  • Robust and standardized results

However, DNA microarrays also suffer from numerous limitations:

  • While DNA microarrays cost only a fraction of what performing high-coverage WGS would cost, they are still relatively expensive to purchase ($30-$100 per chip) and run ($10-$20 per chip)
  • Designing a finalized DNA microarray chip is associated with substantial expenses related to designing the initial large screening array, testing it and, ultimately designing a smaller array chip for commercial or routine diagnostic use
  • Designing a DNA microarray is dependent on having sufficient pre-existing genomic data to identify the most promising genetic targets to include on the microarray chip; this makes microarrays unsuitable for organisms under-represented in genomics research
  • While large amounts of DNA microarray-based results can be used for Genome Wide Association Studies (GWAS), since the genetic targets on the chip are pre-selected, novel gene variants cannot be discovered

 

Hybridization-based target enrichment

Sequencing only selected genomic regions through hybridization-based target enrichment (hybridization capture) can be extremely useful in genomics. The method can be performed on pre-made sample libraries already prepared for Next Generation Sequencing (NGS).

The technology relies on molecularly tagged, specifically designed probes (complementary to the genomic regions of interest) hybridizing to the sample DNA. Through the pull-down of the tagged probes already hybridized to the sample DNA, genomic regions of interest can be separated from the rest of the DNA and sequenced. The method provides:

  • An attractive opportunity for cost-savings when the user needs information only on pre-determined genomic regions
  • The complete DNA sequence of the pre-selected genomic regions of interest, which allows for some level of novel variant discovery

Despite these positives, hybridization-based target enrichment sequencing has multiple drawbacks:

  • Since hybridization-based capture efficiency is low, the starting amount of DNA needed per sample is at least 500 ng, which makes some sequencing library preparation methods incompatible with this technology and increases the cost of library preparation
  • Ideally, hybridization capture would be performed on multi-library pools to increase cost-effectiveness; however, multiplexing capabilities with this method are reduced to processing 12-16 samples at a time, which is not scalable
  • Designing and synthesizing oligo probes for hybridization capture is involved and expensive and favorable pricing can be achieved only through the bulk ordering of probes
  • Hybridization capture follows a laborious protocol with multiple steps, requiring prior experience on behalf of the user and thus amplifying the potential for mistakes

 

Low-pass sequencing + imputation

With dropping sequencing costs, low-pass WGS (typically defined as <1X coverage of the genome) presents an attractive substitute for DNA microarrays. For comparison, 0.4X coverage translates to around one read covering each of ~30 million genetic variants of the human genome, while microarrays provide information on orders of magnitude fewer variants2.

Low-pass sequencing is particularly useful when combined with imputation analysis, which allows us to fill in sequencing data gaps and impute missing data from known gene variant co-inheritance patterns. Obtaining information on a few different variants in a block of DNA allows imputing the remaining known variants within the same block. A 0.4X genome coverage, combined with imputation analysis, was found to be 98.2% concordant with a DNA microarray-based analysis, while 1X coverage showed 99.2% concordance with microarray-based results2. Therefore, low-pass sequencing, in combination with imputation analysis, can provide at least the same level of accuracy as DNA microarrays.

A potential drawback of using low-pass WGS + imputation is that there has to be prior knowledge available on gene variant co-inheritance patterns in the species of interest, i.e. a robust multi-generational haplotype map has to already exist or be built prior to analysis (discussed in more detail later in this text). If this prerequisite is in place, low-pass WGS can be an operationally straightforward and cost-effective way to obtain large amounts of genomic information when following a scale-optimized laboratory process. In addition, with the accumulation of a large low-pass WGS sample database, the potential for the discovery of novel variants increases.

Table 1. Comparison of genotyping methods.

 

The internet’s favorite animal and low-pass sequencing 

The notorious problem of cat breed analysis

Unlike dog breeds, cat breeds are extremely difficult to identify with precision. The reasons for this are associated with the domestic cat’s short history of selective breeding and its under-representation in genomics research.

Domestication versus selective breeding

Cat domestication started around 10,000 years ago and was related to the emergence of agriculture since cats provided the perfect solution for rodent pest control3,4,5. During the gradual cat domestication process, very limited selective breeding occurred due to the fact that freely breeding cats were still remarkably capable pest controllers6,7,8.  Selective cat breeding only appeared in modern times, more specifically, over the past 50 years8. In evolutionary terms, this is an extremely short period of time for robust genetically different sub-populations within any species to form.

In addition, selective cat breeding has historically been focused on aesthetic features (coat color, coat texture, and other typically monogenic traits) rather than genetically complex body structure or functional/behavioral traits. This has resulted in cat breeds often being defined by a single gene variant while sharing the majority of variants associated with life history and geographic origin. Conversely, it also happens that cats with diverse genotypes are classified as the same breed due to similar phenotypic presentation. 

These factors make cats an unusual case of domesticated animals, especially when compared to dogs, where domestication started ~14,000 years ago and followed a rigid set of selective breeding rules focusing on traits defined by complex gene interactions9. The vast differences between cats’ and dogs’ evolutionary histories mean that breed analyses based on genotype will yield different conclusions for the two species.

 

Under-representation of cats in genomics research

There is a substantial disparity in the number of resources and research efforts dedicated to feline genomics compared to canine genomics. There is also a stark contrast in the genome sequencing goals set for the two fields. While researchers from the 99 Lives cat genome project celebrated when they sequenced the genomes of 200 domestic cats (double their initial goal)10, the Dog10K Consortium is aiming to sequence the genomes of 10,000 dogs and wild canids11, as well as sequence dog breeds at high depth, allowing for different breeds to have their own high-quality genome assemblies. In contrast, until the 99 Lives project, there had been very little systematic effort to understand genome-wide differences between cat breeds.

Sequencing and imputing our way to cat breed analysis

The theory behind (good) cat breed analysis

The evolution of cat breeds is inextricably linked to the species’ ancestral and geographic history8. Therefore, cat breed analysis bears a high degree of similarity with human ethnic ancestry analysis. Both types of analysis are based on assessing the sample of interest’s genomic similarity to chunks of DNA (haplotype blocks), rather than to the small individual units comprising the genome (nucleotides). Gene variants (alleles) are usually inherited together in discrete haplotype block units showing the very low amount of ‘genetic shuffling’ across generations12.

Because every species has its own haplotype inheritance pattern (multi-generational haplotype map, also known as linkage disequilibrium map), haplotype blocks can be used to assess a cat’s similarity to a particular breed using a limited amount of data (imputation).

As Figure 1 shows, different breeds have a characteristic combination of alleles inherited together within each haplotype block. A comprehensive breed analysis has to take into account the sample’s genetic similarity to all known feline haplotype blocks before judging the cat’s overall genetic proximity to a particular breed. Once a high-quality multi-generational haplotype map is available, low-pass WGS in conjunction with bioinformatic imputation can be used for cat breed analysis. The better the quality of the feline haplotype map, the more accurate the imputation-based breed analysis.

Building a thorough high-resolution haplotype map relies on having a reference panel comprised of genome sequencing data from thousands of cats representing different breeds and geographic locations. If the reference panel has a small sample size or an obvious bias in population sampling, the allele frequency and allele co-segregation estimates on which the haplotype map is based will be inaccurate.

In addition, as mentioned previously, genetic differences between feline breeds are minor and difficult to detect unless a large cat genome repository exists. Given the already discussed limitations of the feline genomics effort, cats are disadvantaged when it comes to having sufficient publicly available genomic data for the creation of a highly detailed haplotype map (and therefore breed identification).

 

Cats Genomic Sequencing

 

Basepaws’ approach to cat breed analysis

Building the tool kit

We first aimed to build the largest available reference panel of WGS data from purebred and mixed breed cats from across the world. Our reference panel is continuously enriched and updated with quality-controlled new cat DNA samples. This process of updating the reference panel takes full advantage of the screening potential of low-pass WGS.

We start by first performing low-pass sequencing on candidate purebred cats. We then perform a population stratification analysis to get an idea of how well these samples cluster with existing high-coverage samples (> 15X coverage). This inexpensive and computationally lean approach allows us to select the best candidates for subsequent high-coverage sequencing. To not bias our computationally defined populations to the founder samples, we also supplement the analysis with occasional non-screened purebred samples. 

Using our sample reference panel, we are able to: (1) perform a Principal Component Analysis (PCA) to observe breed clusters based on genetic similarity (Eastern, Western, Exotic, Persian, and Polycat breed groups); (2) generate a high-resolution multi-generational haplotype map and utilize it for our imputation and downstream machine learning classification pipeline for breed analysis. 

Customer sample analysis – demystifying breeds

Every Basepaws’ customer sample undergoes low-pass WGS (average coverage of 0.44X) and the sequencing reads are mapped to the latest version of the domestic cat’s genome assembly (felix_catus_0.9). Variant calling is then performed, followed by an imputation analysis of un-genotyped alleles with the help of our high-depth reference panel and out multi-generational haplotype map.

Next, we use our haplotype map to segment the cat sample’s genome into haplotype blocks, which are then compared against the haplotype blocks in our reference panel using a machine learning classification algorithm.

We use this analysis to deliver two types of insights regarding a cat’s breed:

  • How similar is a cat’s DNA to different cat breed groups and individual breeds (percentage similarity)?
  • Which parts of a cat’s DNA are most similar to different cat breed groups and individual breeds?

Using low-pass WGS, combined with bioinformatic imputation, has allowed us to perform a high-accuracy cat breed analysis for a fraction of the cost (and time) that would have been required if an alternative approach, such as DNA microarray, was used.

 

What else can we do with low-pass sequencing and imputation?

Low-pass WGS’s potential is becoming widely recognized and the technique is already being applied in multiple novel and diverse contexts. It is well-known that geneticists are currently seeing the world through a caucasian-centric lens. This is due to the fact that the majority of the available genomic data comes from Caucasian populations.

While performing high-coverage WGS on thousands of people from different races is a costly endeavor, our understanding of the human genome is advanced enough to allow the use of low-pass WGS + imputation on under-represented populations to quickly (and cheaply) diversify our genetic outlook. Low-pass WGS has already been used to identify 2 new alleles associated with major depressive disorder in a cohort of Chinese women13. This study performed low-pass WGS on 10,640 Chinese women.

Another example is the Broad Institute/Harvard partnership on the Neuropsychiatric Genetics in African Populations (NeuroGAP) initiative aiming to study psychiatric genetics in 35,000 African people14,15. Using low-pass WGS is among the key methods being considered for this study.

Low-pass WGS also has a high potential for advancing our understanding of cancer and cancer care. One valuable use of this technique in cancer care is quality control of tumor biopsy prior to high-coverage sequencing and more in-depth genomic sample characterization. Low-pass WGS can be used to (1) confirm that isolated blood cells are indeed cancerous cells; (2) confirm that sequencing libraries generated from single cells are a uniform representation of the genome; (3) confirm that sequencing libraries from cell-free DNA (cfDNA) have tumor DNA15.

Apart from being used as a supplementary method for quality control, low-pass WGS can also be used as the primary method in cancer research and diagnostics. Researchers have used low-pass WGS of cfDNA, combined with imputation, to identify somatic copy number variations (SCNVs) in tumor biopsies of patients suffering from metastatic breast or prostate cancer15. Alternative methods for assessing SCNVs, such as microarrays, would typically require higher starting DNA amounts and provide lower resolution.

Conclusion

Low-pass WGS, combined with imputation analysis, is becoming increasingly widely adopted across different genomics applications due to its low cost, operational convenience, and information density. As NGS prices continue to decrease and bioinformatic methods continue to evolve, this technology will become more commonly used.

 

References
1) genome.gov/human-genome-project/Completion-FAQ
2) Wasik, K., Berisa, T., Pickrell, J.K., Li, J.H., Fraser, D.J., King, K. and Cox, C., 2019. Comparing low-pass sequencing and genotyping for trait mapping in pharmacogenetics. bioRxiv, p.632141.
3) Vigne, J.D., Guilaine, J., Debue, K., Haye, L. and Gérard, P., 2004. Early taming of the cat in Cyprus. Science, 304(5668), pp.259-259.
4) Gupta, A.K., 2004. Origin of agriculture and domestication of plants and animals linked to early Holocene climate amelioration. CURRENT SCIENCE-BANGALORE-, 87, pp.54-59.
5) Zohary, D. and Hopf, M., 2000. Domestication of plants in the Old World: the origin and spread of cultivated plants in West Asia, Europe and the Nile Valley (No. Ed. 3). Oxford University Press.
6) Dobney, K. and Larson, G., 2006. Genetics and animal domestication: new windows on an elusive process. Journal of Zoology, 269(2), pp.261-271.
7) Randi, E., Pierpaoli, M., Beaumont, M., Ragni, B. and Sforzi, A., 2001. Genetic identification of wild and domestic cats (Felis silvestris) and their hybrids using Bayesian clustering methods. Molecular Biology and Evolution, 18(9), pp.1679-1693.
8) Lipinski, M.J., Froenicke, L., Baysac, K.C., Billings, N.C., Leutenegger, C.M., Levy, A.M., Longeri, M., Niini, T., Ozpinar, H., Slater, M.R. and Pedersen, N.C., 2008. The ascent of cat breeds: genetic evaluations of breeds and worldwide random-bred populations. Genomics, 91(1), pp.12-21.
9) Adams, J., 2008. Genetics of Dog Breeding. Nature Education 1(1):144
10) missouri.edu/99lives
11) Wang, G.D., Larson, G., Kidd, J.M., vonHoldt, B.M., Ostrander, E.A. and Zhang, Y.P., 2019. Dog10K: The International Consortium of Canine Genome Sequencing. National Science Review.
12)broadinstitute.org/international-haplotype-map-project/haplotype-map
13) Cai, N., Bigdeli, T.B., Kretzschmar, W., Li, Y., Liang, J., Song, L., Hu, J., Li, Q., Jin, W., Hu, Z. and Wang, G., 2015. Sparse whole-genome sequencing identifies two loci for major depressive disorder. Nature, 523(7562), p.588.
14) broadinstitute.org/stanley-center-psychiatric-research
15) cancer.gov/about-nci/organization/ccg/blog/2019/low-coverage-seq

Basepaws

Basepaws is a pet health company specializing in genetics. In 2018, we launched the world's first at-home consumer DNA test for cats focused on delivering health and breed-related actionable insights.
Our feline breed analysis product is the only one on the market and we pride ourselves on being the DNA sequencing pioneers of D2C animal genetics. Our efficient lab operations have allowed us to make the process of low-pass sequencing extremely cost-effective and easily transferable to other organisms.

Our mission is to improve the health and well-being of every pet through genomics. We are always looking for partners to help us
achieve and expand on our mission. Learn more at Basepaws.com.

SHARE THIS ARTICLE

You Might Also Like

Be the first to know!

Sign up to our newsletter to receive the latest industry news, and trends.