How can we find the sequencing depth for the WGS data?

  • Updated

Sequencing depth is available for both methods of whole genome sequencing that we provide: short read whole genome sequencing (srWGS) and long read whole genome sequencing (lrWGS). Sequencing depth refers to the number of times a given base in a genome is sequenced. The average sequencing depth across the genome is referred to as the genome coverage.

For the srWGS data, the sequencing coverage is available for each sample as the mean_coverage in the genomic metrics file. For the lrWGS data, the sequencing coverage is available for each sample as the mosdepth_cov in the lrWGS variant metrics file. These genomic metrics files are described in the article How the All of Us Genomic data are organized

Another metric to describe the read evidence supporting a genomic location is allele depth (AD), which is the number of reads supporting an allele. This information is available for srWGS SNP & Indel variants, srWGS structural variants (SVs), and lrWGS variants. In the VCF format, the AD is available in the FORMAT field for each genotype. In the Hail MatrixTable (MT), the AD is available in the entry field AD. In the srWGS SNP & Indel VariantDataset (VDS), AD is available for each allele as the local allele depth (LAD). These fields are described in the article How the All of Us Genomic data are organized.

Sequencing depth is only available for the v6 and v7 datasets. 

Was this article helpful?

1 out of 2 found this helpful

Have more questions? Submit a request

Comments

0 comments

Article is closed for comments.