What is the cost to filter the VDS?

Filtering the VDS can be expensive, especially if the variants are spread across the entire genome. You may need to utilize a larger cluster and allocate more time for the task even for a small number of variants if the variants are across the entire genome.

It is important to note that the costs don't scale linearly with the number of variants or samples, but rather depend on how scattered the filtering regions are. If the variants are limited to specific chromosomes rather than spanning the entire genome, you can filter the VDS by chromosome (filter_chromosme) first and then apply a filter_interval step using a bed file. This approach can significantly reduce the processing time. When dealing with large interval sets, filter_rows is faster than filter_intervals and less likely to encounter memory issues. Details about the All of US VDS can be found in this support article: The new VariantDataset (VDS) format for All of Us short read WGS data , and examples on analyzing the VDS can be found in the tutorial workspace All of Us Tutorial Workspace: Getting Started with Controlled Tier Data (v8) (Researcher Workbench login required).

We recommend starting by checking if the variants of interest are present in the smaller callset before using the VDS. You can find more details about the smaller callsets in the support article: Smaller Callsets for Analyzing Short Read WGS SNP & Indel Data with Hail MT, VCF, and PLINK.

Example Cost on CDRv7

It takes approximately 5.5 hours with 50 workers and 500 preemptibles, costing around $330 at a rate of ~$60 per hour, to generate a dense MatrixTable containing 73,545,961 variants and 245,394 samples from the v7 VDS. Please ensure that preemptibles constitute less than 50% of the total number of all workers (primary plus secondary) in your cluster, as exceeding this limit may lead to job failures.

To filter the v8 or v9 VDS, cost are expected to be the higher than noted above. Before starting any genomics analysis, please see our recommendations in the Getting Started with Genomic Analyses on the Researcher Workbench support article.

What is the cost to filter the VDS?

Example Cost on CDRv7

Was this article helpful?

Comments

<%= previousTitle %>

<%= nextTitle %>

<%= block.name %>

<%= block.name %>

Have a question or would like to make a request?

Categories

Toggle navigation menu

<%= category.name %>

Search

Example Cost on CDRv7

Was this article helpful?

<%= previousTitle %>

<%= nextTitle %>

<%= block.name %>

<%= block.name %>

Have a question or would like to make a request?

Categories

Toggle navigation menu

<%= category.name %>

Categories

Categories