Using the Genomic Extraction Tool

To access the short-read whole genome sequencing (srWGS) data, you have access to the point-and-click tool, Genomic Extraction.

The Genomic Extraction tool extracts variant data from the genomic dataset and saves it as Variant Call Format (VCF) callset files for exporting to a Jupyter Notebook environment, where you can perform analysis using Hail, PLINK, etc.

VCF, Hail, PLINK, and BGEN files are also available for different callsets as well as a Hail VariantDataset (VDS) file for the entire genomic dataset.

Note: The extraction process using the Genomic Extraction tool should only be used when you want to analyze whole genome sequencing (WGS) data in a smaller subset of participants (less than 5,000 participants). This process will not work for array data.

For larger cohorts, you will need to pull the genomic data directly into Jupyter Notebook using files in the Controlled CDR Directory.

To use the Genomic Extraction tool

Note: With the release of Curated Data Repository v8, the genomic extraction process may take longer. Extractions may take up to 5 or more hours depending on your sample size. Please plan accordingly. If you have any questions, email us at support@researchallofus.org.

Follow the steps for creating a cohort with the Cohort Builder.
Click “” to the right of “Datasets.”
Select your cohort under the “Select Cohorts (Participants)” column on the left by clicking the checkbox.
Select the “Short-read whole genome sequencing data” under the “Select Concept Sets (Rows)” column in the middle by clicking the checkbox.
Select “VCF Files” under the “Select Values (Columns)” column on the right by clicking the checkbox.
Click “Create Dataset.”
Name your dataset and add a description for your dataset.
Click “SAVE.”
Click “ANALYZE” in the bottom right to begin creating your Jupyter Notebook environment. A pop-up will appear, asking if you would like to run the extraction process.
Note: the extraction process utilizes cloud compute credits to generate the code and files from the genomic dataset. The process can be significant, depending on the amount of data you are analyzing.
Decide if you want to start or skip genomic extraction.
Note: Genomic data extraction runs in the background and incurs compute costs. You can also skip and save your dataset without beginning the extraction process or incurring any compute costs.

If you choose to start genomic extraction

Click “Extract & Continue.”
Note: Genomic data extraction runs in the background and will notify you when the files are ready for analysis.
Select Python as your programming language.
Note: The genomic tools currently available in the Researcher Workbench do not use R or SAS as a programming language.
Select the notebook or create a new notebook.
If creating a new notebook, name your notebook.
Select your preferred analysis tool: Hail, PLINK, or other VCF-compatible tool.
Click “EXPORT” to launch the Jupyter Notebook environment.

To check the status of the genomic extraction

If you chose to run the extraction in the background while you were saving your dataset, you can check the status.

Click “.”
View the status, date started, cost, and duration of the genomic extraction.
Open the notebook you created under the “If you choose to start genomic extraction” steps.
Note: You do not need to create a new notebook or copy the file path anywhere, as it has already been created for you when you clicked export at the start of your extraction.

Using the Genomic Extraction Tool

To use the Genomic Extraction tool

If you choose to start genomic extraction

To check the status of the genomic extraction

Was this article helpful?

Comments

<%= previousTitle %>

<%= nextTitle %>

<%= block.name %>

<%= block.name %>

Have a question or would like to make a request?

Categories

Toggle navigation menu

<%= category.name %>

Search

To use the Genomic Extraction tool

If you choose to start genomic extraction

To check the status of the genomic extraction

Was this article helpful?

<%= previousTitle %>

<%= nextTitle %>

<%= block.name %>

<%= block.name %>

Have a question or would like to make a request?

Categories

Toggle navigation menu

<%= category.name %>

Categories

Categories