To access the short-read whole genome sequencing (srWGS) data, you have access to the point-and-click tool, Genomic Extraction.
The Genomic Extraction tool extracts variant data from the genomic dataset and saves it as Variant Call Format (VCF) callset files for exporting to a Jupyter Notebook environment, where you can perform analysis using Hail, PLINK, etc.
Note: The extraction process using the Genomic Extraction tool should only be used when you want to analyze whole genome sequencing (WGS) data in a smaller subset of participants (less than 5,000 participants). This process will not work for array data.
For larger cohorts, you will need to pull the genomic data directly into Jupyter Notebook using files in the Controlled CDR Directory.
To use the Genomic Extraction tool
- Follow the steps for creating a cohort with the Cohort Builder.
- Click “” to the right of “Datasets.”
- Select your cohort under the “Select Cohorts (Participants)” column on the left by clicking the checkbox.
- Select the “Short-read whole genome sequencing data” under the “Select Concept Sets (Rows)” column in the middle by clicking the checkbox.
- Select “VCF Files” under the “Select Values (Columns)” column on the right by clicking the checkbox.
- Click “Create Dataset.”
- Name your dataset and add a description for your dataset.
- Click “SAVE.”
- Click “ANALYZE” in the bottom right to begin creating your Jupyter Notebook environment. A pop-up will appear, asking if you would like to run the extraction process.
Note: the extraction process utilizes cloud compute credits to generate the code and files from the genomic dataset. The process can be significant, depending on the amount of data you are analyzing.
- Decide if you want to start or skip genomic extraction.
Note: Genomic data extraction runs in the background and incurs compute costs. You can also skip and save your dataset without beginning the extraction process or incurring any compute costs.
If you choose to start genomic extraction
- Click “Extract & Continue.”
Note: Genomic data extraction runs in the background and will notify you when the files are ready for analysis.
- Select Python as your programming language.
Note: The genomic tools currently available in the Researcher Workbench do not use R or SAS as a programming language.
- Select the notebook or create a new notebook.
- If creating a new notebook, name your notebook.
- Select your preferred analysis tool: Hail, PLINK, or other VCF-compatible tool.
- Click “EXPORT” to launch the Jupyter Notebook environment.
To check the status of the genomic extraction
If you chose to run the extraction in the background while you were saving your dataset, you can check the status.
- Click “.”
- View the status, date started, cost, and duration of the genomic extraction.
- Open the notebook you created under the “If you choose to start genomic extraction” steps.
Note: You do not need to create a new notebook or copy the file path anywhere, as it has already been created for you when you clicked export at the start of your extraction.
Comments
0 comments
Article is closed for comments.