The All by All tables were created by leveraging the extensive genomic and phenotypic data available through the All of Us Researcher Workbench.
Using the available short-read whole genome sequence and phenotypic data, genome-wide association studies (GWAS) and rare variant association studies (RVAS) were conducted across a wide range of complex human traits. GWAS and RVAS analysis is used to identify genes or genetic variants which are associated with phenotypes such as height or a health condition like diabetes.
Table of Contents
Overview
The All by All tables, which are available to Controlled Tier users, include the GWAS and RVAS results for thousands of phenotypes from the ~250,000 participants who have whole genome sequence data available.
Researcher Workbench users do not need to have prior experience conducting GWAS analysis to benefit from the All by All tables and can use the data directly to explore genes or genetic variants which contribute to their phenotype of interest. Users will also benefit from time and cost savings, since researchers will not need to perform the analysis themselves and can query the results directly.
All by All is a large genetic association result database across diverse ancestries and a wide range of important human traits and diseases. While similar analyses have been performed using other genomic databases, the All of Us Research Program has specific advantages.
The All of Us Research Program enrolls and collects data from participants of diverse ancestries, including those who have been previously underrepresented in biomedical research, which will allow novel health discoveries in groups who have previously been left out of biomedical research. Additionally, the large sample size available in the All of Us dataset will provide the statistical power needed to identify associations using rare genetic variants.
Data Details and Quality Control
All by All utilizes the genomic and phenotypic data of the almost 250,000 participants who have short-read whole genome sequencing data available in the All of Us dataset.
All by All includes the results of association testing for ~3,400 phenotypes of six different categories, including physical measurements, lab measurements, phecodes, phecodeX, personal and family health history (PFHH), and electronic health record (EHR) sourced drugs and medications.
Sample and variant quality control was performed such that only high quality samples, genotypes, and variants were included in downstream analyses. To ensure sufficient statistical power for downstream analyses, only phenotypes with greater than 200 cases within each genetic ancestry group were included.
In total, 8,895 high-quality ancestry group and phenotype pairs were included in the final data. Find more information under Genetic Ancestry.
For each of the 8,895 phenotype-ancestry combinations (Figure 1), three types of association tests were performed:
- GWAS results for common variants from the Allele Count / Allele Frequency (ACAF) callset
- GWAS results for exonic variants from the exome callset
- RVAS gene-level results for variants from the exome callset
Figure 1 | Phenotypes included in the All by All tables.
The results of each association test by ancestry or meta-analysis are available as Hail Matrix Tables. Find more information under Data Format. The pipeline is available at https://github.com/atgu/aou_gwas.
Genetic Ancestry
The genetic ancestry categories correspond to definitions used within gnomAD, the Human Genome Diversity Project, and 1000 Genomes. Read about how All of Us computes genetic ancestry.
All by All includes data from participants from six major ancestry groups: AFR (African), AMR (Admixed American), EAS (East Asian), EUR (European), MID (Middle Eastern), and SAS (South Asian) to understand the genetic basis of human traits and health conditions in diverse backgrounds.
Data Format
The All by All data is available as 21 Hail Matrix Tables with the summary statistics of ACAF, exome, and gene-based association testing by individual ancestry or following meta-analysis (Figure 2). In addition, the results of association testing by phenotype are available as 36,936 individual Hail Tables (Figure 2).
Figure 2 | Total number of association test results produced by All by All.
The results Hail Matrix Tables are all keyed by the same structure (Figure 3).
- For gene-based results, the rows are the gene groups keyed by gene_id, gene_symbol, annotation, and max_MAF.
- For single-variant results (ACAF & exome), the rows are keyed by locus, alleles and phenoname.
- The columns are the phenotype meta information fields keyed by phenoname and include participant case and control counts.
- The entries are the summary statistics for association testing for each phenotype - genotype combination including Pvalue and Beta.
Figure 3 | All by All results are available as Hail MatrixTables. Note that the table in Figure 3 should only be used as an example, since the actual fields in the data will differ due to the full results table being too large to present as a figure.
The individual phenotype Hail Tables all have the same structure (Figure 4).
- For gene-based results, the rows are the gene groups keyed by gene_id, gene_symbol, annotation, and max_MAF.
- For single-variant results (ACAF & exome), the rows are keyed by locus and alleles.
- The global fields contain case and control counts and lambda GC (the genomic inflation factor) for the phenotype.
Figure 4 | All by All results are available as per phenotype Hail Tables.
The All by All files are available for Controlled Tier users in the Researcher Workbench. The files are located in gs://fc-aou-datasets-controlled/AllxAll/v1/ with the naming scheme below:
- Hail Matrix Tables: /mt/${POP}_${TYPE}_results.mt
- Hail Tables: /ht/${TYPE}/${POP}/phenotype_${PHENONAME}_${TYPE}_results.ht
- POP specifies either single ancestry results or meta-analysis results (AFR, AMR, EAS, EUR, MID, SAS, META).
- TYPE specifies the type of association test (ACAF, exome, gene).
- PHENONAME specifies the phenotype of interest by concept ID.
To see a full list of phenotypes used in the All by All tables, please see Appendix A.
How to query All by All tables
The All by All data is available as Hail Tables and Hail Matrix Tables. To effectively query the data, researcher will load the data into a Jupyter Notebook for analysis using Python.
Here are some examples of how to query the All by All tables.
Full association test results (Hail Matrix Table):
The All by All result Hail MatrixTables are located in gs://fc-aou-datasets-controlled/AllxAll/v1/mt/ with the naming scheme ${POP}_${TYPE}_results.mt.
POP specifies either single ancestry results or meta-analsysis results (AFR, AMR, EAS, EUR, MID, SAS, META).
TYPE specifies the type of association test (ACAF, exome, gene).
# The results table used for the example will be the GWAS results for ACAF variants for SAS genetic ancestry:
mt = hl.read_matrix_table("gs://fc-aou-datasets-controlled/AllxAll/v1/mt/SAS_ACAF_results.mt")
Individual phenotypes (Hail Table):
The All by All result Hail Tables for individual phenotypes are located in gs://fc-aou-datasets-controlled/AllxAll/v1/ht/ with the naming scheme ${TYPE}/${POP}/phenotype_${PHENONAME}_${TYPE}_results.ht
POP specifies either single ancestry results or meta-analsysis results (AFR, AMR, EAS, EUR, MID, SAS, META).
TYPE specifies the type of association test (ACAF, exome, gene).
PHENONAME specifies the phenotype of interest by concept ID
#The results table used for the example will be the meta-analysis results for "essential hypertension":
ht = hl.read_table("gs://fc-aou-datasets-controlled/AllxAll/v1/ht/ACAF/META/phenotype_401_ACAF_results.ht")
Appendix A - All by All Phenotypes Index
All by All Phenotype Featured Workspace Index can be found here. This index details the phenotypes included in All by All available in 6 Featured Workspaces grouped by phenotype category (physical measurements, drugs, labs, personal and family health history, Phecode, and PhecodeX), where a graphical summary of each phenotype is available as its own individual notebook. For more details about how to use the Featured Workspaces, please see the ReadMe notebook for each Featured Workspace.
All by All phenotypes Google Sheet here. For a downloadable excel sheet, please see the link at the bottom of this article.
References
The All of Us Research Program Genomics Investigators. Genomic data in the All of Us Research Program. Nature 627, 340–346 (2024). https://doi.org/10.1038/s41586-023-06957-x
Karczewski, K. J., Solomonson, M., Chao, K. R., Goodrich, J. K., Tiao, G., Lu, W., Riley-Gillis, B. M., Tsai, E. A., Kim, H. I., Zheng, X., Rahimov, F., Esmaeeli, S., Jason Grundstad, A., Reppell, M., Waring, J., Jacob, H., Sexton, D., Bronson, P. G., Chen, X., Hu, X., Goldstein, J. I., King, D., Vittal, C., Poterba, T., Palmer, D. S., Churchhouse, C., Howrigan, D. P., Zhou, W., Watts, N. A., Nguyen, K., Nguyen, H., Mason, C., Farnham, C., Tolonen, C., Gauthier, L. D., Gupta, N., MacArthur, D. G., Rehm, H. L., Seed, C., Philippakis, A. A., Daly, M. J., Wade Davis, J., Runz, H., Miller, M. R. & Neale, B. M. Systematic single-variant and gene-based association testing of thousands of phenotypes in 394,841 UK Biobank exomes. Cell Genomics, (2022). https://doi.org/10.1016/j.xgen.2022.100168
Denny J. C., Ritchie M. D., Basford M. A., Pulley J. M., Bastarache L., Brown-Gentry K., Wang D., Masys D. R., Roden D. M., Crawford D. C., PheWAS: demonstrating the feasibility of a phenome-wide scan to discover gene–disease associations, Bioinformatics, Volume 26, Issue 9, May 2010, Pages 1205–1210, https://doi.org/10.1093/bioinformatics/btq126
Shuey M. M., Stead W. W., Aka I., Barnado A. L., Bastarache J. A., Brokamp E., Campbell M., Carroll R. J., Goldstein J. A., Lewis A, Malow B. A., Mosley J. D., Osterman T., Padovani-Claudio D. A., Ramirez A., Roden D. M., Schuler B. A., Siew E., Sucre J., Thomsen I., Tinker R. J., Van Driest S., Walsh C., Warner J. L., Wells Q. S., Wheless L., Bastarache L., Next-generation phenotyping: introducing phecodeX for enhanced discovery research in medical phenomics, Bioinformatics, Volume 39, Issue 11, November 2023, https://doi.org/10.1093/bioinformatics/btad655
Comments
0 comments
Article is closed for comments.