Protocol: GWAS and the PheGenI Database

Premise of Report 2 work

Complex disorders are quantitative traits. Quantitative traits are impacted by many genes, each with small effect (contribution).

How to study genetics of complex phenotypes.

We need techniques to identify variation for these genes that contributes to the disorder.  GWS studies can do this. “A genome-wide association study (GWAS) is an approach used in genetics research to associate specific genetic variations with particular diseases” (genome.gov). GWAS studies correlate genetic variation (presence/absence of SNP within or near genes) to differences in phenotype. For example, individuals with cancer and individuals without cancer are scored for many SNP. Statistical tests then applied to see if one or more SNP variants are associated with having the condition (e.g., SNP rs81002885 BRCA2 associated with increase risk breast cancer, ClinVar). Hundreds of these kinds of studies have been done, and results are shared to various databases at NCBI.

Use of PheGenI database

The PheGenI database links to these GWAS studies and provides search (query) of SNP correlated with various disorders. This BI308L page outlines how to use the database.

Screenshot of PheGenI homepage

PheGenI_01.png

Protocol

1. Choose a phenotype that you are interested in.  Hundreds of phenotypes are available, listed from A  (1-Alkyl-2-acetylglycerophosphocholine esterase) to Z (ziprasidone). You may chose any phenotype that interests you, but I recommend strongly that you confirm that there is documentation about the phenotype (e.g., check for a good Wikipedia page).

To select a phenotype, find the “Phenotype selection” box and click on the “Browse” button. This will bring up a popup selection box.

phegeni02.png

2. Scroll through the Available traits or select category (e.g., neoplasms), then select a trait (e.g., Wilms tumor, confirmed a decent Wikipedia page available). Click “Apply” button to proceed.

screenshot phegenI select phenotype

3. Return to the homepage for PheGenI. You should see that your phenotype is selected (screenshot, red arrow). Click “Search” button (screenshot, blue arrow) to start searching for associated SNP

screenshot phegenI, phenotype selected

4. Screenshot of results. From this screen you will find the the number of associations (i.e., number of correlated SNP with the disorder); in this example, total number of associations is 7.

screenshot phegeni results

5. Choose an SNP. Pay attention to the context or location of the SNP (green arrow) you choose. For our work I want you to select SNP that are found within a gene. This is the “context”. In this example, 4 out of 7 SNP located introns. The others are outside of our definition of a gene (intergenic, neargene). What other context may you see? Look at the context of where the SNP was found in relation to the gene. Here’s a list of context terms you may find (source: https://gvs.gs.washington.edu/GVSBatch138/HelpSNPSummary.jsp)

Function: If the SNP has been given a function by dbSNP, that classification is used and “(dbSNP)” is added to the text:

stop-gained or stop-lost (within an exon and translated, non-stop codon changed to stop codon or stop codon changed to non-stop codon)
frameshift-variant (within an exon and translated, insertion or deletion interrupts the reading frame)
cds-indel (within an exon and translated, insertion or deletion keeps the reading frame)
missense (within an exon and translated, protein amino acid change, but not nonsense or frameshift)
splice-donor-variant (two locations at the 5′ end of an intron)
splice-acceptor-variant (two locations at the 3′ end of an intron)
synonymous-codon (within an exon and translated, no protein amino acid change)
utr-variant-5-prime (within an exon, but not translated, 5′ end of the gene)
utr-variant-3-prime (within an exon, but not translated, 3′ end of the gene)
upstream-variant-2KB (upstream of the gene)
downstream-variant-500B (downstream of the gene)
intron-variant (between exons)
nc-transcript-variant (transcript variant of a non-coding RNA gene)

If the SNP has not been given a function by dbSNP, the SNP is classified according to the location of the gene and its transcription and coding boundaries, and “(GVS)” is added to the text:

stop-gained or stop-lost (within an exon and translated, codon change to or from a stop codon)
coding-indel (within an exon and translated, variation is an indel, and no attempt is made choose frameshift or not)
missense (within an exon and translated, protein amino acid change)
splice-5 or splice-3 (in first two bases or last two bases of an intron)
coding-synonymous (within an exon and translated, no protein amino acid change)
coding-notMod3 (within an exon and translated, number of coding bases is not a multiple of 3, and no attempt is made to rate as synonymous or not)
coding-monomorphic (within an exon and translated, all genotypes in the database are the same, and no attempt is made to rate as synonymous or not)
utr-5 or utr-3 (within an exon, but not translated)
near-gene-5 or near-gene-3 (within 2000 bases of an exon, upstream or downstream of a gene)
intron (between exons)
intergenic (between genes)

6. You should now have all of the information you need to submit your work. Again, make sure you have documented your work in your notebook

Submit results in three places

Submit PheGenI results: Phenotype of interest

Submit PheGenI results: SNP associations

Submit PheGenI results: Gene of interest