Identify phenotype and genotype of interest from PheGenI

About this page: Nine images on this page. Depending on your available internet speed, the page may take awhile to load. If it appears that images are missing, hit your browser “refresh.”

Objective of PheGenI Brief:

To demonstrate an understanding of how Top → Down genetics hypotheses are tested.

To demonstrate use of PheGenI, Phenotype-Genotype Integrator, and related databases (dbSNP, Genes), to investigate known linkages between genes and other DNA element variation and phenotype variation.

See Protocol: GWAS and the PheGenI Database

What you turn in from this lab by start of Week 3

Overview of this lab work

This is our first steps towards completing Report 2: Bioinformatics project — Overview

You will select a(n)

  1. phenotype of interest (POI)
  2. associated SNP from a GWAS database by use of PheGenI query.
  3. gene of interest (GOI) also known as a candidate gene.

Background for Report 2

To relate how variation in human sequences may be evaluated against the evolutionary context we’ll introduce the idea that some genes are more important than others. For example, some gene products, and therefore their genes, are essential to an organism’s development, survival, or reproduction, others not so much. Essential genes, like those operating during early development (HOX genes), are expected to vary less than nonessential genes. In other words, essential genes are expected to have lower rates of evolution compared to nonessential genes (Kimura & Ohta 1974). Because all genes are descendants of successful copies, after all, permitted variation (or just neutral variation) has already been selected, whereas deleterious variation has been removed from the population. Moreover, genes not strongly selected against are expected to evolve predominately by genetic drift, with genetic distances proportional to the amount of time two species last shared a common ancestor. In other words, some proteins are expected to have evolved more or less constant with time: the longer the time, the greater the genetic distance. This is the essence of the Molecular Clock hypothesis. Human housekeeping genes, unregulated genes essential for cell function, appear to have evolved at lower rates than tissue specific genes (compared to mouse, Zhang and Li 2004). By comparing homologous genes in different species we expect different rates of evolution for essential genes compared to nonessential genes. The prediction of rate differences, while a simple and intuitive explanation, is not necessarily complete (see Zhang and Yang 2015). However, the objective is to get you thinking about how to evaluate genetic variation observed in humans against the evolutionary history in which that variation acts.

In Report 2 your class will collect sequence information on a set of species, including humans, and calculate evolutionary rates of change for sets of proteins from genes associated with phenotype differences in humans.

Background on genetic analysis of complex phenotypes

While many phenotypic characteristics vary discretely (blue eyes, brown eyes, diabetes yes or no), other traits show quantitative differences among individuals.  Classic examples include height and weight.  For example, individuals who say they are “five feet eight inches” tall may differ by a few millimeters.  In studying the genetic basis of differences in phenotypes, the possibilities range from classical Mendelian in which one or a few genes account for differences in phenotype (e.g., one gene, a handful of alleles, in which the alleles show high penetrance and therefore account for most of the differences in phenotypes), to classical quantitative genetics, where 100’s of genes each contribute to phenotype differences.  This range is described as genes with major effects to genes with minor effects.  Building on the works of decades of genetic work, new technologies and statistical approaches allow researchers to look for associations between genetic differences across the genomes of individuals and phenotype differences. While there are a number of problems with GWAS approaches (see nice article in Wired), these studies can be useful to help identify candidate genes. We call these studies Genome Wide Association Studies (GWAS) (Noorgard 2008, Visscher et al 2017).

In short, GWAS looks at single nucleotide polymorphism SNP variation among individuals who differ for a particular phenotype (Noorgard and Schultz 2008).  Genetic imputation, statistical test of unobserved genotypes with known haplotypes to test an association between phenotype (trait) of interest and an untyped genetic variant. Identifying such links between genetic variation and phenotypic variation, then, is used to reveal potential targets of intervention, either from the perspective of reducing risk (don’t forget, Environment!), or to inform therapy strategies.

Work to do

And it is to this kind of project we turn to now. Your task is to query a database to identify genes with SNP variants that have been statistically associated with variation in a particular phenotype.  We will come up with the list of phenotypes in class.

When you’ve selected your phenotype, report the total number of associations and the total number of SNP from the Search Summary page (Fig. 1)

Screenshot PheGenI Search Summary

Figure 1. Screenshot PheGenI Summary

For our example, Blood pressure, a total of 4662 associations were reported, 17 genes, and 29 SNP. You are expected to spend sometime here, to record what you see, explore the links and be able to explain what the results page offers. For a phenotype of interest like Toxoplasmosis, that’s easy: there’s only one association. But for Blood Pressure, there’s over 4600 associations to go through! Clearly, you need to be better at working with results than trying to do this manually. PheGenI includes a Download link to the Associations table results. This will place a text file, tab-deliminated, called PheGenI_Association.tab, into your Downloads folder. Import it to your spreadsheet program. You can run a Pivot table to get the information you need.

When you’ve selected your SNP, gather information about the SNP, including location (context) and whether anything is known about it’s clinal significance. You get this information from dbSNP. For our example, the snp was rs1957757, select on the link from the PheGenI results page (Fig. 2).

Screenshot from PheGenI

Figure 2. Screenshot PheGenI association results.

Results for rs1957757 are shown in Fig. 3.

Screenshot dbSNP

Figure 3. Screenshot dbSNP

From the dbSNP results page we can see that nothing has been reported for clinical significance for this SNP (“Not reported in ClinVar). We can also record information about the SNP. For example, the SNP alleles are distinguished by T (the typical allele) to C. Frequency in the populations studied range from f(T) 0.11 to 0.28 (Fig. 3).

Additionally, your task is to identify one gene, your gene of interest or GOI, and learn what you can about it:

  • sequence length
  • exons
  • notable protein domains
  • location in the genome
  • describe it’s function
  • other?

You will want to include a visit to Wikipedia for your gene, as discussed in class (also, see below for Wikipedia Pageviews). The official site for information about your gene comes from NCBI Gene. Click on the gene’s link in the PheGenI results (Fig. 2). Note that the amount of information from NCBI gene for your gene is extensive. Spend some time moving your cursor around. For example, hover over the cursor

In following weeks you will expand these findings by additional bioinformatics approaches.

For more information about GWAS, start at the Wikipedia site, then check out Manolio (2010) and additional references listed at the end of this page.

Our Report 2 will take you from phenotype to sequence variation in genes of interest to an evolutionary perspective to evaluate whether or not the variation seems unique. After selecting phenotypes and genes of interest, we’ll look at population level variation in the context of functional DNA elements. We’ll complete our study by incorporating a phylogenetic perspective towards variation.

 

Lab Requirements — What is expected of you

Evidence that you did the work. Evidence that goes beyond simply the end product (slides from a PowerPoint). Therefore, I expect to find, in your notebooks, evidence! Make sure you cover the basics: purpose of the lab, identify the hypotheses and how you tested them, what kind of data your collected and how you analyzed or interpreted the data against the hypotheses.  Include a summary of your findings.  Questions from these worksheet will be available in online quiz form for you. Answer the questions and make sure you document sources of information.

Resources to do this lab

1. Get Phenotype of interest (POI), SNP, and Gene of interest (GOI) from https://www.ncbi.nlm.nih.gov/gap/PheGenI

Brief protocol: Start typing your selected phenotype (or Browse, scroll to select a phenotype) into the “Phenotype Selection” box.  As you type, a drop-down list should show up — select the term that most closely represents your phenotype.  After the phenotype has been entered, then press the “Search” button.  A list of results will follow.  In addition to meeting the requirements of the lab, you are encouraged to explore the options on the query page of PheGenI and of the many links from the results pages.  For example, what happens when you click on the “Browse” button instead of the “search” button on the first page of PheGenI?  In your notes, simply identify the available sections, describe what you see, and follow some of the links — When you scroll down the results page, what are the major sections? For a more detailed protocol, please see Protocol: GWAS and the PheGenI Database

2. For your citation numbers, report POI (HuGE disease index and Wikipedia page views) and for GOI report (HuGE Filtered by GENE and Wikipedia page views)

Submit results

After completing work for your

  • Phenotype of interest (POI), plus HuGE Literature Finder and Wikipedia page view results
  • SNP and SNP associations
  • Gene of interest (GOI), plus HuGE Literature Finder and Wikipedia page view results

submit your work at the appropriate

Submit PheGenI results: Phenotype of interest

Submit PheGenI results: SNP associations

Submit PheGenI results: Gene of interest

References

Kimura, M., & Ohta, T. (1974). On some principles governing molecular evolution. Proceedings of the National Academy of Sciences71(7), 2848-2852. link

Manolio, T. A. (2010). Genomewide association studies and assessment of the risk of disease. New England journal of medicine, 363(2), 166-176. DOI: 10.1056/NEJMra0905980

Marchini, J., & Howie, B. (2010). Genotype imputation for genome-wide association studies. Nature Reviews Genetics11(7), 499. link

Norrgard, K.(2008) Genetic variation and disease: GWAS. Nature Education1(1):87 link

Norrgard, K., & Schultz, J. (2008). Using SNP data to examine human phenotypic differences. Nature Education1(1), 85. link

Visscher, P. M., Wray, N. R., Zhang, Q., Sklar, P., McCarthy, M. I., Brown, M. A., & Yang, J. (2017). 10 years of GWAS discovery: biology, function, and translation. The American Journal of Human Genetics101(1), 5-22. DOI: 10.1016/j.ajhg.2017.06.005

Zhang, L., & Li, W. H. (2004). Mammalian housekeeping genes evolve more slowly than tissue-specific genes. Molecular biology and evolution21(2), 236-239. https://doi.org/10.1093/molbev/msh010

Zhang, J., & Yang, J. R. (2015). Determinants of the rate of protein sequence evolution. Nature Reviews Genetics16(7), 409-420. doi: 10.1038/nrg3950

/MD