Make a phylogenetic tree

 

to add: itol, phyloT, timetree.org, phylopic.org, R, ggtree, ggimage

What to do

  1. Use NCBI TreeView to create a phylogeny for your 13 plus 1 species (Table 1)
  2. Save your tree as an image
  3. Save your tree in Newick form
  4. Add your 13 plus 1 tree (Newick form) to your UGENE project

Table 1. The fourteen species (13 plus 1) we have are

Alligator
Cat
Cattle
Chicken
Chimpanzee
Dog
Human
Macaque
Mouse
Lizard*
Opossum
Pig
Rabbit
Rat

where * is to remind you that Anolis carolinensis was the species name (others possible).

Table 2. Species names for our 14 taxa

Alligator mississippiensis, Felis catus, Bos taurus, Gallus gallus, Pan troglodytes, Canis lupus, Homo sapiens, Macaca mulatta, Mus musculus, Anolis carolinensis, Didelphis virginiana, Sus scrofa, Oryctolagus cuniculus, Rattus norvegicus

 

I have a list of species, how do I get a phylogeny?

We will collect sequences of proteins for different species, then create a gene tree in UGENE (Lab 3: Thirteen + 1 species). However, gene trees (i.e., a “phylogeny” of species based on variation at just one gene) may differ from the true phylogeny for a number of reasons (Swenson 2012), including

  • genetic polymorphism (i.e., sequence in the database does not capture variation in the species)
  • the homologous genes are not actually orthologs (similar because of shared ancestry), but rather, are paralogs, similar because of convergent evolution
  • rates of evolution may differ among the lineages

Gene tree & phylogeny reconciliation between gene trees and “true” phylogeny would allow us to study which of the three conditions listed above may be at play, and is of interest to us in determining which of our GOI gathered in the class from the GWAS database are likely to be functionally important causes of the phenotype differences.

To make a phylogeny, we would want to incorporate all available information, not just from a single gene. While in the vast majority of cases we can’t know what the true phylogeny is, we can construct a consensus tree. Consensus trees display the tree supported by all/most of the evidence.

The purpose of this exercise is for you to take advantage of the resources available by calling species from iTOL via phyloT or other interfaces.

Note: An alternative site for making a consensus tree is timetree.org, although you must upload a text file with your list of species names (not common names), so it is a little less straightforward compared to NCBI taxonomy browser.

NCBI Taxonomy can be used for constructing a consensus tree. Once at the NCBI taxonomy browser site, you have two choices:

  • Add each species one at a time or
  • Provide a text file with all of the species listed, one species per line.

Given that you have just 14 species to work with (listed for your convenience in Table 1 and Table 2), adding one at time might be better to start with because there can be differences between what we call a species and what the Taxonomy Browser has in its records (Note: I have verified that all of our species are available in Taxonomy browser, but as a hint, you may have to play around with what they are called in order to complete the exercise!)

SOP R and iTOL

[draft, needs updating]

The following R code takes a list of taxa, interacts with iTOL, and retrieves a phylogenetic tree. Code assumes R is locally installed and ape and rotl packages installed. Tree is mid-rooted, but no branch lengths; use Grafen’s method (Grafen 1989) to provide arbitrary, scaled branch lengths. (For more abut this function, call the help page in R by typing ?compute.brlen at the prompt.) Package easycsv is not required, but provides os-independent choose directory function.

#modified from https://cran.r-project.org/web/packages/rotl/vignettes/rotl.html

library(ape)
library(rotl)
library(easycsv)

#Check working directory
#getwd()
#setwd(easycsv::choose_dir())
myTaxa <- c("Alligator mississippiensis", "Bos taurus", "Canis lupus", "Didelphis marsupialis", "Felis catus", "Gallus gallus", "Homo sapiens", "Lepus timidus", "Macaca sylvanus", "Mus musculus", "Pan troglodytes", "Rattus rattus", "Sus scrofa", "Zootoca vivipara")
resolved_names <- tnrs_match_names(myTaxa)
my_tree <- tol_induced_subtree(ott_ids = resolved_namesott_id) plot(my_tree)  #replace tip labels with common names; order must be same as myTaxa object new_tiplabels <- c("Alligator", "Cattle", "Dog", "Opossum", "Cat", "Chicken", "Human", "Rabbit", "Macaque", "Mouse", "Chimpanzee", "Rat", "Pig", "Lizard") my_treetip.label <- new_tiplabels
plot(my_tree)

#set branch lengths
my_tree <- compute.brlen(my_tree, method = "Grafen", power = 1)
plot(my_tree)

#print tree code to view
write.tree(my_tree, file="")

#save tree to file
write.tree(my_tree, file = "my_tree.nwk")

Original tree from iTOL

Figure 3. Tree retrieved from iTOL by R package rotl.

Updated tree with common names

Figure 4. iTOL tree after tip name and branch (edge) length updates.

 

SOP Timetree

Our consensus phylogeny was generated from timetree.org , with added bonus that calibrated divergence times will be included. That is, instead of branch lengths as expected change, the branch lengths correspond to time in millions of years.

To make a tree, create and save a text file (e.g., Notepad or TextEdit), with a list of species, one species per line (e.g., from Table 2). Save the file as “timetree_14spp_list.txt” (without the quotes). Next, go to timetree.org; portion of homepage shown in Figure 5.

Screenshot timetree.org homepage

Figure 5. Screenshot of portion of timetree.org (May 2022).

Scroll down the homepage to find Load a List of Species (Fig. 6)

Screenshot timetree.org

Figure 6. Screenshot from timetree.org

Click Choose File button, select the text file you created with the list of species. Click Upload button once file is selected. Figure. 7 shows portion of results page.

Screenshot timetree.org results

Figure 7. Screenshot results from timetree.org

We want the Newick file, the code for the tree shown in Figure 7. Scroll down the results page, locate Export Tree, and click on To Newick File.

Screenshot timetree.org

Figure 8. Screenshot timetree.org, Export Tree.

Timetree.org will save the Newick file, accessing your os file system so that you can save the file to location on your computer (e.g., your working folder).  The default name same will be same as the text file you used plus the .nwk file extension (e.g, timetree_14spp_list.nwk).

Once we have the Newick code in hand, we can use any tree viewer app. The phylogenetic tree is shown in Fig. 9A (with Newick string in Fig 9B), was drawn in R with the ape package.

Consensus phylogeny from timetree.org

Figure 9A. Consensus phylogeny for our 14 species, compiled from data available at timetree.org. Branch lengths not displayed. Newick text in Figure 9B. 

((Lizard:279.65697667,(Chicken:236.50266286,Alligator:236.50266286):43.15431381):32.24694470,(Opossum,(((Cat:54.32144118,Dog:54.32144118):23.43351523,(Cattle:61.96598852,Pig:61.96598852):15.78896789):18.70743276,((Rabbit:82.14079889,(Rat:20.88741740,Mouse:20.88741740):61.25338149):7.68238853,(Macaque:29.44154682,(Chimpanzee:6.65090500,Human:6.65090500):22.79064182):60.38164060):6.63920175):62.13519841):153.30633379);

Figure 9B. Newick code for tree in Figure 9A. 

Update Spring 2022, changed Cow to Cattle in the Newick file, but did not update the figures.

The branch lengths in the consensus tree (Fig 1A and 1B) are divergence times in millions of years (mya), and are not comparable to the gene tree branch lengths , which are genetic distances. You’ll need to remove the branch lengths before trying the reconciliation steps, which was done for you (Fig. 6)

((Lizard,(Chicken,Alligator)),(Opossum,(((Cat,Dog),(Cattle,Pig)),((Rabbit,(Rat,Mouse)),(Macaque,(Chimpanzee,Human))))));

Figure 6. Newick code for consensus tree without divergence time branch lengths

SOP NCBI Taxonomy

This generates a phylogeny with many unresolved polytomies and is not our best choice.

Step 1. Go to NCBI Taxonomy and then select “Common Tree. (red arrow, Fig 3).

1

Figure 3. Homepage of NCBI Taxonomy

Step 2a. After clicking on “Common Tree,” you will land on a new page titled “Taxonomy Browser” (Fig. 4). Enter the first species name (Enter name or id) into the empty box (Fig 4 shows “Alligator”), the click the Add button. Figure 5 shows the results: Alligator is added to the Taxonomy Browser listing.

2

Figure 4. Partial screen shot of Taxonomy Browser page, with our first species entered into the text box, before clicking on Add button.

Step 2b. For each of the next 13 species, repeat Step 2a. Species are added to the Taxonomy Browser window (Fig 5)

3

Figure 5. After adding “Alligator,” we see the selection in the Taxonomy Browser list window, along with an option to remove the taxa from the list (lower portion of the screen).

Step 2c. Alternatively, instead of one at a time, you can try a text file with all 14 species named, one on each row. After creating the text file (Notepad or TextEdit, save as text only), from the homepage of Taxonomy Browser (Fig. 4), click on the “Choose file” button, and select from your computer the text file (e.g., Fig. 6).

2

Figure 6. Screen shot of my Finder window; you can see the file I want “spp14names.txt.” Highlight your file, then click “Open” button.

After adding all of your species, whether one at a time or all at once from a text file, you’ll have a filled browser list like the one shown in Fig. 7. Once you are at this step, you may need to add or modify the listings to match our Table 1 listings. After editing is done, you are ready to complete the exercise.

4

Figure 7. A completed listing of species, now ready for tree building.

Step 3. What we aim to do now is to download a file that we can view in another program called TreeViewer. From the Taxonomy Browser menu, uncheck the box “include ranked taxa,” then select “phylip tree,” and click on the “Save as” button (see Fig. 8).

Figure 8. Screen shot of Taxonomy browser, shows location of buttons and options needed to complete Step 3.

Step 5. Find the downloaded file on your computer. It will be called “phyliptree.phy”. It’s just a text file, so if you want, you can view it in your text editor. A portion of the file is shown in Fig 9.

phylip.phy

(
(
(
(
(
‘Bos taurus’:4,

Figure 9. Portion of a phyliptree.phy text file as seen in a text editor.

Step 6. To view the file, there are several options. Many tree viewing programs could work with this file, unfortunately our UGENE needs a different format (Newick), so we will view our tree with a simple choice — use NCBI Treeviewer (new to NCBI as of Fall 2015), then convert the file into the Newick file format that UGENE prefers.

Figure 10 shows a screenshot of the home page. Note that the homepage is confusing. We want to find a button called “Upload”, about half-way one the page (red arrow, Fig. 10)

1

Figure 10. Homepage of TreeViewer; look for the “upload” button to start.

Step 7. Load your phyliptree.phy in the popup menu (Fig. 11); Browse your computer to locate the file, then click “Upload” button when ready. Note that I selected Data file in the browser.

2

Figure 11. Popup menu in TreeViewer used to select and upload a phylip tree data file.

Step 8. Once you have the tree loaded, explore the options in the program to change the view to a Rectangle Cladogram. You should also explore how to swap branches to improve the readability of your tree. (e.g., partial menus shown in Fig. 12 & Fig. 13).

4

Figure 12. Partial view of Tool menu menu to change how the tree looks

6

Figure 13. Partial view of Tool menu menu to change how the tree looks; try with and without distance

 

Step 9. Download an image of your tree as a pdf file.

Step 10. Download your tree as a Newick file.

5

Figure 14. Partial view of Tool menu menu for download options how the tree looks.

 

Update: We’re not quite done! Although you have a newick-styled tree file on your computer, TreeViewer adds names to the internal tree nodes, which UGENE rejects. So, you’ll have to do some modest editing of the tree.nwk file with your text editor. Instructions are provided at Working with Newick and other phylogeny file formats, which is located in the #section-6″>Bioinformatics Resources and Help section of this website.

Step 11. Once you have a standard Newick file, start UGENE, open your Project, Open as… your newly created phylogeny from NCBI. UGENE should provide a context menu to confirm the file format (Fig. 15); load the file, and save to your project. View the tree in UGENE.

Figure 15. File format menu in UGENE

You will need this tree later in the Project when you compare your gene tree against the known phylogeny (reconciliation).

Questions

Quiz: Trees and Newick files

References

Efron, B., Halloran, E., & Holmes, S. (1996). Bootstrap confidence levels for phylogenetic trees. Proceedings of the National Academy of Sciences93(23), 13429-13429. link

Grafen, A. (1989). The phylogenetic regression. Philosophical Transactions of the Royal Society of London. B, Biological Sciences326(1233), 119-157.

Lemey, P., Salemi, M., & Vandamme, A. M. (Eds.). (2009). The phylogenetic handbook: a practical approach to phylogenetic analysis and hypothesis testing. Cambridge University Press. (hint: a little snooping on your part and you’ll find a pdf of the entire book)

Pamilo, P., & Nei, M. (1988). Relationships between gene trees and species trees. Molecular biology and evolution5(5), 568-583. link

Swenson, K. M., & El-Mabrouk, N. (2012, December). Gene trees and species trees: irreconcilable differences. In BMC bioinformatics(Vol. 13, No. 19, p. S15). link

https://en.wikipedia.org/wiki/Evidence_of_common_descent

Yang, Z., & Rannala, B. (2012). Molecular phylogenetics: principles and practice. Nature reviews genetics13(5), 303. link

Yu, Guangchuang. Data Integration, Manipulation and Visualization of Phylogenetic Trees at https://yulab-smu.top/treedata-book/index.html

 

/MD