Change root for gene tree
This post is about how to change your gene tree from an undirected tree to a directed tree by setting the root for your gene tree. You must set the root before proceeding to working with the molecular clock hypothesis, rate tests, or tree reconciliation.
What to do
Root your gene tree by selecting the outgroup, then export the new Newick file, to be used in subsequent exercises (e.g., Tree reconciliation)
Background
“Rooting” a gene or phylogenetic tree applies a hypothesis of evolution among the taxa displayed. To root a tree — a directed tree — implies that each node in the tree with descendants represents the inferred most recent common ancestor among the descendants. Gene and phylogenetic trees in Newick format are not rooted by default; they are undirected trees. (Technically, Newick format automatically roots the tree using a midpoint root, which is completely arbitrary.) While unrooted trees show the relationships among the taxa, they do not make assumptions or claims about the direction of change, i.e., one cannot look at the tree and determine which character state is ancestral or which is derived. For example, if two species differ at a position in a sequence by one nucleotide, A or G, and the tree is not rooted, then neither nucleotide can be said to be ancestral. Directed trees allow the assertion of which state is ancestral, in this case, the A or the G. Directed trees are accomplished by setting an outgroup. An outgroup is a taxa distantly related to other taxa in the study.
In our study, we have have 14 species represented, and the consensus tree is shown in Figure 1.
Figure 1. Consensus tree for our 14 species.
Our taxa included 11 mammals and three reptiles. The outgroup is the reptile group. This evolutionary hypothesis is supported by much evidence (Carroll 1982), all independent of the protein sequences included in our study.
Note that the grouping reptiles is what we call a paraphyletic group. Just looking at them, you would be more inclined to group lizards with alligators than with birds (Aves). However, alligators share a more recent common ancestor with birds. Thus, in our study, reptiles form the outgroup to the mammals. In contrast to paraphyletic reptile group, Mammals is a monophyletic group: all members of the group share a common ancestor.
Once an outgroup is established for a gene tree (or a phylogenetic tree), then direction of chance for character stats can be established. In our example, if the outgroup has A at the position, then A is considered the ancestral state and G would be the derived state.
How to set outgroup in UGENE
You submitted an unrooted Newick tree in the previous lab. While there are better apps than UGENE for editing trees, you can accomplish the needed changes in UGENE. Consider the gene tree for HIF1A (Fig. 2)
Figure 2. Screenshot of Bayesian gene tree, HIF1A, from UGENE. Tree is arbitrarily rooted at midpoint.
With the tree window active in UGENE, move the cursor to the ancestral node for the taxa you wish to set as the outgroup. Figure 3 shows just the nodes with the reptiles. The red arrow points to the node we will select to indicate the outgroup.
Figure 3. Screenshot of subset of tree from Fig. 2. Red arrow points to the node we will select to set the root.
Click on the node to select the outgroup. Next, right-click to bring up a context menu (Fig. 4). Select “Reroot tree”. The tree will display in rooted form (Fig. 5)
Figure 4. After right-click, the context menu is displayed. Select Reroot tree from the options.
The directed tree is shown in Figure 5.
Figure 5. The now directed tree after setting the root.
Note that negative branch lengths may appear. Negative branch lengths are biologically impossible. However, for our next analyses, this will be OK for our analysis. The edge lengths represent the statistical distance among the groups given a rooted tree. (The solution would be to set the edge lengths to zero for any negative branch lengths. Please do not do this for your project, it introduces other complications that are beyond the scope of our project.)
Save the revised Newick file for your gene tree
Finally, export the new Newick file. From the Object window in UGENE select the Newick file, then right-click to bring up the context menu (Fig. 6). Select Export document.
Change the filename to indicate the tree is rooted (Fig. 7). If necessary, update the location on your computer to store the file. The default is to Add to project, but this is not necessary, all we want is the file.
You now have the file you will need for the next steps in the project, e.g., Tree Reconciliation and Rate tests
References
Carroll, R. L. (1982). Early evolution of reptiles. Annual review of ecology and Systematics, 13(1), 87-109.
Old material — Do not read below this text
How can we properly root this tree? UGENE does not provide a way to root the tree. Thus, we need some other software to help. In theory you can simply open the newick file in your favorite text editor (it is just a text file despite the *.nwk extension name!), then make the changes you need to the Newick code — this amounts to switching “)” around. It is relatively straightforward to do for fewer than four OTU, but quickly becomes quite difficult above that unless you are an expert in such things. Fortunately many good folks have worked hard to provide a variety of applications to make this work rapid and straightforward. There are many options here, but two are easy to use and freely available and run on macOS as well as Windows PCs: Njplot and Treegraph2.
The rest of this handout covers use of Njplot to change the rooting of the tree.
Njplot
On the NSM Macbooks, Njplot is installed in a folder (njplot) located in the Applications folder.
Here, I’ve loaded the tree from UGENE shown above; look for the file with a “*.nwk” extension. Here is a screenshot with the same default tree loaded.
We want to “reroot” the tree to make the Macaques, our OUTGROUP, the basal member of this tree. Click on the bubble next to “New outgroup.” Here’s the result — note that the “#” has been added to the tip names.
To make Macaques the OUTGROUP, click on the # next to the Macaque tip name. Here’s the result.
The tree has now been properly rooted.And, to return to our original question — we can see that humans and chimps clearly were clustered together.
You can proceed to format the tree (e.g., change font, size, etc.), add branch lengths, and swap nodes to improve the look of the tree.
Get Njplot from this site: http://doua.prabi.fr/software/njplot
Treegraph
Treegraph2 is a relatively new Java implementation, with many nice features. It is available for both Macs and Windows PCs at http://treegraph.bioinfweb.info/.
In brief, to accomplish re-rooting in Treegraph after loading the file simply click on the branch of the OUTGROUP, then press and hold the key combination Command+R (Macs) or Ctrl+R (Windows). Many additional features are available.