Use of ORF Finder: Practice and Worked example
This assignment supports Bioinformatics II
Purpose
Learn how to use ORF finder to identify candidate genes from DNA sequence.
Definition: ORF, the nominal gene, defined based on similarity to other known genes: Start/Stop, sufficient length of nucleotides (nt) to form a likely protein sequence.
What to do
Read through and try on your own to duplicate the results from the example sequence
Try on your own — see bottom of this page, “What to do, what to turn in”
Resources
- link to ORF Finder, https://www.ncbi.nlm.nih.gov/orffinder/
- Use of UGENE.
1. Example, finding an ORF
Do we have a gene?
Copy/paste sequence in FASTA format to ORF Finder
Look for START (ATG) and STOP (TAG, TAA, or TGA) >random sequence 1 acagacctctattaccgttcccgatccacagggtggcggcattataaaccaacacgtacagccccccctctttactgcccacgaagcgcaccgtccctcgaatcgcggcgtctgcggaattgctatatttacagggcgtcaacaagcacacgcttcagacacgcttaaggtgtttccggcgtaccgaaaacgactgtttcctaggcgtgccgttcctagtaccttcgcttactgcaggtcccgatggcttgggagcggtagagatattctatcttgcttgcggttcgtcaacctatacataccgacatccgctgtcataccatcctcaaggaaagcctacattagttctatgcagccgcgtgagtaaaggggggagacgagagccctccggaagggatagcattacagagctgggggtacattaaacagctctcacagccactcacccatcgtcgaacggtggtcgtcctttccgtgaatcccgcaggctgttacggtctagagattcggtggagtagcactagggggtgtattgttaggtgagaggtacaaacgcttcaagtccctaatggattctcgctcgagagccggtcttcaccccgggctacaccgacggaactctcagcttctccaccatgatagcacccttgagcattcatctaatcaccgttttggtgggggataaggcttaaccgtccaagtcgccacagtccagtgcagtcggttaggatttggcgcaaattgggaagaaggcattccgatggttatacgagatatcagcaagtagatcgaagcggaatacgggtggtgtgtggaatgactacgcggaaagaacacgtcgcccatctgtagaaaaacacttttatatccctattaactgtgacattggattgtcgaggtcgtcgctcaagtcgtcaggtcattaatctaataacctggcgaataaccaggcgagcggataggcgatgcacaggacgtccttaccatgatatgatgccgt
ORF Finder screenshot
Figure 1. Screenshot of homepage of ORF Finder
There are not a lot of options. Try different ORF lengths. This example, shown in Fig 2, set ORF length to 30 nucleotides (nt). You should explore other genetic codes beside the standard. You should also consider whether or not your sequence should be evaluated as a nested gene, a gene sequence contained entirely within another gene. For example, micro RNA genes are sometimes located in the introns of other protein coding genes.
Figure 2 provides screenshot of results from ORF finder.
Figure 2 Screenshot of results from ORF finder on our random sequence 1.
Or, the second possibility, ORFfinder does find an ORF
Figure 3. Screenshot, ORFfinder found ORFs.
2. If the ORF Finder Does Not Find ORFS
Then check for conserved domains using a different website at
https://www.ncbi.nlm.nih.gov/Structure/cdd/wrpsb.cgiLinks to an external site.
Screen shot
Figure 4.
Figure 5.
3. UGENE ORF finder
Import sequence from file, or enter directly via File > New document from text…
Figure 4.
Figure 5.
Actions > Analyze > Find Orfs…
Figure 6.
Figure 7.
Figure 8.
Figure 9.
We summarize the results: eleven ORFs were identified, ranging in length from 42 to 183 nt. Putative proteins ranged in size from 13 to 60 amino acids.
Next steps could be to run BLAST on one or more of the ORFs. We can select them one at a time by clicking on the subset table at lower right of the screen (Fig. 2).
What to do, what to turn in
- Generate your own random 1000 nt long DNA sequence at https://www.bioinformatics.org/sms2/random_dna.html
- You can use this sequence for Virtual Ribosome exercise, too, or generate a new one for each exercise
- Paste your sequence in FASTA format to ORF Finder, set ORF length to 30, then investigate for presence of ORFs.
- Copy and paste this same random sequence into the text submit box, no screenshots (so I can recreate your work).
- Capture a screenshot like the one in Figure 2. If no ORFs found, report as such.
- Select one ORF and do BLAST search. Report your results.
- Your responses to other questions in this handout should be placed in your notebook.