Use of ORF Finder: Practice and Worked example
This assignment supports Bioinformatics II
Purpose
Learn how to use ORF finder to identify candidate genes from DNA sequence
What to do
Read through and try on your own to duplicate the results from the example sequence
Try on your own — see bottom of this page, “What to do, what to turn in”
Resources
link to ORF Finder, https://www.ncbi.nlm.nih.gov/orffinder/
Definition: ORF, the nominal gene, defined based on similarity to other known genes: Start/Stop, sufficient length of nucleotides (nt) to form a likely protein sequence
Example, finding an ORF
Do we have a gene?
Copy/paste sequence in FASTA format to ORF Finder
Look for START (ATG) and STOP (TAG, TAA, or TGA) >random sequence 1 acagacctctattaccgttcccgatccacagggtggcggcattataaaccaacacgtacagccccccctctttactgcccacgaagcgcaccgtccctcgaatcgcggcgtctgcggaattgctatatttacagggcgtcaacaagcacacgcttcagacacgcttaaggtgtttccggcgtaccgaaaacgactgtttcctaggcgtgccgttcctagtaccttcgcttactgcaggtcccgatggcttgggagcggtagagatattctatcttgcttgcggttcgtcaacctatacataccgacatccgctgtcataccatcctcaaggaaagcctacattagttctatgcagccgcgtgagtaaaggggggagacgagagccctccggaagggatagcattacagagctgggggtacattaaacagctctcacagccactcacccatcgtcgaacggtggtcgtcctttccgtgaatcccgcaggctgttacggtctagagattcggtggagtagcactagggggtgtattgttaggtgagaggtacaaacgcttcaagtccctaatggattctcgctcgagagccggtcttcaccccgggctacaccgacggaactctcagcttctccaccatgatagcacccttgagcattcatctaatcaccgttttggtgggggataaggcttaaccgtccaagtcgccacagtccagtgcagtcggttaggatttggcgcaaattgggaagaaggcattccgatggttatacgagatatcagcaagtagatcgaagcggaatacgggtggtgtgtggaatgactacgcggaaagaacacgtcgcccatctgtagaaaaacacttttatatccctattaactgtgacattggattgtcgaggtcgtcgctcaagtcgtcaggtcattaatctaataacctggcgaataaccaggcgagcggataggcgatgcacaggacgtccttaccatgatatgatgccgt
ORF Finder screenshot
Figure 1. Screenshot of homepage of ORF Finder
There are not a lot of options. Try different ORF lengths. This example, shown in Fig 2, set ORF length to 30 nucleotides (nt). You should explore other genetic codes beside the standard. You should also consider whether or not your sequence should be evaluated as a nested gene, a gene sequence contained entirely within another gene. For example, micro RNA genes are sometimes located in the introns of other protein coding genes.
Figure 2 provides screenshot of results from ORF finder
Figure 2 Screenshot of results from ORF finder on our random sequence 1.
We summarize the results: eleven ORFs were identified, ranging in length from 42 to 183 nt. Putative proteins ranged in size from 13 to 60 amino acids.
Next steps could be to run BLAST on one or more of the ORFs. We can select them one at a time by clicking on the subset table at lower right of the screen (Fig. 2).
What to do, what to turn in
- Generate your own random 1000 nt long DNA sequence at https://www.bioinformatics.org/sms2/random_dna.html
- You can use this sequence for Virtual Ribosome exercise, too, or generate a new one for each exercise
- Paste your sequence in FASTA format to ORF Finder, set ORF length to 30, then investigate for presence of ORFs.
- Copy and paste this same random sequence into the text submit box, no screenshots (so I can recreate your work).
- Capture a screenshot like the one in Figure 2. If no ORFs found, report as such.
- Select one ORF and do BLAST search. Report your results.
- Your responses to other questions in this handout should be placed in your notebook.