Use of ORF Finder: Practice and Worked example

This assignment supports Bioinformatics II

Purpose

Learn how to use ORF finder to identify candidate genes from DNA sequence

What to do

Read through and try on your own to duplicate the results from the example sequence

Try on your own — see bottom of this page, “What to do, what to turn in

Resources

link to ORF Finder, https://www.ncbi.nlm.nih.gov/orffinder/

Definition: ORF, the nominal gene, defined based on similarity to other known genes: Start/Stop, sufficient length of nucleotides (nt) to form a likely protein sequence


Example, finding an ORF

Do we have a gene?

Copy/paste sequence in FASTA format to ORF Finder

Look for START (ATG) and STOP (TAG, TAA, or TGA)
>random sequence 1
acagacctctattaccgttcccgatccacagggtggcggcattataaaccaacacgtacagccccccctctttactgcccacgaagcgcaccgtccctcgaatcgcggcgtctgcggaattgctatatttacagggcgtcaacaagcacacgcttcagacacgcttaaggtgtttccggcgtaccgaaaacgactgtttcctaggcgtgccgttcctagtaccttcgcttactgcaggtcccgatggcttgggagcggtagagatattctatcttgcttgcggttcgtcaacctatacataccgacatccgctgtcataccatcctcaaggaaagcctacattagttctatgcagccgcgtgagtaaaggggggagacgagagccctccggaagggatagcattacagagctgggggtacattaaacagctctcacagccactcacccatcgtcgaacggtggtcgtcctttccgtgaatcccgcaggctgttacggtctagagattcggtggagtagcactagggggtgtattgttaggtgagaggtacaaacgcttcaagtccctaatggattctcgctcgagagccggtcttcaccccgggctacaccgacggaactctcagcttctccaccatgatagcacccttgagcattcatctaatcaccgttttggtgggggataaggcttaaccgtccaagtcgccacagtccagtgcagtcggttaggatttggcgcaaattgggaagaaggcattccgatggttatacgagatatcagcaagtagatcgaagcggaatacgggtggtgtgtggaatgactacgcggaaagaacacgtcgcccatctgtagaaaaacacttttatatccctattaactgtgacattggattgtcgaggtcgtcgctcaagtcgtcaggtcattaatctaataacctggcgaataaccaggcgagcggataggcgatgcacaggacgtccttaccatgatatgatgccgt

ORF Finder screenshot

orfFinder01.png

Figure 1. Screenshot of homepage of ORF Finder

There are not a lot of options. Try different ORF lengths. This example, shown in Fig 2, set ORF length to 30 nucleotides (nt). You should explore other genetic codes beside the standard. You should also consider whether or not your sequence should be evaluated as a nested gene, a gene sequence contained entirely within another gene. For example, micro RNA genes are sometimes located in the introns of other protein coding genes.

Figure 2 provides screenshot of results from ORF finder

orfFinder02.png

Figure 2 Screenshot of results from ORF finder on our random sequence 1.

We summarize the results: eleven ORFs were identified, ranging in length from 42 to 183 nt. Putative proteins ranged in size from 13 to 60 amino acids.

Next steps could be to run BLAST on one or more of the ORFs. We can select them one at a time by clicking on the subset table at lower right of the screen (Fig. 2).


What to do, what to turn in

  1. Generate your own random 1000 nt long DNA sequence at https://www.bioinformatics.org/sms2/random_dna.html
  2. Paste your sequence in FASTA format to ORF Finder, set ORF length to 30, then investigate for presence of ORFs.
    • Copy and paste this same random sequence into the text submit box, no screenshots (so I can recreate your work).
  3. Capture a screenshot like the one in Figure 2. If no ORFs found, report as such.
  4. Select one ORF and do BLAST search. Report your results.
  5. Your responses to other questions in this handout should be placed in your notebook.