Using BLAST to predict PCR product length

to do:

  • update images
  • update links

Biological information is encoded in the nucleotide sequence of DNA. Bioinformatics is the field that identifies biological information in DNA using computer-based tools. Some bioinformatics algorithms aid the identification of genes, promoters, and other functional elements of DNA. We will use bioinformatic tools to investigate the location of several human short-tandem-repeat, STR, loci (assigned in class). In subsequent weeks we will obtain STR loci for samples of human DNA by PCR.

Upon completion of this exercise you will be able to identify

  1. The location in the genome where the STR locus is
  2. Make predictions about the outcome of your PCR experiment, that is, what size in numbers of base pairs is the PCR product (aka amplicon)?

Background — CODIS, the FBI’s DNA Databank

(this section needs updating — adapted from Arizona Biology Project)

The Federal Bureau of Investigation (FBI) of the US has been a leader in developing DNA typing technology for use in the identification of perpetrators of violent crime. In 1997, the FBI announced the selection of 13 STR loci to constitute the core of the United States national database, CODIS (Fig 1). All CODIS STRs are tetrameric repeat sequences. All forensic laboratories that use the CODIS system can contribute to a national database. DNA analysts like Bob Blackett can also attempt to match the DNA profile of crime scene evidence to DNA profiles already in the database.

There are many advantages to the CODIS STR system:

  • The CODIS system has been widely adopted by forensic DNA analysts
  • STR alleles can be rapidly determined using commercially available kits.
  • STR alleles are discrete, and behave according to known principles of population genetics
  • The data are digital, and therefore ideally suited for computer databases
  • Laboratories worldwide are contributing to the analysis of STR allele frequency in different human populations
  • STR profiles can be determined with very small amounts of DNA

FBI Codis STR

Figure 1. CODIS STR

A DNA Profile: The 13 CODIS STR loci

As part of his training and proficiency testing for DNA Profile analysis of STR (Short Tandem Repeat) Polymorphisms, Forensic Scientist and DNA Analyst Bob Blackett created a DNA profile on his own DNA. Here is Bob’s DNA Profile for the 13 core Genetic Loci of the United States national database, CODIS (Combined DNA Index System):

Locus D3S1358 vWA FGA D8S1179 D21S11 D18S51 D5S818
Genotype 15, 18 16, 16 19, 24 12, 13 29, 31 12, 13 11, 13
Frequency 8.2% 4.4% 1.7% 9.9% 2.3% 4.3% 13%

 

Locus D13S317 D7S820 D16S539 THO1 TPOX CSF1PO AMEL
Genotype 11, 11 10, 10 11, 11 9, 9.3 8, 8 11, 11 X Y
Frequency 1.2% 6.3% 9.5% 9.6% 3.52% 7.2% (Male)

The complete primer set we have at Chaminade for STR is listed below.

CUH Primer name Locus designation Primers (5′ – 3′) Used for PageProtocol for STR amplification by PCR
CFamelS1 Amelogenin ACCTCATCCTGGGCACCCTGG
CRamelS1 Amelogenin AGGCTTGAGGCCAACCATCAG
CFcsfS1 CSF1PO AACCTGAGTCTGCCAAGGACTAGC
CFcsfS2 CSF1PO CCGGAGGTAAAGGTGTCTTAAAGT
CRcsfS1 CSF1PO TTCCACACACCACTGGCCATCTTC
CRcsfS2 CSF1PO ATTTCCTGTGTCAGACCCTGTT
CFd18sS1 D18S51 CAAACCCGACTACCAGCAAC
CFd18sS2 D18S51 TTCTTGAGCCCAGAAGGTTA Primer 3 F
CRd18sS1 D18S51 GAGCCATGTTCATGCCACTG
CRd18sS2 D18S51 ATTCTACCAGCAACAACACAAATAAAC Primer 3 R
CFd3sS1 D3S1358 ACTGCAGTCCAATCTGGGT
CFd3sS2 D3S1358 ACTGCAGTCCAATCTGGGT Primer 2 F
CRd3sS1 D3S1358 ATGAAATCAACAGAGGCTTG
CRd3sS2 D3S1358 ATGAAATCAACAGAGGCTTGC Primer 2 R
CFd7sS1 D7S820 ATGTTGGTCAGGCTGACTATG
CFd7sS2 D7S820 TGTCATAGTTTAGAACGAACTAACG
CRd7sS1 D7S820 GATTCCACATTTATCCTCATTGAC
CRd7sS2 D7S820 CTGAGGTATCAAAAACTCAGAGG
CFdysS1 DYS391 CTATTCATTCAATCATACACCCA
CFdysS2 DYS391 TTCATCATACACCCATATCTGTC
CRdysS1 DYS391 GATTCTTTGTGGTGGGTCTG
CRdysS2 DYS391 GATAGAGGGATAGGTAGGCAGGC
CFfgaS1 FGA ATTATCCAAAAGTCAAATGCCCCATAGG
CFfgaS2 FGA GGCTGCAGGGCATAACATTA
CRfgaS1 FGA ATCGAAAATATGGTTATTGAAGTAGCTG
CRfgaS2 FGA ATTCTATGACTTTGCGCTTCAGGA
CFthoS1 TH01 GTGGGCTGAAAAGCTCCCGATTAT
CFthoS2 TH01 GCTTCCGAGTGCAGGTCACA
CRthoS1 TH01 ATTCAAAGGGTATCTGGGCTCTGG
CRthoS2 TH01 CAGCTGCCCTAGTCAGCAC
CFtpS1 TPOX ACTGGCACAGAACAGGCACTTAGG
CFtpS2 TPOX CACTAGCACCCAGAACCGTC Primer 1 F
CRtpS1 TPOX GGAGGAACTGGGAACCACACAGGT
CRtpS2 TPOX CCTTGTCAGCGTTTATTTGCC Primer 1 R

The CUH primer name reads as follows.

  • C stands for Chaminade
  • the F or R following C stands for forward (F) or reverse (R) primer sequence
  • next three letters corresponds to first three letters of locus name
  • S1 refers to primer set 1; S2 refers to primer set 2

To begin the Bioinformatics exercise, locate the primer set (forward, begins with “CF”,  and reverse, begins with “CR”) from the Chaminade table. For example, if the CUH name is CFdysS1, then this is the forward primer for DYS391; the matching reverse primer then will be CRdysS1. Once you have the two primers for the STR locus, proceed to the next steps in the handout.

Conduct a BLAST search with your primer set

1. Go to https://www.ncbi.nlm.nih.gov/tools/primer-blast/ and copy exactly our primer sequences in the “Primer Parameters” area (look for “Use my own forward primer; Use my own reverse primer”). One at a time, copy and paste only the sequences, not the leading “5-” or trailing “-3’”. See red arrow, Fig 2.

Figure 2.

2. While on the same page, scroll down to Database under “Primer Pair Specificity Checking Parameters” and change the default entry from “Refseq mRNA” to “Genome (reference sequence from selected organisms)”. Confirm that entry for Organism is “Homo Sapiens” or “9606”, the “taxid” for humans in the NCBI databases. See blue arrow next screenshot (Figure 3).

Figure 3.

3. Scroll down to find the “Get Primers” button (Fig 3). Click on the button – you should see a screen like this next one (Fig 4), which indicates the database search is proceeding.

Figure 4.

Question: What time zone is the database/server located? Hint: I conducted the search at 9:33AM HST.

4. After the search is complete (and successful!), you’ll get a screen like the one below (Fig 5).

Figure 5.

Question. What is the predicted amplicon size?

This was an example for primer search for ACTB; You’ll get a similar page for your search. Click on the link under the “Products on intended target (e.g., NC_000007.14 in this example) to bring up the sequence page (Fig 6).

Figure 6.

Question. Where in the genome is your target STR sequence? Note not only the chromosome number, but the region in the genome, i.e., from base number: ____ to base number: _____.

/MD