The blueprint for all living organisms is contained in their DNA. Deoxynucleic acid (DNA) is made from four different nucleotides. Each nucleotide is composed of three parts, a sugar (deoxyribose), phosphate group, and a nitrogenous base, Adenine (A), Cytosine (C), Guanine (G), or Thymine (T). DNA is a double stranded macromolecule and each of the four bases has a single complementary base: A pairs with T and C pairs with G. So if we know the sequence of one strand of the DNA we can deduce the second strand very simply. A single strand of DNA is built by adding the phosphate group (attached to the 5' carbon atom of the sugar) of one nucleotide to the 3' carbon of the nucleotide in the existing strand. This gives each strand polarity, so that when DNA sequences are discussed they are conventionally written 5' to 3'. In the DNA macromolecule the two strands of DNA run antiparallel to one another. One might visualize a ladder where the sugar and phosphate portions of the nucleotides are strung together to make the sides of the ladder and the complementary bases form the rungs. The whole structure is then twisted to form the helical shape that characterizes DNA. For a thorough explanation of the history and development of the double helical structure model of DNA watch the animation available from DNA from the Beginning.
The basic double helix that everyone thinks of when they think of DNA is just a piece of the picture. The DNA double helix is organized into circular (prokaryotic) or linear (eukaryotic) chromosomes. It is these chromosomes that are very carefully replicated and passed on to new cells. The basic unit of heredity, contained within the chromosomes, are genes. Genes can also be thought of as a specific sequence within the DNA that codes for information. Since the molecule of DNA is the same regardless of the source, it can be very informative to compare the sequence of homologous genes (genes that code for the same product) from different sources.
DNA is considered an informational molecule because it contains nucleotides in a sequence that can be utilized by the cell. There have been several methods employed to elucidate the sequence of a gene. The most common is the Sanger or dideoxy method. This method takes advantage of the principles of DNA replication to make many new copies of DNA sequence.
The Sanger method requires a primer, a short piece of DNA that is complementary to the DNA to be sequenced. Therefore it is necessary to know at least a little of the sequence to gain more sequence information. Genes for sequencing are often identified from a DNA library (a collection of clones, or DNA fragments all within the same cloning vector). Primers that complement the vector can then be extended to sequence of the inserted DNA. NOVA Online (part of PBS) has a good sequencing demo, Sequence for Yourself.
DNA sequencing utilizes an enzyme, DNA polymerase, to add nucleotides to a growing polynucleotide. This enzyme requires a free 3' OH group to add onto. The dideoxy sequencing method takes advantage of this requirement by making dideoxy nucleotides (nucleotides without the free 3' OH) available to the reaction. Each sequencing reaction requires single stranded template DNA (the DNA to be sequenced), DNA polymerase, all four nucleotides (A, C, G, and T) in equal proportion, and the dideoxy version of one of the nucleotides (A, C, G, or T) at 1/100 of the concentration of the normal nucleotide. The DNA polymerase elongates the primer by adding nucleotides which are complementary to the template strand, one by one at the 3' OH, until a dideoxy nucleotide gets added, terminating the chain.
The results are single stranded chains of polynucleotides of various lengths, corresponding to the template DNA, which terminate in a specific nucleotide.
In order to acquire sequence of all four nucleotides (A, C, G, and T), four sequencing reactions are performed, one for each nucleotide.
Why is it important not to add equal proportions of the target nucleotide and its dideoxy version?
For example, if we were to look at the results of the reaction with dideoxy thymine (ddTTP), we would get chains of various lengths, terminating where the ddTTP is added (complementary to adenine in the template strand) in place of a normal T.
We start with the template and a complementary primer:
primer: 5' TACGTGC 3' template: 3' ATGCACGTTAACGACTGACGTACTCGGACAGTCAGTCA 5'
Here are a few examples of the resulting complementary chains:
primer+nucleotides: 5' TACGTGCAAT 3' template: 3' ATGCACGTTAACGACTGACGTACTCGGACAGTCAGTCA 5'
primer+nucleotides: 5' TACGTGCAATTGCTGACTGCAT 3' template: 3' ATGCACGTTAACGACTGACGTACTCGGACAGTCAGTCA 5'
primer+nucleotides: 5' TACGTGCAATTGCT 3' template: 3' ATGCACGTTAACGACTGACGTACTCGGACAGTCAGTCA 5'
primer+nucleotides: 5' TACGTGCAATT 3' template: 3' ATGCACGTTAACGACTGACGTACTCGGACAGTCAGTCA 5'
primer+nucleotides: 5' TACGTGCAATTGCTGACT 3' template: 3' ATGCACGTTAACGACTGACGTACTCGGACAGTCAGTCA 5'
Note that each chain ends with the dideoxy version of T (ddTTP).
Once the four sequencing reactions have been run, the resulting fragments are separated (according to their length) by electrophoresis on polyacrylamide gels that can resolve strands that are only one base different in length. The results are read from the gel, starting at the bottom.
| This (very small) example gel would be read TCAGGACAGGAAAG.
|
It is important to remember that all DNA is created equal whether it codes for a gene or not. Therefore we must keep in mind the structure of a gene. Most scientists define a gene as not only the sequence of nucleotides that code for a protein (from the start to the stop codon -- the open reading frame (ORF) -- but include all the sequence involved in regulation of gene expression. Many programs are available to search for open reading frames. The Institute for Cellular and Molecular Biology at the University of Texas maintains a list.
Keep in mind that only one strand of DNA serves as the template for RNA transcription for a given gene. Both strands can serve as template for transcription of various genes. It is the complementary strand that shares the same sequence as the mRNA (with the thymines replaced by uracils) when they are read 5' to 3'. It is this sequence that is read as containing the genetic code (three nucleotide "words" or codons that specify an amino acid). Therefore, when considering the sequence of a gene, the first thing to think about is the polarity, or direction, of the sequence being examined, then whether the sequence is for the template strand or the complementary strand.
There are many reasons to sequence a gene. Once a sequence has been obtained a comparison of that sequence to other sequences, to establish relations or to identify novel genes, is usually in order. The US government through the National Institute of Health maintains a database of sequence information known as genbank. This database is accessed through the National Center for Biotechnology Information (NCBI) website. It is possible to search the database by looking for gene names, genetic diseases, or sequence information.
You are a research scientist interested in genes involved in DNA repair and meiosis. You know that the rad51 gene from yeast is involved in both of these processes. This gene has been identified and sequenced in several organisms and you want to identify a clone (see part 2 of RE lab) and sequence this gene in your organism, Coprinus cinereus. In order to accomplish these goals an alignment of known sequences of the rad51 gene is made and conserved regions of sequence similarity are identified. These sequences are used to design sequencing primers for polymerase chain reaction (PCR). PCR is used to amplify a section of the rad51 gene of C. cinereus from genomic DNA. The PCR product is sequenced by the dideoxy termination method. Remember that the dideoxy method generates many copies of the sequence of interest terminating at different nucleotides, thus the products are many chains of single stranded DNA of various lengths. You've separated these fragments by electrophoresis on polyacrylamide gels, and are ready to read the results.
NCBI maintains a variety of databases relating to biotechnology, including Entrez, a retrieval system for searching these linked databases. The databases include, but are not limited to, PubMed, which allows you to search for primary literature, Blast, which allows you to search and compare nucleotide or protein sequences, and OMIM, which is a database of human genes and genetic disorders. This site also contains a Primer of concepts and techniques of molecular biology.
For this lab, we concentrate on the sequence databases.
gatagccaactgaagtatgctataggtgggattgaaactggagccattac
tgaactctttggcgagttcaggacaggaaagtcccagatatgtcatacgc
ttgctgtgacatgccaactgccagtcagcatgggaggtggtgaagggaag
tgtctatacattgacaccgaagggacgttccgccctgtgcgattgctggc
attgctgatgcttagtcatgactgatcgtagcttgactgagtcgtagtcagtcgtaatgcgtaagagtcgg
gttaacctcgaagttaaggtctgatgcccaggtacccgattgaaagtccgttagactgactcgaaaggtcg
agtcgatcgatgatgagccgtagaaggtacttgagtcgagtccgtttagactccttgatcgatcggagatc
As you can see, BLAST is a very useful tool, but does it have limitations? Consider the following: