[1]Skip to Main Menu
  [2][logo.png]

  Also see: [3]berthub.eu
  [4]Skip to Content (BUTTON) Toggle Sidebar
    *

  Articles

  by bert hubert / [email protected]

How 2019-nCoV is diagnosed: bacterium assisted DNA searching

  Posted on Feb 10 2020 12 mins read

    This post is dedicated to lab technicians everywhere doing the
    difficult work institutes and hospitals rely on to investigate
    disease and keep us healthy. Lab work requires high precision, deep
    understanding, is physically demanding, and can even be dangerous.
    Although our healthcare systems & universities would come to a
    grinding halt without lab technicians, they are often almost
    literally invisible somewhere far away. Thank you all for your hard
    work!

  When 2019-nCoV was discovered, the world quickly determined the DNA (or
  actually RNA) of this virus. Within a few days reliable laboratory
  tests became available that were able to detect an infection in only a
  few hours. In this post I attempt to explain the magnificent technology
  used, which is called [5]Reverse Transcription Real-Time Quantitative
  Polymerase Chain Reaction or qRT-PCR.
  [ncov-dna-feature.png]

  qRT-PCR is used to detect the presence of a specific bit of 2019-nCoV
  genetic material, and if we see enough of it, we then determine that
  the patient is infected. In essence this virus test is actually a DNA
  test.

    NOTE: If you are a professional, you’ll note I take some shortcuts
    in the story. Please read on to the end, where some of these
    shortcuts are patched up. Feedback if I got it wrong is VERY welcome
    on [email protected] or [6]@PowerDNS_Bert.

  We’ve known about genes and DNA for longer than you might think. The
  first DNA sequences were painstakingly determined in the early 1970s,
  one DNA letter at a time. In 1978, the (tiny) genome of
  [7]bacteriophage φX174 was determined and published. Only around the
  year 2000 did it become possible to read “whole genomes”, and then only
  with international efforts costing billions.

  Because DNA sequencing technology was so limited, in the 1980s a lot of
  thought was put to the question how to detect specific bits of DNA
  without performing the (then) unimaginable act of reading gigabytes of
  DNA.

    Note: Thanks are due to Dr Mamnun Kahn and Erwin van Rijn for
    suggestions & feedback. Please note that all mistakes remain mine!

  Enter the Polymerase Chain Reaction (PCR).

DNA

  [8]DNA is fully digital, consisting of strings of nucleotides, which
  are small molecules. We call these small molecules A, C, G and T. Human
  DNA consists of around 3 billion nucleotides, which comprise around
  750MB of data.

  Because our DNA is, so to say, important, nature stores it redundantly.
  For example, the sequence ACGTTCA is actually stored like this:
<-------
ACGTTCA
|||||||
TGCAAGT
------->

  This shows the two strands of DNA, where each A finds itself opposite a
  T and every C is attached to a G. If damage occurs, it can easily be
  repaired, because the opposite side forms a template to attract
  replacement nucleotides. A/T and C/G pairs can each be compared to
  north/south pole magnets, they attract each other strongly.

  As an example, here a missing T and a G are repaired via their opposite
  sides:
<-------               <-------
ACG TCA                ACGTTCA
|||||||      -->       |||||||
T CAAGT                TGCAAGT
------->               ------->

        repair process

  This redundancy also enables copying. First the two DNA strands are
  separated, leaving all the nucleotides “waiting to be repaired”.
  Repairs are then initiated, and the DNA molecule is “zipped up again”,
  leading to two functional copies of the DNA fragment.

  This ‘zipping up’ process is called the polymerase reaction, and it
  will turn out to be vital for this story. All of life utilizes the
  polymerase reaction to duplicate DNA. DNA is fully compatible down from
  the lowliest virus to the mightiest tree.

  Of specific note are the little arrows drawn in the DNA diagrams in
  this post: our genetic material has a preferred direction, and it can
  only be processed in that direction. This direction is reversed on
  opposite sides of the DNA.
          1                              2      &      3

                <-------          <-------             <-------
                 ACGTTCA           ACGTTCA              ACGTTCA
             ^   |||||||    -->    |||||||     -->      |||||||
            /                      TGCA                 TGCAGGT
<-------    /                       --->                 ------->
ACGTTCA   /                                              Done!
|||||||
TGCAAGT                                              two copies now
------->  \                           <---             <-------
           \                          TTCA              ACGTTCA
            \    |||||||    -->    |||||||     -->      |||||||
             v   TGCAAGT           TGCAAGT              TGCAAGT
                 ------->          ------->             ------->

  Summarising, to copy DNA:
   1. the strands are separated (‘denatured’)
   2. new nucleotides attach themselves
   3. the two new DNA molecules get zipped up (‘polymerised’)

  The astute reader will have noted that this ‘doubling’ of DNA can be
  used to cause a chain reaction, where we first get one copy, then 2,
  then 4, then 8, 16, 32 etc.

  If we could do, say, 40 rounds of duplication we could turn a single
  stretch of DNA into a trillion copies. This would then generate enough
  DNA to detect it “by eye” if necessary!

    Note: to learn more about DNA, RNA, proteins & life, the briefest of
    summaries can be found in my post [9]DNA: The Code of Life, which
    also includes links to >2 hours of video with q&a.

Adding some complication

  Nature has, as far as we know, had over 4 billion years to work on
  “[10]the architecture of life”. So it turns out that nothing is really
  simple.

  In reality, life does not randomly go about copying bits of DNA. The
  ‘polymerase’ reaction as described above does not operate on fully
  denatured (separated) strands. Polymerase is instead used to complete a
  DNA copy, starting from a bit of existing dual stranded DNA. So
  polymerases can do this very well:
<-------           <-------
ACGTTCA            ACGTTCA
|||||||    ->      |||||||
TGCA               TGCAGGT
--->               ------->

  But they can’t do this:
<-------           <-------
ACGTTCA            ACGTTCA
|||||||    ->      |||||||
                   TGCAGGT
                   ------->

  In other words, polymerase can’t operate on a single strand alone - it
  needs to start with a short double stranded bit, and then polymerase
  can continue the work. This “starter bit” of dual stranded DNA can be
  created with a [11]primer.

  In the diagram above, we can make polymerase do its work by adding some
  “TGCA” single stranded DNA as a primer. It will bind solidly to the
  ‘ACGT’ part because it is the exact ‘complement’ (remember the A/T, C/G
  “magnetism”):
<-------                      <-------            <-------
ACGTTCA                       ACGTTCA             ACGTTCA
|||||||    + TGCA     -->     |||||||    -->      |||||||
             ---->            TGCA                TGCAGGT
                              ---->               ------->
            primer                     polymerase
                                       reaction

  Herein lies the key insight - by adding a primer, we can selectively
  make denatured DNA suitable for the polymerase reaction. Because fully
  single stranded DNA can’t be copied, only copies will be made where a
  primer has attached, and it will only attach to the DNA we care about.

  Because no copying (‘polymerasing’) happens without a matching primer,
  we can use the primer as our “search term” in DNA. Ordered online &
  delivered as a bit of fluid, the primer is the selector of the DNA we
  are interested in.

Primers

  To detect a specific virus, primers are ordered that match up to bits
  of the viral genome. We also need to make sure we pick DNA that is not
  also present in other organisms though.

  A typical primer is 20 nucleotides long, and it can be extremely
  specific. Primers can be designed that don’t just match specific
  viruses, but also specific strains.

  It may be somewhat surprising that “only” 20 letters suffice for such a
  strong match but this is due to statistics. 20 nucleotides represent 4
  to the power of 20 possibilities, which is around 1 trillion. Most
  detections rely on two primers (see below), which multiplies the
  specificity by another factor of 1 trillion.

  For robustness, multiple primer (pairs) can be used so that a virus (or
  gene) can be detected even if one part of it has mutated.

Copying selected DNA in the lab

  Nature surely is clever about this copying, but can we replicate it in
  the lab? It turns out that by borrowing a bit from the bacterial
  kingdom, we can.

  The ingredients required:
   1. The DNA we are interested in (from a patient perhaps)
   2. Primer material ([12]order online)
   3. New nucleotides ([13]off the shelf)
   4. Polymerase from a [14]high-temperature bacterium ([15]off the
      shelf)
   5. Food & nutrients for the polymerase (off the shelf)

  Conveniently, we can mix all these together in a single vial. In this
  way, PCR is like a single pan recipe. If you are in a hurry, premixed
  vials containing 2, 3, 4 and 5 are available.

  First we need to separate the strands. DNA does this automatically at
  around 95°C. So we heat up a test-tube (with all the ingredients) to
  this temperature, and wait two minutes.

  Now our tube is full of single stranded DNA (and all the other
  ingredients). We then cool things down again, typically to 50-60°C.
  This makes the primer DNA bind to the single stranded DNA we are
  interested in. Because remember, the primer is the ‘search term’.

         [ncov-labels.png] 2019-nCoV labels. Photo: Sam Nicholls /
                              [16]@samstudio8

  After only 15 seconds, the primers will have bound to the right pieces
  of single stranded DNA. These bits of DNA are now ready to be copied.

  We then raise the temperature to 68°C. Why 68°C? It turns out that the
  high-temperature bacterium ([17]Thermus aquaticus aka Taq) we got the
  polymerase from does its best work at that temperature. After only 15
  seconds, ‘Taq polymerase’ will have done its copying work on the
  single-stranded DNA bits that have bound to a primer (if they are short
  bits - longer stretches require more copying time).

  With some luck many of the interesting single stranded bits of DNA have
  now been duplicated. In reality, we don’t exactly get two copies of
  every strand, but as long as we got more than one copy it is good.

  We then heat up the tube to 95°C again and restart the process. This
  cycle is repeated dozens of times. Even if we only gained 30% material
  per copy, we will now have ‘amplified’ the relevant bit of DNA by three
  orders of magnitude.

Ok, then what?

  If we have done our work right, the test tube is now teeming with
  copies of the bit of DNA we are interested in. But how could we tell?
  Various mechanisms are used. Let’s say the (viral) DNA we care about
  was not present, in that case all these temperature cycles have
  achieved almost nothing - the primer material didn’t latch on to
  anything, the polymerase had nothing to do.

  This means that we could simply measure how much DNA there now is in
  the tube, and if this is significantly more than there was before the
  copying cycles, apparently we scored a hit.

  Such detection is possible with “[18]DNA staining“ which uses molecules
  that become fluorescent once they are attached to (any) double stranded
  DNA. By observing (with a suitable camera) if the amount of light
  emitted increases over the cycles, this can tell us if the PCR is
  actually multiplying DNA.

More precision

  If we look at the [19]US CDC PCR information for 2019-nCoV, we find
  that it lists primer information, but also a “probe”:
  [ncov-primers.png]

  We’ll get to why there are two primers later, but the third line is the
  interesting one. A probe is again a bit of single-stranded DNA, like
  the primers, but it comes with a light generating molecule attached. If
  it manages to bind to a piece of (complementary) DNA, it becomes
  fluorescent.

  If we again, like with the DNA staining, measure how the amount of
  light increases during PCR, we get a very precise confirmation that the
  PCR process is 1) amplifying something and 2) it is actually the DNA we
  were expecting.

  (Note that probes are actually slightly more complicated than this -
  they bind to DNA, but during polymerasing get dislodged & then split
  up. It is this splitting that causes the fluorescence).

But why are there two primers?

  As noted, DNA has two strands, and it can only be copied in one
  direction. Here is some actual 2019-nCoV DNA:
   ------------------------------------------------------------------------>
A:  GACCCCAAAATCAGCGAAATGCACCCCGCATTACGTTTGGTGGACCCTCAGATTCAACTGGCAGTAACCAGA
   ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
B:  CTGGGGTTTTAGTCGCTTTACGTGGGGCGTAATGCAAACCACCTGGGAGTCTAAGTTGACCGTCATTGGTCT
  <------------------------------------------------------------------------

  If we heat this bit of DNA up so it denatures, we end up with two
  single strands:
    ------------------------------------------------------------------------>
A:   GACCCCAAAATCAGCGAAATGCACCCCGCATTACGTTTGGTGGACCCTCAGATTCAACTGGCAGTAACCAGA
    ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||


    ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
B:   CTGGGGTTTTAGTCGCTTTACGTGGGGCGTAATGCAAACCACCTGGGAGTCTAAGTTGACCGTCATTGGTCT
   <------------------------------------------------------------------------

  The first primer for 2019-nCoV is: GACCCCAAAATCAGCGAAAT, and we can
  indeed see that this stretch of the 2019-nCoV genome starts with that
  string. This means it would bind here, and initiate the polymerase
  reaction.
    ------------------------------------------------------------------------>
    GACCCCAAAATCAGCGAAAT
    ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
B:   CTGGGGTTTTAGTCGCTTTACGTGGGGCGTAATGCAAACCACCTGGGAGTCTAAGTTGACCGTCATTGGTCT

  We would then have taken our original bit of DNA, denatured it into two
  single stranded stretches, and copied one of these into a
  double-stranded whole again. This is not amplification, we started with
  one double-stranded bit of DNA, and we also ended up with one!

  So to actually make things work, we need a second primer for the other
  single-stranded stretch we produced.

  The second 2019-nCoV primer is: TCTGGTTACTGCCAGTTGAATCTG, and lo, it
  matches the other single strand (once we “reverse complement” it):
A:   GACCCCAAAATCAGCGAAATGCACCCCGCATTACGTTTGGTGGACCCTCAGATTCAACTGGCAGTAACCAGA
    ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
                                                    GTCTAAGTTGACCGTCATTGGTCT
   <------------------------------------------------------------------------

  With this primer, polymerase can create the second double-stranded copy
  of DNA.

  As a bonus, in the CDC page we found the DNA for the ‘probe’, it is
  ACCCCGCATTACGTTTGGTGGACC, which we indeed find in the middle:
                          ACCCCGCATTACGTTTGGTGGACC
    ------------------------------------------------------------------------>
A:   GACCCCAAAATCAGCGAAATGCACCCCGCATTACGTTTGGTGGACCCTCAGATTCAACTGGCAGTAACCAGA
    ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
B:   TCTGGTTACTGCCAGTTGAATCTGAGGGTCCACCAAACGTAATGCGGGGTGCATTTCGCTGATTTTGGGGTC
   <------------------------------------------------------------------------

One final twist

  At the very beginning of this article we noted the impressive name of
  this technique: [20]Reverse Transcription Real-Time Quantitative
  Polymerase Chain Reaction or qRT-PCR

  It turns out that 2019-nCoV is actually an RNA virus. DNA and RNA both
  carry genetic material. The techniques described above only work on DNA
  and not on RNA. Luckily nature has supplied us with an enzyme called
  ‘reverse transcriptase’. Once added to RNA, this produces the DNA
  variant of the same genetic material. This is the ‘Reverse
  transcription’ part of the name.

That doesn’t sound so hard!

  Well.. yes and no. We glossed over many important details. For example,
  how do you actually gain access to the DNA? This requires mechanical
  preprocessing. In addition, once the DNA has been released, we need to
  make sure that we extract it, and only it, and add it to the PCR vial.
  You can’t just put some snot in there and expect it to work (although
  it might, which might not be what you want).

  In addition, for this to be useful and reliable, great care must be
  taken not to (cross-)contaminate samples. DNA is everywhere and can
  easily end up in places where it should not be. From my own DNA
  research, I fondly recall my system detecting ‘human or monkey DNA’ in
  every sample we tried, even though these were supposed to be bacterial
  samples.

  We also sort of glossed over how we pick primers to detect specific
  disease or organisms. It turns out that primer design is also somewhat
  of an art, and just picking some DNA will not end well.

  So in short, although in this page I may have explained the basics of
  qRT-PCR, realize that people take multi-year courses to learn how to do
  this well.

  I do hope that you found this entertaining, and as ever, feedback is
  very welcome on [email protected] or [21]@PowerDNS_Bert.

  [22]Previous Previous post: Amateur SARS/2019-NCoV RNA Comparison

    * [23]Open Github account in new tab
    * [24]Open Twitter account in new tab
    * [25]Contact via Email
    * [26]Open Linkedin account in new tab

  © 2014-2020 bert hubert

References

  Visible links
  1. https://berthub.eu/articles/posts/dna-grep-2019-ncov/#main-menu
  2. https://berthub.eu/articles/
  3. https://berthub.eu/
  4. https://berthub.eu/articles/posts/dna-grep-2019-ncov/#content
  5. https://en.wikipedia.org/wiki/Reverse_transcription_polymerase_chain_reaction
  6. https://twitter.com/PowerDNS_Bert
  7. https://en.wikipedia.org/wiki/Phi_X_174
  8. https://ds9a.nl/amazing-dna
  9. https://berthub.eu/articles/posts/dna-the-code-of-life/
 10. https://berthub.eu/articles/posts/what-is-life/
 11. https://en.wikipedia.org/wiki/Primer_(molecular_biology)
 12. https://www.eurofinsgenomics.eu/en/dna-rna-oligonucleotides/optimised-application-oligos/q-pcr-primer/
 13. https://www.sigmaaldrich.com/catalog/product/roche/dntpmro
 14. https://en.wikipedia.org/wiki/Taq_polymerase
 15. https://www.sigmaaldrich.com/catalog/product/roche/aptatro
 16. https://twitter.com/samstudio8
 17. https://en.wikipedia.org/wiki/Thermus_aquaticus
 18. https://www.geneon.net/products/fluorescent-dyes/evagreen-fluorescent-dna-stain-50x/
 19. https://www.cdc.gov/coronavirus/2019-ncov/lab/rt-pcr-panel-primer-probes.html
 20. https://en.wikipedia.org/wiki/Reverse_transcription_polymerase_chain_reaction
 21. https://twitter.com/PowerDNS_Bert
 22. https://berthub.eu/articles/posts/sars-ncov-comparison/
 23. https://github.com/ahuPowerDNS
 24. https://twitter.com/@PowerDNS_Bert
 25. mailto:[email protected]
 26. https://linkedin.com/in/bert-hubert-b05452

  Hidden links:
 28. https://berthub.eu/articles/
 29. https://github.com/MunifTanjim/minimo