[1]Skip to Main Menu

[1]Skip to Main Menu
[2][logo.png]

Also see: [3]berthub.eu
[4]Skip to Content (BUTTON) Toggle Sidebar
*

Articles

by bert hubert / [email protected]

How 2019-nCoV is diagnosed: bacterium assisted DNA searching

Posted on Feb 10 2020 12 mins read

This post is dedicated to lab technicians everywhere doing the
difficult work institutes and hospitals rely on to investigate
disease and keep us healthy. Lab work requires high precision, deep
understanding, is physically demanding, and can even be dangerous.
Although our healthcare systems & universities would come to a
grinding halt without lab technicians, they are often almost
literally invisible somewhere far away. Thank you all for your hard
work!

When 2019-nCoV was discovered, the world quickly determined the DNA (or
actually RNA) of this virus. Within a few days reliable laboratory
tests became available that were able to detect an infection in only a
few hours. In this post I attempt to explain the magnificent technology
used, which is called [5]Reverse Transcription Real-Time Quantitative
Polymerase Chain Reaction or qRT-PCR.
[ncov-dna-feature.png]

qRT-PCR is used to detect the presence of a specific bit of 2019-nCoV
genetic material, and if we see enough of it, we then determine that
the patient is infected. In essence this virus test is actually a DNA
test.

NOTE: If you are a professional, you’ll note I take some shortcuts
in the story. Please read on to the end, where some of these
shortcuts are patched up. Feedback if I got it wrong is VERY welcome
on [email protected] or [6]@PowerDNS_Bert.

We’ve known about genes and DNA for longer than you might think. The
first DNA sequences were painstakingly determined in the early 1970s,
one DNA letter at a time. In 1978, the (tiny) genome of
[7]bacteriophage φX174 was determined and published. Only around the
year 2000 did it become possible to read “whole genomes”, and then only
with international efforts costing billions.

Because DNA sequencing technology was so limited, in the 1980s a lot of
thought was put to the question how to detect specific bits of DNA
without performing the (then) unimaginable act of reading gigabytes of
DNA.

Note: Thanks are due to Dr Mamnun Kahn and Erwin van Rijn for
suggestions & feedback. Please note that all mistakes remain mine!

Enter the Polymerase Chain Reaction (PCR).

DNA

[8]DNA is fully digital, consisting of strings of nucleotides, which
are small molecules. We call these small molecules A, C, G and T. Human
DNA consists of around 3 billion nucleotides, which comprise around
750MB of data.

Because our DNA is, so to say, important, nature stores it redundantly.
For example, the sequence ACGTTCA is actually stored like this:
<-------
ACGTTCA
|||||||
TGCAAGT
------->

This shows the two strands of DNA, where each A finds itself opposite a
T and every C is attached to a G. If damage occurs, it can easily be
repaired, because the opposite side forms a template to attract
replacement nucleotides. A/T and C/G pairs can each be compared to
north/south pole magnets, they attract each other strongly.

As an example, here a missing T and a G are repaired via their opposite
sides:
<------- <-------
ACG TCA ACGTTCA
||||||| --> |||||||
T CAAGT TGCAAGT
-------> ------->

repair process

This redundancy also enables copying. First the two DNA strands are
separated, leaving all the nucleotides “waiting to be repaired”.
Repairs are then initiated, and the DNA molecule is “zipped up again”,
leading to two functional copies of the DNA fragment.

This ‘zipping up’ process is called the polymerase reaction, and it
will turn out to be vital for this story. All of life utilizes the
polymerase reaction to duplicate DNA. DNA is fully compatible down from
the lowliest virus to the mightiest tree.

Of specific note are the little arrows drawn in the DNA diagrams in
this post: our genetic material has a preferred direction, and it can
only be processed in that direction. This direction is reversed on
opposite sides of the DNA.
1 2 & 3

<------- <------- <-------
ACGTTCA ACGTTCA ACGTTCA
^ ||||||| --> ||||||| --> |||||||
/ TGCA TGCAGGT
<------- / ---> ------->
ACGTTCA / Done!
|||||||
TGCAAGT two copies now
-------> \ <--- <-------
\ TTCA ACGTTCA
\ ||||||| --> ||||||| --> |||||||
v TGCAAGT TGCAAGT TGCAAGT
-------> -------> ------->

Summarising, to copy DNA:
1. the strands are separated (‘denatured’)
2. new nucleotides attach themselves
3. the two new DNA molecules get zipped up (‘polymerised’)

The astute reader will have noted that this ‘doubling’ of DNA can be
used to cause a chain reaction, where we first get one copy, then 2,
then 4, then 8, 16, 32 etc.

If we could do, say, 40 rounds of duplication we could turn a single
stretch of DNA into a trillion copies. This would then generate enough
DNA to detect it “by eye” if necessary!

Note: to learn more about DNA, RNA, proteins & life, the briefest of
summaries can be found in my post [9]DNA: The Code of Life, which
also includes links to >2 hours of video with q&a.

Adding some complication

Nature has, as far as we know, had over 4 billion years to work on
“[10]the architecture of life”. So it turns out that nothing is really
simple.

In reality, life does not randomly go about copying bits of DNA. The
‘polymerase’ reaction as described above does not operate on fully
denatured (separated) strands. Polymerase is instead used to complete a
DNA copy, starting from a bit of existing dual stranded DNA. So
polymerases can do this very well:
<------- <-------
ACGTTCA ACGTTCA
||||||| -> |||||||
TGCA TGCAGGT
---> ------->

But they can’t do this:
<------- <-------
ACGTTCA ACGTTCA
||||||| -> |||||||
TGCAGGT
------->

In other words, polymerase can’t operate on a single strand alone - it
needs to start with a short double stranded bit, and then polymerase
can continue the work. This “starter bit” of dual stranded DNA can be
created with a [11]primer.

In the diagram above, we can make polymerase do its work by adding some
“TGCA” single stranded DNA as a primer. It will bind solidly to the
‘ACGT’ part because it is the exact ‘complement’ (remember the A/T, C/G
“magnetism”):
<------- <------- <-------
ACGTTCA ACGTTCA ACGTTCA
||||||| + TGCA --> ||||||| --> |||||||
----> TGCA TGCAGGT
----> ------->
primer polymerase
reaction

Herein lies the key insight - by adding a primer, we can selectively
make denatured DNA suitable for the polymerase reaction. Because fully
single stranded DNA can’t be copied, only copies will be made where a
primer has attached, and it will only attach to the DNA we care about.

Because no copying (‘polymerasing’) happens without a matching primer,
we can use the primer as our “search term” in DNA. Ordered online &
delivered as a bit of fluid, the primer is the selector of the DNA we
are interested in.

Primers

To detect a specific virus, primers are ordered that match up to bits
of the viral genome. We also need to make sure we pick DNA that is not
also present in other organisms though.

A typical primer is 20 nucleotides long, and it can be extremely
specific. Primers can be designed that don’t just match specific
viruses, but also specific strains.

It may be somewhat surprising that “only” 20 letters suffice for such a
strong match but this is due to statistics. 20 nucleotides represent 4
to the power of 20 possibilities, which is around 1 trillion. Most
detections rely on two primers (see below), which multiplies the
specificity by another factor of 1 trillion.

For robustness, multiple primer (pairs) can be used so that a virus (or
gene) can be detected even if one part of it has mutated.

Copying selected DNA in the lab

Nature surely is clever about this copying, but can we replicate it in
the lab? It turns out that by borrowing a bit from the bacterial
kingdom, we can.

The ingredients required:
1. The DNA we are interested in (from a patient perhaps)
2. Primer material ([12]order online)
3. New nucleotides ([13]off the shelf)
4. Polymerase from a [14]high-temperature bacterium ([15]off the
shelf)
5. Food & nutrients for the polymerase (off the shelf)

Conveniently, we can mix all these together in a single vial. In this
way, PCR is like a single pan recipe. If you are in a hurry, premixed
vials containing 2, 3, 4 and 5 are available.

First we need to separate the strands. DNA does this automatically at
around 95°C. So we heat up a test-tube (with all the ingredients) to
this temperature, and wait two minutes.

Now our tube is full of single stranded DNA (and all the other
ingredients). We then cool things down again, typically to 50-60°C.
This makes the primer DNA bind to the single stranded DNA we are
interested in. Because remember, the primer is the ‘search term’.

[ncov-labels.png] 2019-nCoV labels. Photo: Sam Nicholls /
[16]@samstudio8

After only 15 seconds, the primers will have bound to the right pieces
of single stranded DNA. These bits of DNA are now ready to be copied.

We then raise the temperature to 68°C. Why 68°C? It turns out that the
high-temperature bacterium ([17]Thermus aquaticus aka Taq) we got the
polymerase from does its best work at that temperature. After only 15
seconds, ‘Taq polymerase’ will have done its copying work on the
single-stranded DNA bits that have bound to a primer (if they are short
bits - longer stretches require more copying time).

With some luck many of the interesting single stranded bits of DNA have
now been duplicated. In reality, we don’t exactly get two copies of
every strand, but as long as we got more than one copy it is good.

We then heat up the tube to 95°C again and restart the process. This
cycle is repeated dozens of times. Even if we only gained 30% material
per copy, we will now have ‘amplified’ the relevant bit of DNA by three
orders of magnitude.

Ok, then what?

If we have done our work right, the test tube is now teeming with
copies of the bit of DNA we are interested in. But how could we tell?
Various mechanisms are used. Let’s say the (viral) DNA we care about
was not present, in that case all these temperature cycles have
achieved almost nothing - the primer material didn’t latch on to
anything, the polymerase had nothing to do.

This means that we could simply measure how much DNA there now is in
the tube, and if this is significantly more than there was before the
copying cycles, apparently we scored a hit.

Such detection is possible with “[18]DNA staining“ which uses molecules
that become fluorescent once they are attached to (any) double stranded
DNA. By observing (with a suitable camera) if the amount of light
emitted increases over the cycles, this can tell us if the PCR is
actually multiplying DNA.

More precision

If we look at the [19]US CDC PCR information for 2019-nCoV, we find
that it lists primer information, but also a “probe”:
[ncov-primers.png]

We’ll get to why there are two primers later, but the third line is the
interesting one. A probe is again a bit of single-stranded DNA, like
the primers, but it comes with a light generating molecule attached. If
it manages to bind to a piece of (complementary) DNA, it becomes
fluorescent.

If we again, like with the DNA staining, measure how the amount of
light increases during PCR, we get a very precise confirmation that the
PCR process is 1) amplifying something and 2) it is actually the DNA we
were expecting.

(Note that probes are actually slightly more complicated than this -
they bind to DNA, but during polymerasing get dislodged & then split
up. It is this splitting that causes the fluorescence).

But why are there two primers?

As noted, DNA has two strands, and it can only be copied in one
direction. Here is some actual 2019-nCoV DNA:
------------------------------------------------------------------------>
A: GACCCCAAAATCAGCGAAATGCACCCCGCATTACGTTTGGTGGACCCTCAGATTCAACTGGCAGTAACCAGA
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
B: CTGGGGTTTTAGTCGCTTTACGTGGGGCGTAATGCAAACCACCTGGGAGTCTAAGTTGACCGTCATTGGTCT
<------------------------------------------------------------------------

If we heat this bit of DNA up so it denatures, we end up with two
single strands:
------------------------------------------------------------------------>
A: GACCCCAAAATCAGCGAAATGCACCCCGCATTACGTTTGGTGGACCCTCAGATTCAACTGGCAGTAACCAGA
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||

||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
B: CTGGGGTTTTAGTCGCTTTACGTGGGGCGTAATGCAAACCACCTGGGAGTCTAAGTTGACCGTCATTGGTCT
<------------------------------------------------------------------------

The first primer for 2019-nCoV is: GACCCCAAAATCAGCGAAAT, and we can
indeed see that this stretch of the 2019-nCoV genome starts with that
string. This means it would bind here, and initiate the polymerase
reaction.
------------------------------------------------------------------------>
GACCCCAAAATCAGCGAAAT
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
B: CTGGGGTTTTAGTCGCTTTACGTGGGGCGTAATGCAAACCACCTGGGAGTCTAAGTTGACCGTCATTGGTCT

We would then have taken our original bit of DNA, denatured it into two
single stranded stretches, and copied one of these into a
double-stranded whole again. This is not amplification, we started with
one double-stranded bit of DNA, and we also ended up with one!

So to actually make things work, we need a second primer for the other
single-stranded stretch we produced.

The second 2019-nCoV primer is: TCTGGTTACTGCCAGTTGAATCTG, and lo, it
matches the other single strand (once we “reverse complement” it):
A: GACCCCAAAATCAGCGAAATGCACCCCGCATTACGTTTGGTGGACCCTCAGATTCAACTGGCAGTAACCAGA
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
GTCTAAGTTGACCGTCATTGGTCT
<------------------------------------------------------------------------

With this primer, polymerase can create the second double-stranded copy
of DNA.

As a bonus, in the CDC page we found the DNA for the ‘probe’, it is
ACCCCGCATTACGTTTGGTGGACC, which we indeed find in the middle:
ACCCCGCATTACGTTTGGTGGACC
------------------------------------------------------------------------>
A: GACCCCAAAATCAGCGAAATGCACCCCGCATTACGTTTGGTGGACCCTCAGATTCAACTGGCAGTAACCAGA
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
B: TCTGGTTACTGCCAGTTGAATCTGAGGGTCCACCAAACGTAATGCGGGGTGCATTTCGCTGATTTTGGGGTC
<------------------------------------------------------------------------

One final twist

At the very beginning of this article we noted the impressive name of
this technique: [20]Reverse Transcription Real-Time Quantitative
Polymerase Chain Reaction or qRT-PCR

It turns out that 2019-nCoV is actually an RNA virus. DNA and RNA both
carry genetic material. The techniques described above only work on DNA
and not on RNA. Luckily nature has supplied us with an enzyme called
‘reverse transcriptase’. Once added to RNA, this produces the DNA
variant of the same genetic material. This is the ‘Reverse
transcription’ part of the name.

That doesn’t sound so hard!

Well.. yes and no. We glossed over many important details. For example,
how do you actually gain access to the DNA? This requires mechanical
preprocessing. In addition, once the DNA has been released, we need to
make sure that we extract it, and only it, and add it to the PCR vial.
You can’t just put some snot in there and expect it to work (although
it might, which might not be what you want).

In addition, for this to be useful and reliable, great care must be
taken not to (cross-)contaminate samples. DNA is everywhere and can
easily end up in places where it should not be. From my own DNA
research, I fondly recall my system detecting ‘human or monkey DNA’ in
every sample we tried, even though these were supposed to be bacterial
samples.

We also sort of glossed over how we pick primers to detect specific
disease or organisms. It turns out that primer design is also somewhat
of an art, and just picking some DNA will not end well.

So in short, although in this page I may have explained the basics of
qRT-PCR, realize that people take multi-year courses to learn how to do
this well.

I do hope that you found this entertaining, and as ever, feedback is
very welcome on [email protected] or [21]@PowerDNS_Bert.

[22]Previous Previous post: Amateur SARS/2019-NCoV RNA Comparison

* [23]Open Github account in new tab
* [24]Open Twitter account in new tab
* [25]Contact via Email
* [26]Open Linkedin account in new tab

© 2014-2020 bert hubert

References

Visible links
1. https://berthub.eu/articles/posts/dna-grep-2019-ncov/#main-menu
2. https://berthub.eu/articles/
3. https://berthub.eu/
4. https://berthub.eu/articles/posts/dna-grep-2019-ncov/#content
5. https://en.wikipedia.org/wiki/Reverse_transcription_polymerase_chain_reaction
6. https://twitter.com/PowerDNS_Bert
7. https://en.wikipedia.org/wiki/Phi_X_174
8. https://ds9a.nl/amazing-dna
9. https://berthub.eu/articles/posts/dna-the-code-of-life/
10. https://berthub.eu/articles/posts/what-is-life/
11. https://en.wikipedia.org/wiki/Primer_(molecular_biology)
12. https://www.eurofinsgenomics.eu/en/dna-rna-oligonucleotides/optimised-application-oligos/q-pcr-primer/
13. https://www.sigmaaldrich.com/catalog/product/roche/dntpmro
14. https://en.wikipedia.org/wiki/Taq_polymerase
15. https://www.sigmaaldrich.com/catalog/product/roche/aptatro
16. https://twitter.com/samstudio8
17. https://en.wikipedia.org/wiki/Thermus_aquaticus
18. https://www.geneon.net/products/fluorescent-dyes/evagreen-fluorescent-dna-stain-50x/
19. https://www.cdc.gov/coronavirus/2019-ncov/lab/rt-pcr-panel-primer-probes.html
20. https://en.wikipedia.org/wiki/Reverse_transcription_polymerase_chain_reaction
21. https://twitter.com/PowerDNS_Bert
22. https://berthub.eu/articles/posts/sars-ncov-comparison/
23. https://github.com/ahuPowerDNS
24. https://twitter.com/@PowerDNS_Bert
25. mailto:[email protected]
26. https://linkedin.com/in/bert-hubert-b05452

Hidden links:
28. https://berthub.eu/articles/
29. https://github.com/MunifTanjim/minimo