Introduction
Introduction Statistics Contact Development Disclaimer Help
View source
# 2024-08-12 - How To Spot The Truth
## 1 INTRODUCTION
'Truth' is under attack, more so now than ever before, and for many
reasons one of which is social media. We hear and read remarkable,
often preposterous claims from many sources. This may be in political
debate, the presentation of new products, or new health-enhancing
exercises ranging from hot water pools to cold water swimming. These
frequently claim to be 'scientific findings' often reporting 'new
studies have shown' stories, underpinned by 'expert'opinion. They are
amplified in the media until the next fad comes along.
This pervasive form of persuasion is a war of beliefs, which in many
cases may contradict accepted knowledge. It is always possible, in
fact likely, that some of the more absurd claims may not involve, or
even be properly aware of, current scientific understanding, in which
case these claims may be logical, but based on incorrect assumptions
or understanding. Flat earthers have a consistent world view, which
is probably logical to them; it just is not compatible with other
known facts. But truth is the first casualty of war, and now more
than ever, we must equip ourselves and others with the skills needed
to judge how valid the information we are presented with is.
This is not as simple as it might appear. The context is
all-important. Interestingly, there are far fewer exact rules, firm
guidelines and exact cut-off levels than people might imagine for
establishing the truth. Most scientific knowledge is rarely expressed
in terms of utter validity, but rather expressed as 'fits' or 'is not
inconsistent with' what we know already, or 'suitable for predicting
performance'. For example, we now know that gravity can be bent; but
Newton's simple straight-line approximation has taken astronauts to
the moon and back (sorry, flat earthers). In addition, although
statisticians use words consistently and exactly, they do not use
words such as 'population' and 'sample' in the way they are used in
general parlance. Nor is the logic of statistics straightforward. For
example, the most commonly used tests of likelihood assume 'if, and
only if, these random samples were drawn from a single population,
then…' Logical and consistent, yes, but not well understood, even by
some scientists. For example, in one study, trainee doctors, who
should be reading this sort of stuff all the time, were given a
simple statement using this test. When asked to choose the correct
conclusion out of four possibilities, almost half made a wrong choice
(Windish et al., 2007).
https://jamanetwork.com/journals/jama/fullarticle/208638
## 2 WHY IS GETTING AS CLOSE AS POSSIBLE TO THE TRUTH IMPORTANT?
The truth helps you make 'adequately correct' decisions and act
accordingly. Such decisions depend on the situation, and the risks of
making a correct or incorrect decision. Uncertainty doesn't mean we
know nothing, or that anything could be true: it just means you don't
bet your house on an outsider.
Some years ago, a district court decided that a particular vaccine
was responsible for an adverse outcome (which was scientifically
doubtful). This triggered a disastrous decrease in child vaccinations
for a whole range of diseases. It also showed convincingly that the
transmission of the faulty conclusion was related to internet
broadband access: more broadband, greater decrease in vaccinations
(Carrieri et al., 2019).
https://onlinelibrary.wiley.com/doi/10.1002/hec.3937
In another case, however, a US court rejected a manufacturer's
defence that there were insufficient data to meet the usual
scientific criteria to demonstrate a causal link between a drug and a
serious, but rare, adverse event; and this is why the drug was
marketed without a warning. The court was unwilling to accept this
statistical threshold, preferring to heed the reports of infrequent,
but important, adverse events after the use of the drug, and thus
awarded damages (Matrixx initiatives, Inc. et al. vs Siracusano et
al., 2011).
https://supreme.justia.com/cases/federal/us/563/27/
Here, we shall try to show the reader the processes applied in
scientific evaluation, in the hope that you can apply them in your
day-to-day decision-making. Facts don't speak for themselves--context
is vital. An experienced scientist, who "knows the ropes", is more
likely to use their knowledge, experience and judgement to tease out
the full story. The central question is not 'can we be certain?', but
rather 'can we process this information and adjust our ideas?'
Uncertainty is always present, but we may be able to be 'confidently
uncertain'.
## 3 A CHECKLIST FOR TRUTH
(ELEMENTS OF THE CONTEXT AND QUESTIONS THAT SHOULD BE ASKED OF ANY
CLAIM)
* Who is making the statement, and what is their qualification for
making it?
* What was the original question? Has it been correctly framed?
* What is the underpinning evidence for the statement? What is the
provenance of the supporting data? Where has it been published? Are
there alternative explanations, have these been explored, how
possible are they?
* Has the best measure been used? The best way to express 'typical'
is as the median value, as is done by the Office for National
Statistics. However, many reports use the average, which could be
far from the same thing and make, for example, the 'typical' person
apparently better off (if we put incomes in order of size, from the
least to the greatest, the 'median' is the one closest to the
halfway point in this order. Many more incomes are small, only a
few are whopping, so the median is closer to the bottom. The
'average' or 'mean' is the sum of all the money in the incomes
[lots of paltry ones, some whopping ones] divided by all the
incomes considered in the sample. For example, median UK household
disposable income in the financial year ending 2022 was about £32K,
and the average was £40K.)
* Have basic scientific principles been used: for example, how was
the sample of people that was tested obtained? The concept of a
'random' sample, scientifically, is that it will contain people
from all walks of life, ages, states of health of the target
population: so that the results can be applied to that population.
If we study healthy students, then the answer may only apply to
healthy students.
* Were sufficient people tested to reliably and confidently find an
effect? The most reliable and frequent (but rather clumsy) study
design is a 'randomised controlled trial', often used to test new
drugs against old ones. Such studies often need hundreds of
participants if the drugs aren't that different in effect. Smaller
studies may not reliably find an effect: if they do, by chance,
then this change exaggerates the benefit (this is known as the
'winner's curse' [Sidebotham & Barlow, 2024]--attempts to verify or
replicate this first observed effect often fail!).
https://associationofanaesthetists-publications.onlinelibrary.wiley.com/doi/10.…
* It is not easy to prove that something does not exist, and a large
study is needed to reach valid conclusions. This is important if
you are investigating a rare but serious complication or a new
technique. For example, if a new surgical procedure is carried out
20 times without a problem, it is not necessarily safe. If the same
procedure were carried out 100 times, and the death risk were
randomly distributed in the same way as for the first 20, there is
a 95% chance that the number of deaths will be between 0 and 16
(and it is likely that fitter patients were selected first in the
original study--see 'bias' below).
* Was there a 'control group'? If an intervention is being assessed
(e.g., the health benefits of cold-water swimming), then a control
group is needed that will carry out the same activities but without
the hypothesised 'active ingredient' (e.g., cold). The control
group should include all other factors that could be at work, such
as similar locations, similar companions, same food, same exercise,
same bedtime and sleep profile, etc.
* Humans vary a great deal, so experiments comparing human
participants are difficult. This is particularly obvious in
responses to medication, and can lead to unexpectedly different
results. An elegant way of getting around this is to 'cross-over' a
treatment and compare the same individuals, each given both the
'control' and the 'active' treatment. However, without care this
can also lead to complexities. Ideally half the participants should
start with the active treatment, and half with a 'neutral'
(control) treatment, but how can we be sure that the active
treatment has worn off ('washed out') before testing the control
treatment? For example, hormones may have effects that last long
after the actual drug has left the body, and some
psychophysiological changes can be long-lasting. Indeed, some would
argue that, in some studies, with some people, wash out may never
fully occur (Tipton & Mekjavic, 2000).
https://link.springer.com/article/10.1007/s004210000255
* What measurements are made? Are these measurements, like blood
pressure, blood levels of hormones? Or questionnaires? What
questions get asked? It is very easy to ask leading questions,
particularly if the person taking part believes something is doing
them good. A far better (but far less likely) outcome would be
health assessments a year after an intervention! Do the scientists
making the measurements know the treatment, and what do they expect
to find? In one study, when a pain-killer was tested, the testers
(who were kept unaware of the drug being tested) found different
effects if the tester had different expectations of the drug's
effects (Gracely et al., 1985).
https://www.thelancet.com/journals/lancet/article/PIIS0140-6736(85)90984-5/full…
* Are tests being used as 'proxy' or 'surrogate' measurements for
something that is more important but not as easy to measure?
Examples include using exam scores as an index of ability, or body
mass index (BMI) for health assessments. How reliable, and exact,
are such surrogate assessments?
* Does the proponent have any conflict of interest? Does what they
argue benefit them?
* Is there any 'bias'? Bias can creep in at lots of stages in the
process of getting information and presenting it. Scientific
publications are very varied: papers in highly regarded journals
have met demanding acceptance standards, with stringent peer
assessment, compared with some 'open access' journals, where papers
are also assessed, but the author pays, or 'vanity journals' where
the author only has to pay to get published! However, all journals
are looking to attract readers and citations, and there is nothing
better than controversy to boost readership and citations.
Additionally, presentations at conferences often turn up as
'publications' but have had virtually no peer assessment, and such
conferences can be international, national or local.
* The funding of research affects what gets published. Published
research papers funded by companies and dealing with available
products are more likely to give a "positive" result than studies
independently funded (Bourgeois et al., 2010). Product evaluation
can be designed to be flattering in terms of the variables
assessed, avoiding observing later adverse effects, and selecting
those tested (age, sex, race). It is now necessary to register
clinical studies before they start: but lots of studies funded by
drug companies are not published. Even trivial effects can be
'statistically significant' if the study is large enough.
Regulatory oversight of large scale, urgent studies can be limited
and poor practice can be concealed (Powell-Smith & Goldacre, 2016).
https://www.acpjournals.org/doi/10.7326/0003-4819-153-3-201008030-00006
https://f1000research.com/articles/5-2629/v1
* Survival bias is relevant. Are the data already selected? A
salutary application of the study of survivors was the analysis of
damage found on aircraft returning to base after combat. Clearly, a
returning aircraft could take damage in those areas and still fly
well enough to return safely to base. Thus, it would be best, if
possible, to protect areas that were not seen to be damaged in
these aircraft. Hits in undamaged areas presumably were more
crippling (Mangel & Samaniego, 1984).
https://www.tandfonline.com/doi/abs/10.1080/01621459.1984.10478038
Overall, as a result of failure to meet some of the requirements
listed above, about half of published medical papers are unlikely to
be true (Ioannidis, 2005). In 2023, the number of retractions for
research articles internationally reached a new record of over 10,000
(Noorden, 2023) due to an increase in sham papers and peer-review
fraud. Furthermore, despite a requirement for disclosure, a lot of
government research is never released, or is delayed until interest
in the topic has declined.
https://journals.plos.org/plosmedicine/article?id=10.1371/journal.pmed.0020124
https://www.nature.com/articles/d41586-023-03974-8
A recent study (Briganti et al., 2023) reviewed the papers published
on the health and recovery benefits of cold-water exposure. They
found 931 articles, and then carefully weeded out irrelevant studies.
The authors were left with 24 papers, and in these the risk of bias
was 'high' in 15 and 'gave concern' in four. Thus, only five papers
had a 'low' risk of bias: three of these looked at cold water
immersion after exercise and two at cognitive function. So, a very
small percentage of the studies examined had anything really useful
to say.
https://onlinelibrary.wiley.com/doi/10.1111/apha.14056
## 4 WHAT ABOUT THE 'FINDINGS' YOU ARE PRESENTED WITH?
Watch out for percentages (Bolton, 2023). A simple change is easily
understood as a percentage, but 'scientific' studies involving
comparisons between groups can require more careful consideration.
These comparisons should always trigger the question 'percentage of
what, exactly?' The headline, 'New drug/product/intervention cuts
mortality by 50%' sounds impressive, and attracts attention, but the
reality could be less spectacular. Perhaps using the old drug, the
death rate was 20 per 1000 patients, and when the new drug was first
used, the rate became 10 per 1000 patients: a 50% reduction. But the
absolute risk reduction in death rate was 10 per 1000, or 1%, a less
impressive headline.
https://commonslibrary.parliament.uk/research-briefings/sn04446/
Also, beware of correlations. Just because two things relate to each
other, for example, a diet and a sense of well-being, does not mean
that one causes the other. The world is full of accidental (spurious)
correlations (Van Cauwenberge, 2016). One of our favourites is the
high correlation between the divorce rate in Maine, USA and the per
capita consumption of margarine! Also, ask the question 'how many
false positives and negatives will I get if I use this correlation to
make a decision' (Tipton et al., 2012).
https://www.datasciencecentral.com/spurious-correlations-15-examples/
https://link.springer.com/article/10.1007/s004210000255
For the moment at least, artificial intelligence cannot quantify
uncertainty very well. Generally, AI uses stuff from 'out there' as
if it were true. Thus, a high proportion of garbage in will give you
garbage out (which increases the proportion of garbage that AI uses
next time round)!
We hope that, armed with the above checklist, you can challenge and
interrogate the polarising information, from 'spin' to the outright
falsehoods presented to you on a daily basis. We are at risk of being
overwhelmed by an increasing number of dubious, unregulated and
disparate sources. The next time you hear phrases like 'they say this
is great' or 'this is scientifically proven' start by asking 'who are
they?' and 'which scientists, using which methods?' Be cautious and
questioning; snake oil and its vendors still exist, they come in many
guises.
From: https://physoc.onlinelibrary.wiley.com/doi/10.1113/EP092160
tags: article,science
# Tags
article
science
You are viewing proxied material from tilde.pink. The copyright of proxied material belongs to its original authors. Any comments or complaints in relation to proxied material should be directed to the original authors of the content concerned. Please see the disclaimer for more details.