(C) PLOS One [1]. This unaltered content originally appeared in journals.plosone.org.

(C) PLOS One [1]. This unaltered content originally appeared in journals.plosone.org.
Licensed under Creative Commons Attribution (CC BY) license.
url:https://journals.plos.org/plosone/s/licenses-and-copyright

------------

Combining simple blood tests to identify primary care patients with unexpected weight loss for cancer investigation: Clinical risk score development, internal validation, and net benefit analysis

['Brian D. Nicholson', 'Nuffield Department Of Primary Care Health Sciences', 'University Of Oxford', 'Oxford', 'United Kingdom', 'Paul Aveyard', 'Constantinos Koshiaris', 'Rafael Perera', 'Willie Hamilton', 'Medical School']

Date: 2021-09

Our findings suggest that combinations of simple blood test abnormalities could be used to identify patients with UWL who warrant referral for investigation, while people with combinations of normal results could be exempted from referral.

We used data from 63,973 adults (age: mean 59 years, standard deviation 21 years; 42% male) to predict cancer in patients with UWL recorded in a large representative United Kingdom primary care electronic health record between January 1, 2000 and December 31, 2012. We derived 3 clinical prediction models using logistic regression and backwards stepwise covariate selection: Sm, symptoms-only model; STm, symptoms and tests model; Tm, tests-only model. Fifty imputations replaced missing data. Estimates of discrimination and calibration were derived using 10-fold internal cross-validation. Simple clinical risk scores are presented for models with the greatest clinical utility in decision curve analysis. The STm and Tm showed improved discrimination (area under the curve ≥ 0.91), calibration, and greater clinical utility than the Sm. The Tm was simplest including age-group, sex, albumin, alkaline phosphatase, liver enzymes, C-reactive protein, haemoglobin, platelets, and total white cell count. A Tm score of 5 balanced ruling-in (sensitivity 84.0%, positive likelihood ratio 5.36) and ruling-out (specificity 84.3%, negative likelihood ratio 0.19) further cancer investigation. A Tm score of 1 prioritised ruling-out (sensitivity 97.5%). At this threshold, 35 people presenting with UWL in primary care would be referred for investigation for each person with cancer referred, and 1,730 people would be spared referral for each person with cancer not referred. Study limitations include using a retrospective routinely collected dataset, a reliance on coding to identify UWL, and missing data for some predictors.

Unexpected weight loss (UWL) is a presenting feature of cancer in primary care. Existing research proposes simple combinations of clinical features (risk factors, symptoms, signs, and blood test data) that, when present, warrant cancer investigation. More complex combinations may modify cancer risk to sufficiently rule-out the need for investigation. We aimed to identify which clinical features can be used together to stratify patients with UWL based on their risk of cancer.

Funding: BDN was supported by National Institute for Health Research (NIHR) Doctoral Research Fellowship number (DRF-2015-08-18). FDRH and RP acknowledges part-funding from the NIHR Oxford Medtech and In-Vitro Diagnostics Co-operative (MIC). FDRH, RP, and PA acknowledge part-funding from the NIHR Oxford and Thames Valley Applied Research Collaboration (ARC). FDRH, JO, RP, and PA acknowledge part-funding from the NIHR Oxford Biomedical Research Centre (BRC). PA is an NIHR senior investigator. RP acknowledges part-funding from the National Institute for Health Research (NIHR Programme Grant for Applied Research) and the Oxford Martin School. WH is co- Principal Investigator of the multi-institutional CanTest Research Collaborative funded by a Cancer Research UK Population Research Catalyst award (C8640/A23385). The views expressed are those of the authors and not necessarily those of the NHS, the NIHR, or the Department of Health. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

Copyright: © 2021 Nicholson et al. This is an open access article distributed under the terms of the Creative Commons Attribution License , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

Prediction models have been developed to identify the most helpful combinations of clinical features for use in clinical practice [ 9 , 10 ]. However, these studies were based on small cohorts from secondary care; they recommend conflicting approaches and include some investigations uncommon in primary care. Research using data from primary care is therefore needed to investigate whether the absence of risk factors and co-occurring clinical findings in the context of normal test results could reduce the risk of cancer to sufficiently rule-out patients with UWL from invasive cancer investigation.

As most patients presenting to primary care with UWL will not have cancer, diagnostic strategies that avoid the harms of unnecessary invasive and costly investigation are also required for patients at a low risk of cancer [ 1 ]. Our previous work has shown that the presence of individual co-occurring clinical features increases the likelihood of cancer sufficiently to rule-in cancer investigation [ 5 ]. However, the absence of individual co-occurring clinical features, including pairs of normal inflammatory markers, do not reduce the likelihood of cancer sufficiently enough to rule-out patients from further cancer investigations [ 5 , 6 ]. Primary care clinicians commonly request multiple blood tests when patients present with UWL [ 5 , 7 ]. There is little guidance on how clinicians should interpret these blood tests in combination or which are most relevant for use in clinical practice [ 1 , 6 ]. When baseline investigations are normal, a watchful waiting approach may be preferable to invasive testing [ 8 ].

Unexpected weight loss (UWL) is a presenting feature of cancer for which there remains no consensus on the most appropriate investigation strategy in primary care [ 1 ]. Patients with UWL recorded by their primary care clinician are more likely to be diagnosed with the following cancers within 3 months: pancreatic, cancer of unknown primary, gastro-oesophageal, lymphoma, hepatobiliary, lung, bowel, and renal tract [ 2 ]. This association is greatest in males once aged 60 years or older and in females 80 years or older [ 2 , 3 ]. Current investigation guidelines focus on selecting patients for single-site cancer investigation based on simple combinations of clinical features (individual risk factors, signs, symptoms, and blood test abnormalities) [ 3 – 5 ].

We refitted the models using the missing indicator method to assess our approach to multiple imputation. We refitted the models using continuous blood test results to explore the impact on discrimination and calibration statistics. We used Stata’s mfpmi command to select the most appropriate functional form for each continuous covariate in relation to the outcome [ 29 ].

Finally, to demonstrate how these models could be used in clinical practice, we followed established methods to develop 2 simple clinical risk scores for the STm and Tm [ 28 ]. The risk score associated with each variable was derived by multiplying each coefficient by the same conversion factor and rounding the result to the nearest whole number. We calculated the mean point score for each patient across the imputation datasets and constructed a 2 × 2 table using each total score as the cutoff. We calculated the sensitivity, specificity, positive likelihood ratio (PLR), negative likelihood ratio (NLR), positive predictive value (PPV), and negative predictive values (NPVs) for each score.

We then used decision curve analysis (DCA) to compare the standardised net benefit (SNB) and proportion of investigations avoided by the Sm, STm, and Tm with scenarios where no prediction model was used (i.e., treat everyone or treat nobody) across a range of risk thresholds (threshold probabilities) using the dca command in Stata [ 24 ]. In general, the strategy with the highest net benefit (the highest plotted curve) is considered to have the greatest clinical utility at any given risk threshold [ 25 ]. Net benefit represents the proportion of the studied population with true positive results minus the proportion with false positives multiplied by the odds of cancer at each risk threshold. To ease interpretation, we calculated the SNB to give the proportion of the maximum achievable utility attained by each model (SNB = NB / prevalence of cancer) [ 26 ]. An alternative presentation of DCA is the proportion of patients who would avoid further investigation without missing a cancer diagnosis at each risk threshold [ 27 ].

Ten-fold internal cross-validation was used to assess overall model performance using the mean predicted probability for each patient across all 50 imputation datasets with the cvauroc command in Stata [ 22 ]. Model performance was assessed using discrimination and calibration statistics. Discrimination was quantified using the area under the curve (AUC) and 95% confidence intervals calculated using bootstrap resampling. Calibration plots were generated using Stata’s pmcalplot command to assess how the predicted probabilities derived by each model correspond to the observed proportion of patients diagnosed with cancer [ 23 ].

Three prediction models were derived in the complete dataset: a symptoms-only model (Sm), a symptoms and tests model (STm), and a simple tests-only model (Tm). The mim Stata command was used to select variables for each model in the imputed data using backwards stepwise logistic regression, using a p-value of <0.01 for inclusion. Candidate variables for Sm included age-group, sex, smoking status, and clinical features found to be associated with a cancer diagnosis within 6 months in males and females ( Table 2 ) [ 5 ]. Candidate variables for the STm also included the blood tests most commonly requested by GPs in patients with UWL and tests used in prognostic scores for patients with cancer ( Table 3 ) [ 5 , 19 , 20 ]. For the Tm, candidate variables included age-group, sex, smoking status, and the blood tests, and as we intended to derive a parsimonious model, we chose the quantum over component tests; for example, the total white cell count was included rather than the white cell subtypes ( Table 4 ). The most complex model (STm) had at least 15 events per variable [ 21 ].

Multiple imputation was used to replace missing values for smoking status, alcohol intake, body mass index, and blood tests using the mi suite of commands in Stata [ 16 , 17 ]. Fifty imputed datasets were created. Multinomial logistic regression was used to impute categorical variables, and predictive mean matching (PMM) with 5 donors was used to impute continuous variables [ 18 ]. The imputation model included the outcome, all candidate variables to be included in the final predictions models, and auxiliary variables to increase the likelihood that the missing at random assumption was satisfied. These were a combination of variables found to predict missingness, personal characteristics, comorbidities, risk factors, other markers of inflammation, or full blood count components ( S1 Text ). For the primary analysis, continuous test results were dichotomised as abnormal/normal in each imputed dataset using standard laboratory ranges ( S1 Table ). Rubin’s rules were used to combine results across the imputed datasets [ 16 ].

Sociodemographic features, recorded on or before the index date, were extracted for each patient ( Table 1 ). Preexisting comorbidities were identified using a previously described approach [ 13 ]. Clinical features shown to be associated with cancer when recorded in the 3 months before to 1 month after the UWL date were identified within that time period [ 5 ]. Continuous results of blood tests commonly requested within this time period were also identified using entity codes in CPRD, and outliers and erroneous results were dropped [ 5 ] ( Table 2 ).

The outcome was any cancer diagnosed within 6 months of the index date identified in the CPRD or NCRAS, using an existing library of codes [ 2 ]. Patients were followed up until the date of the first cancer diagnosis or for 6 months, whichever occurred first. Six months was chosen as previous research has shown that this is the period associated with an increased risk of cancer diagnosis following a presentation of UWL to primary care [ 2 ]. Cancers classified as nonmelanoma skin cancer, in situ, benign, ill-defined, or uncertain were excluded.

We selected a cohort of patients with UWL indicated by the presence of a code for UWL previously been shown to be linked to measured weight loss [ 13 , 14 ]. Patients were selected for the derivation cohort if UWL was first coded between January 1, 2000 and December 31, 2012 in the Clinical Practice Research Datalink (CPRD). The CPRD is an anonymised database of primary care records database covering a representative 6.9% of the United Kingdom population [ 15 ]. Patients were included if they were ≥18 years of age, registered with a CPRD general practice, eligible for linkage to NCRAS and Office for National Statistics (ONS) data, and at least 12 months of data before their first UWL code (the “index date”). These UWL Read codes equated to a mean weight loss of ≥5% within a 6-month period in our previous internal validation study of weight-related coding in CPRD [ 13 ]. UWL may be coded following a range of clinical scenarios, including UWL reported as the patient’s presenting complaint, after targeted history taking, following weight measurement as part of the clinical examination, or as part of a routine health check or chronic disease review [ 5 ]. Patients were excluded if they had a prescription of weight-reducing medication (orlistat) or a code for bariatric surgery in the previous 6 months, or if they had been previously diagnosed with cancer.

The protocol was approved by the Independent Scientific Advisory Committee (ISAC) of the MHRA (protocol number 16_164A2A) [ 11 ]. Ethics approval for observational research using the CPRD with approval from ISAC was granted by a National Research Ethics Service committee (Trent Multiresearch Ethics Committee, REC reference number 05/MRE04/87). We followed the TRIPOD ( S1 TRIPOD Checklist ) reporting guidelines [ 12 ]. Stata (version 15) was used for all analyses.

An STm score of 2 is the closest to a PPV to 3%, the threshold chosen by NICE to warrant further investigation: sensitivity 97.9%, specificity 50.9%. At this threshold, 35 people would be referred for each person with cancer referred, and 1,730 patients would be spared investigation for each person with cancer not referred. Per 100,000 patients with UWL, 1,390 people with cancer would be referred, 48,403 patients would be unnecessarily referred, 50,178 correctly spared referral, and 29 people with cancer not referred.

An STm score of 6 would balance ruling-in (PLR 5.09) and ruling-out (NLR 0.16) the need for referral: sensitivity 86.3%, specificity 83.0%. At this threshold, there would be 14 people referred for investigation for each person with cancer referred, and 422 would be spared investigation for each person with cancer not referred. Per 100,000 people with UWL, 1,225 people with cancer would be referred, 16,759 patients would be unnecessarily referred, 81,822 correctly spared referral, and 194 people with cancer not referred.

An STm score of 1 prioritises ruling-out cancer by maximising sensitivity: sensitivity 98.6%, specificity 43.4%. At this threshold, 40 people with UWL would be referred for each person with cancer referred, and 2,139 people with UWL spared referral for each person with cancer not referred. Per 100,000 people with UWL, 1,399 people with cancer would be referred, 55,797 people would be unnecessarily referred, 42,784 correctly spared referral, and 20 people with cancer not referred.

A 52-year-old woman with UWL, no other clinical features, a low albumin, high alkaline phosphatase, and a raised C-reactive protein corresponds to an STm score of 4: sensitivity 93.9%, specificity 68.4%. At this threshold, 23 people would be referred for each person with cancer referred, and 784 people would be spared investigation for each person with cancer not referred. Per 100,000 people with UWL, 1,333 with cancer would be referred, 31,152 would be referred unnecessarily, 67,429 correctly spared referral, and 86 people with cancer would not be referred.

Figs 3 and 4 show diagnostic accuracy statistics corresponding to each possible point score for the STm and the Tm, respectively. S4 and S5 Tables show how these statistics apply to 100,000 patients with UWL for the STm and the Tm risk score thresholds, respectively. Box 1 gives examples of how the STm score might be used in clinical practice, for example, by choosing the optimal threshold to sufficiently rule-out further cancer investigation.

The STm had greatest clinical utility ( Fig 2A ). The STm had higher SNB than the Sm for the risk thresholds of 0.4% to 18%, and the Tm had greater net benefit to the Sm for 0.5% to 15% ( Fig 2A , S3 Table ). At a cancer risk threshold of 1%, these differences translate into a 55% reduction in further cancer testing if using the STm compared to investigating all patients, a 19% reduction compared to using the Sm, or a 2% reduction compared to the Tm ( Fig 2B , S3 Table ).

Calibration plots for the Sm (1a and 1b), the STm (1c and 1d), and the Tm (1e and 1f). Green points are deciles of predicted probability with error bars. The right hand panels (1b, 1d, and 1e) are zoomed in to show in detail the first 0.1% of predicted probability and observed frequency. AUC, area under the curve; CITL, calibration in the large; E:O, ratio of expected (predicted) probability vs observed frequency of the outcome; Sm, symptoms-only model; STm, symptoms and tests model; Tm, tests-only model.

The AUC for both the STm (0.92 (0.91 to 0.93)) and the Tm (0.91 (0.90 to 0.92)) showed discrimination, which was superior to the Sm (0.79 (0.78 to 0.81)) ( Table 5 ). However, the calibration statistics showed that the Sm was better calibrated compared to the STm and Tm. The calibration plots showed that the difference in calibration statistics was mainly due to underprediction in the highest decile of risk for the STm and Tm that was not seen for the Sm ( Fig 1 ). Refitting the STm and Tm with continuous blood test results instead of dichotomised blood test results made negligible difference to model performance ( Table 5 , S2 Fig , S2 Table ).

In the final Sm, 12 of 17 candidate variables were associated with cancer ( Table 2 ), of which concurrent jaundice (adjusted odds ratio 6.62 (95% CI 3.77 to 11.63)) and lymphadenopathy (4.67 (2.08 to 10.47)) were most predictive. Out of 29 candidate variables, 17 were retained in the final STm ( Table 3 ), of which a raised C-reactive protein (4.09 (3.24 to 5.16) and concurrent jaundice (2.33 (1.27 to 4.28)) were most predictive. Out of 12 candidate predictor variables, 9 were retained in the final Tm ( Table 4 ), of which raised C-reactive protein (4.50 (3.60 to 5.64)) and raised liver enzymes (2.02 (1.51 to 2.71)) were most predictive.

A total of 32,723 (51.15%) patients had missing data on smoking status, 36,356 (56.8%) on alcohol consumption, and 7,235 (11.3%) had no body mass index recorded ( Table 1 ). The most commonly missing blood tests were liver enzymes (53,062, 82.9%), C-reactive protein (49,270, 77.0%), and erythrocyte sedimentation rate (41,042, 64.1%). Imputation diagnostics were deemed satisfactory for all imputed variables ( S1 Fig ). The direction of the estimates was the same, and the confidence intervals overlapping when comparing variables included in the final imputed and missing indicator models.

In the derivation cohort of 63,973 adults aged ≥18 years with UWL recorded, 908 (1.4%) were diagnosed with cancer within 6 months of the index date, of whom 902 (99.3%) were aged ≥40 years. Table 1 summarises the baseline characteristics of the study population. Patients with UWL were more commonly females (58.2%), aged ≥60 years (51.8%), and of normal body mass index (52.9%). The most commonly recorded clinical features were abdominal pain (5.9%), back pain (5.1%), noncardiac chest pain (2.9%), and dyspepsia (2.8%) ( Table 1 ). The most commonly recorded tests were haemoglobin (72.1%), platelets (70.7%), and total white cell count (69.9%).

Discussion

Summary of findings Combinations of multiple simple test results were discriminative between patients with and without cancer, were well calibrated at the levels of risk that decisions to investigate are made in primary care, and showed superior clinical utility when compared to symptoms and signs. We present stand-alone risk scores that could be used by GPs to guide test selection and interpretation. The simplest includes age-group, sex, and 7 primary care blood tests (alkaline phosphatase, liver enzymes, albumin, C-reactive protein, haemoglobin, platelets, and total white cell count). They could be used to select patients with UWL who do not warrant further cancer investigation in addition to those that do.

Strengths and limitations Our study design aimed to minimise bias. We excluded patients with objective evidence of intentional weight loss, restricted co-occurring clinical features to the time of the UWL presentation [5], and included only the first UWL code [2,30]. We were reliant on electronic health record (EHR) codes to define UWL as weight is not recorded frequently enough [13]. It is unclear how recording bias relates to coding for UWL, which occurs when GPs preferentially code clinical features that they associate with cancer and can lead to inflated estimates of association for these features [31]. We excluded patients with a past history of cancer and only included cancers coded within 6 months of the UWL date to ensure that we investigated a first diagnosis of a cancer associated with UWL. By utilising multiple imputation to replace missing risk factor and continuous test result data, we could produce precise estimates for combinations of multiple covariates. Previous studies have not done this and have had to focus on single or pairs of blood test abnormalities [3]. We included auxiliary variables to increase the likelihood that missing values could be accurately predicted by the observed data (that they are missing at random) [16]. However, there is no established method to formally evaluate whether this was successful. Imputation allowed us to combine multiple blood test results and to show that once blood tests are modelled with sex and age, there appears to be no need to include additional risk factors and clinical features. We dichotomised each blood test for the primary analysis to derive a simple risk score for use in clinical practice using the upper or lower boundary of the normal reference range. This can have limitations. Firstly, by dichotomising a continuous variable, information is lost by grouping slightly and extremely abnormal results together. Secondly, choosing raised values to define abnormal might be unhelpful for cancer sites associated with low values (and vice versa). Thirdly, we chose the upper limit of the normal range to define abnormal for blood tests where there is no consensus on how to define abnormal. Refitting the models to include continuous linear and fractional polynomial terms made no meaningful difference to model performance. We required a testing strategy appropriate for a composite of all cancer types. The literature reports that the direction of blood test abnormalities are similar for most cancers and that a pro-inflammatory state underpins many cancers and cancer cachexia [19,20,32–39]. While this supports our approach, it remains likely the composite cancer outcome is partly responsible for the underprediction observed at the highest decile of risk. It is unlikely that this would have adverse clinical consequences because GPs’ decision to refer for invasive testing is likely to be triggered at lower thresholds than this. We used 10-fold internal cross-validation to derive estimates of predictive performance [40]. However, internal validation may produce overoptimistic estimates and so external validation remains necessary to assess the generalisability of our findings in settings, populations, and subgroups of interest [41,42]. Primary care data will be identified from alternative clinical systems for the same time period or from the same clinical systems for a later time period to account for variation in UK practice, international settings with alternative approached to weight measurement and weight loss recording and where a different spectrum of patients consults with primary care, and in systems where similar blood tests are used with differing degrees of missingness.

Findings in context One previous study developed a risk score to predict cancer in a cohort of 256 patients referred for the investigation of UWL (AUC 0.90 (95% CI 0.88 to 0.92)) including the following: age ≥80 years, white blood cells, albumin, alkaline phosphatase, and lactate dehydrogenase [10]. The AUC was notably lower when externally validated (0.70 (0.61 to 0.78) in a cohort of 290 consecutive patients referred to hospital with UWL [9]. This study also developed a simpler 3-variable score that included age, alkaline phosphatase, and albumin and gave an AUC of 0.74 (0.66 to 0.81) [9]. The models we developed here produced higher AUCs than these models and, more importantly, include a cohort of all patients presenting to primary care with UWL, not those referred to hospital for further investigation of UWL. Two existing primary care prediction models incorporate multiple symptoms and risk factors to estimate cancer risk over a 2-year period for 1.24 million females (AUC 0.85 (95% CI 0.84 to 0.85)) and 1.26 million males (0.87 (0.88 to 0.89)) aged 25 to 89 years [43,44]. They also demonstrate good calibration at the lower deciles of risk and miscalibration at the highest decile of risk. The relative timing of symptoms was not reported. Blood tests were not included, except haemoglobin results in the 12 months before to 2 months after study entry were used to define anaemia as a baseline risk factor. Consequently, the design and reporting of these models make it impossible to understand the diagnostic value of symptoms and blood tests co-occurring with UWL.

[END]

[1] Url: https://journals.plos.org/plosmedicine/article?id=10.1371/journal.pmed.1003728

(C) Plos One. "Accelerating the publication of peer-reviewed science."
Licensed under Creative Commons Attribution (CC BY 4.0)
URL: https://creativecommons.org/licenses/by/4.0/

via Magical.Fish Gopher News Feeds:
gopher://magical.fish/1/feeds/news/plosone/