STATISTICS AND ME

Malcolm Hooper

September 2011

Statistics and the conclusions drawn from them are nowhere more crucial than in the delivery of medical care. Drawing appropriate conclusions from correctly processed and interpreted data is vital. Where this doesn’t happen, the consequences can be devastating.

Randomised controlled trials (RCTs) are seen as the gold standard – so how is it possible to arrive at a translation into clinical practice that presents a nightmare to some of the sickest people in the country? This is what happened with the PACE Trial, in which it was possible for a participant to deteriorate physically over the course of the trial yet still be reported as having “recovered”.

PACE is the acronym for “Pacing, Activity, and Cognitive behavioural therapy, a randomised Evaluation”; it cost over £5 million and was described as a Government-funded RCT of rehabilitation strategies for patients with Chronic Fatigue Syndrome (CFS)/Myalgic Encephalomyelitis (ME).

Introduction

Imagine the following fictitious scenario: a high impact medical journal publishes the results of a clinical trial described as a large RCT of “Eutensia”, a new drug for refractory hypertension; the results are impressive, with 30% of those given “Eutensia” reported as having normal blood pressure at the end of the trial. However, the trial investigators had redefined “the normal range” of blood pressure such that it was possible for a person receiving “Eutensia” to leave the trial with higher blood pressure than before treatment but still be counted as having blood pressure “within the normal range”.

Statistics and how they are reported are currently a hot topic for people with ME following publication of the report of the PACE trial in The Lancet earlier this year (1) where a similar scenario actually exists.

Prior to publication, the Principal Investigators deviated from the statistical analysis described in the Trial Protocol (2) with the result that a participant could deteriorate on both primary outcome measures following treatment and still fall within the redefined “normal range” (interpreted as “normal” health).

Even worse, an accompanying Comment (3) in The Lancet described around one third of participants as “recovered”, an error that The Lancet’s senior editor acknowledged in writing but which has still not been corrected, so it remains on the record to be cited uncritically.

The SF-36 physical function score of 60 used by the Investigators to define the threshold of the “normal range” specifically for the PACE Trial (discussed below) contradicts how the authors themselves previously defined the markers of recovery in the same disorder using the same measure -- in 2007 they stated: “A patient had to score 80 or higher to be considered as recovered” (4) and in 2009 asserted: “A cut-off of less than or equal to 65 was considered to reflect severe problems with physical functioning” (5). Moreover “recovery” is described in the Protocol as a physical function score of 85 or above.

In a post-publication letter sent by the Investigators to The Lancet, they acknowledge that: “Being within a ‘normal range’ is not necessarily the same as being ‘recovered’”, yet they have failed to correct this widely-reported misperception in the media and the medical press. Indeed one of the PACE Principal Investigators added to it at the press conference convened to launch the paper: ”twice as many people on graded exercise therapy and cognitive behaviour therapy got back to normal”; this was reported verbatim the following day in The Guardian, whose health correspondent stated: “More people recover if they are helped to try to do more than they think they can” (6). To date, no “recovery” data have been published.

What is ME?

The World Health Organisation has classified ME as a neurological disorder since 1969 and codes the term “chronic fatigue syndrome” (CFS) only to ME and explicitly not to other syndromes of chronic fatigue such as those seen in psychiatric disorders (7). Taxonomically and clinically, chronic fatigue and CFS are not the same, as confirmed in 1990 by the American Medical Association (8); by the WHO in 2001, 2004 and 2009 (9) and by the US Centres for Disease Control in 2011 (10).

The International Consensus Criteria for ME (11) produced by 26 world experts from 13 countries point to widespread inflammation and multisystemic neuropathology, of which the cardinal symptom is post-exertional neuroimmune exhaustion: “Myalgic encephalomyelitis (ME), also referred to in the literature as chronic fatigue syndrome, is a complex disease involving profound dysregulation of the central nervous system and immune system, dysfunction of cellular energy metabolism and ion transport and cardiovascular abnormalities. The underlying pathophysiology produces measureable abnormalities in physical and cognitive function and provides a basis for understanding the symptomatology”.

PACE – what’s it all about?

From its inception, PACE was controversial because it was based on the Investigators’ belief that ME/CFS is a psychogenic illness that is reversible with cognitive behavioural therapy (CBT) to “change the behavioural and cognitive factors assumed to be responsible for perpetuation of the participants’ symptoms and disability” (1), together with graded exercise therapy (GET) designed to reverse their “deconditioning”.

The Investigators’ beliefs are undermined by substantial clinical and biomedical evidence, including that of international experts such as Professor Paul Cheney from the US, who has reported that “We see cardiac diastolic dysfunction in almost every case” and that some ME patients’ heart function “is so poor they would fit well into a cardiac ward awaiting transplant”. On graded exercise he is unequivocal: “The whole idea that you can take a disease like this and exercise your way to health is foolishness. It is insane” (12).

The PACE participants’ leaflet stated: “Medical authorities are not certain that CFS is exactly the same illness as ME, but until scientific evidence shows that they are different they have decided to treat CFS and ME as if they are one illness”. However, the Investigators’ standpoint on CFS bears no relationship to the WHO classification nor to what biomedical experts mean by the same term. This has created confusion amongst clinicians and unnecessary suffering for ME patients.

People with ME have long been saying that conflating ME/CFS with psychogenic fatigue is at the root of public and medical misperception and mistreatment.

Despite many submissions of concern, and whilst insisting that they were studying ME, the Investigators used entry criteria for chronic “fatigue” known as the Oxford criteria (13) which have neither an appropriate degree of sensitivity to identify those with ME, nor the specificity to separate them from the wider “fatigued” population.

Writing to The Lancet’s editor-in-chief following publication, the Investigators implicitly acknowledge this: “The PACE trial paper…does not purport to be studying CFS/ME but CFS defined simply as a principal complaint of fatigue that is disabling, having lasted six months, with no alternative medical explanation (Oxford criteria)”.

The Trial Protocol, however, clearly refers to patients with “CFS/ME”.

Despite their letter to The Lancet confirming that they were not studying ME, the Investigators assert that the results of the PACE trial are generalisable to those who meet either the Oxford or alternative criteria for ME “but only if fatigue is their main symptom”. This has been interpreted as meaning that CBT and GET are effective no matter how the disorder is defined, an illogical assertion. There is a direct link between such conceptual confusion and the likelihood of iatrogenic harm.

In professionally analysed surveys conducted by various ME charities, a large proportion reported that CBT and GET were harmful, resulting in substantial deterioration: certainly it has been demonstrated that incremental exercise induces prolonged and accentuated oxidative stress, compounding the existing cellular damage.

THE PACE TRIAL FINDINGS

Scrutiny of the chosen definition of the “normal range” and the entry criteria reveals a manifest contradiction, which the table below illustrates.

It also shows that the benchmarks used differed considerably from those to which the Investigators had committed themselves in the Protocol.

PACE Trial Benchmarks

SF-36 Physical Function sub-scale

lower scores mean poorer physical functioning

Chalder Fatigue Questionnaire

higher scores mean more fatigue

Entry Criteria

60 or below when recruiting began

subsequently raised to 65 “to increase recruitment”

‘Bimodal’ score of at least 6

ð this can translate to a score as low as 12 on ‘Likert’ (scale) rating method

Analysis Conducted:

Threshold of

“The Normal Range”

60 and above

‘Likert’ score of 18 or less

ð this can translate to a score as high as 9 on ‘bimodal’ (binary) rating method

Analysis Proposed (Trial Protocol):

“a positive outcome”

75 and above

(or a 50% improvement over baseline)

‘Bimodal’ score of 3 or less

(or a 50% improvement over baseline)

The illogical situation whereby participants could score worse on completion than on entry but still be deemed to have achieved “the normal range” has arisen because of the numerous changes in the relevant benchmarks.

When the SF-36 physical function entry criteria were amended, a PACE participant could be enrolled with a score of 65, deteriorate to a score of 60, and the interventions still be declared a success, as a score of 60 was counted as being within the recalculated “normal range”.

Equally, a Likert fatigue score of 18 equates to a bimodal score in the range 4 – 9, which would allow a PACE participant to enter the trial with a bimodal score of 6 and exit with a score of 7,8 or 9 (ie. with greater fatigue) yet still fall within the designated normal range.

Meaningful Benchmarks?

The Protocol set specific benchmarks against which findings were to be judged; additionally, in 2006 the Chief Principal Investigator assured the Multicentre Regional Ethics Committee that “a categorical positive outcome” would be an SF-36 score of at least 75, saying that this would “[reassert] a ten-point score gap between entry criterion and positive outcome”, and that it “would bring the PACE trial into line with the FINE trial, an MRC funded trial for CFS/ME and the sister study to PACE” (14).

When in April 2010 the FINE (Fatigue Intervention by Nurses Evaluation) Trial reported, the difference between intervention and comparison groups at the primary outcome point was not statistically significant, so it is notable that when the PACE report was published in February 2011, these same benchmarks of “a positive outcome” had been dropped.

Remarkably, in view of the complexity of much of the analysis presented in The Lancet article, the Investigators offered this explanation: “Changes to the original published protocol were made to improve…interpretability” (15).

What is “Normal”?

“The normal range” and the lay term “normal” are not the same. “The normal range” is a statistical concept, with a technical definition – the range of values encompassed by the mean plus or minus one standard deviation from the mean. For the Investigators to infer that “within the normal range” equates to normal health is misleading, because “normal” in lay terms means high physical function with little or no impairment.

Around 90% of the general population are within the “normal range” according to the benchmark used to gauge PACE participants’ outcomes – ie. 60 and above for SF-36 physical function, with only 10% functioning at a lower level. In stark contrast to the general population, around 70% of PACE participants who underwent CBT/GET failed to reach the Investigators’ redefined “normal range” and remained in the poorest-functioning 10% of the population.It is the remaining 30% statistic that has been repeatedly quoted as evidence that around one third of participants “recovered” with CBT and GET.

However, what the Investigators failed to clarify was that this 30% figure related to participants who received both CBT and Specialist Medical Care (SMC): as 15% of the SMC alone group were in the “normal range”, in reality CBT added 15% to that figure (GET added 13%), so to allow the media to believe the 30% figure relates to effectiveness of CBT/GET is misleading.

Moreover, the Investigators present an uncommonly low threshold of “the normal range” on physical function:

“In another post-hoc analysis, we compared the proportions of participants who had scores of both primary outcomes within the normal range at 52 weeks. This range was defined as less than the mean plus 1 SD scores of adult attendees to UK general practice of 14.2 (+4.6) for fatigue (score of 18 or less) and equal to or above the mean minus 1 SD scores of the UK working age population of 84 (–24) for physical function (score of 60 or more)”.

This is curious because the paper cited in support of this figure reviews normative data from various sources, none of which appears to provide a mean of 84.

Contrary to what is stated in The Lancet, the reference group included elderly people, a fact which the Investigators had no option but acknowledge:

“We did however make a descriptive error in referring to the sample…as a ‘UK working age population’, whereas it should have read ‘English adult population’” (16).

The “English adult population” includes not only the elderly but also sick people: the appropriate comparator should have been data from an age-matched healthy population.

In a radio interview, one of the Investigators stated candidly:
“What this trial isn’t able to answer is how better are these treatments than really not having very much treatment at all” (17).

After screening over 3,000 patients, a trial lasting 9 years and costing £5 million, that is an extraordinary statement.

Evidence of Efficacy? Mean Improvements

The Investigators conclude that CBT and GET “moderately improve outcomes for chronic fatigue syndrome”.

This claim rests on relatively better average outcomes on measures of fatigue and physical function among those who received CBT or GET alongside “Specialist Medical Care” compared with the group who received SMC alone (SMC consisted of advice on balancing activity and rest, and also help with sleep and pain control). A fourth arm of the trial – the Investigators’ own version of “pacing” – did not emerge favourably.

Two primary outcome measures were used to assess the Trial: fatigue was assessed using the Chalder Fatigue Questionnaire (18) and physical function was assessed using the Short-Form (SF-36) physical function subscale (19).

The Investigators determined a “clinically useful difference” for the two primary outcomes to be an improvement of 2 points on the Chalder Fatigue scale (Likert scoring 0 – 33) and 8 points on the SF-36 physical function scale (0 – 100).

CBT and physical function: the CBT group failed to achieve a “clinically useful” mean improvement: the mean difference from SMC was only 7.1 points.

This was mentioned only indirectly by the Investigators: “Mean differences between groups on primary outcomes almost always exceeded predefined clinically useful differences for CBT and GET when compared with APT and SMC” (where the words “almost always” refer to the failure of CBT to achieve a clinically useful difference).

CBT and fatigue: the mean difference from SMC was -3.4, which was a marginal 1.4 points better than the clinically useful threshold of 2.

GET and physical function: for physical function, the mean difference for GET was trivially better than for CBT with a score of 9.4, this being a marginal 1.4 points above the clinically useful difference of 8 points on a scale of 0 – 100.

GET and fatigue: the mean difference was -3.2 (ie. 1.2 points better than the clinically useful threshold of 2 points on a scale of 0 – 33).

These results challenge the Investigators’ assertions that psychological interventions should be the primary management strategy for patients with ME/CFS.

Both primary outcomes – physical function and fatigue -- were self-reported, but studies of graded exercise for ME/CFS patients by other investigators have demonstrated that self-report questionnaires do not relate well to actual activity (20). Indeed, one US study found that when objective actigraphy measures were used, there was a numerical decrease from the pre-treatment baseline (21).

Secondary outcome measures

A secondary outcome measure was the 6 minute walking distance test. The mean distance recorded by those who had undergone CBT was 354 metres. For those who had undergone GET the mean distance was 379 metres, the latter being a 67-metre increase from baseline.

These scores were lower than scores documented in many other serious diseases, such as those awaiting lung transplantation, where a six minute walking test of less than 400 metres is regarded as a marker for placing a patient on the transplant list (22) and the mean score of those in class III heart failure is 402 metres (23). PACE Trial participants did not achieve a mean six minute walking distance of 518 metres, a level considered abnormal for healthy people aged 50-85 years (24).

Moreover, data on the 6 minute walking test was available for only 69% - 76% of participants, a completion figure roughly 20% lower than for the other secondary outcome measures, for which the Investigators offer no explanation.

Significantly, the CBT group managed less of an average increase in walking distance than those in the SMC alone group.

Results on other measures were similarly under-whelming: for example, out of the reports submitted on the participant-rated CGI (clinical global impression) of change in overall health at the end of the trial, 60% of those in the GET group and 58% of those in the CBT group reported negative or minimal change.

The Distortion Continues

At the press conference, both the lay and medical press picked up on the PACE Trial as a resounding success with no caveats whatsoever.

On 18^th February 2011 The Independent proclaimed: “Got ME? Just get out and exercise”; the Daily Mail reported that “scientists have found encouraging people with ME to push themselves to their limits gives the best hope of recovery” and on-line medical sources such as NHS Choices and NHS Evidence exaggerated reports of a successful outcome.

A nightmare for ME patients

Given

(i) the inability of the recruitment criteria to distinguish between ME/CFS and psychogenic fatigue,

(ii) the illogical overlap of the entry criteria with “the normal range”,

(iii) the failure of CBT to achieve a clinically useful difference for one of the primary outcomes and the trivial improvement produced by GET,

(iv) the failure to recognise that an “averaged” improvement often masks very different responses to an intervention, and

(v) the fact that around two thirds of participants who received CBT/GET remained in the lowest functioning 10% of the general population,

the international ME community wonders why the PACE Trial is being hailed as a “gold standard” study which demonstrated the efficacy of CBT and GET for ME/CFS patients (although the Protocol refers to it as an RCT, The Lancet paper at no point describes PACE as a controlled trial, yet it was described in the press release as “the highest grade of clinical evidence” and as “extremely rigorous (and) carefully conducted”).

CBT and GET are being actively and inappropriately applied to people with ME or CFS; the PACE press release states that the results suggest: “everyone with the condition should be offered the treatment” and that every patient “who wishes to be helped” should be willing to take part in such regimes. Non-compliance (for example, if a person has already found that exercise exacerbates their condition) is deemed to demonstrate lack of desire to recover, which in some instances has already led to the withdrawal of state and/or insurance benefits.

The PACE Trial is a travesty of science and a tragedy for patients with ME.

References

1.	Comparison of adaptive pacing therapy, cognitive behaviour therapy, graded exercise therapy, and specialist medical care for chronic fatigue syndrome (PACE): a randomised trial. PD White et al. The Lancet 5^th March 2011:377:823-836; published online 18^th February 2011: DOI:10.1016/S0140-6736(11)60096-2 (FAST TRACKED)
2.	PACE Trial Protocol http://www.biomedcentral.com/1471-2377/7/6
3.	Chronic fatigue syndrome: where to PACE from here? G Bleijenberg and H Knoop. The Lancet: published online February 18, 2011 DOI:10.1016/S0140-6736(11)60172-4
4.	Is Full Recovery Possible after Cognitive Behavioural Therapy for Chronic Fatigue Syndrome? Hans Knoop, Gijs Bleijenberg, Marieke FM Gielssen, Jos ver der Meer, Peter D White. Psychotherapy and Psychosomatics 2007:76:171-176
5.	Fatigue and chronic fatigue syndrome-like complaints in the general population. Marjolein van’t Leven, Gerhard A Zielhuis, Jos van der Meer, Andre L Verbeek, Gijs Bleijenberg. European Journal of Public Health 2009:20:3:251-257
6.	Study finds therapy and exercise best for ME. Sarah Bosely. The Guardian, 18^th February 2011
7.	WHO International Classification of Diseases (ICD-10 G93.3)
8.	Following an erroneous News Release in 1990 about this point, the American Medical Association issued a correction which said: “ A news release in the July 4 packet confused chronic fatigue with chronic fatigue syndrome; the two are not the same. We regret the error and any confusion it may have caused”. JAMA issues correction (referring to the article entitled Chronic fatigue: A prospective clinical and virologic study by Deborah Gold et al: JAMA 1990:264:1:48-53).
9.	On 16th October 2001 the WHO provided written clarification: “I wish to clarify the situation regarding the classification of neurasthenia, fatigue syndrome, post-viral fatigue syndrome and benign myalgic encephalomyelitis. Let me state clearly that the World Health Organisation (WHO) has not changed its position on these disorders since the publication of (ICD-10) in 1992 and versions of it during later years. Post viral fatigue syndrome remains under the diseases of the nervous system as G93.3. Benign myalgic encephalomyelitis is included within this category. Neurasthenia remains under mental and behavioural disorders as F48.0 and fatigue syndrome (note: not The Chronic Fatigue Syndrome) is included within this category. However, post viral fatigue syndrome is explicitly excluded from F48.0” On 23^rd January 2004 the WHO provided further written clarification: “This is to confirm that according to the taxonomic principles governing the Tenth Revision of the World Health Organisation’s International Classification of Diseases and Related Health problems (ICD-10) it is not permitted for the same condition to be classified to more then one rubric as this would mean that the individual categories and subcategories were no longer mutually exclusive” On 30^th January 2009 the WHO re-confirmed the position: “I confirm that the WHO has not changed its position regarding benign myalgic encephalomyelitis. Statements made in the past…regarding coding and classification of the aforementioned condition are still valid. There is no evidence that any change should be made to this in ICD-11”
10.	“CFS is different than fatigue. CFS is a long-lasting debilitating illness with impact similar to heart disease, multiple sclerosis and AIDS”: US Centres for Disease Control; Emergency Preparedness: Consideration in CFS; power point for physicians, 18^th August 2011
11.	Myalgic Encephalomyelitis: International Consensus Criteria. Carruthers BM, van de Sande MI, de Meirleir KL, Klimas NG, Broderick G, Mitchell T, Staines D, Powles ACP, Speight N, Vallings R, Bateman L, Baumgarten-Austrheim B, Bell DS, Carlo-Stella N, Chia J, Darragh A, Jo D, Lewis D, Light AR, Marshall-Gradisbik S, Mena I, Mikovits JA, Miwa J, Murovska M, Pall ML, Stevens S. J. Intern Med. 2011 doi:10.1111/j.1365-2796.2011.02428.x
12.	DVD of “Invest in ME” CPD (Continuing Professional Development-accredited) Conference, 2010 http://www.investinme.org/IiME%20Conference%202011/IiME%20International%20ME%20Conference%202011%20DVD%20Orders.htm
13.	A report - Chronic Fatigue Syndrome: Guidelines for Research. M Sharpe et al. JRSM: 1991: 84:118-121
14.	Letter dated 9^th February 2006 sent by Professor Peter White to Mrs Anne McCullough, Administrator, West Midlands Multi-centre Research Ethics Committee
15.	The PACE trial in chronic fatigue syndrome – Authors’ reply. The Lancet: doi:10.1016/S0140-6736(11)60651-X)
16.	Undated letter from Professor Peter White on behalf of the PACE Trial team to the editor-in-chief of The Lancet
17.	Comparison of treatments for chronic fatigue syndrome – the PACE trial. ABC National Radio: The Health Report. http://tinyurl.com/84a9vf3
18.	Development of a Fatigue Scale. Trudie Chalder, Simon Wessely et al. J Psychosom Res 1993:37:2:147-153
19.	The MOS 36 item short form health survey (SF-36): II. Psychometric and clinical tests of validity in measuring physical and mental health constructs McHorney CA, Ware JE, Raczek AE; Med Care 1993. 31: 247–63
20.	Physical activity in chronic fatigue syndrome: assessment and its role in fatigue. Vercoulen JH et al. J Psychiat Res 1997: 31(6):661-673
21.	Cognitive behaviour therapy in chronic fatigue syndrome: is improvement related to increased physical activity? Friedberg F et al. J Clin Psychol: 2009:65(4):423-442
22.	The six minute walk test: a guide to assessment for lung transplantation. Kadikar A et al; J Heart Lung Transplant 1997:16(3):3130319
23.	Six minute walking test for assessing exercise capacity in chronic heart failure. DP Lipkin et al; BMJ 1986:292:653
24.	Six minute walking distance in healthy elderly subjects. T Troosters et al; Eur Respir J 1999:14:270-274.