Published in The British Medical Journal - 4th May 2021


In clinical trials (CT), a placebo is commonly used as a control therapy to evaluate the clinical effectiveness of the treatments tested.1 Placebo has been defined as ‘an inert substance or sham procedure that is provided to research participants with the aim of making it impossible for them, and usually the researchers themselves, to know who is receiving an active or inactive intervention.’2 Placebo interventions are methodological tools used to treat participants in the study arm and the control arm in exactly the same way, except that the study group receives an active substance and the control group does not.

In Europe, its use in pharmacological CT has been regulated by CT Regulation No. 536/2014. According to this regulation, placebo must be treated as an investigatory medical product and as such it has to meet certain standards in order to ensure quality, guarantee patient safety and the reliability of the study results.3

The regulatory aspects of trials involving manual therapies (MT) are very different. Although such studies might be influenced by the type of placebo provided, no clear guidelines or regulations have been developed to ensure the credibility of trial results and patient safety.

MT is a clinical approach used by different physical therapists and involves hands-on techniques to manipulate, mobilise and massage the body tissues. This type of therapy can help relieve pain and stiffness, promote relaxation of soft-tissues, enhance blood supply to tissues and increase mobility of joint structures.4

In MT trials, placebo treatment is often provided in different modalities from trial to trial although the manual techniques or treatments tested are the same. A true placebo does not exist for MT and testing the effectiveness of MT requires a sham intervention. For instance, sham treatment (ST) is commonly administrated as a light touch in the site of pain or as an active treatment in a different site,5 with no clear criterion. Such light touch might in fact have a health effect and there is no evidence as to its ineffectiveness. Touch itself could have a positive outcome on health6 and active treatments could have an analgesic reflex on pain even if administered elsewhere in the body.7

Placebo effect, also called placebo response, is the reported improvement in symptoms among patients that occurs as a result of the placebo administration. Since a placebo has no inherent therapeutic power, it cannot cure the disease but it may contribute to the relief of patients’ symptoms such as pain.8 Additionally, placebo might be related to an adverse effect (AE) called nocebo. It has been estimated that up to 26% of patients in randomised controlled trials (RCTs) discontinue placebo due to AEs.9

It is thought that these psychobiological phenomena may be related to the overall therapeutic context, such as treatment environment, individual patient and clinician factors (eg, beliefs, desire for symptom changes), as well as the patient’s expectations of improvement and prior experiences of the treatment.10–13

In pharmacological trials, this overall therapeutic context and its influence on placebo response has been widely studied.11 Less evidence is present for MT trials, where the tactile interaction could be considered as an important characteristic of this therapeutic context.14 15 Pharmacological trials avoid the influence of clinicians’ beliefs by using a placebo that ensures both patient and clinician blinding to treatment allocation, but, in MT trials, the blinding of clinicians is impossible to achieve. The best alternative in this type of trial is the use of an ST that mimics the active treatment and aims at blinding of participants.

Another important factor that has to be taken into account is that RCTs involving MT usually use patient-reported outcomes (PROs)—such as pain—as primary outcomes. Studies suggested that physical placebo treatments might have a greater effect on these types of outcome compared with pharmacological placebo and that this effect might be a consequence of physical contact.1 16 17

Moreover, especially when subjective PROs outcomes are used, the absence of clinician blinding could also increase the possibility of performance bias.14

Therefore, a better understanding of sham procedures in manual treatment would be fundamental to define the real difference in efficacy between manual and ST, with a better knowledge of the effect of manual contact on PROs such as pain relief and drop-outs.

The role of placebo—referred to as sham therapy in this review—in MT trials is still very confused and the lack of guidelines allows huge discrepancies in its use in RCTs. Additionally, the reliability of sham procedures in MT trials has been rarely evaluated.

A clear definition of placebo effect could improve trial design, implementing studies with a proper power and sample size, defining clinical relevance of MT and giving more reliability to study results.

The aim of this systematic review with pairwise meta-analyses is to evaluate the use of ST in MT trials in order to analyse the effects, possible harm and the reliability of different kinds of sham procedures provided in RCTs involving MT. A systematic review could help to define ST standards to be applied in CT in order to guarantee methodological quality and patient safety.


To assess the benefits, potential harm and reliability of ST in MT RCTs in the treatment of back pain—both cervical and lumbar—in order to provide methodological guidance for CT development.


This systematic review and meta-analysis was performed following the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA).18

Criteria for considering studies for this review

Only RCTs were included in this review. Quasi-randomised trials in which allocation was not strictly random (eg, date of birth or toss of a coin) were excluded. No restrictions were applied to language or setting.

Studies were considered eligible if they included adult participants with acute or chronic back pain including coccyx, lumbar, dorsal and cervical. Trials where pain was related to muscular conditions, articular disorders (such as osteoarthritis) or spinal disc herniation were included.

Trials where musculoskeletal diseases were secondary to other pathologies (eg, amyotrophic lateral sclerosis, fibromyalgia, etc) were excluded.

Trials where pain was related to fracture, surgery, dysmenorrhoea, post partum or pregnancy, headache or dizziness were excluded.

This review involved all types of ST that include hand contact provided by all kinds of physical therapists. Studies where ST was provided by machines (such as inactive ultrasound) were excluded. This choice was based on the fact that many MT used detuned ultrasound as control. This type of sham was not considered adequate for MT trials where active treatment is provided by hand contact. Therefore, these studies were excluded.

All trials that involved hand contact ST as light touch or a manual treatment in a different site were included.

ST was compared with other MT provided by any type of healthcare provider such as: physiotherapist, chiropractor, osteopath, massage therapist, kinesiologist and reflexologist.

To assess if touch itself could have a positive health effect, ST was also compared with no treatment. Physiotherapeutic exercises were included in the analysis only if associated with manual treatment.

The use of active cointerventions such as oral Nonsteroidal anti-inflammatory drugs (NSAIDs) or other active treatments was accepted if used in all trial arms. Trials with more than two arms of intervention were included, but only data from interested arms were extracted.


Primary outcomes were pain intensity on a validated scale, success in the blinding of participants and AE. Secondary outcomes were number of drop-outs.

Whenever the meta-analysis could not be performed, a narrative summary of the outcomes has been provided. Outcomes were divided into short (≤2 months), medium (≤4 months) and long term (≥6 months). Data were extracted and analysed based on the time closest to these intervals.

Information sources

Search strategy (online supplemental appendix 1) was adapted to the different databases by an experienced information specialist.

Supplemental material

RCTs were identified in different databases (up to 20 August 2020): MEDLINE, Embase, CINAHL, SPORTDiscus, PEDro, WHO Clinical Trials Registration Platform, Index to Chiropractic Literature, Cochrane central register of controlled trials, Clinical trials registry and metaRegister of Controlled Trials.

Researchers of unpublished trials, but completed and registered, were contacted by CL to obtain data.

The search in PROSPERO, in the Cochrane Library and in PubMed (clinical queries) was performed to evaluate the presence of ongoing or recently completed systematic reviews. Guidelines from different organisations (eg, National Council for Osteopathic Research, etc) were reviewed and references from relevant publication were analysed.

Data collection and analysis

Search results were screened by two independent reviewers who identified all the potentially eligible trials based on title and abstract. Full texts of all the selected articles were screened first for inclusion. If full text was not available, or the trial was completed but not published, CL contacted the authors in order to obtain the information needed or used the document delivery service of the 3Bi Biella library.

Uncertainty about the inclusion of a study was discussed by the two reviewers. If no agreement was reached by the two reviewers a third reviewer (AM) was asked for their opinion.

The selection process was recorded and reported through a PRISMA flow diagram.

Data extraction and management

Data extraction was performed by two reviewers with a tested predefined form. Data extracted were related to settings, type of study, participants characteristics (such as localisation and duration of pain, pain score at baseline, previous similar treatment), interventions, outcomes used in the meta-analysis and other relevant data such as difference in ST and active treatment or funding (online supplemental appendix 2).

Risk of bias in individual studies

Bias risk was assessed by CL and agreed by MG using the Cochrane Risk of bias (CRB) tool.19 This tool was used to assess selection bias, performance bias, attrition bias, reporting bias and other biases.

Each possible risk was evaluated as ‘high’, ‘medium’ or ‘low’ by CL and a revision of the judgements was performed by MG. RevMan V.5.3.5 was used for the graphic representation of each risk. The CRB tool results were then converted to Agency for Healthcare Research and Quality (AHRQ) Standards to assess the quality of the study (good, fair and poor). Trials were judged as good quality when bias risk was judged as low, studies with fair quality were trials where at least one criterion was high risk, while poor-quality studies were trials with two or more criteria with high or unclear risk.

Assessment of reporting biases

Funnel plots were created to explore reporting bias, whenever more than 10 studies were included in the meta-analysis. Furthermore, for each study, an analysis of possible conflicts of interest and funding sources was performed.

Summary measures

Dichotomous outcomes, such as AE (occurred or not), were analysed using risk ratio (RR) with 95% CIs.

Continuous outcomes, such as back pain on Visual Analogue Scale (VAS), were evaluated using mean difference (MD) between ST and the MT/no treatment group with 95% CI and the SD.

The minimal clinically important difference (MCID) between pretreatment and post-treatment was taken as 30 mm changes in 100 mm pain score.20–22 These values were used for the interpretation of the clinical significance of the findings.

Success of blinding was reported with a percentage of patients guessing correctly the treatment allocation.

In this review, the unit of analysis was the participant.

Assessment of heterogeneity

The presence of heterogeneity was assessed with a visual inspection of the forest plots and through an inconsistency level test (I2).

Cochrane Handbook was used for threshold interpretation: heterogeneity was considered as unimportant for values of I2 between 0% and 40%, as moderate for values between 30% and 60%, as substantial for values between 50% and 90% and considerable for values between 75% and 100%.23

Synthesis of results

Meta-analysis of pain score, AE and drop-out rates were performed using RevMan V.5.3.5 whenever possible. The meta-analyses compared all kinds of ST with all types of MT and to no treatment. Random effect model was used when a substantial inconsistency was present (I2=50%–90%).20 When considerable heterogeneity was present (I2 >75%) and could not be explained by clinical or methodological diversity, the results have been presented narratively.

The statistical significance of measured effects was determined evaluating the p value and 95% CI.

Additional analyses

Different subgroup analyses were planned in the protocol such as on ST type provided (applied locally or in different sites from pain), type of manual technique tested (single or multiple techniques) and localisation of back pain. However, due to the small number of studies included in this review, only a few subgroup analyses were conducted on follow-up periods.

Sensitivity analysis was conducted for the primary outcomes to assess the effects of skewed and imputed data on the effect measure. These analyses are reported as online supplemental appendices.

Summarising results and assessing the quality of the evidence

The quality of evidence for each outcome was evaluated with the Grading of Recommendations Assessment, Development and Evaluation (GRADE) approach by two independent authors and any disagreement was discussed. The quality for each effect measure was judged as high, moderate, low or very low.19 The GRADE approach was used to assess the quality of the key outcomes. The software GRADEpro ( was used to import data from RevMan V.5.3.5 and to create ‘summary of findings tables’.

The following outcomes were chosen to be presented: pain scores at short term, AE and drop-outs.

Patient and public involvement

There was no involvement of patients or public during the outline of this project. The differences noted between therapies tested on primary pain outcome were those clinically meaningful to patients.


Included studies

Table 1 shows a summary of the main characteristics of included studies. 24 studies were included in this review (figure 1), one study had a 2×2 factorial design,24 eight studies had multiple arms.25–32 Most of the studies were conducted in physical therapy clinics, in 13 different countries. Three trials did not report in which clinical setting they were conducted.29 33 34

Figure 1Figure 1
Figure 1

PRISMA flow diagram. PRISMA, Preferred Reporting Items for Systematic Reviews and Meta-Analyses.

Table 1

Summary of main characteristics of included studies

Eight trials were conducted in Europe,27 28 30 35–39 five in the USA,24 25 31 40 41 three studies in Brazil,42–44 one in the UK,26 Egypt,32 Japan45 and Australia.46

No ongoing or unpublished trials were found.


The included trials randomised a total of 2019 participants, the majority of studies (N=18) were small with a median of 50 participants and a range from 15 to 455.

Most trials included middle aged patients (mean 39.9 range from 18 to 73) with a mean BMI of 21.7 kg/m2.

The majority of studies included both genders, with a percentage of male that ranged from 19% to 80%. Two trials included only male,38 44 one study included only female participants.42

Sixteen trials enrolled participants with low back pain (LBP), eight included participants with cervical pain.26 33 35–37 40–42

The majority of trials (N=18) included participants with unspecified cause of back pain. Disk herniation was considered in three trials.27 30 44

Duration of symptoms were unassessed in eight trials, nine studies included participants with chronic pain, some included participants with both acute and chronic pain.

Participants with experience of the tested treatment were included in eight trials24 29 31 32 35 37 42 43 and excluded in four.26 36 39 41 The remaining studies did not provide this information.


Interventions deferred for number of sessions and number of techniques applied. Eleven trials used a single therapy session with a single technique performed in eight of those trials. Trials with different therapy sessions ranged from 525 26 30 to 2027 sessions once a week.

Sham treatment

ST was provided by a hand contact on the area of pain in 19 studies, and five studies provided ST in a different area from where the pain was located.27 35 43 45 46

In trials providing spinal manipulation (SM), as inactive treatment the majority of authors used the similar placement of hands on participants without any force applied.40–42 44 Two trials used an ST with similar forces applied in different directions.25 32 One trial did not specify the inactive manipulation applied.29

In trials that provided multiple techniques in the same treatment session (such as osteopathic treatment, spinal mobilisation and physiotherapy) the ST was administrated with different techniques that mimed active treatments using light touch or light tractions.

Only one trial compared one single sham technique with both single active technique and multiple treatment techniques. In this case only data of the first arm were extracted.37

Manual and controls treatments

Different manual treatments were provided:

  • SM/chiropractic (7 studies, 567 participants).

  • Osteopathy (5 trials, 645 participants).

  • Kinesiology (1 trial, 58 participants).

  • Articular mobilisations (6 trials, 445 participants).

  • Muscular release (5 trials, 304 participants).

Four trials with multiple arms compared ST to no intervention (379 participants)25–27 31 and one to muscle relaxant group (156 participants).29

The manual treatment was generally applied in the area of pain, some trials used techniques additionally in other areas. Just one trial using reflexology provided both MT and sham in a different zone.39

Characteristics of the practitioner who administrated treatments were provided by 16 trials. Trials involved physiotherapists (N=8), physical therapists (N=4), osteopaths (N=3) and osteopathic students (N=1). Only seven studies provided information on years of practice experience of the physicians involved ranging from 6 to 17 years.30 33 35–37 40 42 44 The gender of practitioners was indicated in only three trials.26 30 37

Risk of bias in included studies

Figure 2 shows risks of bias.

Figure 2Figure 2
Figure 2

Risk of bias summary. Review authors’ judgements about each risk of bias item for each included study.

Blinding of participants and assessors will be described due to the nature of this review.

According to AHRQ standards of CRB tool,19 the majority of trials were judged as poor quality (N=22). Good quality was conferred on only two studies.36 45

The random sequence and allocation concealment were adequately reported in 71% and 63% of trials, respectively.

The lack of blinding of participants was the most common bias and was judged as high risk in 38% of studies, while 38% were considered as unclear risk.

The reasons for this judgement were mainly related to trials involving SMs. These studies used a technique which can be easily recognised by patients as active treatment for the popping sound emitted by joints. Additionally, these trials involved participants who could have already received this type of treatment, making the masking of technique almost impossible.

Blinding of outcomes was evaluated mainly as unclear risk in 46% of trials. Only two trials reported the strategies adopted to guarantee assessor blinding.28 32

Incomplete outcome data were the least common bias risk with 80% of trials judged as low risk. Reporting bias was evaluated as unclear in 55% of trials where registration number and trial protocol were not reported or found.

Other bias occurred was generally considered as high risk for baseline differences of the population in 30% of trials.


Results show a small, not clinically meaningful effect in favour of MT for short-term pain relief compared with ST. However, the quality of evidence is very low, suggesting that the true effect may be different from the estimated effect. Substantial levels of heterogeneity within the four studies analysed showed no differences between ST and no treatment in pain reduction.

Success of blinding was reported in four trials that compared ST to MT, with a high percentage of correct detection of treatment allocations by participants.

AEs were generally under-reported, with a similar rate of occurrence between sham and MT accompanying low levels of heterogeneity. Only one study reported AE in its no treatment group with no significant difference from ST.

SM techniques were the treatment most evaluated (N=7). These techniques are highly recognisable by patients for a popping sound emitted by the column during their performance.47 The fact that participants enrolled in these trials were eligible despite having already received SM, threatens the validity of blinding. This thought is strengthened by the high percentage of participants who recognised treatment allocation in this kind of trial (from 63.5% to 83.5%).25 29 Additionally, five trials applied ST in a different site compared with pain and active treatment. This might have had important influences on sham therapy reliability and consequently to study results.

Lack of blinding seemed not to be related to drop-outs rate, although both these data were reported only in two trials Bialosky et al and Hoiriis et al showed high percentages of correct treatment allocation detection by participants but drop-out rate between sham and MT group did not differ.25 29 These results seem to be in conflict, nevertheless, participants could have wanted to remain in the trial for several other reasons such as the setting or the attraction of being evaluated by an expert clinician free. This possibility is reinforced by the fact that a similar drop-out rate was reached in the comparison sham versus no treatment. These data suggest that drop-out rate might not be a dependable outcome for assessing reliability of ST.

Another factor that seemed to put blinding validity at risk was the use of a single technique. Single techniques were generally more difficult to mask, negatively affecting the validity of blinding of participants. The majority of trials judged as at high or unclear risk of performance bias used a single technique evaluating its effects on pain soon after its performance, or its effect after different sessions.

When compared with no intervention, ST showed no effect. Only one study of the four included in the meta-analysis showed a statistically significant effect in favour of ST. This study was the only one judged at low risk of performance bias because researches tried to mask ST performing techniques very similar to MT and excluding participants that already received the treatment tested.26 This trial was the one that showed a marked effect on pain (MD −21.7, 95% CI −33.5 to −9.9, 42 participants) (online supplemental appendix 5). Other studies included in this comparison, judged at high risk of performance bias, showed no effect of ST. These results suggest that lack of blinding could have had an impact on this comparison.

This review included generally small trials. Only 14 of 24 studies performed a sample size calculation but just two of these considered MCID in this computation. The MCID is the measure of smallest change of PROs that patients perceive as important, beneficial or harmful. MCID is useful for clinicians to interpret the findings of trials and apply them in clinical practice and to their decision making.48 An adequate sample size calculation, using MCID especially in trials with PROs, is fundamental to assess the number of participants needed to detect clinically relevant treatment effects. Oversized trials, which expose too many people to unnecessary therapies, or underpowered trials, which may not achieve significant results, should be avoided.49–51

Comparison with other studies

Similar findings were found in other reviews conducted on LBP. Ruddock et al included studies where SM was compared with what authors called ‘an effective ST’, namely a credible sham manipulation that physically mimics the SM. Pooled data from four trials showed a very small and not clinically meaningful effect in favour of MT.52

Rubinstein et al
53 compared SM and mobilisation techniques to recommended, non-recommended therapies and to ST. Their findings showed that 5/47 studies included attempted to blind patients to the assigned intervention by providing an ST. Of these five trials, two were judged at unclear risk of participants blinding. The authors also questioned the need for additional studies on this argument, as during the update of their review they found recent small pragmatic studies with high risk of bias. We agree with Rubinstein et al that recent studies included in this review did not show a higher quality of evidence. The development of RCT with similar characteristic will probably not add any proof of evidence on MT and ST effectiveness.53


This review aimed to compare different kinds of sham therapy with different kinds of MT and no intervention. The nature of this comparison needed an NMA, but this analysis could not be performed due to the small number of trials using hand contact ST. The decision to include only this kind of sham therapy was mainly due to the intention of analysing the effect of manual interaction between practitioner and patients, which is suspected of leading to an amplified placebo effect.54 Additionally, the use of machine placebo trials in the same meta-analysis could have increased diversity within included trials due to the possible enhanced presence of biases such as performance and consequently detection ones.

Although the population differed—some trials analysed cervical, others lumbar pain with different aetiologies and different symptoms duration—this factor did not affect the meta-analysis performed, as highlighted by the low heterogeneity found in the primary outcome.

As already suggested by other authors,1 placebo effect might be influenced by chronic pain, nevertheless, in this review, this analysis could not be performed due to the range of pain duration in trials included (from acute to chronic in the same trial).

Data concerning settings and operators were insufficient to evaluate the influence of these two factors on sham therapy response. Experience of practitioners was considered in data extraction but insufficient information was provided by authors to draw any hypothesis.

Another limit was in not considering non-objective outcomes as primary outcome for meta-analysis. Nevertheless, most of the trials included did not evaluate an objective outcome and the few studies which analysed this type of outcome used different kinds of scales not easily comparable in a meta-analysis.

Pairwise comparison on pain outcome between sham and MT showed slightly higher effects of MT in trials where blinding was ensured. A linear regression analysis was planned to assess the impact of blinding on meta-analysis results. Due to the small number of trials, this analysis could not be performed. This trend follows what has been already suggested by other studies.55 However, trials with bigger sample size are needed to assess a real correlation between these two factors.

Another limit of this study is that risk of bias was assessed by one author (CL) and agreed by another (MG). This aspect could have been improved if both authors had worked independently on bias risk assessment and then discussed any discrepancy.

Implications for practitioners

In some clinical contexts, MT could be difficult to apply; for example, some patients may present hyperalgesia to tactile stimuli. Defrin et al suggested that tactile allodynia might be present in 60% of patients with chronic LBP associated with radicular pain.56

In this kind of patient the use of MT could be excessively painful, and any MT that triggers pain should be avoided.57 ST—and therefore a possible placebo effect—could represent a valid alternative to MT in the multidisciplinary approach to back pain, promoting pain relief without increasing the possibility of AE occurrence.

This thought is strengthened by our findings: ST was found to be equally safe to MT without increasing the risk of AE occurrence when compared with no intervention. Furthermore, when blinding was guaranteed, ST showed a statistically significant effect on pain reduction in chronic LBP patients compared with no treatment.

ST could be seen as an ‘affective touch’, which it is suggested creates a pleasant therapeutic experience promoting affiliative behaviours and pain improvement.58 59

Nevertheless, due to the low quality of the studies included in this review, further studies are needed to verify the possible role of ST among patients where MT is not well tolerated.

Implications for research

In MT trials, a true placebo is impossible to achieve so trials should implement strategies to guarantee patient and assessor blinding, for example, avoiding the inclusion of participants who already received the active treatment and avoiding single technique performance which are more difficult to mask. Plans to avoid performance bias, such as giving similar treatment with similar localisation have to be implemented.

Moreover, the evaluation of the success of blinding should be considered as, at least, secondary outcome.

Researchers should pay particular attention to sample size calculation using the MCID. This difference is fundamental both for research and patients. MCID indicates patients’ values and preferences and can help clinicians improve interpretation and promote the understanding of the importance of intervention effects in RCTs.

National Institute for Health and Care Excellence guidelines for LBP suggest the use of MT only as ‘a part of a treatment package including exercise, with or without psychological therapy’.60 Therefore, the development of future CT should imitate the real multidisciplinary clinical context to assess the external validity of future findings.

Future researches should also evaluate the real effects of ST comparing it both with active treatment and with the no intervention groups. Only with this kind of design could the real placebo effect in MT be defined.