ICEMAN for Meta-analyses

1 — Is the analysis of effect modification based on comparison within rather than between trials?

Effect modification suggested by a comparison between studies (subgroups of studies) are usually much less credible than effect modification suggested by a comparison within studies (subgroups of individuals).

An important concern with between-study comparisons is study-level confounding: an association observed between a study-level variable and an outcome may be confounded by other study-level variables.^{1, 2, 3, 4, 5, 6, 7, 8, 9, 10} The power to identify a true within-trial effect modification can be very low and an apparent effect modification might be largely driven by study-level confounding.^{9, 10, 11}

Most common are aggregate-data meta-analyses in which analyses of effect modification are completely based on between-study comparisons, e.g. using meta-regression. Those analyses are at a high risk of study-level confounding and consequently lower credibility.

Sometimes, investigators combine within- and between-trial information using one of the following approaches^{2, 12}: (1) estimate within- and between-trial effect modification separately, then combine both; (2) include a simple interaction term in a one-stage IPD meta-analysis; (3) first combine trials within subgroups, then compare summary effects between subgroups.

An analysis of effect modification is definitely free of study-level confounding if it is completely based on within-trial information, possible if all trials provide (or allow estimation of) within-trial effect modification and, in a separate step, one combines the estimates across trials.^{2, 12, 13} Alternatively, there are more complex methods available for individual-participant data meta-analyses.^{2, 12, 14}

A survey of published IPD meta-analyses suggested that only a small proportion of analyses of effect modification separate within- from between-trial information; instead, most analyses seem to combine within and between trial information.² Therefore, unless there is a statement to the contrary, analyses of effect modification in an IPD meta-analysis likely combine within and between trial information and might not be free of study-level confounding.

Response options	Description
Completely between	Subgroup analysis or meta-regression comparing overall effects of each individual trial (typical for aggregate data meta-analyses)
Mostly between or unclear	Most information from overall effects; some trials providing within-trial subgroup information
Mostly within	Most trials providing within-trial subgroup information; or IPD analysis combining within and between trial information
Completely within	IPD analysis that separates within from between trial information (e.g. meta-analysis of interactions)

Completely between — Example 1: A meta-analysis assessing the effect of inpatient versus usual care found patients undergoing orthopaedic focused rehabilitation had a substantially larger functional benefit than patients undergoing geriatric focused rehabilitation (interaction p = 0.01).¹⁵ The analysis was based on between-study comparison only and therefore at high risk of confounding.

Completely between — Example 2: An IPD meta-analysis based on three RCTs suggested that mobile phone text messages can improve adherence to antiretroviral therapy. Because the type of text message varied only between but not within studies, the significant interaction (p=0.01) reflects a between-study comparison at high risk of study-level confounding — even though individual participant data were used.

Mostly between — Example: A meta-analysis assessing the effect of preoperative chemotherapy for gastroesophageal adenocarcinoma on survival combined individual patient and aggregate data.¹⁶ The analysis suggested a potentially larger treatment effect in tumours of the gastroesophageal junction (interaction p=0.08). The apparent effect modification might be explained by study-level confounders, e.g. risk of bias.

Mostly within — Example: An IPD meta-analysis combined 13 trials comparing radiochemotherapy versus radiotherapy alone in patients with cervical cancer.¹⁷ The authors first pooled subgroup-specific effects of each trial, then applied a chi-square test for trend (p=0.017). This method combines within- and between-trial information and is therefore potentially affected by study-level confounding.²

Completely within — Example: A meta-analysis of individual patient data from 16 trials compared low intensity interventions for depression with usual care.¹⁸ The investigators chose a model that estimated the effect modification within each trial and separated out between-trial comparisons, including a forest plot illustrating the heterogeneity of effect modifications across trials.

2 — For within-trial comparisons, is the effect modification similar from trial to trial?

Credibility of effect modification increases if the effect modification has been replicated across independent studies. Replication provides the strongest protection against random error and decreases the likelihood of confounding.

If the item applies, it is helpful to quantify the magnitude of effect modification for each trial, e.g. by calculating a ratio of risk ratios.¹³

Note that this credibility consideration is different from assessing consistency (or heterogeneity) of treatment effects across studies (e.g. expressed by the I²-measure¹⁹).

Response options	Description
Not applicable	No or only one within-RCT comparison available
Definitely not similar	Effect modification reported for ≥2 trials with clearly different directions
Probably not similar or unclear	Not reported for individual trials, or too imprecise to tell
Mostly similar	Reported for ≥2 trials, mostly similar direction but considerable differences in magnitude
Definitely similar	Reported for ≥2 trials, similar in direction, only some differences in magnitude

Probably not similar — Example: An IPD meta-analysis combined 13 trials comparing radiochemotherapy versus radiotherapy alone in patients with cervical cancer.¹⁷ The authors reported the effect modification only for the combined dataset, not for individual trials. It was therefore not possible to assess consistency across trials.

Mostly similar — Example: A meta-analysis of individual patient data from 16 trials of low intensity interventions for depression.¹⁸ Considering the point estimates within the 16 trials, 12 suggested a direction consistent with the overall finding, 1 suggested no effect modification, and 3 were in the opposite direction but with wide confidence intervals.

Definitely similar — Example: An IPD meta-analysis of fixed-dose aspirin for primary prevention of cardiovascular events found a significant interaction with body weight.²⁰ All six trials showed the same direction (more effective in lighter patients) with ratios of hazard ratios ranging between 0.5 and 0.9.

3 — For between-trial comparisons, is the number of trials large?

For analysis of effect modification based on between-study comparisons, the credibility increases with the number of studies (analogous to number of observations in a regression analysis). A large number of studies also increases the power of the analysis and improves modelling of between-study dispersion in a random effects model.^{21, 22, 23, 24}

Response options	Subgroup analysis	Continuous meta-regression
Very small	1–2 in smallest subgroup	≤5 studies total
Rather small or unclear	3–4 in smallest subgroup	6–10 studies
Rather large	5–9 in smallest subgroup	11–15 studies
Large	≥10 in smallest subgroup	>15 studies

Very small — Example: A meta-analysis comparing transcatheter versus surgical aortic valve replacement found a qualitative interaction (interaction p=0.01 using random effect model). The smallest subgroup included only two studies.²⁵

Rather small — Example: In a meta-analysis investigating the effect of low-intensity pulsed ultrasound on bone healing, the subgroup of 3 studies at low risk of bias suggested no benefit (interaction p<0.001).²⁶

Rather large — Example: In a meta-analysis assessing the effect of inpatient rehabilitation versus usual care, both subgroups included 6 studies per subgroup.¹⁵

Large — Example: A meta-analysis comparing interventions for preventing hospital readmission performed a subgroup analysis by number of activities. The small subgroup included 16 and the larger subgroup 26 studies.²⁷

4 — Was the direction of effect modification correctly hypothesized a priori?

Credibility is higher if investigators correctly anticipated the direction of the effect modification, lower if they failed to anticipate a direction, and lowest if they anticipated the opposite direction.

Correct anticipation of an effect modification implies that investigators had a specific hypothesis in mind, usually based on a biologic or other causal rationale, or sometimes based on prior evidence. An explanation stated a priori is much more credible than a post hoc explanation. If post hoc, investigators had likely considered many possible explanations, thereby creating a multiplicity problem.^{28, 29, 30, 31, 32, 33, 34}

Because meta-analyses are retrospective, investigators may already know the key trials and most promising effect modifiers when they plan the analysis.³ If so, this item loses some of its value if it suggests increased credibility: correct anticipation of direction may essentially be data-driven. The item is more relevant if none of the key trials has tested the effect modifier of interest, and if the analysis is completely based on between-trial comparisons.

Response options	Description
Definitely no	Clearly post-hoc, results inconsistent with hypothesized direction, or biologically very implausible
Probably no or unclear	Vague hypothesis or hypothesized direction unclear
Probably yes	No protocol available but unequivocal statement of a priori hypothesis with correct direction
Definitely yes	Prior protocol available and includes hypothesis with correct specification of direction, e.g. based on biologic rationale

Probably no — Example: An IPD meta-analysis of fixed-dose aspirin for primary prevention of cardiovascular events found a significant interaction with body weight.²⁰ The paper does not clarify whether the effect modification was hypothesized a priori.

Probably yes — Example: An IPD meta-analysis combined three trials comparing high versus low PEEP in ventilated patients. A subgroup analysis suggested that higher pressure was associated with longer survival in patients with ARDS (interaction p=0.02). The authors explicitly stated that they correctly anticipated the effect modification in their protocol which, however, was not published.³⁵

Definitely yes — Example: A meta-analysis comparing transcatheter versus surgical aortic valve replacement. The investigators had anticipated this interaction with correct direction in a published protocol.²⁵

5 — Does a test for interaction suggest that chance is an unlikely explanation?

Credibility is higher if a statistical test for interaction or meta-regression suggests that chance is an unlikely explanation for the apparent effect modification. Credibility is lower if the test is compatible with chance, or if no test is available and impossible to compute.

Warning

Important: Showing that an effect is significant in one subgroup and not in another is of little use: it provides no information whether chance might explain differences in effects across subgroups.^{21, 29, 36, 37, 38}

If no interaction or meta-regression p-value is reported, it can sometimes be calculated based on the reported data. We anchored the response options around typical thresholds for p-values 0.05, 0.01, and 0.005, with a p-value of 0.005 or smaller representing the most credible category.

Response options	Threshold
Chance a very likely explanation	Interaction or meta-regression p-value > 0.05
Chance a likely explanation or unclear	p <= 0.05 and > 0.01, or no test reported and not computable
Chance may not explain	p <= 0.01 and > 0.005
Chance an unlikely explanation	p <= 0.005

6 — Did the authors test only a small number of effect modifiers, or consider multiplicity?

Performing multiple tests is a major concern in the context of effect modification analysis. Because multiple tests increase the risk of a chance finding, credibility is higher if investigators tested only a small number of effect modifiers or statistically considered multiplicity.

Multiplicity issues can arise through multiple candidate effect modifiers, multiple time points, multiple scales, multiple outcomes, or multiple methods for testing the interaction. Assessment of multiplicity depends heavily on reporting, and retrospective statements about the number of pre-specified subgroup analyses are not always reliable.³⁹

A potential limitation in meta-analyses is that investigators might have scanned key trials for promising effect modifiers before planning the meta-analysis. If so, a small number of tested effect modifiers might obscure potential multiplicity issues introduced in earlier selection processes in the individual trials.

Response options	Description
Definitely no	Explicitly exploratory analysis, or large number of analyses (>10), and multiplicity not considered
Probably no or unclear	No mention of number, or 4–10 effect modifiers tested and not considered
Probably yes	No protocol but unequivocal statement of <=3 effect modifiers tested
Definitely yes	Protocol available and <=3 effect modifiers tested, or number considered in analysis

Definitely no — Example: A meta-analysis investigating interventions to reduce early hospital readmissions reported results for 12 effect modifiers.²⁷ The authors correctly highlighted the possibility of chance findings due to multiplicity.

Probably no — Example: In a meta-analysis assessing inpatient rehabilitation versus usual care, all reported meta-regression analyses were pre-specified in an analysis plan. Nevertheless, 9 effect modifiers were tested for 3 outcomes at 2 time points.¹⁵

Probably yes — Example: An IPD meta-analysis assessed the effect of adding whole brain radiation therapy to stereotactic radiosurgery in patients with brain metastases. The report includes an explicit statement that age was one of three pre-planned effect modifiers.⁴⁰

Definitely yes — Example: A meta-analysis comparing the effect of low-intensity pulsed ultrasound versus sham ultrasound on bone healing. The investigators had pre-specified the analysis in the published protocol⁴¹ together with two other subgroup hypotheses. The low number of tested effect modifiers and the pre-specified definition makes multiplicity issues less likely.²⁶

7 — Did the authors use a random effects model?

The credibility of claimed effect modification is higher if investigators used a random effects model within subgroups, allowing true effects to differ among studies within subgroups and allowing generalisation of results beyond the included studies; this is almost always the model that should be used.^{42, 43}

The credibility is lower if investigators used: (a) a common effect (fixed effect, singular) model — implying all studies within subgroups are based on the same population;^{42, 43} or (b) a fixed effects model — implying results will only apply to the studies included in the subgroup but cannot be generalised beyond them.^{42, 43}

Simulation studies have shown that failure to assume random effects increases the risk of false positive claims for both study-level and individual participant-level meta-analysis.^{14, 22, 24} A random effects model strengthens a test of interaction because a significant result is usually harder to achieve.^{3, 6, 22, 42, 44}

If investigators state that they used a mixed effects model without further specification, it usually implies they used a random effects model for between-study differences within subgroups (appropriate) and a fixed effects model for between-subgroup differences (also appropriate^{6, 42, 43}). Therefore, the appropriate answer is usually definitely yes.

The question also applies to individual-participant data meta-analysis, for which an empirical study has shown that most do not apply a random effects model.⁴⁵

Response options	Description
Definitely no	Fixed (or common) effect model explicitly stated
Probably no or unclear	Probably no random effects model, or unclear
Probably yes	Probably random (or mixed) effects model
Definitely yes	Random (or mixed) effects model explicitly stated

Definitely no — Example: An IPD meta-analysis of aspirin for primary prevention of cardiovascular events. The authors explicitly state that they used a fixed effects model.²⁰

Probably no — Example: An IPD meta-analysis combined 13 studies comparing radiochemotherapy versus radiotherapy alone in patients with cervical cancer.¹⁷ The authors did not explicitly report how they modelled between-study differences. Because they used a fixed effect model for the overall analysis, it is most likely that they also used a fixed effect model within subgroups.

Definitely yes — Example: In a meta-analysis assessing the effect of inpatient rehabilitation versus usual care, the authors explicitly specified a random effects model for between-study differences in the methods section.¹⁵

8 — Were arbitrary cut points avoided? (continuous variables only)

Categorising continuous effect modifiers is common² but associated with problems.^{46, 47} In the context of meta-analysis, cut points can cause additional problems: if two studies assessed the same continuous effect modifier but used different cut points, it may be impossible to combine the within-study results in a meaningful way unless individual patient data are available. Credibility is low if investigators selected the best-fitting data-driven cut point.^{46, 48}

Provided individual participant data is available, it is also possible to average functions across several studies and base conclusions on the resulting mean function (i.e. a meta-analysis of interactions^{49, 50}).

See RCT Item 5 for full response option descriptions.

Probably no — Example: A meta-analysis investigating interventions to reduce early hospital readmissions reported a potential effect modification by the number of intervention components.²⁷ The published protocol did not specify cut points and the investigators explicitly highlighted the exploratory character of the analysis.

Probably yes — Example: In a meta-analysis on inpatient rehabilitation versus usual care, the intervention was better in preventing nursing home admissions in patients younger than 80 than in patients older than 80 (p=0.045).¹⁵ According to the authors, the threshold was pre-specified.

Definitely yes — Example: An IPD meta-analysis investigated whether patients with ARDS benefit from higher PEEP ventilation strategies.⁵⁰ A continuous analysis suggested a non-linear effect modification by degree of hypoxaemia. A previous analysis dichotomised the effect modifier and could not reveal the potential non-linear relationship.³⁵

9 — Are there additional considerations that may increase or decrease credibility?

Similar to RCT Item 6, with additional meta-analysis-specific considerations.

Sensitivity analysis suggesting robustness^{51, 52, 53}:

Example: A meta-analysis comparing the effect of low-intensity pulsed ultrasound versus sham on bone healing.²⁶ In a sensitivity analysis requested by the editors, the investigators applied a stricter threshold for missing data (≥10%). Although different criteria led to reclassification of one trial, the effect modification remained significant (p=0.004).

Effect modification supported by external evidence:

Example: A meta-analysis comparing transcatheter versus surgical aortic valve replacement.²⁵ A prior cohort study of 501 patients using propensity score matching had suggested that the transapical approach was associated with more adverse events and higher mortality.⁵⁴

Dose-response effect across levels of the effect modifier:

Example: An IPD meta-analysis combined 13 trials comparing radiochemotherapy versus radiotherapy alone in patients with cervical cancer.¹⁷ A subgroup analysis based on tumour stage suggested that the relative benefit decreased with increasing tumour stage across three stages, suggesting a possible “dose-response” effect (chi-square test for trend, p=0.017).

Risk of bias of the main effects of the individual RCTs or the meta-analysis: A commonly used instrument to formally assess the overall risk of bias is the Cochrane risk of bias tool for individual trials⁵⁵ and the ROBIS tool for systematic reviews.⁵⁶ Note that reporting bias can be introduced if only some studies report an effect modifier but not others.⁵⁷ Also, industry-funded trials are at higher risk of spurious claims of effect modification.^{58, 59, 60}

Example: An IPD meta-analysis combined three trials comparing high versus low PEEP in ventilated patients with lung injury or ARDS.³⁵ A subgroup analysis suggested that higher pressure was associated with longer survival in patients with but not in patients without ARDS (interaction p=0.02). Although the p-value provides only modest support against chance, the high methodological quality of all three trials is reassuring.

Exceptionally high power.^{23, 61}

Persistence after adjustment for other potential effect modifiers⁶²:

Example: An IPD meta-analysis of fixed-dose aspirin for primary prevention of cardiovascular events.²⁰ The effect modification by weight remained when the investigators stratified their analysis by both weight and age.

Consistency across related outcomes:

Example: A meta-analysis comparing transcatheter versus surgical aortic valve replacement.²⁵ The qualitative interaction was consistent across outcomes mortality, stroke, acute kidney injury, and bleeding.

10 — Overall credibility rating

The overall rating is a continuous visual analogue scale spanning four credibility areas, corresponding roughly to <25%, 25–50%, 50–75%, and >75% confidence that the apparent effect modification is true and not the result of chance or bias. The overall rating should be driven by the items that decrease credibility.

Recommended strategy:

All responses definitely or probably reduced credibility or unclear → very low credibility
Two or more responses definitely reduced credibility → maximum usually low credibility
One response definitely reduced credibility → maximum usually moderate credibility
Two responses probably reduced credibility → maximum usually moderate credibility
No responses definitely or probably reduced credibility → high credibility very likely

Credibility	Interpretation	Implication
Very low	Very likely no effect modification	Use overall estimate for each subgroup
Low	Likely no effect modification	Use overall estimate; note remaining uncertainty
Moderate	Likely effect modification	Use separate estimates; note remaining uncertainty
High	Very likely effect modification	Use separate estimates for each subgroup

How to use ICEMAN provides more suggestions for using and presenting ICEMAN in context; Concept and scope of ICEMAN provides a more detailed justification why the scale is continuous and why low credibility suggests likely no effect modification.

Worked example — Meta-analysis (cervical cancer)

An individual patient data meta-analysis of 13 trials compared radiochemotherapy versus radiotherapy alone in women with cervical cancer.¹⁷ The authors report “a suggestion of a difference in the size of the survival benefit with tumour stage.” The credibility assessment suggested low credibility.

Item	Response	Comment
1. Within vs between	Mostly within	All trials provided IPD; authors likely combined within and between; mostly driven by within-study information
2. Similarity across trials	Probably not similar	Effect modification within individual trials not reported
3. Number of studies	Rather large	13 trials; reduces risk of trial-level confounding
4. Direction a priori	Probably no	No information provided
5. Interaction test	Chance likely	p=0.017 for chi-square test of trend
6. Number of modifiers	Probably no	≥8 subgroup analyses; no published protocol; potential multiplicity
7. Random effects model	Probably no	Not explicitly stated; fixed effect used for overall analysis
8. Cut points	Not applicable	Effect modifier is not continuous
9. Additional (optional)	Probably increases	Dose-response pattern across tumour stages; consistent across different outcomes
10. Overall	Low	Consistency across studies unclear; p-value not very small, possibly inflated by multiple analyses and use of fixed effect model

References

1. Sun X, Ioannidis JPA, Agoritsas T, Alba AC, Guyatt G. How to use a subgroup analysis: Users’ guide to the medical literature: Users’ guide to the medical literature [Internet]. JAMA. 2014 ;311(4):405–411.Available from: http://dx.doi.org/10.1001/jama.2013.285063

2. Fisher DJ, Carpenter JR, Morris TP, Freeman SC, Tierney JF. Meta-analytical methods to identify who benefits most from treatments: Daft, deluded, or deft approach? [Internet]. BMJ. 2017 ;356j573.Available from: http://dx.doi.org/10.1136/bmj.j573

3. Thompson SG, Higgins JPT. How should meta-regression analyses be undertaken and interpreted? [Internet]. Stat. Med. 2002 ;21(11):1559–1573.Available from: https://doi.org/10.1002/sim.1187

4. Davey Smith G, Egger M, Phillips AN. Meta-analysis. Beyond the grand mean? [Internet]. BMJ. 1997 ;315(7122):1610–1614.Available from: https://doi.org/10.1136/bmj.315.7122.1610

5. Berlin JA. Invited commentary: Benefits of heterogeneity in meta-analysis of data from epidemiologic studies [Internet]. Am. J. Epidemiol. 1995 ;142(4):383–387.Available from: http://dx.doi.org/10.1093/oxfordjournals.aje.a117645

6. Borenstein M, Hedges L, Higgins JP, Rothstein H. Introduction to meta-analysis. 1 ed Chichester: John Wiley & Sons. 2009 ;

7. Smith GD, Egger M. Going beyond the grand mean: Subgroup analysis in meta-analysis of randomised trials [Internet]. In: Egger M, Davey Smith G, Altman DG, editor(s). Systematic reviews in health care. London, UK: BMJ Publishing Group; 2008. p. 143–156.Available from: https://doi.org/10.1002/9780470693926.ch8

8. Hingorani AD, Windt DA van der, Riley RD, Abrams K, Moons KGM, Steyerberg EW, Schroter S, Sauerbrei W, Altman DG, Hemingway H, PROGRESS Group. Prognosis research strategy (PROGRESS) 4: Stratified medicine research [Internet]. BMJ. 2013 ;346(feb05 1):e5793.Available from: http://dx.doi.org/10.1136/bmj.e5793

9. Simmonds MC, Higgins JPT. Covariate heterogeneity in meta-analysis: Criteria for deciding between meta-regression and individual patient data [Internet]. Stat. Med. 2007 ;26(15):2982–2999.Available from: http://dx.doi.org/10.1002/sim.2768

10. Lambert PC, Sutton AJ, Abrams KR, Jones DR. A comparison of summary patient-level covariates in meta-regression with individual patient data meta-analysis [Internet]. J. Clin. Epidemiol. 2002 Jan. ;55(1):86–94.Available from: http://dx.doi.org/10.1016/s0895-4356(01)00414-0

11. Berlin JA, Santanna J, Schmid CH, Szczech LA, Feldman HI, Anti-Lymphocyte Antibody Induction Therapy Study Group. Individual patient- versus group-level data meta-regressions for the investigation of treatment effect modifiers: Ecological bias rears its ugly head [Internet]. Stat. Med. 2002 ;21(3):371–387.Available from: http://dx.doi.org/10.1002/sim.1023

12. Fisher DJ, Copas AJ, Tierney JF, Parmar MKB. A critical review of methods for the assessment of patient-level interactions in individual participant data meta-analysis of randomized trials, and guidance for practitioners [Internet]. J. Clin. Epidemiol. 2011 Sept. ;64(9):949–967.Available from: http://dx.doi.org/10.1016/j.jclinepi.2010.11.016

13. Song F, Bachmann MO. Cumulative subgroup analysis to reduce waste in clinical research for individualised medicine [Internet]. BMC Med. 2016 ;14(1):197.Available from: http://dx.doi.org/10.1186/s12916-016-0744-x

14. Hua H, Burke DL, Crowther MJ, Ensor J, Tudur Smith C, Riley RD. One-stage individual participant data meta-analysis models: Estimation of treatment-covariate interactions must avoid ecological bias by separating out within-trial and across-trial information: One-stage IPD meta-analysis models must avoid ecological bias [Internet]. Stat. Med. 2017 ;36(5):772–789.Available from: http://dx.doi.org/10.1002/sim.7171

15. Bachmann S, Finger C, Huss A, Egger M, Stuck AE, Clough-Gorr KM. Inpatient rehabilitation specifically designed for geriatric patients: Systematic review and meta-analysis of randomised controlled trials [Internet]. BMJ. 2010 ;340(jun30 2):c3484.Available from: http://dx.doi.org/10.1136/bmj.c3484

16. Ronellenfitsch U, Schwarzbach M, Hofheinz R, Kienle P, Kieser M, Slanger TE, Burmeister B, Kelsen D, Niedzwiecki D, Schuhmacher C, Urba S, Velde C van de, Walsh TN, Ychou M, Jensen K. Preoperative chemo(radio)therapy versus primary surgery for gastroesophageal adenocarcinoma: Systematic review with meta-analysis combining individual patient and aggregate data [Internet]. Eur. J. Cancer. 2013 Oct. ;49(15):3149–3158.Available from: http://dx.doi.org/10.1016/j.ejca.2013.05.029

17. Chemoradiotherapy for Cervical Cancer Meta-Analysis C. Reducing uncertainties about the effects of chemoradiotherapy for cervical cancer: A systematic review and meta-analysis of individual patient data from 18 randomized trials. J. Clin. Oncol. 2008 ;26(35):5802–5812.

18. Bower P, Kontopantelis E, Sutton A, Kendrick T, Richards DA, Gilbody S, Knowles S, Cuijpers P, Andersson G, Christensen H, Meyer B, Huibers M, Smit F, Straten A van, Warmerdam L, Barkham M, Bilich L, Lovell K, Liu ET-H. Influence of initial severity of depression on effectiveness of low intensity interventions: Meta-analysis of individual patient data [Internet]. BMJ. 2013 ;346(feb26 2):f540.Available from: http://dx.doi.org/10.1136/bmj.f540

19. Higgins JPT, Thompson SG, Deeks JJ, Altman DG. Measuring inconsistency in meta-analyses [Internet]. BMJ. 2003 ;327(7414):557–560.Available from: http://dx.doi.org/10.1136/bmj.327.7414.557

20. Rothwell PM, Cook NR, Gaziano JM, Price JF, Belch JFF, Roncaglioni MC, Morimoto T, Mehta Z. Effects of aspirin on risks of vascular events and cancer according to bodyweight and dose: Analysis of individual patient data from randomised trials [Internet]. Lancet. 2018 ;392(10145):387–399.Available from: http://dx.doi.org/10.1016/S0140-6736(18)31133-4

21. Brookes ST, Whitley E, Peters TJ, Mulheran PA, Egger M, Davey Smith G. Subgroup analyses in randomised controlled trials: Quantifying the risks of false-positives and false-negatives [Internet]. Health Technol. Assess. 2001 ;5(33):1–56.Available from: http://dx.doi.org/10.3310/hta5330

22. Higgins JPT, Thompson SG. Controlling the risk of spurious findings from meta-regression [Internet]. Stat. Med. 2004 ;23(11):1663–1682.Available from: http://dx.doi.org/10.1002/sim.1752

23. Alosh M, Huque MF, Bretz F, D’Agostino RB Sr. Tutorial on statistical considerations on subgroup analysis in confirmatory clinical trials [Internet]. Stat. Med. 2017 ;36(8):1334–1360.Available from: http://dx.doi.org/10.1002/sim.7167

24. Rubio-Aparicio M, Sánchez-Meca J, López-López JA, Botella J, Marín-Martínez F. Analysis of categorical moderators in mixed-effects meta-analysis: Consequences of using pooled versus separate estimates of the residual between-studies variances [Internet]. Br. J. Math. Stat. Psychol. 2017 Nov. ;70(3):439–456.Available from: http://dx.doi.org/10.1111/bmsp.12092

25. Siemieniuk RA, Agoritsas T, Manja V, Devji T, Chang Y, Bala MM, Thabane L, Guyatt GH. Transcatheter versus surgical aortic valve replacement in patients with severe aortic stenosis at low and intermediate risk: Systematic review and meta-analysis [Internet]. BMJ. 2016 ;354i5130.Available from: http://dx.doi.org/10.1136/bmj.i5130

26. Schandelmaier S, Kaushal A, Lytvyn L, Heels-Ansdell D, Siemieniuk RAC, Agoritsas T, Guyatt GH, Vandvik PO, Couban R, Mollon B, Busse JW. Low intensity pulsed ultrasound for bone healing: Systematic review of randomized controlled trials [Internet]. BMJ. 2017 ;356j656.Available from: http://dx.doi.org/10.1136/bmj.j656

27. Leppin AL, Gionfriddo MR, Kessler M, Brito JP, Mair FS, Gallacher K, Wang Z, Erwin PJ, Sylvester T, Boehmer K, Ting HH, Murad MH, Shippee ND, Montori VM. Preventing 30-day hospital readmissions: A systematic review and meta-analysis of randomized trials: A systematic review and meta-analysis of randomized trials [Internet]. JAMA Intern. Med. 2014 July ;174(7):1095–1107.Available from: http://dx.doi.org/10.1001/jamainternmed.2014.1608

28. Yusuf S, Wittes J, Probstfield J, Tyroler HA. Analysis and interpretation of treatment effects in subgroups of patients in randomized clinical trials [Internet]. JAMA. 1991 ;266(1):93–98.Available from: http://dx.doi.org/10.1001/jama.1991.03470010097038

29. Oxman AD, Guyatt GH. A consumer’s guide to subgroup analyses [Internet]. Ann. Intern. Med. 1992 ;116(1):78–84.Available from: http://dx.doi.org/10.7326/0003-4819-116-1-78

30. Thompson SG. Why sources of heterogeneity in meta-analysis should be investigated [Internet]. BMJ. 1994 ;309(6965):1351–1355.Available from: http://dx.doi.org/10.1136/bmj.309.6965.1351

31. Fletcher J. Subgroup analyses: How to avoid being misled [Internet]. BMJ. 2007 ;335(7610):96–97.Available from: http://dx.doi.org/10.1136/bmj.39265.596262.AD

32. Dijkman B, Kooistra B, Bhandari M, Evidence-Based Surgery Working G. How to work with a subgroup analysis. Canadian journal of surgery Journal canadien de chirurgie. 2009 ;52(6):515–522.

33. Gagnier JJ, Morgenstern H, Altman DG, Berlin J, Chang S, McCulloch P, Sun X, Moher D, Ann Arbor Clinical Heterogeneity Consensus Group. Consensus-based recommendations for investigating clinical heterogeneity in systematic reviews [Internet]. BMC Med. Res. Methodol. 2013 ;13(1):106.Available from: http://dx.doi.org/10.1186/1471-2288-13-106

34. Varadhan R, Stuart EA, Louis TA, Segal JB, Weiss CO. Review of guidance documents for selected methods in patient centered outcomes research: Standards in addressing heterogeneity of treatment effectiveness in observational and experimental patient centered outcomes research. pcori. org,. 2012 ;

35. Briel M, Meade M, Mercat A, Brower RG, Talmor D, Walter SD, Slutsky AS, Pullenayegum E, Zhou Q, Cook D, Brochard L, Richard J-CM, Lamontagne F, Bhatnagar N, Stewart TE, Guyatt G. Higher vs lower positive end-expiratory pressure in patients with acute lung injury and acute respiratory distress syndrome: Systematic review and meta-analysis: Systematic review and meta-analysis [Internet]. JAMA. 2010 ;303(9):865–873.Available from: http://dx.doi.org/10.1001/jama.2010.218

36. Simon R. Patient subsets and variation in therapeutic efficacy [Internet]. Br. J. Clin. Pharmacol. 1982 Oct. ;14(4):473–482.Available from: http://dx.doi.org/10.1111/j.1365-2125.1982.tb02015.x

37. Tanniou J, Tweel I van der, Teerenstra S, Roes KCB. Estimates of subgroup treatment effects in overall nonsignificant trials: To what extent should we believe in them? [Internet]. Pharm. Stat. 2017 July ;16(4):280–295.Available from: https://doi.org/10.1002/pst.1810

38. Pocock SJ, Assmann SE, Enos LE, Kasten LE. Subgroup analysis, covariate adjustment and baseline comparisons in clinical trial reporting: Current practice and problems [Internet]. Stat. Med. 2002 ;21(19):2917–2930.Available from: http://dx.doi.org/10.1002/sim.1296

39. Kasenda B, Schandelmaier S, Sun X, Elm E von, You J, Blümle A, Tomonaga Y, Saccilotto R, Amstutz A, Bengough T, Meerpohl JJ, Stegert M, Olu KK, Tikkinen KAO, Neumann I, Carrasco-Labra A, Faulhaber M, Mulla SM, Mertz D, Akl EA, Bassler D, Busse JW, Ferreira-González I, Lamontagne F, Nordmann A, Gloy V, Raatz H, Moja L, Rosenthal R, Ebrahim S, Vandvik PO, Johnston BC, Walter MA, Burnand B, Schwenkglenks M, Hemkens LG, Bucher HC, Guyatt GH, Briel M, DISCO Study Group. Subgroup analyses in randomised controlled trials: Cohort study on trial protocols and journal publications [Internet]. BMJ. 2014 ;349(jul16 1):g4539.Available from: http://dx.doi.org/10.1136/bmj.g4539

40. Sahgal A, Aoyama H, Kocher M, Neupane B, Collette S, Tago M, Shaw P, Beyene J, Chang EL. Phase 3 trials of stereotactic radiosurgery with or without whole-brain radiation therapy for 1 to 4 brain metastases: Individual patient data meta-analysis [Internet]. Int. J. Radiat. Oncol. Biol. Phys. 2015 ;91(4):710–717.Available from: http://dx.doi.org/10.1016/j.ijrobp.2014.10.024

41. Schandelmaier S, Busse JW, Lytvyn L, Kaushal A, Agoritsas T, Mollon B. PROSPERO. 2016 ;

42. Borenstein M, Higgins JPT. Meta-analysis and subgroups [Internet]. Prev. Sci. 2013 Apr. ;14(2):134–143.Available from: http://dx.doi.org/10.1007/s11121-013-0377-7

43. Borenstein M. Common mistakes in meta-analysis: And how to avoid them. Biostat, Inc.; 2019.

44. Thompson SG, Sharp SJ. Explaining heterogeneity in meta-analysis: A comparison of methods [Internet]. Stat. Med. 1999 ;18(20):2693–2708.Available from: http://dx.doi.org/10.1002/(sici)1097-0258(19991030)18:20<2693::aid-sim235>3.0.co;2-v

45. Simmonds M, Stewart G, Stewart L. A decade of individual participant data meta-analyses: A review of current practice [Internet]. Contemp. Clin. Trials. 2015 Nov. ;45(Pt A):76–83.Available from: http://dx.doi.org/10.1016/j.cct.2015.06.012

46. Royston P, Altman DG, Sauerbrei W. Dichotomizing continuous predictors in multiple regression: A bad idea [Internet]. Stat. Med. 2006 ;25(1):127–141.Available from: http://dx.doi.org/10.1002/sim.2331

47. Altman DG, Royston P. The cost of dichotomising continuous variables [Internet]. BMJ. 2006 ;332(7549):1080.Available from: http://dx.doi.org/10.1136/bmj.332.7549.1080

48. Altman DG, Lausen B, Sauerbrei W, Schumacher M. Dangers of using “optimal” cutpoints in the evaluation of prognostic factors [Internet]. J. Natl. Cancer Inst. 1994 ;86(11):829–835.Available from: http://dx.doi.org/10.1093/jnci/86.11.829

49. Wang XV, Cole B, Bonetti M, Gelber RD. Meta-STEPP: Subpopulation treatment effect pattern plot for individual patient data meta-analysis [Internet]. Stat. Med. 2016 ;35(21):3704–3716.Available from: http://dx.doi.org/10.1002/sim.6958

50. Kasenda B, Sauerbrei W, Royston P, Mercat A, Slutsky AS, Cook D, Guyatt GH, Brochard L, Richard J-CM, Stewart TE, Meade M, Briel M. Multivariable fractional polynomial interaction to investigate continuous effect modifiers in a meta-analysis on higher versus lower PEEP for patients with ARDS [Internet]. BMJ Open. 2016 ;6(9):e011148.Available from: http://dx.doi.org/10.1136/bmjopen-2016-011148

51. Desai M, Pieper KS, Mahaffey K. Challenges and solutions to pre- and post-randomization subgroup analyses [Internet]. Curr. Cardiol. Rep. 2014 ;16(10):531.Available from: http://dx.doi.org/10.1007/s11886-014-0531-2

52. VanderWeele T. Explanation in causal inference: Methods for mediation and interaction. 1st ed. New York, NY: Oxford University Press; 2015.

53. Pearce N, Greenland S. Confounding and interaction [Internet]. In: Ahrens W, Pigeot I, editor(s). Handbook of epidemiology. New York, NY: Springer New York; 2014. p. 659–684.Available from: https://doi.org/10.1007/978-1-4614-6625-3_10

54. Blackstone EH, Suri RM, Rajeswaran J, Babaliaros V, Douglas PS, Fearon WF, Miller DC, Hahn RT, Kapadia S, Kirtane AJ, Kodali SK, Mack M, Szeto WY, Thourani VH, Tuzcu EM, Williams MR, Akin JJ, Leon MB, Svensson LG. Propensity-matched comparisons of clinical outcomes after transapical or transfemoral transcatheter aortic valve replacement: A placement of aortic transcatheter valves (PARTNER)-I trial substudy: A placement of aortic transcatheter valves (PARTNER)-I trial substudy [Internet]. Circulation. 2015 ;131(22):1989–2000.Available from: http://dx.doi.org/10.1161/CIRCULATIONAHA.114.012525

55. Higgins J, Sterne J, Savović J, Page M, Hrõbjartsson A, Boutron I, Reeves B, Eldridge S. A revised tool for assessing risk of bias in randomized trials. Cochrane Database of Systematic Reviews. 2016 ;1029–31.

56. Whiting P, Savović J, Higgins JPT, Caldwell DM, Reeves BC, Shea B, Davies P, Kleijnen J, Churchill R, ROBIS group. ROBIS: A new tool to assess risk of bias in systematic reviews was developed [Internet]. J. Clin. Epidemiol. 2016 Jan. ;69225–234.Available from: http://dx.doi.org/10.1016/j.jclinepi.2015.06.005

57. Hahn S, Williamson PR, Hutton JL, Garner P, Flynn EV. Assessing the potential for bias in meta-analysis due to selective reporting of subgroup analyses within studies [Internet]. Stat. Med. 2000 ;19(24):3325–3336.Available from: http://dx.doi.org/10.1002/1097-0258(20001230)19:24<3325::aid-sim827>3.0.co;2-d

58. Sun X, Briel M, Busse JW, You JJ, Akl EA, Mejza F, Bala MM, Bassler D, Mertz D, Diaz-Granados N, Vandvik PO, Malaga G, Srinathan SK, Dahm P, Johnston BC, Alonso-Coello P, Hassouneh B, Truong J, Dattani ND, Walter SD, Heels-Ansdell D, Bhatnagar N, Altman DG, Guyatt GH. The influence of study characteristics on reporting of subgroup analyses in randomised controlled trials: Systematic review [Internet]. BMJ. 2011 ;342(mar28 1):d1569.Available from: https://doi.org/10.1136/bmj.d1569

59. Barton S, Peckitt C, Sclafani F, Cunningham D, Chau I. The influence of industry sponsorship on the reporting of subgroup analyses within phase III randomised controlled trials in gastrointestinal oncology [Internet]. Eur. J. Cancer. 2015 Dec. ;51(18):2732–2739.Available from: http://dx.doi.org/10.1016/j.ejca.2015.08.030

60. Gabler NB, Duan N, Raneses E, Suttner L, Ciarametaro M, Cooney E, Dubois RW, Halpern SD, Kravitz RL. No improvement in the reporting of clinical trial subgroup effects in high-impact general medical journals [Internet]. Trials. 2016 ;17(1):320.Available from: http://dx.doi.org/10.1186/s13063-016-1447-5

61. Burke JF, Sussman JB, Kent DM, Hayward RA. Three simple rules to ensure reasonably credible subgroup analyses [Internet]. BMJ. 2015 ;351h5651.Available from: http://dx.doi.org/10.1136/bmj.h5651

62. Varadhan R, Wang S-J. Standardization for subgroup analysis in randomized controlled trials [Internet]. J. Biopharm. Stat. 2014 ;24(1):154–167.Available from: http://dx.doi.org/10.1080/10543406.2013.856023