How to use ICEMAN

Users and applications

Users of ICEMAN include

Trial investigators who are planning or considering the results of a subgroup analysis;
Meta-analysts who are planning or considering the results of a subgroup analysis;
Authors of systematic reviews and clinical practice guidelines who assess subgroup claims made in published reports of trials or meta-analyses;
Journal editors, referees, methods consultants, and others concerned with the quality of subgroup analyses in trials or meta-analyses.

Assessment in duplicate

Confidence in the assessment increases if two investigators independently apply ICEMAN, discuss discrepancies, and present a consensus version.

Reporting

We recommend specifying use of ICEMAN in the study protocol, and in the methods, results, and interpretation sections of the final publication:

Protocol: “We will assess the credibility of potentially relevant effect modification using ICEMAN.”
Methods: “We used ICEMAN to assess the credibility of potentially relevant effect modification.”
Results: “We judged the credibility of the potential effect modification as low, with uncertainty arising from lack of prior evidence and an inconclusive test of interaction (see supplement).”
Interpretation: “A formal credibility assessment rated the apparent effect modification as likely spurious. We recommend considering the overall effect estimate for all patients.”

Warning

We do not recommend reporting overall credibility as a percentage (e.g. “30% credible”).

Users can download the fillable ICEMAN form for RCTs or meta-analyses, complete it on their device, and attach it to the appendix of their publication. Alternatively, the online tool generates a downloadable reporting table directly.

We also provide four table templates for summarising one or more ICEMAN assessments in a publication. Download all templates (.docx)

Legend for all tables: (–) definitely reduces credibility; (-) probably reduces credibility or unclear; (+) probably increases credibility; (++) definitely increases credibility. Not applicable items receive no code.

Template1 — Single assessment (RCT). Full item-level table for one outcome, effect measure, and effect modifier in an RCT.

Item	Response	Rationale
1 — Direction of effect modification hypothesized a priori?	(+) Probably yes	Direction stated before analysis.
2 — Effect modification supported by prior evidence?	(+) Some support	Consistent indirect evidence.
3 — Chance unlikely explanation of effect modification?	(+) Chance may not explain	Interaction p = 0.008.
4 — Few effect modifiers tested or multiplicity considered?	(-) Probably no or unclear	Six modifiers tested; no adjustment.
5 — Arbitrary cut points avoided for continuous modifier?	Not applicable
6 — Additional credibility considerations?	None	No additional concerns.
Overall credibility	Moderate: likely effect modification; use separate subgroup effects, but note uncertainty	Multiplicity lowered credibility.

Template2 — Single assessment (meta-analysis). Full item-level table for one outcome, effect measure, and effect modifier in a meta-analysis.

Item	Response	Rationale
1 — Based on within- rather than between-trial comparison?	(+) Mostly within	Most information came from within-trial subgroup comparisons.
2 — Effect modification similar across trials?	(+) Mostly similar	Trial-specific estimates had similar direction, with some variation in magnitude.
3 — Number of trials large for between-trial comparison?	Not applicable	Assessment did not rely on between-trial comparison.
4 — Direction of effect modification hypothesized a priori?	(+) Probably yes	Direction stated before analysis.
5 — Chance unlikely explanation of effect modification?	(+) Chance may not explain	Meta-regression p = 0.008.
6 — Few effect modifiers tested or multiplicity considered?	(-) Probably no or unclear	Several modifiers tested; no adjustment.
7 — Random-effects model used?	(++) Definitely yes	Authors explicitly used a random-effects model.
8 — Arbitrary cut points avoided for continuous modifier?	Not applicable
9 — Additional credibility considerations?	None	No additional concerns.
Overall credibility	Moderate: likely effect modification; use separate subgroup effects, but note uncertainty	Multiplicity lowered credibility.

Template3 — Multiple effect modifiers. Compact table comparing ICEMAN items across several candidate effect modifiers.

Item	Age	Sex	Diabetes status	Baseline severity
Interaction p-value	0.008	0.04	0.003	0.07
1 — Direction a priori?	(+) Probably yes	(-) Probably no or unclear	(++) Definitely yes	(-) Probably no or unclear
2 — Prior evidence?	(+) Some support	(-) Little or no support	(++) Strong support	(+) Some support
3 — Chance unlikely?	(+) Chance may not explain	(-) Chance likely	(++) Chance unlikely	(-) Chance likely
4 — Multiplicity?	(-) Probably no or unclear	(-) Probably no or unclear	(-) Probably no or unclear	(-) Probably no or unclear
5 — Arbitrary cut points?	Not applicable	Not applicable	Not applicable	(-) Probably no or unclear
6 — Additional?	None	None	None	None
Overall credibility	Moderate: likely effect modification, but uncertainty remains	Low: some but insufficient support	High: very likely effect modification	Very low: minimal to no support

Template4 — Summary table. One row per assessment; suitable for summarising several assessments across outcomes and effect modifiers.

Footnote: Each row represents one candidate effect modification (one outcome, effect measure, and effect modifier). ICEMAN was applied only when the interaction p-value was 0.1 or smaller. Full item-level assessments appear in a supplement.
Outcome	Effect measure	Effect modifier	Interaction p	Main credibility concerns	Overall credibility
30-day mortality	Risk ratio	Age	0.008	Multiplicity not addressed	Moderate
30-day mortality	Risk ratio	Sex	0.04	Direction not prespecified; weak prior evidence; multiplicity not addressed	Low
Stroke at 1 year	Odds ratio	Diabetes status	0.003	None major	High
Pain at 6 months	Mean difference	Baseline severity	0.07	Weak statistical support; arbitrary cut point	Very low
Serious adverse events	Risk difference	Age	0.14	ICEMAN not applied (p > 0.1)	Not assessed

Using ICEMAN with other instruments

ICEMAN can be combined with the Cochrane Risk of Bias tool for RCTs¹ or the ROBIS tool for systematic reviews,² and with the GRADE framework³:

Moderate or high credibility: Apply GRADE to subgroup-specific estimates. Note remaining uncertainty if moderate. Considering subgroup-specific estimates may sometimes resolve concerns due to heterogeneity and consequently increase certainty of evidence and strength of recommendation.
Low or very low credibility: Apply GRADE to the overall effect estimate. Note remaining uncertainty if low, especially if the potential effect modification appears to explain heterogeneity.

References

1. Higgins J, Sterne J, Savović J, Page M, Hrõbjartsson A, Boutron I, Reeves B, Eldridge S. A revised tool for assessing risk of bias in randomized trials. Cochrane Database of Systematic Reviews. 2016 ;1029–31.

2. Whiting P, Savović J, Higgins JPT, Caldwell DM, Reeves BC, Shea B, Davies P, Kleijnen J, Churchill R, ROBIS group. ROBIS: A new tool to assess risk of bias in systematic reviews was developed [Internet]. J. Clin. Epidemiol. 2016 Jan. ;69225–234.Available from: http://dx.doi.org/10.1016/j.jclinepi.2015.06.005

3. Guyatt GH, Oxman AD, Kunz R, Woodcock J, Brozek J, Helfand M, Alonso-Coello P, Glasziou P, Jaeschke R, Akl EA, Norris S, Vist G, Dahm P, Shukla VK, Higgins J, Falck-Ytter Y, Schünemann HJ, GRADE Working Group. GRADE guidelines: 7. Rating the quality of evidence–inconsistency [Internet]. J. Clin. Epidemiol. 2011 Dec. ;64(12):1294–1302.Available from: http://dx.doi.org/10.1016/j.jclinepi.2011.03.017