Ossiculoplasty Atlas · Acoustics, Mechanics & Classification Systems · Module 14

14Comparing Risk Scores: MERI, OOPS, and the EER

How legacy ossiculoplasty indices stack up against the multi-institutional Ear Environment Risk score in correlating with real postoperative hearing outcomes.

TWhy compare the risk scores at all

Every ossiculoplasty ends with the same quiet question: how close to normal will this ear hear?For half a century surgeons have tried to answer it before the operation by reducing a complicated middle ear — its ossicles, its mucosa, its history of drainage and previous surgery — to a single number that predicts the postoperative air–bone gap (ABG). The Middle Ear Risk Index (MERI), the Ossiculoplasty Outcome Parameter Staging (OOPS) index and Black’s SPITE method are the legacy tools most trainees learn first [1994, 2001, 1992].

The trouble is that a score is only worth using if its number actually moves with the outcome it claims to predict. Until recently the indices were validated mostly in small single-surgeon series, on different cohorts, with different definitions of “success” — which made head-to-head comparison almost impossible. The arrival of the Ear Environment Risk (EER) score in 2025, derived from a large multi-institutional database and explicitly benchmarked against MERI and OOPS in the same cohort, is the first chance to ask the question properly [2025]. This module sets the indices side by side and asks which of them, if any, earns its place in the clinic.

TThe legacy indices: MERI, OOPS and SPITE

All three legacy systems share a skeleton: take the variables that experience says matter, assign each a weight, and sum them into a prognosis. They differ chiefly in which variables they trust.

The MERIgrew out of Kartush’s work and was given its familiar 0–12 form by Becvarovski and Kartush, who added smokingto the mix. It combines the Austin–Kartush ossicular grade, Bellucci’s otorrhoea classification, perforation, cholesteatoma, granulation/effusion and prior surgery; a score of 7 or more marks a high-risk ear [1994, 2001, 1973].
The OOPSindex of Dornhoffer and Gardner was built differently — by statistical multivariate analysis of 200 ossiculoplasties rather than by clinical intuition. Strikingly, in that derivation the presence of the stapes superstructure and of cholesteatoma did not independently predict ABG closure, so OOPS leaves them out and leans instead on mucosa, drainage, malleus and ossicular status, revision and the type of surgery [2001].
Black’s SPITEmethod distilled twelve significant features from 535 ossiculoplasties into five families — Surgical, Prosthetic, Infection, Tissue and Eustachian — designed more for structured counselling than for a single tidy number [1992].

The disagreement between MERI and OOPS over the stapes superstructure is more than trivia. It shows that “obvious” prognostic factors do not always survive statistical scrutiny, and it warns that any index is only as good as the cohort and the method that built it. That caveat becomes the whole story when these tools are finally tested against each other.

TThe EER: a multi-institutional rebuild

The Ear Environment Risk score took the legacy approach and rebuilt it on a far larger, multi-centre foundation. Gluth and colleagues assembled 1,679 ossiculoplasty cases performed between 2011 and 2019 across several institutions, with a mean follow-up of about 34 months and a mean postoperative PTA–ABG of roughly 21 dB. They then used multiple-variable linear regression to find which factors were independently associated with the postoperative gap, and turned only the statistically significant ones into a weighted scale [2025].

The result is a compact 0–12 working range built from seven factors, with weights that reflect how strongly each moved the outcome. The heaviest belong to multiple prior revisions and a lateralized or blunted tympanic membrane; a canal-wall-down cavity and an absent malleus carry intermediate weight; and pediatric age, absent stapes superstructure and frequent otorrhoea add a point each. Notice what survived: the EER, like OOPS before it, gives the stapes superstructure only a small weight, while elevating the surgical and inflammatory burden of the ear. Build a score yourself below and watch the risk band shift.

Two design choices are worth flagging. First, the EER rewards scoring the whole ear environment— its name is deliberate — rather than fixating on the ossicular gap alone. Second, because it was derived and weighted on a large, heterogeneous, real-world dataset, it is less hostage to one surgeon’s technique or case mix than the single-centre indices that came before it.

CHead to head: which predicts best

The genuinely new contribution is that the EER study scored every one of its 1,679 ears with MERI and OOPS as well, so all three could be correlated with the samepostoperative gaps. Using Kendall’s τ — a rank-correlation that asks how reliably a higher score goes with a larger gap — the EER came out ahead at 0.193, against 0.164 for OOPS and 0.149 for MERI [2025]. The chart sets them side by side.

Read that chart honestly and two messages emerge. The EER is the best of the three— a real, reproducible edge from a purpose-built multi-institutional model. But all three correlations are modest. A τ near 0.2 means the score and the outcome move together far more often than chance, yet the score still leaves most of the variability in an individual result unexplained. An independent single-centre validation reached the same conclusion from the other direction: MERI and OOPS correlated only weakly with hearing outcome (r around 0.2) and, crucially, neither reliably predicted the failures— the very cases counselling most needs to flag [2020].

That pairing — a clear winner that is still only modestly predictive — is the honest state of the art. It would be a mistake to abandon scoring because the numbers are imperfect, and an equal mistake to treat any score as a precise individual forecast.

CReading the bands for honest counselling

The reason a modestly correlated score is still clinically valuable becomes obvious when you stop looking at individual predictions and look at risk bands. The EER sorts ears into four groups — favourable (score 0), low (1–4), intermediate (5–8) and high(9+) — and the mean postoperative gap climbs cleanly across them, from about 16 dB in the favourable band to roughly 32 dB in the high-risk band [2025].

That stepwise gradient is exactly what a clinic conversation needs. You cannot promise a given patient a 19 dB gap, but you cantell the patient whose ear sits in the high-risk band that the average ear like theirs ends up with a gap roughly twice the size of a favourable ear — and that staging the reconstruction, or accepting more modest goals, is reasonable. The score turns a vague sense that “this is a difficult ear” into a defensible, comparable statement.

EER band	Score	Mean postop ABG	Counselling stance
Favourable	0	~16 dB	Expect near-closure; reconstruct in one stage.
Low	1–4	~20 dB	Good odds; standard single-stage approach.
Intermediate	5–8	~25 dB	Guarded; consider staging, temper expectations.
High	9+	~32 dB	Frank discussion; staging and amplification on the table.

CUsing scores well, and their limits

So how should a thoughtful surgeon actually use these tools? The evidence supports a few clear principles. Score the whole ear, not just the chain.The factors that dominate every modern index — revisions, canal-wall-down anatomy, a lateralized drum, drainage — describe the environment the prosthesis must survive in, and they move the outcome far more than the choice between a PORP and a TORP. A dry, primary ear with a single missing incus and a scarred, draining, multiply revised cavity may share an ossicular grade yet belong in completely different risk bands.

Prefer the best-validated tool, but hold it lightly. On current evidence the EER edges out MERI and OOPS for correlation with outcome and rests on the largest, most representative cohort, which is a reasonable basis to favour it [2025, 2020]. Yet familiarity and ease of use are legitimate reasons to keep using MERI or OOPS, and a score you compute consistently beats a better score you compute carelessly [2020].

Respect what the scores cannot see.None captures the microscopic intraoperative nuances — fibrotic adhesions, round-window obstruction, the true mobility of the footplate, the quality of the malleus as a load-bearing strut — that often decide an individual case, and some of their inputs (extent of granulation, “blunting”) carry interobserver subjectivity. Use the score to structure honest, risk-banded counselling and to standardize comparison between cases and centres. Do not use it to promise a decibel, to refuse an operation outright, or to replace the audiogram and your own eyes at the microscope. Within those limits, the modern risk score is one of the few genuinely evidence-based instruments the ossiculoplasty surgeon owns.

Case 2.14

A 41-year-old presents for revision ossiculoplasty. This is her second revision after two prior canal-wall-down procedures for cholesteatoma. The ear has been intermittently draining, the malleus handle is eroded, the stapes superstructure is intact and mobile, and the drum graft is blunted at the anterior angle. You want to give her an honest, evidence-based estimate of the likely postoperative air-bone gap before she consents.

Using the multi-institutional Ear Environment Risk (EER) framework, which combination of her findings carries the greatest weight toward a poor predicted outcome?

Self-assessment - Comparing Risk Scores: MERI, OOPS, and the EER4 questions

Question 1 · Foundation

What is the shared purpose of the MERI, OOPS and EER scoring systems in ossiculoplasty?

Question 2 · Trainee

A surgeon notes that the OOPS index, unlike MERI, did not retain the stapes superstructure as an independent predictor of outcome. What does this reflect?

Question 3 · Trainee

In the multi-institutional study that derived the Ear Environment Risk (EER) score, how did its correlation with the postoperative air-bone gap compare with MERI and OOPS?

Question 4 · Clinician

Given that even the best-performing current index (EER) correlates only modestly with outcome, what is the most defensible way to use these scores in clinic?

Tracked locally in your browser — see /progress for the dashboard.