4Pitfalls in Reporting and Comparing Hearing Results
Frequency selection, follow-up timing, and overclosure artifacts that make published ossiculoplasty results hard to compare.
FWhy hearing results refuse to compare
Pick up two ossiculoplasty papers and you will often find one claiming a 90% success rate and another a sober 55% for what looks like the same operation. Before concluding that one surgeon is twice as good as the other, read the small print. Almost always the gap between the headlines is not a difference in surgery at all but a difference in how the result was measured and reported. Hearing outcome is a number the author constructs from several choices — which frequencies are averaged, what decibel threshold is called a “success”, which bone-conduction line the air–bone gap is referenced to, and at what follow-up interval the audiogram was taken — and each of those choices can be made in a way that flatters or deflates the figure.
The outcome of ossiculoplasty is the air–bone gap (ABG): the distance between what the ear hears through air and the cochlear reserve measured by bone conduction. The operation can only close that gap toward the bone line; it cannot raise the bone line itself. Using the gap rather than the raw air-conduction threshold is itself a deliberate choice — it isolates the conductive component the surgery actually repairs from any coincident sensorineural change. But the moment you commit to the gap, every one of the downstream conventions becomes a lever that can move the published result. This module is about those levers, why the otology community built standards to fix them, and why comparing series that ignore the standards is a category error. The explorer below holds one fixed cohort of operated ears constant and lets you watch the reporting choices alone change the headline.
Recognising that the same ears can be reported as a triumph or a disappointment is the first defence against being misled — whether by a journal, a conference abstract, or a colleague at the morbidity meeting. The remedy is not cynicism but standardisation: a shared convention that every author follows so that the numbers mean the same thing.
FWhich frequencies you average
The air–bone gap is not one number; it is a curve across frequency. After ossiculoplasty the gap typically closes well in the low and mid frequencies but leaves a residual deficit in the high frequencies, because a reconstructed chain — with its added mass, altered stiffness and imperfect coupling — transmits the short wavelengths of high-frequency sound less faithfully than the native ossicles did. If you average only the low and mid tones, you report the part of the spectrum where the operation does best and quietly drop the part where it struggles.
This is exactly why the 1995 AAO-HNS guidelines fixed a four-frequency pure-tone average at 0.5, 1, 2 and 3 kHz, substituting 3 kHz for the older 4 kHz to keep the average within the speech range while still reaching toward the high frequencies [1995]. A series that averages only 0.5, 1 and 2 kHz, or that quietly substitutes a two-tone low average, is not comparable to one that follows the standard — it is reporting an easier exam. And even the standard four-tone average has a blind spot: it does not extend to 4–8 kHz, where a real conductive deficit can persist after an apparently successful reconstruction. Polanik and colleagues showed that measurable high-frequency air–bone gaps persist after ossiculoplasty even when the conventional four-tone gap looks closed [2020]. The chart below contrasts the spectrum before and after surgery and shows what the speech-frequency average conceals.
The lesson is twofold. When you report, state your frequencies and use the AAO-HNS set so others can compare. When you read, check the frequencies before you believe the number — and remember that even a textbook-perfect four-tone closure may leave a patient complaining of muffled high tones that the headline never mentioned.
TWhat counts as success, and against which bone line
Two further conventions silently decide the headline. The first is the success threshold. Most modern series call an ossiculoplasty a success when the postoperative gap closes to within 20 dB, but this is a convention, not a law of nature. Quote a result “within 30 dB” and you reclassify a swathe of mediocre ears as successes; demand “within 10 dB” and the same cohort suddenly looks poor. A success rate is therefore uninterpretable until the threshold is stated, and two series using different thresholds cannot be compared by their headline percentages at all.
The second, subtler convention is the bone-conduction reference line. The air–bone gap is air conduction minus bone conduction, so its value depends entirely on which bone line you subtract. The AAO-HNS standard is explicit: the postoperative gap must be computed against the postoperative bone-conduction line, using the same reference before and after where overclosure is assessed [1995]. Referencing the postoperative air conduction to the old preoperativebone line is a common and seductive error, because the bone line itself often improves after surgery — and subtracting a higher (worse) preoperative bone line makes the gap look falsely smaller, as if the operation closed more than it did. The next section explains why that bone-line improvement is usually an artifact rather than a genuine gain in cochlear function.
TOverclosure and the Carhart artifact
Surgeons sometimes report overclosure— a postoperative air–bone gap that appears smaller than zero, or a result that seems to have improved the bone line itself. Taken at face value this would mean the operation improved cochlear function, which ossiculoplasty cannot do. The usual explanation is a measurement artifact, and the classic one is the Carhart notch. Carhart described in 1950 a depression of bone-conduction thresholds — maximal near 2 kHz — that is mechanical, arising because a fixed or disrupted ossicular chain alters the inertial and osseotympanic components of bone-conducted hearing [1950]. It overstates the true sensorineural loss; it is not a real cochlear deficit.
When the ossicular lesion is corrected, that mechanical depression is released, so the measured bone-conduction threshold improves — not because the cochlea recovered, but because the artifact has gone. If you now compute the postoperative gap against the old preoperative bone line, the improvement is double-counted and the gap appears to have closed by more than the surgery actually achieved, sometimes into apparent overclosure of several decibels. The disciplined fix follows directly from the AAO-HNS rule: always reference the gap to the postoperative bone line, so that release of a Carhart-type artifact is reflected in the bone line where it belongs rather than masquerading as extra conductive closure [1995, 1950]. A series that reports headline overclosure without specifying its bone reference should be read with caution.
| Reporting choice | Flattering version | AAO-HNS standard |
|---|---|---|
| Frequencies averaged | 0.5/1/2 kHz (drops the worst tones) | 0.5/1/2/3 kHz four-tone average |
| Success threshold | Gap ≤ 30 dB | State it explicitly; commonly ≤ 20 dB |
| Bone reference | Preoperative bone line (inflates closure) | Postoperative bone line |
| Follow-up | Earliest audiogram, before drift | Defined, adequate interval |
CWhen you measure: the follow-up trap
Even with frequencies, threshold and bone line all standardised, two honest series can still disagree because they measured at different times. Ossiculoplasty results are not static. Early audiograms flatter the operation because the late mechanisms of decline — progressive adhesions and fibrosis loading the prosthesis, partial migration or extrusion, recurrent mucosal disease, and worsening ventilation — have not yet had time to act. Report at six weeks and you capture the peak; report at five years and you capture what the patient actually lives with.
The longitudinal data are unambiguous. In one cohort, success fell from 61.3% at six months to 54.3% at five years [2008]; in another, from 66.5% to 50.3% over the same interval [2006]. The chart traces both. A paper reporting an early endpoint is therefore not strictly comparable with one reporting a late endpoint, even if every other convention matches — and a series that does not state its follow-up interval cannot be interpreted at all.
There is a related distributional pitfall. Outcome data are typically skewed— a cluster of good results with a tail of poor ones — so summarising them as a single mean ± standard deviation can mislead, implying a symmetric spread that does not exist. Govaerts and Offeciers argued that audiometric results are better shown as medians, quartiles and extremes in box-and-whisker plots, which reveal the scatter and the tail of failures that a mean conceals [1998]. When you compare series, look past the headline average to the spread: a 20 dB mean gap with a long tail of 50 dB failures is a very different operation from a tight cluster at 20 dB.
CBinaural benefit and honest reporting
The deepest pitfall is that a perfectly closed monaural gap may mean little to the patient. Hearing is binaural, and everyday disability is governed by the better-hearingear. If the contralateral ear hears well, closing the operated ear’s gap to an immaculate 18 dB may buy the patient almost no perceived benefit, because the brain was already relying on the good ear. Conventional air–bone gap reporting, however meticulous, captures the technical success of the operation but not the benefit the patient feels.
Two tools exist to bridge that gap. The Belfast rule of thumb, derived by Smyth and Patterson from patients’ own assessments, judges a unilateral reconstruction worthwhile when the operated ear reaches 30 dB or better, or comes to within 15 dB of the other ear— thresholds chosen because beyond them the poorer ear tends to be ignored by the auditory cortex [1985]. The Glasgow Benefit Plot displays pre- and postoperative air-conduction thresholds of both ears on one graph, so a reader can see at a glance whether a technically successful operation actually moved the patient into a binaurally better position [1991]. Both make explicit what a monaural percentage hides: technical success and patient benefit are not the same thing.
Put together, honest outcome reporting is a discipline of specifying choices, not a single number. State the frequencies (AAO-HNS four-tone), the success threshold, the bone reference (postoperative), and the follow-up interval; show the distribution, not just the mean; and where the question is patient benefit rather than surgical success, reach for a binaural tool. Apply the same checklist when you read others’ results, and the implausible 90%-versus-55% chasm usually dissolves into two papers that simply measured different things [1995, 1998].
Which combination of methodological choices most plausibly inflated the reported success rate?
Why is the air-bone gap, rather than the air-conduction threshold alone, the preferred outcome measure for ossiculoplasty?
Which four-frequency average did the 1995 AAO-HNS guidelines recommend for reporting the air-bone gap after surgery for conductive hearing loss?
A series reports apparent bone-conduction overclosure after ossiculoplasty, with the postoperative air-bone gap referenced to the preoperative bone line. What is the most likely explanation, and what is the reporting fix?
An ossiculoplasty closes the operated ear's air-bone gap to 18 dB, but the patient reports little subjective benefit. The contralateral ear hears at 10 dB. How do binaural reporting tools explain this?