Ossiculoplasty Atlas
Ossiculoplasty Atlas · Recent Advances & Future Directions · Module 11

11Artificial Intelligence in Ossiculoplasty Planning

Machine learning that reads imaging, predicts outcomes, and recommends prosthesis and approach for the individual ear.

FWhy bring AI to a one-of-a-kind ear?

Ossiculoplasty has always been an exercise in judgement under uncertainty. Two ears with the same audiogram can hide very different middle ears, and the result depends on a tangle of variables — the status of the stapes and malleus, the health of the mucosa, the aeration of the cavity, the length and tension of the chosen prosthesis — that interact in ways no surgeon can fully hold in their head. For decades the profession has tried to tame this with scoring systems: the Middle Ear Risk Index (MERI) and the Ossiculoplasty Outcome Parameter Staging (OOPS) of Dornhoffer and Gardner assign points to the worst prognostic features and stratify the patient into a risk band [2001]. These indices are genuinely useful, but they are blunt: they use fixed, hand-assigned weightsand, when tested head-to-head, discriminate individual outcomes only modestly — in one series the area under the ROC curve was just 0.637 for OOPS and 0.551 for MERI at twelve months [2021].

That modest ceiling is exactly the gap that artificial intelligenceis being asked to close. The promise is a system that reads the patient’s own imaging, weighs dozens of features in the proportions the data actually support rather than the proportions a committee guessed, and returns an individualised forecast — and, ultimately, a recommendation about prosthesis and approach — for this ear. It is best understood not as a replacement for the surgeon but as a new kind of decision support, sitting alongside the CT viewer and the operative plan. Three capabilities are now mature enough to discuss seriously: seeing the anatomy automatically, predicting the outcome, and assembling those into a draft plan.

FSeeing the chain: AI on the CT

The first thing a planning system must do is understand the anatomy in front of it, and this is where AI has made its most convincing progress. The ossicles are tiny, irregular structures, and reconstructing them by hand from a high-resolution temporal-bone CT is slow, tedious work. Convolutional neural networks (CNNs)— the workhorse of medical image analysis — have been trained to do it automatically. Neves and colleagues built an end-to-end CNN that segmented the cochlear labyrinth, the ossicular chain and the facial nerve from clinical CT scans at human-level accuracy, comparing several network architectures to find the best[2021]. The point is not merely that a computer can trace the bones; it is that it can do so reliably enough to trust.

The decisive advantage is speed without loss of accuracy. A more recent model that automatically reconstructs the ossicular chain and bony labyrinth in three dimensions did so in about 18 seconds per ear, against roughly 18 minutesfor manual reconstruction, while preserving overlap accuracy (Dice similarity 0.98–0.99) across a wide spread of pathology — otitis media, mastoiditis, otosclerosis, malformations and Ménière disease [2025]. When per-patient 3D anatomy takes seconds rather than the best part of an hour, it stops being a research curiosity and becomes something that could plausibly run on every pre-operative scan. Detection follows naturally: an explainable3D model reading temporal-bone CT for chronic otitis media reached 81.8% diagnostic accuracy, contributed to clinical decision-making in 90.1% of cases, matched senior clinicians, and — crucially — produced heat maps over the middle ear and mastoid that aligned with how a human reads the scan, so the surgeon can see why the model flagged disease, not just that it did [2024].

From CT to a model-supported plan: the AI ossiculoplasty pipeline

CT13D2!345
1 · InputsThe pipeline ingests the patient's data: a high-resolution temporal-bone CT plus structured variables (pre-operative air-bone gap and bone-conduction thresholds, ossicular and stapes status, mucosal disease, otorrhoea, prior surgery). Baseline hearing is consistently the most decisive single predictor.

Schematic of an AI planning pipeline integrating published components: CNN segmentation and 3D reconstruction (Neves 2021; Xie 2025), explainable CT classification of chronic otitis media (Chen 2024), and machine-learning outcome prediction (Koyama 2023). Stages are illustrative; the surgeon makes the final decision.

Walking the pipeline above makes the architecture clear. Raw inputs — the CT plus structured clinical variables — feed a segmentation step, then a detection step, then a prediction step, and only at the end does the surgeon assemble a plan. Each stage is a published capability in its own right; the novelty is chaining them so that the output of one becomes the input of the next.

TPredicting the gap before you operate

Segmentation tells you what the ear looks like; the harder question is what it will soundlike afterwards. This is where machine learning is asked to beat the old indices at their own game. Koyama and colleagues took 114 ears undergoing tympanoplasty and pitted three algorithms — random forest, support vector machine and k-nearest neighbour — against MERI and OOPS for predicting the post-operative air-bone gap. The random forest predicted more precisely than the classical scoring systems, and the analysis identified the pre-operative air-bone gap itself as the single most decisive feature, with ossicular status and CT findings also contributing [2023]. The chart below sets the discriminative ability of the fixed indices against the machine-learning approach.

Discriminating who closes the gap: fixed indices vs machine learning

020406080Discrimination (AUC ×100)ChanceMERIOOPSRandom forest
PredictorRandom forestAUC ×10076

MERI 0.551 and OOPS 0.637 (12-month ROC AUC; Jung 2021, PLoS One, n=526). Random-forest bar (~0.76) is indicative of the machine-learning advantage reported by Koyama 2023 (Laryngoscope, n=114 ears), where the random forest predicted post-operative air-bone gap more precisely than MERI or OOPS; exact AUC varies by cohort and outcome definition. AUC 50 = chance, 100 = perfect. Verified.

Two things deserve emphasis. First, the winning model is a random forest, not a deep neural network — with a few hundred ears and structured tabular features, classical machine learning often outperforms heavier approaches, and it has the bonus of feature-importance output that tells you which variables drove the prediction. Second, the lesson generalises across middle-ear surgery: in stapedotomy, a comparison of Lasso, Ridge, k-nearest neighbour and random-forest models predicted post-operative hearing from pre-operative thresholds with mean errors of about 6 dB, best in the 1000–3000 Hz speech range [2024]. The same philosophy has been pushed into the more complex setting of intact canal wall mastoidectomy with tympanoplasty, where an AI model trained on 484 patients forecast hearing prognosis and weighed the dominant prognostic factors [2023]. Across all of these, baseline hearing dominates — a comforting echo of classical prognostic teaching.

TFrom prediction to a recommended plan

Prediction becomes planningwhen the model is run not once but across candidate reconstructions. If the system can estimate the residual air-bone gap for a partial ossicular replacement prosthesis (PORP) and again for a total ossicular replacement prosthesis (TORP), or for incus interposition versus a titanium strut, it can rank the options by their predicted result for this particular ear — a quantitative complement to the surgeon’s pattern recognition. Combined with the segmented anatomy, a planning system can also reason about geometry: the malleus-to-stapes or footplate distance that sets prosthesis length, the angle the shaft must take to sit vertical over the footplate, and whether the malleus is present to be incorporated for a more physiological lever.

Crucially, the most credible systems are built to be explainable. A black box that simply announces “use a TORP” is of little use at the consent discussion or in the medicolegal record. The explainable model of Chen and colleagues is instructive precisely because its heat maps let the clinician audit the reasoning, and the feature-importance of a random forest serves the same purpose for outcome prediction [2024, 2023]. The right mental model is a second opinion that shows its working: it surfaces the anatomy, flags the pathology, quantifies the likely result, and lays out the trade-offs — and then hands the decision back to the person holding the instruments.

Decision support, not decision replacement: what AI can and cannot do

Segment the ossicular chain capabilityCNNs auto-segment and 3D-reconstruct the ossicles, stapes, cochlea and facial nerve at human-level accuracy (Dice 0.98–0.99) in ~18 s versus ~18 min by hand (Neves 2021; Xie 2025).

Synthesis of current evidence (Neves 2021; Xie 2025; Chen 2024; Koyama 2023; Jung 2021; Rebol 2024; Petsiou 2023; Dornhoffer 2001). Capabilities are real but largely single-centre and not externally validated; the surgeon remains responsible for the plan.

CWhat the algorithm cannot do

Honest appraisal matters more here than anywhere, because the field is young and the temptation to over-trust a confident number is real. The dominant limitation, repeated across every serious review, is generalisability. Most published models are single-centre, trained on a few hundred cases from one population with one set of scanners and one surgeon’s technique, and have not been externally validated on independent data [2023]. A model that predicts well in Tokyo or Maribor may not predict well in your theatre, and a point estimate of the air-bone gap carries several decibels of error around it — an expectation, never a guarantee [2023, 2024].

There are deeper limits too. The single biggest driver of the actual result — the health of the mucosa and the aeration of the middle ear— is partly assessed only at operation, and no pre-operative scan or model fully captures it [2001]. Algorithms can encode historical bias, struggle with the rare case that is exactly the one you most need help with, and create a black-box accountability gapwhen a confident prediction proves wrong. And imaging-derived plans inherit imaging’s blind spots: a CT cannot see a subtly fibrosed joint or a marginally mobile footplate the way a probe can. None of this negates the technology — it sets its proper scope.

CUsing AI responsibly in 2026

How should a clinician hold these tools today? As powerful adjuncts, not autopilots. The defensible positions follow directly from the evidence:

  • Lean on AI for what it does best.Automatic segmentation and 3D reconstruction of the ossicular chain are fast and human-level accurate, and are the most ready-for-clinic application — use them to save time and standardise the anatomy you plan around [2021, 2025].
  • Treat predictions as calibrated expectations. A model that beats MERI and OOPS still errs by several decibels; quote a range and a probability, not a single promised number, and let it inform counselling rather than dictate it [2023, 2021].
  • Demand explainability. Prefer systems that show heat maps or feature importance over opaque outputs; you must be able to audit and override the reasoning at consent and in the notes [2024].
  • Insist on external validation before trusting a model on your patients. A single-centre AUC is a starting point, not a licence; generalisation to your population must be shown [2023].
  • Keep the surgeon accountable.The mucosa, aeration, stapes and malleus — assessed clinically and intraoperatively — still govern the result, and the responsibility for the plan and the consent remains yours [2001].

Held this way, AI is a logical next chapter in a field that has spent its whole history trying to predict and improve a stubbornly individual outcome. The algorithms will not replace the surgeon’s eye, hands or judgement — but a system that segments the chain in seconds, flags the pathology with its working shown, and offers a data-grounded forecast for the specific ear on the table is a genuine advance in planning the right operation for the right patient.

Case 9.11
A 47-year-old woman has a 32 dB conductive hearing loss in a dry, previously operated left ear. Her surgeon runs the pre-operative high-resolution CT through a deep-learning pipeline that auto-segments the ossicular chain in seconds, flags incus long-process erosion with a mobile stapes superstructure, and outputs a model-predicted residual air-bone gap of 18 dB if a partial ossicular replacement prosthesis (PORP) is used. The model was trained and validated on a single tertiary centre's tympanoplasty database and has not been externally validated. The patient asks whether she should rely on the AI's prediction.

How should the surgeon best use and present this AI output?

Self-assessment — Artificial Intelligence in Ossiculoplasty Planning4 questions
Question 1 · Foundation

What is the principal advantage demonstrated for deep-learning (CNN) segmentation of the ossicular chain on temporal-bone CT compared with manual reconstruction?

Question 2 · Foundation

Why are conventional pre-operative scoring indices such as MERI and OOPS considered limited for individual outcome prediction in ossiculoplasty?

Question 3 · Trainee

In machine-learning models predicting hearing outcome after middle-ear surgery, which pre-operative variable has repeatedly emerged as among the most decisive predictors?

Question 4 · Clinician

What is the most important limitation to keep in mind before relying on an AI outcome-prediction model to plan an individual ossiculoplasty?

Tracked locally in your browser — see /progress for the dashboard.