How Data Prioritization and Model Diagnostics Help Grow AI for Digital Accessibility

The Challenge

Labeling a single webpage screen for accessibility compliance takes nearly 10 minutes and that’s just one screen. With thousands of web and mobile datasets, the team had built a valuable collection of training data, but manual annotation quickly became a major bottleneck. As they looked to scale, it was clear that continuing this approach would be too slow and too costly.

The Approach

To accelerate progress, the team turned to Databrewery’s Catalog and Model Diagnostics. These tools allowed them to zoom in on weak points in their models, detect noisy or unreliable labels, and focus their efforts on the data that mattered most. Instead of treating all data equally, they prioritized high-impact examples making better use of their time and resources.

The Outcome

With smarter filtering and clearer model signals, they eliminated a third of low-confidence data points and improved overall model performance by over 5%. Even more importantly, they cut their annual labeling volume and cost by more than half, unlocking faster iteration cycles and freeing their team to push forward on new AI-driven accessibility features.

DataSets

This team has spent decades at the forefront of digital accessibility. From the early days of the internet, their mission has been to ensure equal access to web and mobile experiences for everyone, including people with disabilities. Today, they’re using machine learning to drive the next generation of accessibility testing, automating what used to be fully manual processes and scaling their impact like never before.

Annotating just one web page screen for accessibility compliance can take around ten minutes. Multiply that by thousands of screens across web and mobile platforms, and the workload quickly becomes overwhelming. As their dataset grew, the team realized they needed a better solution. They turned to Databrewery for both the tooling and support to help scale their efforts. Prior to this, they were relying on open-source annotation tools stitched together with Jupyter notebooks and spreadsheets, a setup that made collaboration difficult and consistency nearly impossible.

“Before we had access to diagnostics within Databrewery, everything was a manual lift,” said Javier Moretti, Machine Learning Engineer. “We were calculating metrics on our own and trying to visualize predictions through our own tooling. The moment we moved those workflows into Databrewery, everything started moving faster. Iteration became natural, not something we dreaded.”

Bluriness

By using Model Diagnostics, Javier’s team was able to evaluate how their models were performing and quickly identify weak spots. When they reviewed their existing dataset, they found noisy samples that were dragging performance down. Roughly one-third of those data points were filtered out, leading to a measurable 5% improvement in performance. After re-labeling high-impact examples, results improved even further. Many of these edge cases were difficult for both labelers and models making the ability to focus and refine on known problem areas a breakthrough moment.

By refining their dataset instead of expanding it randomly, the team significantly reduced the amount of labeling required without sacrificing model quality. “We’ve seen equal or better performance using half the data,” Moretti added. “That only became possible because we were finally able to pinpoint model weaknesses and match them to the right data. Otherwise, we’d still be labeling twice as much and making half the progress.”

With Databrewery powering their workflows and a sharper focus on high-value data, the team is pushing accessibility forward through smarter, faster AI proving that inclusive technology isn’t just possible, it’s scalable.