Combining deep historical records, family trees, and DNA samples, the team works on extracting structured genealogical insights from highly unstructured, complex data. They’ve invested heavily in neural networks and transformer-based models, but as projects grew in size and complexity, it became clear that model performance was limited by one thing: the quality and speed of their training data. To move faster, they shifted toward a more data-centric approach and began rethinking how labeling fit into their MLOps pipeline.
Previously, data scientists owned much of the labeling work from end to end. Even when domain experts were brought in to help, the process was clunky. Experts had deep familiarity with historical documents, but not always with how machine learning models consumed labeled data. This created friction. The team needed a way to connect both worlds expert judgment and machine learning in a workflow that encouraged iteration, not slowdowns.
That’s where Databrewery came in. Using its Annotate platform, the team integrated model-assisted labeling, dynamic annotation relationships, and a flexible image and text editor that allowed everyone engineers and historians alike to work in sync. With Databrewery, they sourced high-quality annotators who could keep up with both the scale and complexity of the work without requiring constant supervision.
“Before switching platforms, we could build models quickly, but the data pipeline was a bottleneck,” said Priya Mehta, Senior Applied Scientist.
“Once we brought in a platform that let us collaborate with labelers in real time, everything changed. We now move at the pace our models demand.”
One major shift came from the ability to leave real-time feedback dropping comments directly on specific image regions, asking questions, and clarifying edge cases without delay. This back-and-forth unlocked faster decision-making and reduced errors that often came from ambiguous labeling specs. “Other platforms felt like a black box; we’d wait for all the labels to come back before we could even start giving feedback,” Mehta added.
To keep momentum, the team also leaned into Databrewery’s analytics and QA tools, which made it easier to evaluate label performance and fine-tune processes as they scaled. They set up strong review systems to ensure accuracy, even on unlabeled data, and focused on high-value tasks like handwriting recognition and data extraction from historical PDFs. As a result, their training cycles became faster, their models more accurate, and their entire workflow more collaborative and efficient.