Training AI to automate complex legal workflows through custom datasets
A fast-moving generative AI legal startup set out to build an AI agent that could handle tasks like contract review, risk analysis, compliance checks, and legal document processing work that typically demands hours from legal professionals.
To get there, they needed highly specialized training data. Foundational models lacked the domain-specific reasoning and context required for legal use cases. The company had to develop original datasets built around the nuances of the legal industry, then apply targeted fine-tuning and post-training to teach the model how to perform like a legal expert.
Building high-quality legal datasets with expert-driven document analysis
To support the AI agent’s development, Databrewery quickly assembled a team of legal experts through the Brewforce network. These professionals were onboarded to handle complex legal data tasks. At the same time, Databrewery collaborated with the startup to define clear labeling instructions and build a custom ontology that captured every essential data point.
The startup shared a set of prompts linked to multi-page legal documents. The experts were tasked with extracting key insights, writing well-reasoned responses backed by evidence from the documents, and reviewing the model’s output for accuracy, safety, and legal soundness.
A second phase of the project focused on insurance and medical billing documents ensuring the model learned to correctly interpret and extract highly specific industry details.
Driving legal AI adoption with expert-labeled data and custom workflows
After testing other providers, the company chose Databrewery for its ability to deliver accurate, high-quality human-labeled legal data fast and for its powerful platform that made it easy to build and manage tailored project ontologies.
With support from Databrewery’s data factory, the startup improved model accuracy and accelerated development, staying ahead in a rapidly evolving legal AI market.