← Back to AI Insights

"We have loads of data." This is one of the most common things we hear from organisations starting their AI journey — and one of the most misleading. Having data and having AI-ready data are very different things. In our experience, data preparation accounts for 40–60% of total AI project cost, and inadequate data is the single most common reason AI projects fail to deliver.

This guide gives you a framework for assessing your data readiness before you commit to an AI project — so you can surface problems early when they're cheap to fix, rather than mid-project when they're expensive.

The Five Dimensions of Data Readiness

1. Availability

Does the data you need actually exist and can you access it? Common availability problems include data trapped in legacy systems with no API, data that exists on paper and has never been digitised, data owned by third parties who won't share it, and data that exists in principle but has never been systematically collected.

2. Quality

Is the data accurate, consistent, and complete? Quality issues include missing values, duplicate records, inconsistent formatting, outdated information, and systematic biases in how the data was collected. Even small quality issues compound dramatically at scale — a 5% error rate in a training dataset of 1 million records means 50,000 incorrect examples teaching your model the wrong patterns.

Rule of thumb: Run a data quality audit on a random sample of 1,000 records from your target dataset before any AI project begins. The findings will almost always change your project timeline and budget estimates.

3. Volume

Do you have enough labelled examples for the type of model you're planning to build? Volume requirements vary enormously by use case and model type. Fine-tuning a large language model may require only hundreds of high-quality examples. Training a computer vision model from scratch may require millions. Understanding volume requirements before you start prevents the painful discovery mid-project that you don't have enough data to build what you planned.

4. Governance

Do you have the legal right to use this data for AI training? GDPR and UK data protection law impose significant constraints on using personal data for purposes beyond those originally specified. Data used to train AI models may require specific legal bases, data subject consent, or anonymisation — and not all anonymisation techniques are sufficient to prevent re-identification.

5. Infrastructure

Can you get the data to where the AI system needs it, at the speed and volume required? Production AI systems often need real-time or near-real-time data feeds. If your data sits in siloed systems with batch exports and no API layer, significant infrastructure investment may be required before AI can be deployed.


A data readiness assessment at the start of every AI project is not optional — it is the single most cost-effective investment you can make. The organisations that consistently deliver AI in production are those that treat data infrastructure as a first-class engineering discipline, not an afterthought.

Not Sure If Your Data Is AI-Ready?

Our Discovery Sprint includes a comprehensive data readiness assessment — giving you a clear, honest picture of what you have, what you need, and what it will realistically take to get there.

Book a Discovery Call →