Category: Data & AI • Research Note

Can LLMs Read UK Company Accounts?

This research note tests whether frontier and open-weight large language models can extract reliable information from UK company accounts. The practical conclusion is blunt: the models are already good enough to read the documents; the scarce asset is the clean, verified, provenance-rich data layer.

UK company accounts Companies House iXBRL LLM benchmark Data defensibility

1. Core thesis

The benchmark asks a commercially important question: can modern LLMs read UK company accounts well enough to extract structured financial facts? Based on the project description, the benchmark uses a 1,000-question verified Q&A set and compares proprietary and open-weight models.

The finding is strategically useful because it shifts the moat away from generic model selection and toward data ownership, verification, provenance, and refresh infrastructure.

2. Benchmark framing

Component	Description
Question set	1,000 verified questions and answers based on UK company-account filings.
Model coverage	Proprietary and open-weight systems, including Claude, GPT-5.5, Gemini, DeepSeek V4 Pro, and GLM 5.2.
Reported range	Approximately 96% to 99.6% performance across the tested systems, according to the project summary.
Commercial implication	The bottleneck is not whether LLMs can read the accounts. It is whether the accounts have been normalised, verified, and packaged into a reusable data product.

3. Why it matters

If multiple frontier and open-weight models can already perform the reading task, the durable advantage moves to the inputs. In plain English: the model is no longer the hard part. The hard part is obtaining clean public filings, parsing them consistently, mapping fields across taxonomies, preserving provenance, and making the dataset safe to use in downstream analytics or model workflows.

Plain English takeaway: LLMs make messy filings usable, but they do not magically create a verified financial database. The winner owns the cleaned, normalised, provenance-tracked layer.

4. Link to the UK Company Financials dataset

The article supports the broader UK Company Financials data product: an ML-ready dataset of UK company filings parsed from Companies House iXBRL accounts and normalised across the relevant UK GAAP taxonomies. The benchmark is useful because it demonstrates that the dataset can be consumed not only by analysts, but also by AI systems.

Daniel Cheah

Can LLMs Read UK Company Accounts?

1. Core thesis

2. Benchmark framing

3. Why it matters

4. Link to the UK Company Financials dataset