The Graph Project

It's boom time for graph databases but...

there's a shortage of 'graph-ready' plug and play data-sets

UK Companies & Company Officers

Companies House, the company registry in the UK, provides bulk datasets and daily updates as well as numerous API endpoints. There is a lot of data even at the summary level, some 12M companies, 21M company officers, and 20,000 daily record updates.

We maintain this data in a Neo4j graph database and provide access via a browser.

latest stats: 216,937,296 nodes, 710,316,821 relationships

Data Model

The model is based on POLE (People, Object, Location, Event) often used in investigations. Companies are connected to company officers who are connected to other companies and so on. In turn these are connected to addresses, points in time - such as birth, incorporation or joining dates - or other common features such as occupations or nationalities.

See the Documentation for more information.

Jacob Rees-Mogg (a UK MP) and the companies he is connected to

And if we then expand the connections...

User Cases

There are questions which this graph database can answer which other databases can not:

  • Entity resolution. Companies House data is not verified, it's filed raw. This results in multiple records for many businesses and individuals. Separate records can be connected by common relationships and graph analysis to establish a total record. Multiple filings are caused by error or are intentional to hide activity.
  • Understanding business networks. Mapping out who is connected to who and how is very useful in unearthing conflicts of interest, establishing how closely two parties are connected, identifying clusters of businesses around certain locations or people. Good for sales and customer profiling, fraud detection and host of other cases.
  • Scoring. Enhance the network structure by scoring using native graph algorithms (such as Page Rank) to discover which people, businesses, locations or events have the most influence or control. Include performance data to generate quality metrics for due diligence or background checks.
  • Monitoring. The data-set is updated daily (around 20k records changes) and reaches back many years. All the cases above can monitored to generate alerts.
  • Machine Learning. Native graph machine learning algorithms can take advantage of the 100s of millions of nodes and relationships in our data-set for predictive analysis.

Explore the Examples to see the detail