jHound

Our tool jHound features an in-depth analysis of JSON documents. In particular, jHound analyzes documents regarding certain metrics such as maximal document depth, amount of data per depth, data type distribution, property optionality, and more. It is designed in a map-reduce multi-node approach in order to analyze documents in parallel on multiple machines.

jHound at a Glance

While the analysis backend of jHound is written in Python, it comes with an easy-to-use web frontend as well as with CLI support. On the dashboard the user gets a brief overview about the data sources and the health and availability of the analyzing nodes.

Nodes can be started or shut down on-demand. It is even possible to start up nodes during an already running analysis processes. jHound then takes those nodes into account while adapting its analysis distribution. As of now, this kind of elasticity works manually, however, we are planning to automate the choice of nodes.

Analyzing Real Public Data From Around the World

The jHound repository overview shows the repositories that have been processed by jHound. jHound is capable of analyzing data from any source that implements the CKAN API with an API version of at least 3.

In a multi-step process, the user is able to add repositories, scrape the links to all JSON files, download all data, analyze and inspect it in afterwards. The download and the analysis processes are parallelizable.


Beyond Analysis: Provenance Included

jHound sheds light upon characteristics like the distribution of data sizes, nesting depths, data type distribution, bulks of data location, property requirements, and more. Additionally, a provenance inspector tracks all the generated results down to their sources. A click on the diagram reveals which documents caused generating which result. If required, the user can inspect the document using an internet browser. jHound stores all collections and documents under a unique name in order to allow a reconstruction of the results in the future.

Contact: Mark Lukas Möller, University of Rostock. ✉ mark.moeller2@uni-rostock.de