The 839 Data Sets

(Under heavy construction.)

The data sets numbered (1 - 26) were created by students in the CS839 data science class at UW - Madison, Spring 2019 as a part of their class project.

While the data was originally created for entity matching purposes, it can also be used to do experiments on other tasks, such as data cleaning, visualization, etc. If you use the data in this repository, you can cite using the instructions mentioned here. This will help others to obtain the same data sets and replicate your experiments.

Goals

To be filled in later.

The Data

The below table details out the intermediate output and the accuracy for the datasets mentioned above. The Precision and Recall were estimated following the procedure mentioned in the Corleone paper (section 6).

The Baseline Result

Estimating Precision and Recall

To be filled in later.