The 839 Data Sets
(Under heavy construction.)
The data sets numbered (1 - 26) were created by students in the CS839 data science class at UW - Madison, Spring 2019 as a part of their class project.
While the data was originally created for entity matching purposes, it can also be used to do experiments on other tasks, such as data cleaning, visualization, etc. If you use the data in this repository, you can cite using the instructions mentioned here. This will help others to obtain the same data sets and replicate your experiments.
Goals
To be filled in later.
The Data
The below table details out the intermediate output and the accuracy for the datasets mentioned above. The Precision and Recall were estimated following the procedure mentioned in the Corleone paper (section 6).
The Baseline Result
Estimating Precision and Recall
To be filled in later.