Current Work

Pseudo.pdf

Expanding to Decision Trees V2

To the left is our current approach to expanding the ideas from our previous work into decision trees. This approach is based off a paper that focuses on a very similar approach except using K-Nearest Neighbors. Essentially, we generate a candidate set for each dirty data point, which contains a sampled M-number of repairs from the repair space. As we build the decision trees, we adjust the semantic of the decision tree from perfect partitioning to candidate set agreement. Potential splits are judged on their level of agreement within candidate sets of dirty tuples then afterwards judged on their partitioning.

Expanding to Decision Trees V1

Our current work involves taking the same premise from the last publication and applying it to a new, and more popular, learning algorithm: decision trees. This was the framework for our first attempt at a solution. We expanded the ideas found in the paper Sampling Repairs of Functional Dependencies which was published in 2010 at the VLDB conference. This approach proved to be far too time complex and did not prove successful at scale. We went on to focus on a simpler type of dirty data: null values. Initially we had attempted to focus on a different type of dirty data which was functional dependencies.