Week 5

10.10.2016

Studying for Prof. Schneider's Exam.

10.11.2016

Added Deduplication prototype to remove duplicate ID's from Schurter's POS data while segementating.

Renamed all classes, variables and functions to properly commnicate what they are doing in the JAVA program.

Added StringDistance Factor to deduplication Method so that only the most accuracte result of each of the duplicates gets saved.

10.12.2016

Added Detection of Empty Row's to SPOSDES. This is needed to later delete the removed duplicates.

The Java library that we use to manipulate excel files is unable to "delete" a row.

All it is able to do is remove the data of all cell's in a row. Duplicates -> Remove Data of all Cell's -> Row is Empty -> Remove empty row.

Removed the "Customer" Model. We previously delivered two different ArrayLists one with POS Data ->

Customers and one with Segementation Data -> Segements.

We now deliver both Arraylists as <Segment> instead.

10.13.2016

Implemented Deduplication Process throughout the Program.

Empty Row's get detected during the parsing, Duplicates get detected and "tagged" during the comparison and the empty row's and

duplicates then get removed during the Writing Process of the Program, the output.

10.14.2016

Lot's of bug fixing. The Program didn't properly detect which row's it needed to remove and which row's are to be removed.

Ran into issues which made it impossible to read in old xls files due to a bug of the apache.poi library. Developed a work around.

Fixed a bug that wrote the incorrect Company Name into the final .xlsx file if it's part of a duplication.

Fixed a bug that stopped us from detecting more than 2 duplicates for any given ID.

Refactored the Code to make it prettier / more readable.