Week 6

Management Summary

For the SCHURTER, Inc. Project Point of Sale (POS) Customer Segmentation

SCHURTER, Inc. possesses a Point of Sales (POS) data set with information on their indirect customers. The data includes the name, the country and the zip code of each customer. This data was further enhanced by last year’s ICE group which began developing the SCHURTER Point of Sales Data Enrichment Software (SPOSDES). The Software uses the API of Google and Bing to find the URL of a customer based on the name of the customer. The final goal of SPOSDES is to then read the meta data from the customer websites and segment the customer into industry segments based on the meta data. This method however, has proven to be insufficient as the meta data does not contain the required information. Furthermore, SPOSDES did no longer function as the BING API changed. The first objective was to get the basic data enrichment function of SPOSDES to work again and to then find a way to segment the customers into the different segments.

Finding the bug in the code which was written by the previous team proved difficult. Fortunately, Microsoft did write an excellent guide for their API thanks to which the error in the code was identified. While looking for the problem with BING, new API keys were generated which also fixed the problem with the Google API. Sadly, as the previous team already mentioned towards SCHURTER, Google only allows for 100 searches a day and Bing only for 5’000 a month. Furthermore, Microsoft will shut Bing’s current API down and add a service similar to it to the azure marketplace. Since the main goal of our project is not to fix the last year’s program but to further enhance the data and deliver a customer segmentation, SCHURTER decided to “abandon” the old solution.

Instead of using meta data to find the industry segment of a company, SPOSDES now uses the name of the company and finds a match in a second list of company names and segments. The list of companies with segments in which they operate is obtained by looking up online registers of companies and reading the name and segment out of them via an excel VBA script. The resulting two lists, POS and industry segment, get read into the java program and compared to check if a customer of SCHUR-TER is in the industry segment list. If a match is found the segment in which the company operates is written into the POS data file of SCHURTER. As many companies have names ending with “ldt, co, ag” or sometimes even just a simple “,” the Java application looks for similarity between two names in-stead of an exact match.

For this purpose a string distance algorithm is used. Such an algorithm requires two strings, meaning two company names, and delivers an approximation on how similar the strings are to one another. The Algorithm chosen is the “Jaro-Winkler distance”, an algorithm best suited for the comparison of short names. The output of the string comparison function is the segment of the company, the com-pany name which delivers the best string similarity and the “Jaro-Winkler distance”.

The new customer segmentation function requires two inputs: SCHURTER’s POS file and the new Industry Segment file. The latter is an easily editable excel file with two columns. The company name and the segment of the company. This allows SCHURTER to add new segments and companies without the need of a programmer by simply editing the excel file. For now, the program focusses on companies from the medical industry. We suggest that more companies and segments get added to the seg-mentation file to ensure a better and more accurate result. The more data, the better the result.

As the previous group already coded a java application, we added the functionality of string comparison to the existing software. Doing so proved problematic as the software was not coded to work well with additional functionality. The graphical user interface (GUI) had to be revamped and the two functionalities, data enhancement and customer segmentation, are split to ensure simple navigation through the program and to make sure that the two functionalities do not cross each other. The design of SPOSDES stayed the same. A main window shows the user the necessary inputs to continue while a properties window allows for deeper customization of the output. To ensure that the user always feels safe and comfortable a help window opens a manual showing the user exactly what type of inputs the program expects.

In the last stage of the project a deduplication function was added to SPOSDES upon request. It re-moves all the duplicated customers based on the customer ID. This heavily increased the runtime of the program from 10 minutes for 53’000 customers and 10’000 companies to up to 2 hours. The reason is the huge amount of duplicates coupled with the long time it takes Java to remove each duplicate once they are detected. To keep the runtime short we suggest only deduplicating the file once and then using the deduplicated file in the future.

To summarize the above mentioned points, all requirements of SCHURTER, Inc. have been met. The past functionalities of SPOSDES have been restored and the program now allows customer segmentation based on external company registers and shows the segmentation as well as the accuracy of the result. By increasing the amount of companies in the segmentation file, the segmentation can be improved to provide a more accurate customer segmentation. The deduplication function added further allows SCHURTER to get rid of duplicate records via simple button press.