6. Week + Final

Our last week we spend working on the deployment of our software and on our final presentation on Friday, October 23^th. In addition, we wrote a Management Summary for the audience of our presentation.

On this site we will provide you our final Management Summary and the Lessons Learned

MANAGEMENT SUMMARY

SCHURTER, Inc. holds a Point of Sale (POS) database with a vast amount of information regarding their indirect customers. These are customers that purchase from one of their many authorized distributors. Unfortunately, it does not tell them much about these customers. This POS data includes the customer’s company name, their zip code and what they have purchased. It does not include their business/market/application, and how SCHURTER, Inc.’s products are used. The ultimate goal is to categorize their indirect customers by industry and find out what products they make. This has been done manually form time to time in the past by searching the company’s website with a search engine like Google. This manual work process should be automated on a monthly basis with a software application.

The applied approach to achieve this complex task, is to first load the POS-data in to the application, using a search engine API (Application Programming Interface) to find the correct URL of the respective customer’s website. Thereafter collect the website’s metadata to further categorize the customers and allocate them to an industry. Since the categorization of customers with unstructured and unstandardized metadata is a really complex matter and the development of such a system is very time-consuming, this step is not part of the proposed requirements. SCHURTER, Inc. has not defined how to further proceed the gathered metadata so far. The main target is therefore to find the right customers website in a short period of time with a high hit rate.

According to the discussed and defined requirements, the software architecture is based on a stand-alone application design with Java as the chosen programming language. Since SCHURTER, Inc. will run this application primarily on one ordinary desktop computer, Java and the stand-alone architecture were predestined. Java is runnable on almost any platform and therefore the software is portable and independent of the underlying operating system.

Bing Search API is the preferred search engine. However, the Google Custom Search API is also implemented in the software due to SCHURTER, Inc. request. Unfortunately Google Custom Search allows only 100 web search requests per day, even with expensive pricing models. This restriction makes Google almost unfeasible for productive use. That is the main reason Bing became the preferred search engine. Also, Bing offers more attractive pricing models and 5000 free search requests per month. Due to the fact, that SCHURTER, Inc. is unlikely to have more than 5000 new customers per Month, no costs for the use of the application will occur. Carried out tests with both search engines have shown, that only minor differences between search results exist. Thanks due to sophisticated search logic, the archived hit rate for the correct URL is located between 70% and 80%. It is important however to bear in mind, that some customers, for instance individuals with no company given, do not dispose a website and therefore can impossibly be found with the search logic. This circumstance leads to an even higher hit rate if you take the absolute number of customers which effectively have a website. Furthermore, results that the search logic cannot clarify as the correct result will be marked as unsure. This feature will inform the end-user that a human interaction is needed by highlighting the unsure results in yellow.

The graphical user interface (GUI) comes with a straight-forward design to avoid complexity and to enable a comfortable use for the end-user. The main window contains only the necessary functions to proceed with the search and further configurations can be done in a separate properties window. Performance is always a key criteria for software, therefore the application architecture had to be adapted to run smoothly even with large datasets. Due to the fact that multithreading was introduced during the development phase, the shipped application now runs four times faster than the initial prototype.

To tackle the further processing of the gathered metadata an additional concept has been developed. It is based on a reference list, which contains common keywords of each industry. The application runs these keywords against the gathered metadata. If a keyword from the reference list occurs in the gathered metadata, the customer will be temporarily associated with this industry. This quantitative approach follows the theory that the more keywords were found in the metadata, the more likely it is, that this particular customer can be allocated with the according industry. Since SCHURTER, Inc. does not maintain such a list of keywords and industries, this approach is limited for now.

It has to be considered that not every company maintain their metadata well, so the results depend on the quality of the website. Samples have shown, that only around 50% of a customer’s websites contained useful and well-maintained metadata. However, these findings provide a proof-of-concept and can be used as a basis for further discussion.

To summarize the above mentioned points, it is to say, that all requirements have been achieved. SCHURTER, Inc. is now able to find out the customers website with a high hit rate in an extremely short time comparing the time they did this process manually. Also, they can now discuss further steps to proceed the gathered information and finally categorize their customers by industry segments. We recommend, that SCHURTER, Inc. consider to screen the market for professional datamining tools which are able to process the gathered data.

LESSONS LEARNED

What we have learned during the International Campus Experience 2015 is that communication is key for a successful project.
Talking to our project provider about the requirements shows us, that if you understand each other it helps a lot to deliver what is required.
Furthermore, we spent a lot of time on the architecture, which paid off in the end as we did not struggle with misunderstandings about the build of the software.
In addition to that, we made sure that we split tasks accordingly and in small portions. With this approach we were always aware who is working on what and where exactly we face problems.
A major improvement for the application was the performance boost through Threads. It was the first time we wrote an application using multiple threads.
A key finding during these 6 weeks was that you should always schedule enough time for testing. Even though it may not be very enjoyable it is crucial to deliver a good product to the client.

SOURCE CODE

If you are interested in our source code please have a look at GitHub.

Final-presentation_SCHURTER_Inc..pptx