Advertisers often want to know the some specific demographic information of a person, so that they can have better strategies to recommend advertising.
This project is based on Python that try to make user understand the content that people care in different region.
In this project we will select some people from two cities in the United States to compare the information published by these people to compare whether the people in these two cities are different in advertising terms. If the conclusions are indeed different, then what are the words they commonly use, and from the people's words, we understand the content of people care in different cities.
Development Process
Data collection: Collection data from RSS feed, building a interface of RSS feed.
Prepare the data: Parse the text file into a term vector.
Analyze the data: Check the terms to ensure the correctness of the analysis.
Training algorithm: Build the train function.
Test algorithm: Observe the error rate and make sure the classifier is available. The slicing program can be modified to reduce the error rate and improve the classification results.
Use the algorithm: Build a complete program that encapsulates everything. Given two RSS feeds, the program will display the most common public words.