People's Concerns of Two Different Regions

Keywords: advertising words, RSS, Naive Bayes, machine learning, text segment, Python3

Introduction

Advertisers often want to know the some specific demographic information of a person, so that they can have better strategies to recommend advertising.
This project is based on Python that try to make user understand the content that people care in different region.
In this project we will select some people from two cities in the United States to compare the information published by these people to compare whether the people in these two cities are different in advertising terms. If the conclusions are indeed different, then what are the words they commonly use, and from the people's words, we understand the content of people care in different cities.

Data collection: Collection data from RSS feed, building a interface of RSS feed.
Prepare the data: Parse the text file into a term vector.
Analyze the data: Check the terms to ensure the correctness of the analysis.
Training algorithm: Build the train function.
Test algorithm: Observe the error rate and make sure the classifier is available. The slicing program can be modified to reduce the error rate and improve the classification results.
Use the algorithm: Build a complete program that encapsulates everything. Given two RSS feeds, the program will display the most common public words.

You can download the project in GitHub

Choose the RSS feed that you want to focus from Craigstlist.
Fill the key "key_words" that you need to know in "craigslist " function.
Run the project, it will return the area in which people more concern what you fill.

I pledge my honor that I have abided by the Stevens Graduate Student Code of Academic Integrity.

Google Sites

Report abuse