Overview
In this meeting we continued to share our progress with the Exploratory software. We continued to make graphs based off token matches. We also were able to export the files for the majority of the data and compiled them. The team also discussed how the work was going to be divided up for the milestone presentation and discussed how to get started on using the KHCoder and discussed the report.
Overview
In this meeting we shared our progress so far in Exploratory. We were able to figure out how to also gather data through Exploratory itself rather than simply relying on Tweepy for all data collection. This helped with utilizing other data cells to make connections between popular words used within the tweets. We were able to see that data displayed on a bar graph and are also working to connect word pairs which will be displayed with a heat map. We also discussed the upcoming milestone and the document requirements that will need to be submitted.
Activities Meeting Log:
This week we focused on downloading and getting acclimated with Exploratory, which is a data analytics tool that we will be using to analyze the data we collected during the first portion of this project. Exploratory offers a plethora of options and tools in regard to analysis so we’ve been working on understanding the platform and its uses through their tutorials and material. We have begun to create projects in Exploratory using the .csv files that were created during our first milestone and hope to start seeing some connections amongst them.
Group 3 Milestone Meeting
Activities Meeting Log:
This week was more about continuing to collect data utilizing the Tweepy library. This has proven to be the most effective for us when pulling data and will later be used to identify relationships. Currently we’re pulling data from selected hashtags while also researching the most effective hashtags that will yield us the most valuable data. This includes searching hashtags around specific events or trends on twitter, such as gay pride events. We’ve also considered utilizing PRAW to extract data from Reddit but have yet been able to get a successful outcome. However, attempts are still being made so we have an extra source that we can possibly use to make comparisons with on twitter. We also discussed the goals for the next upcoming milestone which is to download, test, and utilize Exploratory software to analyze and find relationships amongst our data.
Week 2 Activities Meeting Log:
This week started out with a breakthrough in the web scraping portion of this project. The main issue we were using when trying to use the Tweepy library was that the permissions on our Twitter application were set to read and write only and not read write and not read write and direct message. After solving that error we were able to start mining data through Twitters API’s. Some of the hashtags we have explored so far are pride, gay, and transgender. After looking closely at the data some of the other hashtags that were included on most of the tweets included other sub communities of the LGBT+ community. While this is not surprising seeing as the LGBT+ community is a rather large one there were other hashtags such as depression and anxiety which tell us that there may be a large number of LGBT+ youth suffering with these mental health conditions. Lastly the group has decided to have an in person meeting Tuesday February 12th at 5pm. During this time frame we will finish up tasks that need to be completed before the first milestone and discuss goals and deadlines for the second milestone.
Week 1 Activities Meeting:
This week we researched several different open source ways to get data off of Twitter. The main reason we chose to web scrape Twitter was that the demographic we are researching which in this case is LGBT+ youth mainly use Twitter as their primary social media. This information was obtained by speaking with the vice president of the Kennesaw Pride Alliance. After collecting several different web scraping options we watched several Youtube tutorials before deciding to test one. For most of the tutorials, we watched a valid Twitter account was needed along with python version 2.7 or higher and the Anaconda command prompt. The main one we started to test was called Tweepy which is open source and hosted on GitHub and is known to communicate with the Twitter by using its APIs. In order to obtain the APIs, we had to create an application on Twitter and fill out what we were intending to do with the data. After that was approved and we downloaded Anaconda in order to download the python library Tweepy. Currently, we are having problems once we run the code it states that Tweepy is not a module that exists but if you check in Anaconda it says that Tweepy is already downloaded successfully. We are currently exploring online for solutions and will visit the CCSE tutoring lab to see if they may have a solution for this bug. If there are no other solutions we will try to use another open source web scrapper called Scrapy and use a library called beautifulsoup instead of Tweepy.