Mazur-Social_Network_Analysis

Cleaning the Corpus

Corpus Cleanup

preparing the text

I decided to delete text from a marketing user that only contained photos of attendees using their product. I also deleted any text that mentioned the user @gcouros based on data from the initial social network analysis which identified him as a main influencer of #NCTIES17. See figure 3. This action removed 1108 textless tweets from the corpus.

As a keynote speaker and presenter of multiple sessions throughout the conference, most tweets in the text were positive sentiments about Mr. Couros. Text included quotes that appeared to resonate with users, affirmations or agreements, such as "yes! yes! yes!" and nothing more contextual that is pertinent to the question of content/knowledge contribution. Again, the repetitive and affirming natures of this text did not lend itself to my question, so they were removed from the sample.

figure 2. Social Network Gephi of George Couros (gcouros) as a main influencer and Richard Byrne (rmbyrne) as an additional major player at #NCTIES17

I almost removed text related to the other major influence on the DPLN as revealed in the social network analysis, Richard Byrne. However, as I skimmed the 22 tweets, I realized that more of these were sharing of resources which spoke directly to my question. I did not delete them from the final data set. See figure 3.

figure 3. pivot filter of text mentioning Richard Byrne

The next set of text that I removed to make a more manageable sample for labeling came from a pivot filter of any tweets that included "thank", "shoutout" and "proud" in the tweet. My reasoning was to remove text that affirmed participation and gratitude. While these are valuable, they do not help answer the question and do not offer text with content. See figure 4. This filtering brought down the sample size to 3,479. For my purposes, I decided to choose a 1% sample size, which I selected from a random row and highlighted up the document until I had 347 rows. Then I deleted any duplicates, which further reduced the sample size.

figure 4. text containing "thank" before filtering out

Google Sites

Report abuse