What roles Cambridge Online Learning Community followers self identify themselves with

Post date: 31-Jul-2016 07:07:41

As most people who work with large sets of data know it does take a long time to sift through them. Spreadsheets and csv file made move to #BigData easier in the beginning. Now we have #AI doing a lot of the sorting as has been seen in the some of the posts on twitter having a # tag or no # tag can make a big difference as to what is picked up and categorised and what becomes the reamainder without a home. Twitter lists are a good way of sorting some of your data visually if you are not happy at looking at csv files it also picks up on those pesky missing # tags.

However if you decide to sort you data at some point after you started getting followers to small business or community group you may struggle if you want to do it all yourself. Time poor people may not have the time to sort their lists. The "data mining" excercise I am about to carry out has the purpose trying to give feedback to those that have taken part as followers in @cambolc up to 1200 UTC +1 on the 30th July 2016 and those who who joined since and will be part of the second #IoT week. We have at present (today) 1148 followers. That is #IoT Week 1 and #IoT Week 2. #IoT Week 2 has 24 members a glance at the little silent movie (except for squeek from mouse) we could do some analysis now and work out the percentage of the followers were present for #IoT week 1

  • 1148-24 gives the number for #IoT week 1
  • (1124/1148) *100 therefore gives us the percentage of 98% after a little bit of rounding up. So 2% of @cambolc have joined since 1200 UTC+1

This data will be constantly changing. The individual lists may overlap with other lists, for instance the business list will include #IoT Week 2 and #IoT Week 2. In order to separate them out you would need the raw data from the Twitter analytics. What you are seeing in twitter lists is the processed data by twitter on one category across all your followers.

Twitter lists appear to be arranged in that the last created list is found at the top of the page. Count down and you will find whichever one happens to be list created 10th from last. To help make the spreadsheet easier to handle and more importantly to display I have given the category or list a number with 1 being the last created.

Table 1. The names of the numbered lists (will be filling in the table over the next day or 2 as categories sorting more important. Old fashioned look and see from one tab to another and counting down to find which number matches which bar int he chart )

The spreadsheet is set up to calculate the percentage of each category based on how people I have classified by looking at their profile. By doing this on a relatively small sample (if 1000+ is small) I also get a chance to get a "feel" fro the data and so would hopefully be able to spot anything glaringly wrong. I am taking the whole data set as obviously every follower matters and this almost a census of the group. This may take some time but as people are categorised and lists change the spreadsheet will act as a snap shot at a particular time. Saving a copy of the spreadsheet with a different name at that point will allow us to compare points in time. If the analysis is being done by a real person this may take some time. Later in the week I will probably download the .csv file and produce another spreadsheet. But for now this is a crude analysis that anybody could do using Google Sheets.

I have a bit of work to do with the Google Sheets Help pages to find out how to select part of the sheet to embed I suspect it is in the Chart Gadget rather tahna the spreadheet. It is a while since I last inserted a sheet possible 3 years to I will use the Google Effect.