As a professional learning tool, I wonder if this might be useful to educators and their evaluators when building a professional learning plan. If a teacher designs professional goals, often documented on a Professional Growth Plan, around certain technology competencies, his/her mentor could peruse a database of topics from a knowledge base, much like the FOCI portfolios cited by Gupta & Lehal (2009). Although designed for "competitive intelligence" (Gupta & Lehal, 2009, p. 68), I can envision a teacher searching for relevant resources and content by keywords that were gathered, organized and published by a system with "a user interface front end for graphical visualization and users interaction" (Gupta & Lehal, 2009, p. 69). The goal would be for information gathering and sharing among members of a DPLN to be easier to weed through than Twitter currently allows based on their own self-serving algorithms.
As an instructional technology facilitator, I encourage my teachers to join Twitter as they are comfortable and start joining chats even as lurkers in the beginning. My own identity as a "connected" educator has been slow and steady, having joined in 2009 and not beginning to really see its value for my current role until 2015. I even used it to research the school district I joined last year, vetting its online presence and public "face".
Limitations and Considerations
operating system without NodeXL
I would like to hand over this corpus to someone who begin the actual supervised machine learning process with fidelity. Particularly, based on the initial data set that could be mined in the future after the algorithm has been trained, I am very interested in seeing a bag of words from this data based on a larger corpus from NCTIES. Once the supervised algorithm learned from categories, it could be applied to the full data set and return other topics ranked by frequency.
If I had access to a machine that could run NodeXL, I would run the data set rather than hand code in order to train the model. I am curious what topic modeling the algorithms would pick up on that I have missed due to my novice status.
I would also have kept a larger sample size, but needed to make it small enough for me to manually label in a short amount of time.
One choice I am particularly happy about in selecting the sample size is sorting the text column from A-Z so that tweets were not sorted by user. I believe that if I had chosen text that was inadvertently sorted by Twitter user, it would present an artificial picture if certain "power" users were over-represented.
Before importing this data set as an input, I would spend more time cleaning up the text to rid it of symbols that were emojis or other non-text.
Another challenge presents itself in the form of labeling that requires analysis of pictures from hyperlinks. The example in figure 10 below indicates that the label is unclear until one click on the hyperlink which points to an Edutopia article on virtual reality and empathy. This can't be discerned from text alone.
figure 10. snapshot of text from a Tweet labeled "both" for "content" and "celebration" based on where the URL points
A challenge going forward when applying machine learning to this multi-label data set occurs if a set data set isn't captured and considered to be a "snapshot" in the ever-changing Twitter feed. This appears to be problematic for training the model "as the number of stream records are unprecedently large and it is impractical to label all of them for model training" (Wang, Zhang & Guo, 2012, p. 1131). Those taking this to the next step could consult the "ensemble-based active learning framework" (Wang, Zhang & Guo, 2012, p. 1140) which they argue can address the challenge of training models with this much data and multiple labels.