I wrote some boneheaded Java code to grab news and all reader comments from an online newspaper, and parse the html and text with JSoup to produce graphml files showing the article as a root and all comments under it. The nodes were assigned the text on the article and comments plus meta data such as the comment author, date, time, to whom they were responding, etc. I did this about a year ago after the Snowden revelations with an eye toward performing some textual and network analysis...
D:\java\GetNewsAndComments>javac -cp .;./* ReadNews6.java
D:\java\GetNewsAndComments>java -cp .;./* ReadNews6
NSA warned to rein in surveillance as agency reveals even greater scope
NSA officials testify to angry House panel that agency can perform 'three-hop queries' through Americans' data and records
Spencer Ackerman
Wednesday 17 July 2013 15.19 EDT
Total Comments 824
D:\java\GetNewsAndComments>javac -cp .;./* ReadNews7.java
D:\java\GetNewsAndComments>java -cp .;./* ReadNews7
NSA warned to rein in surveillance as agency reveals even greater scope
NSA officials testify to angry House panel that agency can perform 'three-hop queries' through Americans' data and records
Spencer Ackerman
Wednesday 17 July 2013 15.19 EDT
Total Comments 824
D:\java\GetNewsAndComments>
News Articles from The Guardian:
NSA warned to rein in surveillance as agency reveals even greater scope
White House stays silent on renewal of NSA data collection order
Revealed: US spy operation that manipulates social media
yEd circular layout of the 1st article
Maybe connect nodes by author with undirected edges of a separate color, hmm... each author becomes a clique in the graph... arrange nodes clockwise by date and time of post...
yEd Organic layout with comments connected by author in red
Of course this article links to other articles on the web and the comments often link to other articles...
How to identify author by style? Idiosyncratic misspellings and grammar errors choice of words, need dictionary and thesaurus for semantic analysis to guage demeanor, beliefs, politics, can a computer detect sarcasm?
Formula to detect an author's literary 'fingerprint'
The meta book and size-dependent properties of written language
Tools: