Dr. Bichen Shi
 Postdoctal Researcher
 University College Dublin,
 Dublin, Ireland

 Email: Bichen.Shi@insight-centre.org

Research Interests:
  • Machine learning from massive streams.
  • Real-time large-scale text mining in news and social media data. 

  • Doctor of Philosophy (Ph.D.) in Computer Science, University College Dublin, Dublin, Ireland (Sep 2013 - Nov 2017)
       Thesis: Real-time Learning for News and Social Streams (pdf)
Supervisor: Dr. Neil Hurley and Dr.Georgiana Ifrim

  • Master's degree by research (M.Sc.) in Computer Science, University College Cork, Cork, Ireland (2013)
Thesis: A Machine Learning Approach to Estimating the Smoothed Complexity of Sorting Algorithms (pdf)

  • Bachelor's degree (B.Sc.) in Computer Science, University College Cork, Cork, Ireland (2012)
  • Bachelor's degree (B.Sc.) in Computer Science, Beijing Technology and Business University, Beijing, China (2012)

  • B Shi, G Poghosyan, G Ifrim and N Hurley. Hashtagger+: Efficient High-Coverage Social Tagging of Streaming News. IEEE Transactions on Knowledge and Data Engineering (TKDE), IEEE, 2017 (IF 3.4). (PDF)
  • G Ifrim, D Green, M T.Keane, C Orellana-Rodriguez, B Shi and G Poghosyan. On Supporting Digital Journalism: Case Studies in Co-Designing Journalistic Tools, Computation+Journalism Symposium, October 2017.(PDF)
  • T Mai, B Shi, PK Nicholson, D Ajwani and A Sala Scalable Disambiguation System Capturing Individualities of Mentions. International Conference on Language, Data and Knowledge (LDK), Galway, Ireland, June 2017. (PDF)
  • B Shi, G Ifrim and N Hurley. Learning-to-Rank for Real-Time High- Precision Hashtag Recommendation for Streaming News. The 25th International World Wide Web Conference (WWW), Montreal, Canada, April 2016. (PDF)
  • B Shi, G Ifrim and N Hurley. Insight4news: Connecting News to Relevant Social Conversations. The European Conference on Machine Learning & Principles and Practice of Knowledge Discovery in Databases (ECML/PKDD), Demo Track. Nancy, France, September 2014. (PDF)
  • B Shi, G Ifrim and N Hurley. Be In The Know: Connecting News Articles to Relevant Twitter Conversations. The European Conference on Machine Learning & Principles and Practice of Knowledge Discovery in Databases (ECML/PKDD), PhD Track. Nancy, France, September 2014.(PDF)
  • G Ifrim, B Shi and I Brigadir. Event Detection in Twitter using Aggressive Filtering and Hierarchical Tweet Clustering. Second Workshop on Social News on the Web (SNOW) at the 23rd International World Wide Web Conference(WWW), Seoul, Korea, April 2014. Data Challenge Winner, 1st Prize. (PDF)
  • B Shi, M Schellekens, and G Ifrim. A Machine Learning Approach to Estimating the Smoothed Complexity of Sorting Algorithms. arXiv:1503.06572, 2015 (pdf)
  • Schellekens, Michel P.; Hennessy, Aoife; Shi, Bichen. 2014. Modular smoothed analysis. [Preprint] (pdf)

Projects & Downloads:
  • Story Disambiguation: Tracking Evolving News Stories across News and Social Stream (2017)
We join the area of topic tracking and entity disambiguation, and propose a framework named Story Disambiguation: given a target story, we aim to effectively classify streaming documents (e.g., news articles, blogs, comments, posts, and tweets) as to whether or not they belong to a pre-specified news story.
Label Data: here
  • PML: Per-Mention Learning Named Entity Disambiguation (Internship at Bell Labs, Nokia, 2016)
We propose a novel per-mention entity disambiguation approach that is both accurate and fast at runtime. Our approach aims at learning the individual peculiarities of entities (words and phrases) in the English language and learns a specialized classifier for each ambiguous phrase (i.e., mention).  
Reference: PDF
Insight4News is a system that connects news articles to social conversations, in order to provide a richer context for ongoing and past news stories. The system extracts relevant topics that summarise the tweet activity around each article, recommends relevant hashtags, and presents complementary views and statistics on the tweet activity, related news articles, and timeline of the story with regards to Twitter reaction. 
Label Data for Hashtagger+: here
Reference: PDF

Extract newsworthy topics from given Twitter stream every 15 minus.
Data collection & Python2 code(final version): here 
Reference: PDF
  • Machine Learning & Smoothed Complexity (Ms.c. project, UCC, 2013)
Estimating the smoothed complexity of sorting algorithms using a machine learning approach(linear/non-linear regression & surface fitting)
Reference: PDF     
  • Machine Learning with Tree Search for Connect-4 (B.Sc. final year project, UCC, 2012) 
A connect-4 game agent using unsupervised learning(TD-predication) and tree search(Minmax)
Reference: PDF