Literature Review

While conducting research for my final project about text-mining, I found that the following sources were the most informative.  Thus, for the literature review section of my Practicum project, I discuss how these articles related to my topic as well as how they offered me a unique perspective on text-mining software. 

  • Text Mining and Data Mining in Knowledge Organization and Discovery: The Making of Knowledge-Based Products


This article by L.J. Haravu and A. Neelameghan offers information about the advances in text and data-mining in knowledge-based software.  After research and experience using these technologies, both Haravu and Neelameghan suggest two essential methods of creating these platforms.  First, using natural language processing software in text mining, and second, the planning, designing, and developing of a comprehensive multiple media product that would satisfy their target audiences’ needs.  The authors also use text-mining software to link concept terms from a processed text to a related thesaurus, similar to the process at Condé Nast.  They trust that these text and data-mining products can only become more useful if the features of a subject classification system are incorporated into text mining techniques and products.  In other words, the specialized role of human language technologies in the library and information science venue has the potential to become standardized, and thus predicted.


Haravu, L.J. and A. Neelameghan. “Text Mining and Data Mining in Knowledge Organization and Discovery: The Making of Knowledge-Based Products.” Knowledge Organization and Classification in International Information Retrieval. Ed. Nancy J. Williamson and Clare Beghtol. Binghamton, NY: Haworth, 2003.

  • Mining and Tracking Evolving Web User Trends from Large Web Server Logs


This article, written by computer engineers, closely relates to Condé Nast’s text mining software objectives by discussing web usage mining, user profiles, web analytics, and data streams.  In other words, recently, current publishing organizations have started dedicating its resources to tracking various users’ behavior on their online databases to better understand and satisfy their needs.  As a direct result, web usage mining tools were developed to help them use web logs to discover usage patterns and profiles.  Many publishing companies refer to this information as valuable evidence or case studies for usability.  In addition, with this data, companies like Condé Nast are better able to generate accurate text-mining languages that will best satisfy their target audiences. 


Hawwash, Basheer and Olfa Nasraoui. “Mining and Tracking Evolving Web User Trends from Large Web Server Logs.” Statistical Analysis and Data Mining. Vol. 3 (2). Wiley Periodicals, Inc: MA. 03/11/ 2010. Pg. 106-125.


  • Nstein’s TME 5.0: Optimize Your Web Content for the Semantic Web


This article is a review about TME (Text Mining Engine), a software program—presented by NStein—that is closely related to the [unnamed text-mining software company] that Condé Nast uses for their keworder project.  One of the author David Roe’s more important points connects the semantic web with text analytics.  Condé Nast adopted a similar program when they started facing a content and unstructured data overload.  By applying a layer of semantically interpreted metadata alongside the content, this program has capabilities to release this information, making it more visible, understandable, organized, centralized, and ready for analysis.  This software company’s technology automatically annotates the unstructured data, identifying context; meaning, people, categories, and entities, outputting a standardized language.  Many of these text-mining software companies claim to identify nuance and meaning in content, preparing it for various applications.  My experience with this position at Condé Nast, and my project with the Vogue Digital Archive however, has proved this to not be the case.  While here, I was required to edit this output, while providing it with specific text-mining vocabularies. 


Roe, David. “Nstein’s TME 5.0: Optimize Your Web Content for the Semantic Web.” CMSWire. Simpler Media Group, Inc. 6/08/2009.



  • Developing Affective Lexical Resources


This article addresses lexicons and their connection to text mining.  In linguistics, the lexicon of a language is its expressions, words, and vocabularies.  In other words, lexicons, similar to text mining, are a language’s inventory of lexemes, or combination pattern.  Efficient computing is consistently advancing as a field, and allows new forms of human-computer interactions, in addition to the use of a standardized natural language.  There is a common perception that the future of human-computer interaction lies in themes such as entertainment, aesthetics, and publishers, to name a few.  This article helps with studying the relationship between natural language and effective information and dealing with its computational treatment, while valuing this practice as crucial to future development.  Later in this article, the authors present another linguistic resource for a lexical representation of affective knowledge that competes with Condé Nast’s text-mining software, called WORDNET-AFFECT.


Valitutti, Alessandro, and Carlo Strapparava. “Developing Affective Lexical Resources.” Psychology Journal 2 (1): 2004. Pg. 61-83.


  • Template Matching Techniques in Computer Vision: Theory and Practice


This resource is about pattern recognition, which aims to classify data, or specific patterns based on statistical information extracted from patterns by computers.  The patterns classified are defining points in a multidimensional space.  In terms of Condé Nast, after they receive the automatic text mining software results, and manually correct the computer generated information into an accurate language, they analyze these findings to create their own vocabularies and patterns to dictate their target audience’s customary language.  Condé Nast also uses facial recognition technology in their model identification projects.  This source discusses the detection of objects in images, and its relation to interpretation of data.  It also focuses on template matching a subset of object recognition techniques of wide applicability, which has proved to be particularly effective for face recognition applications. 


Brunelli, Roberto.  Template Matching Techniques in Computer Vision: Theory and Practice. Wiley Publications: MA, 2009.



**Note: Because this website is open to the public, I have omitted all confidential company information and specific software names.

Hadass Blank,
Aug 17, 2010, 9:11 AM