Data Mining in Libraries


    Mining The Library Catalog: 
Emerging Trends, A Literature Survey by Dr. Mohamed Taher

To comment on this subject go to: InfoVis:Wiki


It allows us an opportunity to reinvent what people think a library catalog is - says James Michalko, RLG describing a project: Union Catalog on the Web and Catalogs: The next generation

"When the costs of mining a particular source -- say, an old-fashioned library card catalog, for instance -- begin to outweigh the residual benefits, we tend to switch to a new "information patch," say an Internet Web site, data- base or search engine;  User Interface Research  More...



A Literature Survey

What is Bibliomining?

The basic definition is "data mining for libraries."

For years, bibliometrics has been used to track patterns in authorship, citation, etc. Today, there are many more tools available for discovering similar patterns in complex datasets from data mining and statistics. In addition, tools from management science such as Online Analytical Processing (OLAP) can be used to explore the data for patterns.

Therefore, a more complex definition is:
Bibliomining is the combination of data mining, bibliometrics, statistics, and reporting tools used to extract patterns of behavior-based artifacts from library systems.

See other works of Scott Nicholson

  • Best Practices: Data Warehousing Project in Libraries by Joe Zucca at the University of Penn's Data Farm

  • e-Services Projects. Center for Software Engineering at USC.  Data Mining the Library Catalog (Team 14a)

  • A Glimpse of Trends in Developmental Perspectives:

The goal of the Normative Data Project for Libraries (NDP) is to compile transaction-level data from libraries throughout North America; to link library data with geographic, demographic, and other key types of data; and, thereby, to empower library decision-makers to compare and contrast their institutions with real-world industry norms on circulation, collections, finances, and other parameters.

About the OCLC WorldMap
The OCLC WorldMap is a prototype system that provides an interactive visual tool for selecting and ... Access the OCLC WorldMap and select a dataset. ... researchworks/worldmap/default.htm

Public Library Geographic Database (PLGDB) Mapping
Welcome to the Public Library Geographic Database (PLGDB). The database includes the locations of America’s 16,000 public libraries, population characteristics from the US Census that best describe people that use libraries, and library use statistics from the National Center for Educational Statistics. Florida State University's GeoLib Program ( ) is developing this first-ever National Public Library Geographic Database. The project partner is FSU’s Information Institute.



  • Information visualization aesthetics @ The Seattle Public Library:
    [Making visible the unvisible data visualization installation consisting of 6 large LCD screens located on a glass wall horizontally behind the librarians’ main information desk in the Seattle Central Library (designed by Rem Koolhaas)...
    floating titles shows the sequence of checked-out items into a linear representation based on time. dot matrix rain displays checked-out items by time & their Dewey Classification number.
    keyword map attack positions & colors keywords of checked out titles by the average of their Dewey subcategories links. an interesting quote from its main designer, George Legrady, on nwsource: "What's the result? 'It's information visualization aesthetics,' he said, beaming. 'I made it up.' [ &|via]. More ...; Related; About George Legrady]

  • Chau, May Y. Web Mining Technology and Academic Librarianship: Human-Machine Connections for the Twenty-First Century., First Monday, Volume 4, 1999
    [Abstract: John Naisbitt predicted in his book Megatrends (1980) that high technology would bring the need for "high human touch." This prediction is reflected in today's information-intense world. Due to the rapid development of technology, the library profession faces an uncertain future. Library professionals must use insight to identify technology's potential to benefit the academic library's role in the twenty-first century. This paper focuses on the human-machine connection between academic librarians and Web mining technology with respect to electronic reference service. The connection is featured in processes of: (a) identifying problems of electronic reference service; (b) selecting a technology to solve the problem; and, (c) envisioning the potential of the selected technology for librarianship. Scenarios address pertinent questions, including: (a) What role should librarians play to facilitate implementation of a technology? and, (b) What opportunities do technology offer to the profession in return?]

  • Banerjee, Kyle  Is Data Mining Right for Your Library? [1999?] 
    {Extract: Before committing to data mining technologies on a large scale libraries need to determine how data mining fits with existing resources and organizational goals. Generally speaking, data mining technologies are most beneficial to libraries that are interested in purchasing access to databases rather than physical materials. Full text, dynamically changing databases tend to be better suited to data mining technologies than the online catalog which is cumbersome and expensive to update. On the other hand, libraries concerned with providing long term access to physical items which exist within the library would be well advised to adopt a sit and wait attitude at this point -- especially since good access to these materials is provided through the online catalog.]

  • Humphreys, R. M. Beyond the Database -- ``Mining" the APS Catalog of the POSS I. American Astronomical Society, 195th AAS Meeting, #30.01; Bulletin of the American Astronomical Society, Vol. 31, p.1413 
    [Abstract The size of astronomical databases is growing rapidly; a more than 40 TB -- sized Internet -- wide astronomical dataset will soon exist. The size of the databases plus the complexity and variety of astronomical data present new computational challenges. To derive the maximum scientific benefit from this vast multi--wavelength resource will require efficient access, such as federated databases that can query several databases, and new software techniques and tools oftern referred to as ``data mining." The APS Catalog of the POSS I is an excellent resource for perfecting and testing these data mining techniques in an astronomical environment. The morphological classification of galaxies in the APS Catalog is presented as an example of applying these methods. The APS Project is supported by NASA and the University of Minnesota]

  • J.E. Wolff, J. Kalinski: "Mining Library Catalogues: Best-Match Retrieval based on Exact-Match Interfaces". Proceedings of the Int. Workshop on Issues and Applications of Database Technology (IADT'98). Society for Design and Process Science, IDPT-Vol.2, pages 376-383. Berlin, Germany. July 1998.Article (gzipped PostScript) [92 KB]


A model catalog:
  • AquaBrowser Overview How it works Features Public Libraries Academic Libraries Special Libraries FAQ Support AquaBrowser Library » OverviewSearch finds results that are relevant to you. AquaBrowser Library allows patrons to search for information using a standard query box, the way they are used to doing. The results are portrayed in a typical browser search list by relevance to the query terms given by the patrons. Clicking here takes the patron to the exact sources, both inside and outside the library. See the true colors

Back to Information Visualization