Competency E
Databases
Databases
Design, query, and evaluate information retrieval systems.
Introduction
Humans were visiting libraries to retrieve or get help retrieving information long before there were computers, but because of the subsequent digitization of that information, “the concept of information retrieval came to mean the retrieval of bibliographic information from stored document databases” (Chowdhury, 2019, p. 93). Databases and other information retrieval systems are now the predominant way we store, organize, and access the information.
The goal of an information retrieval system is to make it as quick and easy as possible for a user to find the information they seek. Competency E tells me that I must have intimate knowledge of such systems in order to best serve patrons. That knowledge extends to the principles guiding how to design and build information retrieval systems, how to search within a system in order for the it to return the best results, and how to evaluate the efficacy of different aspects of the system.
Design
Designing an information retrieval system begins with the questions, who is going to use it, and how are they likely to use it? Answering these questions requires thorough research of the user group, both prior to and during the design process. My coursework taught me that a well-designed system will be good not only at returning results that are relevant to this user group, but also at not providing irrelevant information. In the language of information retrieval, this means understanding the difference between “recall,” or the system’s ability to retrieve all relevant documents, and “precision,” or the system’s ability to retrieve only relevant documents.
One way a database designer can improve search results is through aggregation, discrimination, and disambiguation. For example, somebody might want to search a database for iPhones. Aggregation means the system returns all results that fit that description—that is, all records having to do with iPhones. Discrimination says the database can distinguish between records of iPhones and those of Android and other phones and only return the appropriate results. Disambiguation, meanwhile, makes it clear which term a person inputting data should assign to a record and which the user should search for. The user should not have to search both “iPhone” and “Apple phone” to find all the results they want. Similarly, if they want to search for Android phones, they should not be met with records about robots.
The information retrieval system designer has certain tools they can use to deliver upon these principles. Database designers must carefully define the attributes of the system’s records. In this case, the attributes are the characteristics of the documents that serve as the records. They are the terms that describe the document, and in particular, the ones that the user group might search for.
The designer might also wish to employ a controlled vocabulary, which is a list of standardized terms that users can search for in a database. Van Hooland and Verborgh (2014) wrote that a controlled vocabulary can help “avoid the problems which arise with the use of natural language during the indexing and retrieval of information” (p. 112). In my coursework building controlled vocabularies for databases, I learned how they make it easier for all people entering data into the system to do so consistently, using the same terms. This in turn makes it easier for a database user to be confident that they are retrieving all the records associated with that term—there is no other term they should be searching. Thus, it can improve recall. Furthermore, because it narrows down the choices available to the user without limiting the information retrievable, a controlled vocabulary can also improve precision—the user will get just the records that are associated with that term.
Querying
Perhaps the greatest lesson I learned from my coursework on information retrieval systems is the relationship between how information is stored and how to go about searching for that information. If information is stored, say, by subject rather than author, then, accordingly, if I search by author, my search will probably come up blank. This example is simplistic, yet nevertheless reflects the essence of this key principle of information retrieval system querying.
There are many different types of information retrieval systems, including full-text databases, online public access catalogs (OPACs), and web search engines, and successful interactions with any of them requires knowing the best way to query them. Most systems allow for different search strategies—you can find a certain resource by searching for the author or subject or both—but in each case, there are optimal strategies as well as limitations to querying. My coursework has made me a better searcher by making me aware of this fact and the need to research any given information retrieval system’s design and its resulting limitations and optimal strategies so that I can find better results faster.
Evaluation
Understanding the principles underlying designing and querying information retrieval systems is also essential to being able to fulfill another role I might be called upon to participate in as an information professional: evaluating systems for use or purchase at my organization. Databases can be expensive, so it is important to be able to compare different systems and choose the one that best meets the institution’s needs.
My coursework showed me the importance of evaluating different systems to compare how well they returned sought-after results, as measured by recall and precision. I learned to differentiate between a system’s interface, which is how the user interacts with it, and its functionality, which is how well it performs searches. Bad searches on the part of the user can lead to bad results, but I also learned how good databases can help foster good searches. For example, I now know that I should evaluate a system’s disambiguation according to how easy it is to determine which search terms I should use.
In evaluating a system, I would pay attention to how and why a database uses a controlled vocabulary, natural language searching of the full text of records, or a classification system. This last organization method uses categories in hierarchies (from general to specific) and can facilitate the discovery of records in different but related topics.
One more key to evaluating information retrieval systems is understanding the institution’s needs in the first place. As Weedman (2019) put it, “Concrete knowledge of both the users and the subject domain are important in evaluating as well as in designing an information system” (p. 17).
Evidence
Evidence 1: INFO 202 Information Retrieval and System Design – Database Prototype
As part of a group project, my teammates and I designed a database prototype in the web-based application WebData Pro. In this document, I was the primary writer of The Scenario section. Writing this section required me to think through how the user group would use the database, and to explain the database’s structure so as to reduce user confusion and improve search results.
Together, my teammates and I built out the Data Structure Plan table and defined the values to describe the attributes of the hats within the database. We then worked individually to write the definition and indexer entry rules for two fields each, with mine being Size and Gender. Finally, we worked as a group again to build the database in WebData pro.
This piece of evidence shows that I am able to think through and execute the steps necessary to design and build a database that makes it easy for users to find the information they seek. It taught me that when designing a database, there are two users to consider: the indexer and the target user. This stressed to me the importance of making sure indexer rules are clear, systematic, and consistent from field to field—rules that avoid being vague while still giving the indexer the freedom they need to create accurate and descriptive records for the end user.
Furthermore, WebData Pro is a popular database application and representative of similar programs, and this project shows my ability to learn and work in this type of technology.
Evidence 2: INFO 202 Information Retrieval System Design – Controlled Vocabulary
This piece of evidence displays my ability to build a controlled vocabulary for use in a database. Through this exercise I learned that such a list of standardized terms for indexing records can help ensure consistency in database entries and spare indexers and users from having to figure out which term to use, reducing confusion and time spent entering fields and searching.
In this group project, my teammates and I contributed three articles each to serve as records for the controlled vocabulary. My three were Mat-Hassan and Levene (2001), Nielsen (2013), and Zhang (2014). In successive steps we defined the target user group for the database, individually identified the main concepts for the records we contributed, rejoined as a group to turn those concepts first into draft terms and then a final vocabulary list, and, lastly, we indexed the records using that list.
This exercise demonstrates that I am able to work alone and on a team to consider and describe the intended audience for a database in detail, to then create a targeted and user-friendly controlled vocabulary specifically for that audience, and to use that vocabulary to index records, again with that user group in mind.
Evidence 3: INFO 210 Reference and Information Services – Database Evaluation
In sections 4 and 5 of this assignment, I was called upon to play the role of librarian in evaluating two similar databases under consideration for purchase. Throughout, I kept in mind Weedman’s (2019) edict that “The key to evaluating an information system is relevance—does it retrieve information the user wants and avoid information the user doesn't want?” (p. 16). These exercises let me explore and evaluate real-world databases I might encounter as an information professional in a healthcare industry setting. I learned to compare the effectiveness of different databases using criteria such as coverage (the number and extent of resources), searchability (the control the database gives users in conducting a search via advanced options), discovery of content through features such as filters to refine results, and currency.
In section 3, I accessed and compared results from different databases within the same topic area (healthcare). This gave me insights into the different sources that similar searches on different databases can yield. It also made me aware of the different ways that databases can be organized, as evidenced by the database’s search fields and types of sources (e.g., video vs. only text-based documents).
In section 6, I compared different web portals for bibliographic information. Here I considered components such as usability, organization, thoroughness of record fields, and discovery of similar resources. This was a valuable exercise in critically evaluating web portals and search engines for my own use on the job and for recommending to patrons.
Together the exercises in this assignment gave me a good deal of hands-on experience with different types of information retrieval systems, and provided me the guidance and tools I could apply to evaluating them methodically and objectively in a future job.
Conclusion
The abilities to design, query, and evaluate different types of information retrieval systems all rest upon knowledge of some of the same key principles. Some of these principles are more abstract, such as understanding the user group and their information needs and searching behaviors, and some are more concrete, such as the technical aspects of building a database in an application such as WebData Pro. One of the most important principles I will keep in mind is the effect the information retrieval system’s organization has upon the best way to search that system. Armed with these principles, I believe I am set up for success in a future role that might call upon me to work in most any capacity with databases, web portals, search engines, and other types of information retrieval systems.
References
Chowdhury, G. G. (2019). Basic concepts of information retrieval. In V. M. Tucker (Ed.), Information retrieval system design: Principles & practice (6th ed., pp. 93–105). AcademicPub/XanEdu.
van Hooland, S., & Verborgh, R. (2014). Linked data for libraries, archives and museums: How to clean, link and publish your metadata. https://ebookcentral.proquest.com/lib/sjsu/detail.action?docID=1993231
Weedman, J. (2019). Information retrieval: Designing, querying, and evaluating information systems. In V. M. Tucker (Ed.), Information retrieval system design: Principles & practice (6th ed., pp. 6–20). AcademicPub/XanEdu.