Design, query, and evaluate information retrieval systems.
Information retrieval systems are a powerful tool of catalogs, and allow users of all kinds to browse through collections of games, products, and of course, in the cases of libraries, books. However, there is much more going on below the surface of these searches than may be first assumed. Specific tagging tools are used to classify information, different types of search engines are utilized for different items, and ease of access is (or should always be) considered on behalf of the user. Additionally, search strategies and search syntax can drill down to specific topics and ensure relevant results. Finally, the effectiveness of information retrieval systems can be evaluated through how well they produce the desired results. By considering all of these elements, the design, query process, and overall effectiveness of information retrieval systems can be judged – and improved.
In order to create an information retrieval system, the first thing to be considered is the content. Such systems are designed so that individuals may search for specific artifacts, according to their most notable characteristics. These metadata field tags are selected based on the needs (or perceived needs) of the user. For videos, tags like level of quality, lengths, and creator could be discerning factors. For physical objects like socks, tags could include style, color, and material. Regardless of the specific metadata attached to a digital object, the fact that it can be filtered and searched for is a crucial part of the information retrieval system design process (Tucker, 2021). Designers must determine what fields are searchable, decide whether they operate via text entry or dropdown selection, and whether they can be toggled on or off to allow users to search for items with the most flexibility.
However, the onus of effectively utilizing an information retrieval system is not entirely placed upon the designers. While they are required to create the system according to the uses that they perceive as the most critical, specific search strategies and syntax can be utilized on the user-side of a search to improve the quality of the results. One of the easiest ways to yield high-quality search results, regardless of designer implementation, is with Boolean operators. These operators are words such as AND, OR, NOT (Owais, 2006), and can be included in searches to preclude or yield certain results. For example, if one was looking for information on a state, but not an archeologist, one could use the search ‘Indiana NOT Jones.’ By understanding the overlapping Venn diagrams that can be created by Boolean logic, users can become better equipped to browse for their intended material.
Once a system has been designed and the users have applied search strategies to browse the contents of said system, there is still one element left to consider: evaluation. An information retrieval system can only truly be judged based on the results it delivers. To explore this, there are two main ideas that must be discussed: precision and recall. Precision judges a system based on what it excludes, or in other words, how relevant the data retrieved is to the initial query. Usually measured as a percentage, precision therefore compares the total number of results to the total of related documents (Sadeli & Lawanda, 2023). Recall, on the other hand, is a measure of what is included. The number of documents that a system is able to pull based on a query determines its recall level (Sadeli & Lawanda, 2023). Ideally, an information retrieval system is able to, after having a query asked of it, pull all relevant results, yielding both high recall and precision.
In order to ensure that both high recall and precision are present in an information retrieval system, extensive testing must be done. There is a specific design process, outlined in Weedman’s steps of implementation (2008), that emphasized the key step of user feedback. Often, designers can inadvertently have blinders on when creating an information retrieval system – and so it is essential to test systems again throughout development, so as to avoid frustrating one’s userbase by the time deployment occurs.
Ultimately, the goal of an information retrieval system is to make it easy for an individual to access and acquire information from a database. By educating ourselves and others on proper search syntax, applying appropriate metadata tags, and using extensive testing to ensure high precision and recall, we can develop such systems.
My experience in designing, querying, and evaluating information retrieval systems prior to starting my MLIS was minimal. I knew the basics of Boolean search terms, and was proficient at finding the subjects that I was looking for online. However, as I took courses like INFO 202: Information Retrieval, INFO 248: Introduction to Cataloging and Classification, and INFO 281: Metadata, I found myself falling in love with the complexity of such systems.
I love cataloging and classifying information – my computer is filled with meticulously cataloged folders, all arranged neatly in their appropriate locations, my clothes and my desk and my bookshelves are similarly classified. When everything has its place, I feel at ease and better able to work – and understanding a new way to classify and organize information is therefore a delight. I have taken to these courses about information retrieval extremely enthusiastically, and been able to immediately utilize what I have learned in other courses. I have been able to recognize poor design on websites and how to better organize them so that their informaiton could be retrieved easier. I have considered more what elements are most relevant when determining what metadata elements I wish to use for my own tagging systems. Finally, after INFO 248, I would regularly look up MARC records to identify information about books, and my overall ability to search for information has drastically improved. I have been able to build further on my experiences with my HTML course this semester, and feel that without the previous classwork, I would feel much more mystified. Overall, reviewing the processes that permit effective information retrieval has been able to make me a more competent reference librarian.
While my initial level of knowledge with designing, querying, and evaluating information retrieval systems was slight, I have truly enjoyed filling in this gap in my knowledge. To demonstrate the proficiency I have gained, I assembled three projects that represent my increasing knowledge over the course of my degree. The first of these is the Yarners of America Database, created with my groupmates in INFO 202. This was our final project, a system that included several types of yarn, and allowed others to add to our database according to our preselected metadata categories. We explored a variety of options for classifying our metadata, but ultimately created a robust system that stood up to the scrutiny of our peers. My second document also came from INFO 202, in the form of our final group project. Here, the same team that I worked to create the Yarners of America database attempted to redesign the San Jose Public Library website. At present, the website was a confusing, messy site – one that limited the ability of even well-informed Boolean searchers to find results. Our edits included a reinterpretation of the site map overall, and the combination and elimination of minimally used categories to design a much more easily navigable site. Finally, the last document that I believe represents my understand of how to navigate information retrieval systems was a search activity I completed in INFO 210: Reference Information Services. Being a reference staffperson means being able to utilize proper search syntax oneself, and this assignment challenged me to use Boolean operators and a variety of information retrieval systems to produce certain results. I believe that when taken in aggregate, this collection of evidence displays the growing progress I have made in understanding and utilizing information retrieval systems across my coursework.
The Yarners of America Database was created by my group to classify and categorize different kinds of yarn. We selected yarn because it has several key qualities that we could use for metadata: color, material, weight, brand, and others. Throughout the development of this website, we tested our site rigorously, and tried to find ways that we could break our classification – we initially found that multicolor yarn was a problem. However, thanks to our design process, as we noticed these issues, we were able to resolve them by adding another metadata field: Multicolor: Yes/No. This project also taught us to be sure to practice user-focused development. In addition to elements that would be important to the creation of a fabric art like color, we also included metadata that would define the final product, like washing instructions and price. We learned a lot from this project, and it laid an excellent groundwork for my future understanding of metadata.
For our final group project in INFO 202, my group met once again to attempt to improve the Books and eResources section of the San Jose Public Library website. One of our group members had been employed by this system for a while, and had noticed that a recent update to the site did more to confuse patrons than assist them. Many links were ambiguous, and failed to appropriately reflect the pages that they led to – leading to a very imprecise web search experience. Furthermore, the groupings of the site map felt unintuitive, as the differences between ‘Books and More,’ ‘eBooks and Other Formats,’ and ‘eResources’ would appear to have several overlapping categories. In order to alleviate the most egregious of these issues, we proposed a site redesign. All links were to be checked and appropriately routed. A new site map was created, one that simply categorized items as ‘Books,’ ‘eResources,’ and ‘More.’ We also added categories for resources that were present at the library, but not mentioned openly on the website previously. This exercise allowed us to examine a site that could use some assistance in better providing direction to its patrons, and attempt to apply our understanding of information retrieval systems to improve it.
It is not enough to be merely aware of search strategies and syntax – such skills must be practiced and honed in order to stay sharp. As such, I was excited for the opportunity to utilize Boolean terms and quotation marks in ProQuest database searches. In this assignment, I was able to prune the results I had found by using AND, and expand my results with OR. This practice also reminded us how to filter results by scholarly journals, and the difference between those and peer-reviewed ones. Finally, I was offered the opportunity to explore a topic of my choice (I picked Neon Knights by Black Sabbath) through different search engines to observe what different resources were pulled. This helped to illustrate the concept of recall, and how different search engines will pull from different sources. Overall, I felt like this assignment did a good job of refreshing my familiarity with good search strategies.
The design process and inner workings of information retrieval systems are fascinating to me, and opened to me the possibility that I might like working in the back-end of a library system, should I ever burn out on interacting with the public. The ability to accurately and specifically quantify information is extremely impressive, and I know that the knowledge that I have gained as a result of fulfilling Competency E is going to assist me with organizing my own information even more logically. Being able to refresh myself on the processes that enable effective searching has allowed me to be a better reference librarian, and designing my own IR websites was exactly the kind of fun challenge I enjoy working on. As this was the area that I felt I had the least expertise in at the beginning of my MLIS program, it is now one of the ones that I feel I have made the most satisfying progress in.
Owais, S. S. J. (2006). Optimization of Boolean queries in information retrieval systems using genetic algorithms - Genetic programming and fuzzy logic. Journal of Digital Information Management, 4(4), 249–254.
Sadeli, A. F., & Lawanda, I. I. (2023). Recall, Precision, and F-Measure for Evaluating Information Retrieval System in Electronic Document Management Systems (EDMS). Khizanah Al-Hikmah (Online), 11(2), 231–241. https://doi.org/10.24252/kah.v11i2a8
Tucker, V.M. (Ed.). (2021). Information retrieval system design: Principles & practice (Edition 6.1). AcademicPub/XanEdu.
Weedman, J. (2008). Information retrieval: Designing, querying, and evaluating information systems. In Haycock, K. and Sheldon, B. (Eds.) The Portable MLIS. Westport, CN: Libraries Unlimited.