E. Design, query, and evaluate information retrieval systems
INTRODUCTION
Information retrieval (IR) is at the heart of librarianship. Being able to accurately and effectively, design, access and evaluate information is crucial to any information professional’s skill set. This entails understanding the ways various information systems store and catalog data. Although configured differently, all search engines have the same basic components. An information system can be anything that stores collections of information in an organized fashion. Phone books, websites, magazines, filing cabinets, and apple music are all examples of information systems. For the purposes of this paper, the type of information systems discussed will be search engines related to libraries such as databases and Online Public Access Catalogs (OPACS). The key to IR is knowing how information is stored, cataloged, and how to retrieve it “retrieval depends on two things: the ability of the searcher to construct an incisive query and the ability of the designer to incorporate features that will result in the query retrieving documents with the desired attributes” (Weedman, 2018, p. 184). In this document I will discuss three aspects of IR: design, query, and evaluation.
DESIGN
The key elements of a search engine consist of its documents, index, ranking model, user query, and results retrieved. Search engines are IR programs designed to search for and identify items in a database—records of documents—that correspond to keywords or characters specified by the searcher. In the library world, the word document or “object in a collection” (Tucker, n.d.b, slide 2) is the term used to represent “any information-bearing entity” (Weedman, 2017c, p. 29) or data being stored. Designers must have a clear understanding of who will be accessing and using the system, in order to determine how documents should be stored “it is important to know what information users are expecting to find and to provide them with clear links so they don’t waste time searching for information” (Weedman, 2017a, p. 401).
Each digital record in a search engine is assigned metadata, this can be done manually or automatically using a software program. According to the National Information Standards Institute (NISO) metadata is “structured information that describes, explains, locates, or otherwise makes it easier to retrieve, use, or manage an information resource (Weedman, 2018, p. 175). Indexer software contains a large index file or controlled vocabulary that contains the words and phrases found in these documents. When an information seeker enters keywords into a search engine, the program searches the system’s index, which is a separate file from the records, that has the ability to be sorted and optimized for searching or querying. It is therefore crucial when building a search engine, designers install an extensive index which enables the cataloguer to input sufficient data about each record for findability “if a term isn’t present in a vocabulary, then for all practical purposes, that term doesn’t exist” (Weedman, 2017b, p. 126).
Extensive indexing correlates with the ability to find information, it also dictates the way objects in a collection are categorized, classified, catalogued, and stored in an IR system. The type of information system, whether it’s a database, OPAC, or website, determines which design principles and standards will be used. Regardless of the type of system, according to Tucker (n.d.a), the desired outcome is selectivity and the ability to aggregate and discriminate when searching (slide 7).
The ability of the seeker to aggregate (group like things together) and discriminate (reject unwanted items) involves how well the system is designed, as well as the cataloguer’s thoroughness and attention to detail. It is imperative when designing or cataloging to use a standardized system with a ranking model for storing information in a way that the information seeker will be able to understand and use. An IR system must be designed giving cataloguers the ability to attach information through the use of keywords or values to its records. Libraries generally follow either the Library of Congress (LOC) or the Dewey Decimal Classification System (DDC), both of which use a hierarchical ranking system. A hierarchical system classifies documents according to an established ranking order “according to relative importance or inclusiveness” (Weedman, 2018, p. 390). “In information retrieval, a hierarchical relationship is a relationship between a large set and its sub-sets. Each of the sub-sets may have its own smaller sub-set as well” (Weedman, 2017a, p. 391). LOC and DDC also use pre-coordinate indexing which solved a space issue when records were stored in physical catalogs and allowed the cataloger to “use a single access point to refer to multiple semantic notions with a great degree of specificity” (Brown & Bell, 2018, p. 45). In other words, the subject headings are predetermined and “always treated in the same way, no matter how it may be expressed in natural language" (Bodoff & Kambil, 1998, p. 1255).
The cataloguer must then determine how to classify each document, choosing which attributes or values to assign to the fields in each record. These values, call them classification labels, keywords, or taxonomies are the “controlled vocabularies” (Weedman, 2017b, p. 125) or searchable terms that an information seeker can use when entering a query in order to retrieve relevant documents.
QUERY
When searching for information, it is important to have an understanding of how the IR system works “how does it create a match between what you want and the representation of the appropriate documents” (Weedman, 2018, p. 176). Knowing how an IR system such as a library catalog or database works, allows the searcher to effectively and efficiently retrieve relevant information “you have to make your search fit the system (Weedman, 2018, p. 178). Therefore, it is also important to understand how the system stores information “how you store information determines how (and how well) you can retrieve it—and conversely the fact that how people are going to retrieve information affects how you should store it” (Weedman, 2017c, p. 23). With the invention of the search box in search engines like Yahoo and Google, many people have gotten lazy in their searching techniques. You can find information on just about anything by typing a word or phrase into a search engine. Often conducting basic internet searches for mundane information such as the definition of a word, a business address, or finding a menu to the local pizza shop, is more than adequate. However, will you find accurate information when the information needed is more involved or complicated? Being able to find the correct information requires precision searching “when the need for accuracy is high (for instance, information about drug interactions with herbal supplements), or when you need varying expert opinions, then more specific (exact) retrieval becomes important” (Weedman, 2018, p. 179). In these situations, information seekers should be using a vetted database.
It is best to familiarize yourself with the particular database, its default search style and the fields it uses before you begin your search. It is also important to establish what type of search you are conducting. There are two types of searchers: a known item search—where you know the exact item you are trying to locate—which can usually be located by searching for the title or author’s name, or a subject search, which can be tricky. When deciding how you will direct a subject search in a database, it is important to determine whether you will ask it to search the full text of a document which “will give you a broader retrieval, but many ‘false drops’ because words have many meanings, and may occur in a document without being its central subject” (Weedman, 2018, p. 179). Or, if you will search using controlled vocabulary “that will often give you the best results, since it is consistent and assigned by indexers” (Weedman, 2018, p. 179). Knowing how the search engine is configured, and how much information the information seeker is looking to retrieve should determine how the search should be conducted. Even the most experienced information professionals may have to try several searches and keyword combinations before retrieving results “very few professional searchers find the information in a single search; a searcher needs to be very observant about how language is being used and how topics are combined, and then try several approaches, often in different resources, to be sure of getting the best possible results” (Weedman, 2018, p. 181).
Most databases offer both basic and advanced search options, allowing you to search by keyword, subject, author, or title. One important aspect of using the advanced search in a database is knowing how to use boolean operators. The boolean operators AND, OR, and NOT allow you to either broaden or narrow your search, depending on the operator selected, which reinforces the ability to aggregate and discriminate results. The boolean operator AND will narrow a search. Although this seems counterintuitive, as the word and usually means more, when used in a search, it means less. Telling the search engine you want to search both lakes AND oceans, means you will only retrieve results that reference both lakes and oceans, not one without the other. To further narrow your results, you can use the boolean operator NOT, however the use of NOT should be carefully considered “NOT is overly powerful, knocking out results you don’t want and often results you should be seeing” (Brown & Bell, 2018, p. 67). Searching lakes, NOT oceans, means your results would not yield any results containing information about oceans, even if a document also referenced lakes. Alternatively, using the boolean operator OR, broadens results. Searching for oceans OR lakes would retrieve results that contain both oceans and lakes, as well as results that contain just oceans or just lakes. Whether using boolean operators, a simple keyword search, or an alphabetical index when searching, it is important to understand how the search engine performs. This helps the searcher determine which search tool(s) to utilize “each of these tools has its own set of strategies for getting the best results; as noted (several times) earlier, understanding the structure of the tool and of the representations is critical for doing effective searches” (Weedman, 2018, p. 182).
EVALUATE
Once you have determined the way a search engine is configured, and entered your query, what criteria will you use to evaluate the results retrieved? According to Weedman (2018) “the key to evaluating an information system is relevance—does it retrieve information the user wants and avoid information the user doesn’t want?” (p. 182). Whether results are relative is for the information seeker to decide “relevance is found in the same location as beauty—in the eye of the beholder. It is thus extremely difficult to assess” (Weedman, 2018, p. 182). In order to evaluate how well an information system is performing, is to important to conclude how well it is able to precisely recall relevant information “recall—how close the system gets to retrieving all of relevant documents” (Weedman, 2018, p. 182) and “precision—how close it gets to retrieving only the relevant documents” (Weedman, 2018, p. 182) [emphasis mine]. No information system is perfect and you are likely to get some irrelevant results in any search, as well as not all of the relevant information possible on the subject. Knowing how to decipher what information is relevant is crucial in conducting any query. Certain things to consider is access to the full text of the document, how recent the information is, and where it originated, in other words “are the works authoritative?” (Weedman, 2018, p. 182).
In order to effectively evaluate how well an information system retrieves information for the user, one must understand the capabilities of the information system, as well as the needs of the user. The user is dependent on being able to coordinate their query to whichever characteristics or assets are assigned to a record “just as a document is represented in an information system, so a user’s need has to be represented in a way the system can process” (Weedman, 2018, p. 184). Therefore, discerning whether the information retrieved is relevant and of value to the user, is strictly for the user to determine in each situation.
EVIDENCE
Info 202 Information Retrieval System Design - Project 1Database Design
The first project in info 202 introduced me to the database design process. The scope of the project was to create a database of records for a non-traditional collection of items. We were told to imagine who our users would be, and what their needs may be. The first thing we had to do was to establish a set of rules for indexers for cataloging new items, keeping in mind the needs of the target audience. After the creation of our database we were instructed to beta-test and evaluate another groups’ database, including creating a new record using their rules.
Project 1 was a group project, my group consisted of five members. My group decided to meet on a weekly basis, using zoom as our platform. During these meetings we determined what our product would be—specialty chocolates, and who our target audience would be—the adventurous chocolate connoisseur, as well as who would be responsible for each task. We decided that each of us would be responsible for creating one rule, with three of us (I was one of the three) creating two. We also decided we would each design one type of treat, assigning it its field names and values, and inserting this information into the Webdata Pro Database. I was also responsible for writing the statement of purpose and finding the images of our collection sample set. We set up a google drive folder for ease of communication throughout the project.
This project gave me an understanding of how a database is designed, of which I had no real concept prior to this class. Understanding how a database works has helped me recognize the functions needed to search a database, and the importance of indexing language. This knowledge has helped me in my studies at SJSU as well as in my professional life by realizing that if my query is not working, most likely I need to change my terminology, or possibly the database I am using. It has also taught me how important it is to explain this concept when instructing students on search techniques.
Info 202 Information Retrieval System Design - Project 3 SiteMaps
This project required the analysis and redesign of a website’s site map. The purpose of this assignment was to determine the usability of the existing site map, taking into consideration the needs of the target audience. We were then required to create a new design using any features that currently worked, while changing those that did not. Completing this project led me to understand how important the design of a website is, and how that design affects the users experience. This knowledge has helped me in my position as a library teacher designing libguides for both teachers and students. This project was completed with a partner. Since there were only two of us, it was simple to split the project up pretty evenly. Although we discussed all aspects of the project, and were both involved in the ideas that went into the entire project, we each had specific roles. My role was to design the site map graphics, write the introduction, and the site redesign section. We co-wrote the recommendations, and edited the entire document together.
Info 244 - Online Searching - Key Concepts Presentation
This project helped me gain an in-depth understanding of how boolean operators work, and why they are so important when constructing a search in a database. Before starting at SJSU, I knew the basics of database design, plugging in a subject or an author in order to find information. However, I did not truly understand how boolean operators could direct my search, by either broadening or narrowing it. Understanding this concept has helped me not only in all of my courses at SJSU, but also in my professional life, teaching students how to effectively use a database. This knowledge helped me considerably during my internship, as I was able to guide 6th grade students, who were conducting a research project, find relevant information for their topics.
CONCLUSION
In conclusion, understanding the design principles of IR systems, design, query, and evaluation, is crucial for information professionals. As a school library teacher, it is not only imperative that I have a working knowledge of these principles, but that I am also able to instruct students on how to construct an effective search. I believe that I have the capacity to not only teach students information searching techniques, but to also evaluate the usability of the resources in the library, including the library website and the databases. I resolve to use my knowledge of IR systems to design a user-friendly, accessible library website, based on feedback from library users, particularly students and faculty. I will also ensure that the databases subscribed to are effective, and easily operated as students with varying degrees of ability and curriculum needs will be utilizing them.
References
Bodoff, D., & Kambil, A. (1998). Partial coordination. I. the best of rre-coordination and post-coordination. Journal of the American Society for Information Science, 49(14), 1254-1269. http://libaccess.sjlibrary.org/login?url=https://search.ebscohost.com/login.aspx?direct=true&db=lls&AN=502807509&site=ehost-live&scope=site
Brown, C. C., & Bell, S. S. (2018). Librarian's guide to online searching: Cultivating database skills for research and instruction (5th ed.). Libraries Unlimited.
Tucker, V. M. (n.d.a). Information retrieval systems for search and navigation part 1 [Lecture transcript]. https://sjsu-ischool.hosted.panopto.com/Panopto/Pages/Viewer.aspx?id=d2765d5e-7a7a-457b-a676-086b88ba296e
Tucker, V. M. (n.d.b). Introduction to core concepts [Lecture transcript]. https://sjsu-ischool.hosted.panopto.com/Panopto/Pages/Viewer.aspx?id=6d1ab2e3-8a1c-4ea2-8dcc-d7d1b534e3b5
Weedman, J. (2017a). Lecture: Designing for navigation. In V. M. Tucker (Ed.), Information retrieval system design: Principles & practice (5.1 ed., pp. 389-434). Academic Pub.
Weedman, J. (2017b). Lecture: Designing for search. In V. M. Tucker (Ed.), Information retrieval system design: Principles & practice (5.1 ed., pp. 119-139). Academic Pub.
Weedman, J. (2017c). Lecture: Overview of concepts. In V. M. Tucker (Ed.), Information retrieval system design: Principles & practice (5.1 ed., pp. 22-38). Academic Pub.
Weedman, J. (2018). Information retrieval: Designing, querying, and evaluating information systems. In K. Haycock & M.-J. Romaniuk (Eds.), The portable MLIS: Insights from the experts (2nd ed., pp. 171-186). Libraries Unlimited, an imprint of ABC-CLIO.