I am able to design, query, and evaluate information retrieval systems
Introduction
Information is only as valuable as the systems in place to find it. A collection with millions of volumes has little value if the information in that collection can not be located. This simple observation highlights the importance of information retrieval systems (IRSs). Though such systems are never perfect, they make it possible to find specific types of information – to find that proverbial “needle” in an information haystack. Without IR systems, librarians could not perform their essential service to patrons: helping users find the information they need.
The librarian’s mastery of IR systems encompasses three areas. First, she understands IRS design. She recognizes that IRS design is preceded by a planning process that involves assessing the information needs of the user population. IRS design requires careful consideration of appropriate fields and attributes in order to make the database responsive to user needs. It also involves decisions about indexing, interface design, search features, and display options.
Second, she is able to query IR systems to retrieve desired information. She is able to translate an information need into a query statement appropriate to the varying design constraints of different IR systems. She is able to assess the best, initial search strategy based on the nature of the query and the scope of the database. She is familiar with IRS search features (truncation, proximity searching, phrase searching, etc.) and knows how to broaden or narrow a search as needed. She is also able to use descriptor lists or thesauri to optimize recall and precision of subject searches.
Third, she is able to evaluate IR systems. She is familiar with evaluation criteria (e.g., recall, precision, cost, response time, usability) and able to apply that knowledge to identify an IR system’s strengths and weaknesses. She is able to use information gained from formal evaluation to inform future design decisions.
Almost all my SLIS courses have touched upon these skills and abilities, either directly or indirectly. Two courses, however, focused specifically on skills related to this competency: LIBR 202 and LIBR 244. The former covered the three, primary aspects of information retrieval systems (design, querying, and evaluation); the latter dealt directly with querying online IRSs. Through readings and assignments in these two courses I have gained knowledge and skills in all these areas, giving me the ability to design, query, and evaluate information retrieval systems.
Commentary
LIBR 202 was my first explicit introduction to information retrieval systems. (I say “explicit” because anyone who has used Google or an OPAC has been introduced – whether she knows it or not – to IRSs.) Initially, the most important thing I gained from LIBR 202 was a broad understanding of the organization and structure of an IRS. I learned that IR systems are databases of objects (in a full-text database) and/or surrogates representing objects (e.g., in a bibliographic database) that specify information about the objects using field attributes or full-text indexing. Field attributes, subject indexing, and/or full-text indexing make it possible for users to retrieve text-files or records that satisfy specific information needs. This general knowledge of the basic structure of an IRS provided the background and context for understanding more specific features of IRSs.
In LIBR 202 I worked with a team to plan, design, create, and evaluate two simple IR systems. The first was a non-bibliographic database of records representing foods in the refrigerator. The second was a bibliographic database of journal article records. The process of planning a database was an exercise in IRS design. Planning the non-bibliographic database required considerable reflection on the potential needs of users in order to design an appropriate data structure. What attributes and values would make it possible for users to find the information they needed? Which fields required validation lists? What rules needed to be created in order to ensure consistent data entry? In the case of the bibliographic database, we also developed lists of pre-coordinate and post-coordinate descriptors that could then be used to index articles by subject. These are just some of the issues we had to address at the design stage of the project. Through the planning process, my teammates and I were able to create a data structure for individual records, validation lists, rules for data entry, etc.
Whether directly of indirectly, the art of querying an IRS has been part of every SLIS course I have taken. LIBR 244, however, involved intensive instruction in this aspect of information retrieval systems. In addition to learning about Boolean operators and the specific features of three major online information services (Dialog, Factiva, and LexisNexis), I learned about the importance of planning a search before logging on to a service. Effective keyword searching may require combining synonyms in order to effectively search the database for an important concept. Part of the planning process, then, involves identifying synonyms prior to a search. In the case of Dialog, another important part of the planning process involves gathering information about specific databases. This alerts the researcher to unique fields that may facilitate the search process.
In LIBR 244 and other SLIS courses I have learned additional querying strategies. These include using thesauri to identify descriptors and adjusting search terms and using limiters to widen or narrow a search. In cases where thesauri are not readily accessible, I understand how to identify appropriate descriptors by using keyword searches to retrieve relevant records and scanning those records for useful descriptors. The descriptors from these records can then be used to execute a subject search, increasing both the recall and the precision of the search.
In LIBR 202 I learned criteria for evaluating an IRS and applied those criteria in the evaluation of a specific bibliographic database. I learned the meaning of precision and recall, how to establish normative values for each of these variables, and how to apply those norms in IRS evaluation. My team also employed beta testing to evaluate our non-bibliographic database. This served as a means of incorporating user feedback in the design process. Though evaluation is ideally a process of gathering objective, quantitative data, I came to understand the subjective variables that affect IRS performance and the problems these entail for “objective” evaluation of an IRS. For example, a retrieved set may be relevant in terms of subject relatedness and yet – depending on user needs – have little value or utility to the user. On the other hand, a user might retrieve a very large set with a high percentage or irrelevant records (low precision) and be quite happy with the results because the initial display includes three relevant records (which is all the user needed). The librarian needs to be aware of these issues when evaluating IRS performance.
Evidence
I am submitting three course assignments as evidence of my mastery of this competency. I am submitting my final paper for LIBR 202, which demonstrates an understanding of IRS evaluation through an analysis of subject access in a bibliographic database. In the paper I explain key evaluative criteria (e.g., recall and precision), establish normative performance values for different types of subject searches (natural language vs. descriptor searches), and compare subject access using natural language and controlled vocabularies. By analyzing and comparing the results of various searches I identify strengths and weaknesses of the IRS.
I am submitting a team project that I helped create for LIBR 202. The assignment covers two of the areas of this competency – design and querying – through a description of database structure, guidelines for searching the database, and rules for creating/adding controlled vocabulary terms and for creating and indexing individual records. My specific contribution to the group assignment included developing some of the post-coordinate and pre-coordinate terms, creating several bibliographic records and indexing those records, creating the field structure and query screen for the database, and providing editorial feedback on the final project.
I am including an assignment from LIBR 244 that describes the steps of a Dialog search and demonstrates important skills related to querying a database: planning the search by identifying synonyms of important keywords, using Dialog Bluesheets to research unique features of specific databases, and using Boolean operators and various field delimiters to refine a search through successive iterations.
Conclusion
Librarians help make information accessible to users. In order to do this, librarians make almost continuous use of information retrieval systems, whether that is the library catalog, online databases, or internet search engines. Even though most librarians will not be responsible for the design of an IRS, a strong understanding of IRS design provides librarians with essential background knowledge. This knowledge supports searching and evaluating information retrieval systems. Librarians are better able to help users find the information they need when they understand the organization of an IRS – the structure of an IRS record, fields and attributes, different types of indexing languages, etc. A background in IRS evaluation also helps them recognize weaknesses in IRS design. This knowledge can then be used to inform future IRS design – even if the librarian herself relies on technical staff to implement design changes. Through readings and assignments in my SLIS courses I have gained this essential competency. I am able to design, query, and evaluate information retrieval systems – skills that will help me effectively meet the information needs of library users.