------------------------------------------------------------------------------------
In the age of computers and the Internet, we have been given the ability to access information at any time and from any place. However, the amount of information available to us now can be overwhelming and confusing to sort through. As information professionals, it our responsibility to help library users find what they need from all of this available information. Instead of stacks of card catalogs, we must now be familiar with databases, which have become the main method of storing and organizing information in the 21st Century. Databases are only one type of information retrieval (IR) system; of which there are many. As an information professional, it is of utmost importance that we understand the principles involved in the design, query, and evaluation of IR systems.
Design:
Designing a database (or any IR system) requires much forethought. At its most basic, database design is essentially the consideration of how documents will be stored and organized. When talking about IR systems, the term “documents” encompasses anything that would be stored in an IR system, such as physical objects or digital pieces of information (Haycock & Sheldon, 2008). The audience—people who are going to be using the database—and the type of documents being organized will largely determine how a database is designed.
The process of database design generally begins with the assignment of a set of attributes to each document. Attributes can be roughly thought of as the characteristics of a document. They are used to describe a document and can consist of values such as “color,” “size,” “texture,” and so on. When assigning attribute values, the concept of disambiguation must also be considered. Disambiguation is any attempt made to minimize the ambiguity of attribute values. Ambiguity can occur when similar words have drastically different meanings, such as “bass”—is the word referring to a type of fish or a type of instrument? In scenarios like this, it is necessary to have controlled vocabularies. Controlled vocabularies are “organized lists of approved words and phrases” (Morville, 2005). For many databases, controlled vocabularies are required to introduce some form of standardization into the (often large) sets of attribute values.
In database design, it is also important to consider how documents will be retrieved. When assigning attributes to documents, database designers must decide how terms/values will be coordinated for search. Designers can approach this in two different ways. Pre coordination requires that sub-terms are coordinated (put together) beforehand, which can lead to greater specificity (Scott, n.d.). Each document is usually assigned only one string of terms, which are put together in a specific order. Pre coordination mostly appears in print-based indexes. Post coordination, on the other hand, requires that a document be assigned many different and separate sub-terms (Scott, n.d.). This makes documents easier to search for, but comes at the cost of specificity. Post coordination is mostly found in modern databases and search engines.
Query:
A query is something that is submitted to an IR system to retrieve information from it. Many databases and search engines take queries that consist of phrases, words, or numbers. For information professionals, knowing how to structure queries is an essential part of conducting efficient searches. Bibliographic databases, online search engines, and library catalogs are often structured in different ways, necessitating different search strategies for each of them. Some may allow the use of Boolean operators, such as “AND,” “OR,” or “NOT.” These operators can be used in conjunction with search terms to create a more or less specific query. Information professionals will also need to know how to refine results. Often times, information professionals will need to conduct multiple consecutive searches to find the information needed. This requires the ability to parse through a potentially large number of search fields to limit results by date, author, subject, and so on. It also requires the ability to analyze results from initial searches to determine whether broader or narrower search terms should be used. Forming effective queries for various IR systems may also require a familiarity with some of the common subject terms or controlled vocabulary used. Not all IR systems provide a reference list for the subject terms used in that system, so obtaining this sort of knowledge generally requires some trial-and-error.
Evaluate:
Information professionals should also know how to evaluate IR systems to ensure that they are actually retrieving relevant results. Relevancy in this case refers to how well an IR system retrieves “all and only the relevant information” to fulfill an information need (Haycock & Sheldon, 2008). Two measurements relating to relevancy are recall and precision. “Recall” is how successful an IR system is in retrieving all relevant documents, while “precision” is how successful an IR system is in retrieving only the relevant documents (Haycock & Sheldon, 2008). Both of these aspects can be measured through thorough testing, which would likely involve the submission of many different queries to the system and an analysis of the results returned.
The evaluation of IR systems can also involve the evaluation of the system’s organizational structure and whether it is appropriate for its audience. Evaluation can also be conducted on the IR system’s search interface to determine if the interface is easy to parse, if the fields are clearly marked, or if the button functions are easily understood. Furthermore, IR systems can undergo usability testing even before they are officially launched to the public. This involves recruiting regular users (preferably from the audience that the IR system is aimed at) to test the functionality of the system.
I have had experience in using databases and other IR systems throughout my undergraduate college years, although I only truly learned about the design and evaluation of databases while I was in the MLIS program. Info 202 (Information Retrieval System Design) in particular was one of my first classes, but it was also the one the taught me the most about creating and evaluating IR systems. I further learned how to hone my searching skills in Info 210 (Reference and Information Services) through the many searching assignments I had to complete. My experience as a library volunteer has also given me experience in conducting searches in online public access catalogs (OPACs), as I am often asked to find a book or item by library patrons.
Haycock, K., & Sheldon, B. E. (Eds.) (2008). The portable MLIS: Insights from the experts. Westport, CT: Libraries Unlimited.
Morville, P. (2005). Ambient findability. Sebastopol, CA: O’Reilly Media, Inc.
Scott, A. (n.d.) Information retrieval system design: Review, week 4. Lecture presented online at San José State University, San José, CA.
1. Info 202 Database Creation and Evaluation (Group Project)
The first piece of evidence that I am submitting for competency E is a three-part group project from Info 202 (Information Retrieval System Design).
This was a fairly involved group project so it requires more explanation than usual. This group project required us to build a basic database structure in WebData Pro for any type of collection we wanted. It would be a post coordinate type database with some controlled vocabulary. As a group, we decided to build a database for Ben & Jerry’s ice cream flavors. We then had to create a set of rules and controlled vocabulary terms which would be the standard to follow when entering records into the database.
The alpha prototype build was essentially the conceptual stage of our project. We further refined the rules and database structure in the beta prototype. I was responsible for creating the data structure plan table in the alpha prototype document. I was also in charge of building the database structure in WebData Pro for the beta prototype, and contributed to the database rules section (seen in the beta prototype document). After finalizing our database at the beta stage, we handed our set of rules and our database to another group, and received their set of rules for their database in exchange. For this part of the project, we were tasked with evaluating the other group’s database structure and rules. As a team, we all worked together and contributed to the evaluation document.
When working on the alpha and beta prototype of our database, we had to carefully consider several factors: who was the audience for our database, what attributes of our collection (ice cream flavors) should be included, and what field types should be used. When writing up the rules for our database, we also had to make sure to eliminate (as much as possible) any ambiguity. This was so that any records entered into the database would be consistent even if they were created by different indexers. The end result is that our rules are very specific, and written in a way that accounts for any possible exceptions. (Since the creation of these documents, the Ben & Jerry’s company appears to have changed the way they list flavors on their website, so these rules may no longer be applicable.) Our work on the alpha and beta prototype demonstrate our awareness of the principles of database design.
When evaluating another group’s database and database rules, we focused on several different aspects. We not only looked at their overall database structure, but also at their statement of purpose, database field styles, and the rules they created. We also offered up some possible suggestions for improvement; i.e. changing the “landscape” database field from a list-type field to a text-box field to give users more freedom when entering records. This demonstrates our ability to evaluate multiple components of a database, and our ability to test the functionality of in-development databases. For all the reasons stated above, I submit this piece of evidence to demonstrate my ability to create and evaluate a database.
I have removed access to the main document due to privacy concerns. However, I am able to include a link to screenshots of the database that was created for this project.
2. Info 202 Discussion Post -- "Searching"
The second piece of evidence I am submitting is a discussion post on the topic of "searching" from Info 202 (Information Retrieval System Design). The details of this discussion post are listed on the first page of the document. To summarize, I had to conduct a keyword/natural language search on two online search engines with the query “pirates of the caribbean”. I chose to conduct searches through two lesser known web search engines, Webcrawler and Hotbot. I gave a brief explanation of how these two search engines displayed results, establishing my understanding of the many different ways a search engine may be designed and how this affects the retrieval of information. I then walked through the steps I took to filter out results related to the Disney film franchise, which involved a quick evaluation of the various functions the two search engines featured.
Ultimately, I had to submit a very long query to the search engines using Boolean operators to filter out most of the Disney related results. This discussion post establishes my understanding of the advantages and disadvantages of search engines that only allow keyword/natural language search. Such search engines make finding information simpler for the general public, but the lack of a controlled vocabulary means that the results may not fit a person’s exact information needs. I submit this piece of evidence towards competency E to demonstrate my ability to assemble queries for keyword-based search engines to find relevant information.
I have included my discussion post in the MS Word document below.
3. Info 210 Mini Activity Assignments Part 1
The final piece of evidence I am submitting is a collection of reference question assignments from Info 210 (Reference and Information Services). For this series of assignments, I was tasked with performing searches in various IR systems to find the answers to a set of questions posed by the professor. These assignments required me to retrieve information from many different kinds of IR systems, such as online search engines, bibliographic databases, and OPACs. Depending on how the IR system was structured, I had to use many different search methods and strategies to find the information I needed. Some of the IR systems I searched through allowed the use of Boolean operators, which I used when forming queries if I needed more specific information. Some IR systems had multiple dimensions of organization, which allowed me to refine my searches by source or through title keyword matches.
I was also asked to evaluate some IR systems. I studied the precision and recall of a few bibliographic databases (pages 8 and 9) by observing the types of documents they retrieved in the results. I also evaluated a few online encyclopedias on factors such as site usability, by studying their searchability and the types of research aids they provided (pages 12-14). For all the reasons listed in these two paragraphs, I submit this piece of evidence to demonstrate my understanding of the principles involved in forming queries for various IR systems; as well as the principles involved in evaluating various types of IR systems.
The document containing these completed assignments can be found below.
As I mentioned at the very beginning of this page, it is the norm now for information professionals to use databases in their daily work. Thankfully, my time in the MLIS program has given me the skills needed to conduct searches and form queries in any type of IR system. This is a skill I will use often to help library users find the information they need. While I am fairly confident in my search and query skills, I feel that I could still learn much more about the design and evaluation of IR systems. After I graduate from the MLIS program, I still plan on studying these two aspects more deeply by reading books on the subject. Being more informed about the design and evaluation of IR systems will only further help me help library users.