"Each graduate of the Master of Library and Information Science program is able to...evaluate programs and services using measurable criteria"
Introduction
Evaluation is a critical skill for information professionals. At some time in our career we will be called upon to assess the value of some program and/or service. Many times this is to justify the continuance of the program/service or to change the service, such as an increase or decrease in the level of service, or even to bring in a new service. Evaluation is part of an ongoing effort to make sure the information organization continues to provide relevant services and can be a significant part of the strategic planning process. No matter what is being measured, it is important to understand why we are measuring it, the question that needs to be answered, who the key stakeholders are, what will actually be measured and how it will be measured, what the plan for managing the data collected will be, and how the data will be analyzed and reported. It is also worthwhile to cultivate a culture of assessment within the organization in order to get the widest possible buy-in from management and staff.
Evaluation
Why evaluate?
Without users, there would be no reason to provide a service. Users typically have four criteria they use when judging services. “They expect to get what they want, when they want it, at a cost that is acceptable to them, and delivered in a way that meets their expectations” (Evans and Ward, 2007, pp. 224). This applies equally to patrons of libraries and purchasers of other goods and services. Assessment of programs and services help the information organization determine whether users/patrons are happy with the service and if they are meeting their expectations. Evaluation is also key for determining whether an on-going program or service is actually providing value. It is a way to help those who fund services a way to determine the value to the organization to and help make decisions for planning purposes. Finally, assessment help to make sure that programs and services are continually improved.
Evaluate What?
Effective assessment requires that those responsible for evaluation are very clear on what question they are trying to answer. What exactly is being measured? For information organizations this can be things like user satisfaction, economic or social impact, program or service quality, resource and collections utilization, and even staff performance. What are the goals of the study? To show value, to improve services? To increase resources and/or collections? Being very specific about what is being measure not only helps to focus the assessment, but it also informs the rest of the evaluation process and makes it easier to design the appropriate assessment effort. Specifying criteria for assessment such as effectiveness, efficiency, extensiveness, service quality, impact, and usefulness help to create effective communication for stakeholders such as board members, community members and staff (McClure, 2008). One thing to keep in mind, however, is that just because we can measure it doesn’t mean we should. We have the ability to acquire and analyze vast amounts of data, but we need to be cognizant of issues like privacy and storage.
For Whom are we Evaluating?
The key stakeholders are those for whom the assessment holds some value. The evaluation should target a specific audience such as students, seniors, young children, library board members or other bodies that allocate funding for the organization. Demographics are important; are we looking only at the users in our immediate geographical area? Or do we have a web presence that allows us to effectively reach across the globe? It is important to understand what our target audience already knows in order to clearly communicate the value of the assessment for them. If the assessment addresses a known issue, how can the results be used to resolve the issue (Stenström, 2015)? Another important question to ask is whether the evaluation results might be used by management for future planning.
Evaluate How?
There are many ways to design an evaluation. Understanding what criteria to use for the evaluation is very important. While there have been some standards developed for specific types of evaluations commonly done in information organizations, many times the criteria is developed based on the specific needs of organization. For example, a recent survey by Tsakonas et.al (2013) focused on evaluations used for digital libraries by looking at the published research in four different publication venues; IEEE (Institute for Electronics and Electrical Engineers), ACM (Association for Computing Machinery), JCDL (Joint Conference on Digital Libraries) , and the ECDL (European Conference on Digital Libraries). The goal of the research was to discover the “logic and mentality that governed the design of evaluation experiments for the period 2001–2011”. What they discovered was that for digital libraries there was limited interest in evaluating service quality and outcomes. There was much more interest in determining what steps lead towards the increase of the specific digital library systems’ effectiveness and determining performance aspects in order to advance the design of the digital library. These results point to an idea that before the users can really be involved with determining the usefulness of a new information technology, most researchers would prefer to work on performance first. Notice that the criteria for most of the researchers in this survey was based on performance not on quality.
Service quality, on the other hand, is an entirely different question with a different set of criteria. Much research into service quality evaluates the gap between the service provided and patron expectations. Here the criteria may involve more difficult to measure concepts like “excellence”, “value”, “conformance to specifications”, and “meeting and/or exceeding expectations” (Hernon, 1999). There are is universally agreed upon standard definition for these concepts. Therefore, when attempting to measure concepts such as these, indicators should be defined that clearly point to whether a specific criterion was met (Rubin, 2006, chapter 1). For example, if a program to teach elderly patrons how to use the internet were being evaluated for effectiveness, one indicator could be the average amount of time it takes the sample population to find a specific site using Google. In this situation, criterion-referenced (Grassian and Kaplowitz, 2009, chapter 11) results contain a comparison between an objective (elderly patrons can find a specific internet site within 5 minutes using Google) and the actual measure (average time is 8 minutes).
In addition to the types of assessments and evaluation and whether we are looking for indicators or direct measure of results, the type of design should be considered. While there are more, the most common are quantitative and qualitative designs (Long, 2014). Qualitative designs are typically very focused in scope and the variables under study will generally be known before the study begins. An example of a qualitative study would be determining how popular certain collections are based on a count of the number of times patrons check out the materials or place it on hold. The variables being measured in this case are easily identifiable. Quantitative designs are more open ended and are more exploratory in nature. These might be instances where there is a known issue but the causes are unknown. A study is undertaken in order to determine the causes before a solution can be devised.
Making sure that what we want to measure is actually what we are measuring is worth the amount of work that normally goes into assessment planning. There is risk of making ‘watered-down “duck soup”’ (Lyons, 2012) in terms of the results we gather if we aren’t careful about designing our assessments.
How will we manage the data?
During the design phase of the assessment, data collection, long-term storage, security, and maintenance should be addressed. The type of study will influence where and how the data is stored. If the study is longitudinal, data could be relevant for years after the initial start, depending on the time frame of the study. For example, if a library wanted to determine the effectiveness of story-time on the academic performance of economically disadvantaged children, the library would take initial data on the participating children such as age, frequency of attendance, family incomes, etc. Assuming the library had access to some form of academic records for those children, they could periodically collect data and compare it to a control group of children who may visit the library but never attend story-time. Obviously, the data will span several years in this case and accommodations for keeping the data together and intact, need to be considered at the onset of the study. Other important considerations for managing data include establishing a formal mechanism for regular evaluation and assigning specific responsibilities for the effort; methods for integrating the data into other library controlled data such as demographics or community information; and maintaining and documenting data systematically in a management information system (McClure, 2008, chapter 16). Finally, the sensitive nature of data must be protected. For example, in the hypothetical situation described above about the effect of story-time on academic performance, personal information about the children and their parents are needed in order to track the indicators for the study. As librarians, we are ethically responsible for making sure that data like this is securely protected against the possibility of the data being leaked or stolen.
How will we analyze and report the results?
The final consideration for evaluation is how the results will be analyzed and reported. Analysis and reporting should be done with the stakeholders in mind. These are the people who will make decisions based on the results or those who would be affected by decisions based on the results. As discussed above, if the study is quantitative, analysis would be very different than if it were qualitative. If a significant amount of data is collected, the results should be analyzed for patterns and condensed into meaningful information for the stakeholders. If the target audience consists of decision makers who will determine if a service or program goes forward, is terminated, or changes, the report should highlight key findings and recommendations (McClure, 2008).
Regardless of the services or programs the evaluation covers, if the results don’t support the recommendations, the entire exercise might amount to a wasted effort. This is why it is so important to put the extra effort into the above questions before actually running the assessment.
Coursework & Work Experience
I have taken three courses at the iSchool that have prepared me for understanding and using this competency. I have also had applicable work experience. In INFO-204, Information Professions in the fall of 2011, one project involved determining the most important question facing an information organization. Working with a group of three other people, we chose a question and worked on identifying resources and approaches to addressing the question, which had to do with understanding how technology will change relationships between (normally) isolated special libraries. This is really the first step in evaluation because evaluations always begin with a question that needs answering. Another course that was important for understanding the competency was INFO-150, which I took in the summer of 2013. During this course we did several types of assessments, like assessments of tools we might be using in our instructional design and assessment of our instructional design. These assessments involved coming up with my own set of criteria for evaluation. Finally, in INFO-282 I had to prepare an evaluation plan for the grant application I worked on during the fall semester of 2015. In this situation, the criteria were less rigid than that for the INFO-250 class, mostly because the grant application had a limit of only 200 words and, because the nature of the evaluation is more exploratory, there was only a two criteria used. Both of these are really only measurable indirectly through reflections by the instructor on her website.
In addition, while working on my Master’s and Ph.D. I had several opportunities to evaluate research done in my field, tools needed for my research, as well as research done by others in the form of paper reviews. There are usually very specific criteria I need to use while reviewing and these criteria are developed by the party for whom I might be reviewing. For example, I recently reviewed a paper for the Software Quality Journal and as part of that review I was given several criteria on which to base my review including things like whether the evidence supports the claims made in the article. Most recently at work, I am part of a team working on defining a project in which I was required to evaluate several tools that can create and extract text from PDF files. The criteria used was a long list of required or wanted features such as the ability to recognize embedded XML formats. These are measureable criteria in that it all one has to do is check whether the feature exists or not. Between my iSchool and work experience, I feel I have the skills to address this competency.
Evidence
I have four pieces of evidence to support my understanding and experience in putting this competency to work. The first is an assignment I did for INFO-250 in which I was required to evaluate several tools that might be of use for implementing my instructional design for starting a Junior First Lego League (FLL) team course. The criteria I used for determining which tools I would use included learning curve, ease-of-use, ability to use in a synchronous, face-to-face instructional situation, and cost. The learning curve and ease of use was indirectly measurable by determining how long it would take me to use the system effectively, but not expertly. The other two criteria are easily directly measurable just by finding the data. And, although the instructional design was focused on a course that would be delivered in the same physical location as the students, I evaluated virtual Learning Management System (LMS) options as well. Understanding the capabilities of different LMS systems would actually be important in the event I decided to take the course on-line. I chose this evidence to show that very specific criteria were used in order to evaluate several tools.
Other evidence from the same course includes the Formative Evaluation I did for the Junior FLL instructional design. This is a required part of the instructional design that allows us to do an on-going evaluation for the course. Since Junior FLL is a yearly occurring event and kids grow out of Junior FLL at some point, new parent volunteers will need to be trained every year. The formative evaluation allows us to take what we’ve learning in the previous season and apply it to next one. The criteria used for the formative evaluation include how well the children performed in competitions and how the team felt at the end of the season. These are actually only indicators, as discussed above, of the effectiveness of the training. I used these indicators because, for example, if it turns out that most team members feel they did poorly, we can look at whether the coaches need more training in providing encouragement to the kids. I chose this evidence because it shows how I was able to create a formative evaluation.
The next piece of evidence is the evaluation plan I did for INFO-282: Grant Writing in the fall of 2015. This class was a very hands-on course in which we found a client for whom we could collaborate on writing a grant. In my case, I worked with a local elementary school to find a grant for technology to be used in a maker lab. An evaluation plan was required not only by the instructor for the class, but also as part of the application. Space limitations in the grant application itself (maximum of 200 words) meant that the evaluation plan had to very succinct with very little detail. The “Evaluation_Plan.doc” contains the overall goal for the program the money would be used for, what we expect the outcome to be, specific objectives, how we plan to evaluate the success of the program, and the timeline the grant program evaluation will cover. The evaluation plan covers two objectives that are tied to the school’s overall improvement plan and the evaluation methods used are qualitative (exploratory). The evaluation plan mainly addressed implementation of how we would assess the effectiveness of the program to which the grant monies would be applied. One of main objectives was to use the technology as part of a “design thinking” project. The main evidence for design thinking and that the student’s increase their ownership of academic and personal success will consist of the instructor’s own observations and the posting of material about the experience by the children who participate in the program. Again, these are just indicators of a discovery process. The grant application, including the evaluation plan was sent to the school district, the body which will decide whether the school will receive the grant. Obviously, as part of the course we need to find both funders and grants that met our criteria as well as making sure we met the criteria of the funders. For example, it makes no sense for an elementary school to apply for a grant from an organization looking only to fund higher education. I have also included the SPIE.doc document to demonstrate that we were able to meet the grant criteria for the organization to which we applied.
My final piece of evidence is an article co-authored with my Ph.D. adviser which was published in IEEE Computer magazine in July of 1995 (von Mayrhauser and Vans, 1995). While the article is based on research I completed during my Master’s program at Colorado State University, it demonstrates my ability to evaluate research related to my work. At the time it was published, very little research was available on-line, and while this article is available in the IEEE digital library, it is not free, so I have included an image of the cover of the magazine (Figure 1), the first page of the article (Figure 2), and most relevant part of the article to the evaluation competency (Figure 3, below). I chose this evidence because I believe it shows my ability to develop good criteria for assessment. The criteria are also measurable because all the data shown in the table was taken from the data published by each researcher. The article is a survey on various cognition models of software engineers attempting to understand computer program code in various situations. In the article I generated the criteria for comparison purposes to another model I developed and used later for my Ph.D. thesis. The criteria included the type of static knowledge structures each model theorized were used by engineers and their abstraction levels; how static mental representations are presented in each model such as plans or schemas; what dynamic processes are involved, for example top-down or bottom-up; and how these models were tested. Figure 3 is a tabular representation of the evaluation that was published. I believe this is a good example because as of February, 2016 this article has 478 citations including 7 so far this year, according to Google Scholar. I believe this signifies that even after 20 years, this evaluation is still relevant to researchers in this area.
Figure 3: Evaluation criteria for von Mayrhauser & Vans article, 1995.
Conclusions
The ability to assess programs and services provided by information organizations is an important skill for information professionals. I have defined evaluation and assessment, I have shown how I applied my knowledge in this area to several projects in my MLIS coursework as well as in other coursework and my daily work. I believe clearly defining what needs to be evaluated and why, how to evaluate it, for whom we are assessing, designing and running an evaluation, and reporting the results of the assessment are all key steps regardless of whether we are trying to figure out if we should keep, start, or shut down a program/service. The work I did on the survey article demonstrates my ability to develop important criteria when assessing the work of other researchers. While we won’t know for another month whether the grant I wrote for the elementary school will get funded, I feel confident now that I am able to find and evaluate potential funders for any grant I may be pursuing in the future. In my current position as a research scientist, the ability to evaluate everything from research papers to tools for my own use is a part of what I do almost on a daily basis. I believe that my knowledge and experience would easily transfer to most information organizations.
References
Evans E. and Ward, P.L. (2007). Management basics for information professionals. Neal-Schuman Publishers, Inc., New York, NY.
Grassian, E. S. and Kaplowitz, J.R. (2009) Information literacy instruction, 2nd edition. Neal-Schuman Publishers, Inc., New York.
Hernon, P., Nitecki, D., & Altman, E. (1999). Service quality and customer satisfaction: An assessment and future directions. Journal of Academic Librarianship, 25(1), 9-17.
Long, D. (2014). Assessment and Evaluation Methods for Access Services. Journal of Access Services, 11(3), 206-217.
Lyons, R. (2012). Duck Soup and Library Outcome Evaluation. Public Library Quarterly, 31(4), 326-338.
McClure, C.R. (2008). Learning and using evaluation: A practical introduction. In Haycock, K. and Sheldon, B.E. (Eds.) The portable MLIS: Insights from the experts. Libraries Unlimited, Westport, CT.
Rubin, R. J. (2006). Demonstrating results: Using outcome measurement in your library. American Library Association.
Stenström, C. (2015). Demonstrating value, Assesment. In S. Hirsh, (Ed.) Information Services Today (pp. 271-277).
Tsakonas, G., Mitrelis, A., Papachristopoulos, L., & Papatheodorou, C. (2013). An exploration of the digital library evaluation literature based on an ontological representation. Journal of the American Society for Information Science and Technology, 64(9), 1914-1926.
Von Mayrhauser, A., & Vans, A. M. (1995). Program comprehension during software maintenance and evolution. IEEE Computer, 28(8), 44-55.