The Liberated End User

The Liberated End-User: Issues in Planning National Information Services

Derek Law

In the United Kingdom, the Higher Education community has taken advantage of developments in networking to begin to develop centrally planned services which are available to all end-users. These are predicated on two absolute requirements; firstly that it must be able to shown that there is a saving in acting collectively rather than at individual institutions and secondly that services should be free at the point of use. It is intended to use such services positively to encourage the use of electronic information. In effect, we take advantage of the size of the educational sector to bulk-purchase information. I want then to share with you today some of the issues which surround this process, not least that the transfer of power to the end-user can threaten the traditional role of the librarian. I have also been asked to consider the issues of retroconversion and standards and want to say a little about the changing role of librarians.

As background to all of this, let me very briefly describe the range of services offered and how they are managed. In the United Kingdom, the Government has appointed an agency called the Higher Education Funding Council, which is charged with distributing funds to the universities. This Council has in turn appointed several sub-committees to which it allocates relatively small sums of money. A committee manages our national academic network called JANET, which links hospitals as well as universities and is connected to the Internet. A further committee manages the network services and it has an annual budget of ten million US dollars. This may seem a lot, but the total higher education budget is some five billion dollars, making the share for networked services a tiny fraction of one per cent of the whole. With that we provide: -Bibliographic Datacentres. At present these provide the ISI databases; Embase; Compendex; the International Bibliography of the Social Sciences. Other deals are in hand -Datasets at other Centres. National Census Data; Household Survey Data; Labour Data; Demographic Data; Satellite Data. Access is provided to data from NGOs and other governments -Shareware. Two sites provide shareware for pc's and Unix machines respectively. This covers everything from basic wordprocessors to sophisticated formulae. -Current information and gateway. Newspapers; yellow pages; press releases; travel information; gateways to other services abroad -Overseas data. A cache of the most heavily requested overseas data in order to reduce transatlantic traffic loads -Librarians Directory. A resource guide pointing out useful resources and locations on the Internet. -Electronic Mail and Bulletin Boards. A unit dedicated to setting up list servers for subject and activity based groups then training them in its use. -Image database [proposed]. To provide a link to images created by institutions. Likely to start with a databank of medical and dental images. -A "national" OPAC. To record the catalogues of the major libraries in a single system

It seems fair to say that this represents good value for our investment. Perhaps the best proof of this is that the total number of users is in excess of 250,000 each month.

Let me turn to the sort of issues which faced us in each of these areas. Some of these services had existed for some time, but in an uncoordinated way. In the same uncoordinated fashion we purchased the ISI tapes (covering the Citation Indices of the Institute for Scientific Information), without really being clear what client group they were aimed at, but probably with a vague assumption that the product was aimed at researchers. We were however clear that the end-user should not pay, since we wished to encourage use. The result of this haphazard growth is that within two years we are running the largest ISI dataservice in the world, with some 4000 users a day, most of whom are undergraduates. So that is perhaps the first lesson that we have learned. There is a need for a set of strategic goals and objectives as well as success measures, before such projects are initiated.

ISI had some experience of running large multi-campus deals which meant that there was a basic framework for negotiating and pricing a deal, although this would be the first one to cover a whole country. The principle areas of concern were the creation of security mechanisms which would prevent commercial re-use and settling the balance between capital and recurrent expenditure so that if the number of participating sites grew, ISI would receive additional revenue. None of these points was contentious and the debate centred on mechanisms for implementing them. This was in the end satisfactorily agreed. It is also worth recording that neither we nor ISI are aware of cases of misuse of the data. A single case has happened - in a medical school - and the individual was severely punished. We also determined that although the end-user should not pay, institutions should pay a modest sum - about $7000, so that they would be committed to the project and to ensuring that non-subscribers were not hacking in.

The data was then mounted at a site in the UK, which had the additional benefit of ensuring that nationally we possessed the skills to run major datasets - ISI alone is a 50 gigabyte file - and were not entirely reliant on external third parties. The host site also developed the software interface. Given the huge success of the ISI deal we quickly committed ourselves to purchasing up to twenty datasets covering all subject disciplines. We therefore determined that as far as possible we should seek a common search engine on the assumption that many users would wish to search multiple databases and would not wish to learn a new command structure for each. This has prejudiced our ability to provide some other datasets. For example, MEDLINE was only available with the Grateful Med search engine and it proved impossible for this and other reasons to provide access to the data using our search engine. As a result we have entered a deal with Elsevier to provide Embase, which again attracts huge usage figures.

These usage figures provided our second lesson. We were not clear who the service is for and now are trying to commission work to find out, as a form of market research. For example we know that four thousand people a day use ISI for an average of nineteen minutes. But we do not know who they are, how frequently they search, how satisfied they are with the outcomes or how effective the searches are. We need to pay more attention to monitoring, and this is true of all services.

Secondly, we need to review delivery mechanisms, the way in which the data was made available to the community. It is important to remember that the ISI deal was an experiment, a very big experiment it is true, but still an experiment. Did we need a single national data centre or should there be several throughout the country? Was the model adopted for first centre the best one? For example there have been discussions, admittedly unavailing, which would buy time on existing commercial services rather than set them up in the UK. Further all of our services are based on the assumption that the majority of users still have a need for VT100 access. This makes the interface crude if robust. But we are now undertaking a national terminal census to see if that assumption is correct or whether we can now proceed to develop more sophisticated interfaces. We have the same problem with supporting other standards. Although it is clear that the systems used by most people have switched to IP protocols, a significant minority still use OSI or earlier standards which are still supported by the network. But what should our transition strategy be? How long is it reasonable to support old standards for? All of these issues have real costs. On the other hand the adoption of a national strategy rather than allowing individual sites to make decisions ensures a cohesion and commonality which is much valued.

In considering monitoring, the intention was not just to monitor the ISI deal itself, but to ensure that there was sufficient user input into the setting up of the service. This was intended to cover end-users as well as librarians, and to look at how the database was serviced and supported as much as at user interface issues. It would also consider whether there should be user groups for each dataset or whether there is sufficient commonality to have one overarching group considering service issues rather than product deficiencies. This area remains one of concern. Although there has been and will continue to be an effort to involve end-users in evaluation panels, their participation in the user group has been at best limited. Most of the response has come from library and computer centre staff.

It very quickly became clear that the major costs of datasets were ownership rather than purchase costs. The need to support software development and the creation of national level training materials was underestimated. The unreadiness of most sites to deliver a truly mass service was also surprising. Some sites were not geared to give undergraduates passwords for more than a term. One computer centre director expressed a fear that if students were allowed access to the network they might use it! The original costing of services at the first host site was made on a marginal basis. It soon became clear that it was a better practice to undertake costing on a full cost basis. As it was evident that this would involve running costs of perhaps $750,000 a year, the view hardened that a small number of host sites should be selected to achieve economies of scale. It was also clear that the service was a runaway success with usage doubling every two months. A debate had begun on the question of flat rate charges for all sites, irrespective of size. To everyone's surprise, actual usage was almost exactly in inverse ratio to the size of the institution, making a simple resolution of charging issues quite difficult. This ratio has since changed and the policy on charging institutions remains a contentious issue.

By now our datasets acquisition strategy was known as the Doughnut Strategy. ISI was at the centre of the doughnut and was to be surrounded by up to twenty subject datasets. If the ISI deal were renegotiated, they would be the jam in the doughnut, if the renegotiation failed, they would be the hole in the doughnut. I am delighted to say that a further deal has now been settled with ISI covering the next few years and we hope to continue to be their major distribution point in the UK.

Why should one be active in this area at all? One response is clear: it is the only way to make certain types of data widely available within the research community. However, on a broader scale, it may be argued that such an approach is absolutely central to managing current shifts in higher education and in particular the shift from teaching to learning. Students will have to become adepts at information management as part of their degree studies, and one aspect of this will be the mastery of electronic information sources. In the various discussions that have taken place, one school of thought argued that it was essential that major datasets aimed at mass use by the undergraduate market are purchased and not just those of interest to researchers. It is also felt that even databases aimed at researchers can have training and support materials prepared which make them of value to students. The network services committee has a responsibility above all to this mass market where it can help the Higher Education Funding Council to achieve its expressed aims of producing more graduates without an increase in resource. Thus, in a small way, dataset provision becomes an arm of higher education policy.

Another important concept to emerge in our reviews of services was that of a portfolio of services. It would be possible to have a series of "beauty contests" asking users to vote for what they most want and then go out and buy MEDLINE, BIOSIS, INSPEC, and so on, until the money ran out. This has at least two severe disadvantages. Firstly, this would put vendors in a strong position when it comes to negotiating deals since they would know what our priorities were. Secondly, the selection will be dominated by the existing major services, which are already easy to get at, if expensive. The network services committee would much prefer an attempt to purchase up to twenty datasets covering all the major disciplines, giving as much prominence to the humanities as the sciences. Higher Education needs a spread of resource and not a concentration in big science and business. This again reflects the ambition of exposing everyone to the need to use electronic information sources. If it proves impossible to renegotiate a contract with a supplier, this may be disappointing but not disastrous and the business will be taken elsewhere. In fact there is some slight evidence that the providers of new or the "number two" database in a discipline are more willing to negotiate a deal on our terms. There is then further anecdotal evidence that this both begins to reduce the usage of the "number one" database and also increases downstream sales of the other product as graduates begin to go into the professions and industry and look for what are now familiar electronic tools.

A coherent policy for the acquisition of a datasets portfolio is now in place. Other issues particularly of management and charging remain to be ironed out in detail, as well as the balance between bibliographic and non-bibliographic resources but essentially the way ahead is clear for the network services committee to sponsor the purchase of a significant number of datasets over the next few years. Mechanisms for relating to the many communities who believe they are being given minor rather than central roles in the process must also be addressed. Making the community feel it has a role in the acquisition process is important, even if there is a belief - which I share - that there are actually many communities rather than just one.

Most of the issues raised above are repeated in the other services we operate. Major statistical datasets are acquired cheaply or free from government and non-governmental organizations. It is then almost self-evident that what we pay for here is the support costs. These are even larger in this area as users will tend to require assistance in manipulating the statistical packages which draw meaning from the data. Issues are also raised about the need for long-term archiving and who should undertake this. We also have to decide how far to store data from other countries and international organizations and how far to negotiate reciprocal access agreements. This requires quite careful calculations of how much international traffic will be created by reciprocal arrangements, since the growth of international traffic is causing worries to the network providers over the cost of the necessary bandwidth.

The issues with shareware also have their own particular features. We offer almost no support here because the range of packages is so varied. Is this the correct response? It is also an area where we could offer the shareware to the general public for a fee and use the revenue to subsidize the service. This is not just a regulatory matter but raises issues of how far we are running a public good network and what its relationship is with commercial ventures.

The data cache to reduce traffic, the cooperatively developed resource directory and the current awareness tools are again all sensible tools to improve the efficiency of the community as a whole. These are expensive to maintain, but by in effect spreading the costs across the whole community they become very cheap.

Our proposed image database also poses interesting issues. Most people will be conscious of major publishers and software houses touring the world apparently buying up every image in sight. In part then we want to ensure that the copyright of images created in the universities is retained within them and not sold off. In part we also want to encourage sites to invest in networks and peripherals capable of dealing with high quality images and video in order to push technology uptake forward.

Finally let me mention the proposed national OPAC. We are taking the catalogues of seven or eight major libraries and making them available in a national OPAC. The option is also available of adding the collections of specialized libraries to the database. We also intend to build a national higher education document delivery service on top of this OPAC. OPAC design is an area of fairly general failure. It is an all too common complaint of librarians and their users that it is easy to login to an OPAC but impossible to logout. We hope to make nice simple links between the OPAC and journal listings such as Science Citation Index so that the end-user can easily order required material.

The basis for an OPAC is, of course, retrospective catalogue conversion. Even after some thirty years of library automation this remains an issue of concern in many countries. I am aware of at least three European countries where major reviews of retrospective conversion are under way. In at least one this is to allow a decision to be made on whether to fund further conversion of collections which are seen as strategically important nationally. Methods of conversion vary hugely, but generally the alternatives lie between in-house conversion, record matching by organizations such as OCLC or data conversion by firms such as SAZTEC. Some libraries have used all three methods depending on the nature of various parts of the collection. It is also a common experience of libraries that usage of the collections shoots up as records are added to the database. Some libraries have converted only a fixed percentage of their stock, arguing that the remaining 10-15% is little used. The logic of this may seem attractive with so many things competing for priority in a world of inadequate resources. On the other hand it is not clear to me why the library then keeps the collections it apparently values so little. If a collection is worth keeping it is worth cataloguing, particularly when we know that the very act of such cataloguing onto a database will increase use.

The consequence of all that I have been discussing is to empower significantly the end-user. A great deal of work has now gone into taking services which were developed independently and piecemeal and organizing them as a coherent whole. We are working towards standardized marketing, support and training. A tailored package of services is provided recording the holdings of major libraries and giving access to them through document delivery. Current awareness tools, software, bibliographic and statistical datasets are all available at the desktop. The end-user is liberated from the need physically to visit the library. Indeed all libraries are open. The liberated end-user may borrow from several libraries, request documents from all networked libraries, undertake reference queries over the network, join in debate over listservers and publish results on bulletin boards. However this freedom is not as simple as it seems. It is rather like Mickey Mouse in the cartoon of the Sorcerer's Apprentice from the film Fantasia. The end-user can release the power in Mickey's wand, but like Mickey cannot control the wand and will find the brooms get out of control and fill the cellar with buckets of water (for which we might substitute information). There is still a need for the Sorcerer and his skills.

Support, training and archiving are perhaps the most important issues we have to address. It has become a commonplace of debate that apart from a very few very large libraries, we must move away from the notion of possession of books as the paradigm of library operation and move more towards the provision of access to information as our model. The role of the librarian moves much more towards that of product champion and gatekeeper. The role librarians have to develop is that of mass instruction in information management skills and by that is meant rather more than the annual ritual of the tour of the library for first year undergraduates. Take the case of our networked services. How are institutions going to instruct 8,000 users in system use? What helpdesks will they have? Which of the staff will be responsible for the production and dissemination of documentation? How will the data be linked to the work of the institutional research strategy committee? Do institutions plan bibliometric studies of research impact? How will libraries liaise with the advisory desk in the Computer Centre? What are the demarcation lines? On the other hand, will libraries simply ignore all of this and assume or make it someone else's problem? On the teaching side we also need to look very hard at the commonality of the interfaces.

The other element of the future to be considered is that of the library's role. Consider some statistics. HENSA - the shareware archive - is used by over 70,000 people a month. The ISI database is used by 60,000 users a month; the NISS gateway by 70,000 users a month. The number of listservers and bulletin boards on the Internet is estimated to have grown from 3000 to 5000 in the last year - and anyone connected to a bulletin board knows how much activity that generates. The first networked end-user driven document delivery services are also just hitting the market. What impact has this had on our libraries? Precious little I suspect, with the exception of training, which I shall come back to in a moment. The point I want to make is that in the last two-three years in the UK alone we have gone from almost zero to hundreds of thousands of networked information seeking sessions each month. Is this new use, or new users or is the use displaced from someone else? The bleak picture is that this is a combination of new users and displacement activity which will eventually lead to the marginalisation of libraries except as archive stores. It is worth noting here that the Royal Society Report on the STM Information System which has just been published shows researchers in the 25-35 age group as least satisfied with and least likely to use libraries. It also hypothesises that the trend to document delivery will expand and shift to the end user, thus moving funding control inexorably away from libraries. The optimistic view is that in what is now known as cyberspace, there are no maps and that it will be the role of information professionals to map that new world and to train others to find their way. I find it hugely ironic that the technical world claims to have solved - if not implemented - all the problems of running high speed networks but acknowledges failure in describing information content. They are attempting to reinvent cat and class.

The other element which is missing is what may loosely be described as management. There is something approaching chaos out there on the networks. We need standards such as Z39.50, such as Universal Resource Locators and adequate Resource Guides. We need new ways of cataloguing data. The recent hugely successful American Civil War series run to some sixteen hours. How does one catalogue the three minute sequence on the Gettysburg address so that it is retrievable? We need quality control of data not only in the formal sense of identifying and eliminating bugs but also in defining what its value is. The expectations associated with printed volumes where, for example, publishers name will give a clue to status and quality don't translate to the network. A fairly average bulletin board will contain some four gigabytes of information from multiple sources of origin and there are thousands of these. We also need to develop the notion of "connectedness" expounded by Lorcan Dempsey, allowing transparent and seamless movement between data sources with any information or search strategy carried forward.

Let me try to draw all of the threads together finally. We are moving beyond local solutions for two reasons. The first is that better technical solutions are coming along which can only be achieved through collective action. The second and perhaps more important is that central funds are being poured in an unprecedented way into the development of networked information services. The ability to access that data poses threats to the traditional custodial and even physical role of the library. It is, however, quite clear that what we have is a flea market and what we need is a department store. The traditional problems and skills associated with the organization of knowledge remain central to the development of these services. Next comes the issue of training. In higher education in particular, an area where there is no real learning curve as students move in and out of the system and where numbers are multiplying rapidly, the whole issue of training literally tens of thousands of people in information management skills becomes critical. Finally the issue of quality control comes firmly in our domain. Using the Internet has been graphically described as like trying to drink from a firehouse. It is our skills in selecting and organizing knowledge which will provide the necessary filters to make the information flow manageable.

It may be trite to suggest that these are not problems but opportunities, that in large measure we face the collective task of redefining our professional future and skills. This is not blue sky fantasy, nor is the horizon very far off; tens of thousands of people join the Internet each day and in my small country thousands of them are using the networked services we manage. Too many libraries are then looking at how to deal with library problems by such expedients as networking CD-Roms. These are big issues and they go far beyond putting pretty colored screens in parts of the library and thinking that we are revolutionizing user choice. Resource sharing has proved notoriously difficult in the past. I hope that our experience shows that new technology offers new solutions to old problems.