13Buying

Buying Databases: Many Problems, Many Solutions

By Tony McSean, Derek Law

When the authors left their respective library schools the problems libraries faced when buying bibliographic databases were almost exclusively financial. Could they afford the annual subscription, did they have the shelving to accommodate the rows of bound volumes? The professional problems only arose when trying to use these behemoths. Almost twenty years later the situation is entirely reversed. With the range of search packages on display at any library trade show, even a clever child, even a working doctor, can carry out literature searches of a complexity inconceivable (or at least uneconomic) in the days when print was king. The librarian’s important choices now are which technologies to offer to what we still fondly call The Reader. This paper will look briefly at the developments that have taken place in the delivery of secondary source bibliographic databases – files such as Medline. It suggests that their usefulness is now at or near its peak, at least as far as the developed world is concerned, and that in 10-15 years time these same users of information will go straight to the primary source material, and may no longer be users of anything that we recognise as a library.

Overload

Changes in the way that information is managed do not just happen by accident. They are sparked into existence by the creeping inadequacy of current services. In librarianship the imperative has been overload – the increase in scholarly publishing. It was said of Thomas Young, the 19th century doctor, that he was the last man who knew everything. That is no longer an achievable goal in even a single sub-discipline. Physics Abstracts has grown from 80,000 articles in 1968 to 150,000 in 1991. Just to read all the abstracts in a working year would mean galloping through them at 100 per working hour. To cover a single core journal, e.g. Journal of Applied Physics, would require a researcher to read and digest two articles an hour every working day. Ridiculous even to contemplate. Each successive phase of overload has drawn a response. Towards the end of the 19th century the first indexing journals appeared, and these allowed us to cope reasonably well until the late 1960s. By then they were huge and unwieldy; the 1974 volumes of Index Medicus weigh 28kg, and occupy 62cm of shelving. Despite being enhanced by more or less primitive computer indexing, they were unable to cope economically with the sophisticated search arguments resulting from increasing specialisation.

Enter the Computer Search

Online searching opened up the literature again and in well-run libraries, the literature was back under control and readers were better served than ever. Towards the end of the 1980s, the continuing decline in the cost of computing was opening up computer searching to readers. Some of the large, enterprising and prosperous private US university libraries (e.g. Johns Hopkins) set up their own small-scale online host services, networked onto the desks and workbenches of their users. Down at the level of capital investment on which most of us operate appeared the CD-ROM workstation. This technology brought self-service database searching within reach of library budgets for the first time and many medical libraries have made good use of Medline workstations.

The Options Today

So much for history. It is interesting to examine the options available today for the medical librarian wishing to make sensible decisions about databases. All sensible solutions will involve mixing up a cocktail of database systems, and the ingredients and balance of each library’s optimum recipe will depend on who its users are, how it relates to them, and how much money it has. One thing that is very striking is that when we talk about database systems for libraries, we are still essentially speaking of secondary source bibliographic databases. Some primary sources are becoming available but these are still essentially fringe products with negligible sales and little impact on library work in general. We shall be returning to this topic in the concluding section of this paper.

Paper indexes still exist, and both the authors’ libraries still subscribe – although we both wonder why. It is now perfectly possible to consider oneself a significant medical library without having current subscriptions to the printed Index Medicus and Excerpta Medica.

Commercial online services are still indispensable and seem likely to remain so for the foreseeable future (3-5 years, in this context), if only because of the ease with which you can stretch a search across families of related databases. There is an old marketing adage that you cannot sell something if someone else is giving it away – it is true, and it applies here. So, online services will continue to run for use by professional intermediary searchers, and will be increasingly used to supplement local database services.

Private online services are becoming a viable proposition as computing becomes cheaper and networking easier.

In the UK there has been an outstanding example of what can be achieved by informed, cooperative action within the profession. The academic community, aware of the amount of money it spends on online searching with third-party host services, has set up a rolling programme to implement key databases on a cooperative basis. The Combined Higher Education Software Team (CHEST) negotiates a deal directly with the database producer, sets up the implementation of a system connected to the national Joint Academic Network (JANET) and then funds the operation by subscription payment from universities and other scholarly organisations. The staff of subscribing organisations enjoy free, unlimited access to the search system. CHEST’s first medical database, EMBase, is now being implemented, and in the authors’ view it is a great pity that there is no immediate prospect of a Medline service. To set up such a high-powered service is still beyond the ability and budget of most medical libraries, but as the price of the hardware component continues to plummet this may not be the case for much longer.

The possibility for individual libraries to mount their own database services is already there – an extension of the pioneering mini-Medline services in the US. At the time of writing there are ten North American universities and research establishments that are running their own campus-wide Medline services using the PlusNet2 system, developed by CD-Plus from its CD-ROM search and retrieval package and using conventional magnetic disc technology. The key development making this possible is the availability of cheap, reliable PC-type discs of 1-2 gigabyte capacity. Although PlusNet2 and its equivalent are still too expensive to rank as a casual purchase, it is already a competitive option for intensive Medline users, and likely to get cheaper as time passes and as competitors emerge. Again at the time of writing, the BMA Library is buying a PlusNet2 system in order to provide a free Medline service as a component of the Associations professional services to its members. Obviously a choice of this sort very much depends on a library’s individual circumstances, but in two or three years it will be a standard upgrade for stressed CD-ROM stations.

CD-ROM has been a good friend to libraries, and both authors are happy to acknowledge that their libraries’ users have made good use of our CD-ROM workstations. CD-ROM database packages have made two important contributions to library database services: user interfaces good enough for the untrained to achieve acceptable success in their searches; and an acceptance by libraries of the idea of subscription payment for databases, of buying databases [the legal reality with Medline products is that the data are only leased, but this is only a quibble and has not stopped many Western libraries routinely posting their old discs to sister libraries in the third world].

The authors’ view, expressed many times elsewhere and only summarised here, is that the CD component of the systems is the weak link and largely incidental to their runaway success. The main problem lies in the technology’s origins in consumer audio. Normal computer discs rotate at constant speed, have data stored in concentric rings, are optimised for an erratic life of random access of disconnected data elements. CDs of all sorts have data stored in a spiral and rotate at varying speed depending on which part of the spiral is being read. This is fine for playing back a piece of music, but when used for randomly accessing a database it is physically impossible for a CD-ROM’s performance to achieve levels comparable with a magnetic disc. The CD-ROM database system is probably at its zenith. System improvements are into the area of diminishing returns and alternatives beckon at the expensive end of the market and for networked systems. Even small libraries should consider that CD-ROM is not going to seem state-of-the-art for much longer and that while the PC will live on as a word processor or whatever, the CD-specific equipment may have to be quietly disposed of. The authors’ long-held opinion that the CD-ROM is a big floppy disc (rather than a slow hard disc) is being borne out by its recent emergence as a component in computer game systems and as a vehicle for distributing clip art and software packages.

From This Moment On

The final section of this paper gives two examples of developments that might change the face of medical librarianship. Perilous, of course, because of the almost unimaginable rate of change in computers and telecommunications – from wild fantasy to commonplace in 5-7 years. If you can visualise a desirable application, you should plan for its implementation because the technology will catch up with your price/performance requirement. The library profession is increasingly part of the computer world, and needs to adapt its thinking to accommodate such speed of development.

Super-Networks: Even within its own rapidly developing context, telecommunications is set for a period of radical development. The very units which specialists use to measure the bandwidth (data capacity) of a data link are changing, from bits per second to kilobits, to nanobits to terabits and beyond. Each stage marks a thousandfold increase on its predecessor, and researchers are already talking of the thousand-terabit link. Partly, these developments are taking place because they have become technically possible – make the network and then search about for sources of traffic to fill it. But the main practical thrust has been the transmission of images and in particular of moving images. The runaway world-wide success of the fax machine shows that there is a ready market for high quality image transmission – using photo-quality colour, even holographic images. The current experiments in high-definition television and video are also driving experiments in high-bandwidth telecoms channels to bring the signals into our homes.

These developments produce networks where the level of data throughput is operational irrelevance: smart networks, with smart packets of information knowing where they are going and able to get there and reassemble themselves under all possible circumstances. This is best illustrated by an analogy. Current data packets are like a 7-year-old child flying from London to Los Angeles with two stopovers: it is barely conceivable that she could arrive at the other end without constant guidance at every switching point, and some degree of supervision even on the flights themselves. The smart packet is the experienced adult traveller who can get himself to the destination in reasonable time, through all manner of vicissitudes and with virtually 100% certainty of arrival. The smart network opens up limitless possibilities to the system designer, for example in telemonitoring of life-critical processes, or of browsing an electronic library of high-definition colour images. When access to information is routinely a matter of electronic conversation, the library of first resort is much less likely to be the nearest.

The Electronic Ferret: Until now, this paper, and library database provision, have been almost entirely concerned with secondary sources. This is because secondary sources are what our profession has been able to provide, not because they are what our users want. Being able to give our users Medline to search is a very achievement, but we should not pretend to ourselves that it is the real answer to their problems. It is only a signpost.

What researchers want is a system that shields them from the mechanics of information retrieval and which presents them with a set of papers (in the loosest possible sense) which are relevant to their work and interests, which have a known reliability factor, and which are formatted in a familiar and congenial style and layout. Ideally they want this to be done on a routine and automatic basis and to scan every possible source of information in the world. They want their own librarian, in fact. And in a very few, very eminent cases that is what they get.

It is now possible to conceive of a piece of expert software which will scurry about the academic networks of the future, performing this service for even the humblest postgraduate student and researcher. In future the electronic ferret (so called after the small predator which is traditionally sent into warrens to terrify rabbits into bolting blindly out of their holes into the waiting nets) will sit on a personal workstation and: (a) guide the researcher through a formulation of his or her subject interest, weighting terms according to their importance and relevance, and building in a random element from those oddities who believe in the importance of serendipity; (b) working invisibly in background mode and out of hours, search the networked journals, bulletin boards, research notes and other data sources, applying an informed judgement as to their reliability, and retrieving all items crossing the current relevance/acceptability threshold; (c) translating all materials into the languages familiar to the researcher; (d) organising the material thematically (or otherwise as required); € formatting the resulting bespoke “journal” into the organisational format, typeface and page layout, preferred by the researcher, and outputting it to screen, printer or whatever else is specified. At this point, the role of the readers’ services librarian becomes a little unclear.

Conclusion

The ferret will not start retrieving its smart packets this year. Perhaps all this will not happen this century. But most of its elements can already be seen in primitive form, and the first tentative links are being put in by systems such as CARL. It was only 12 years between offline Medlars searching and the first CD-ROM prototypes. Retrieving primary documents will assume greater importance and sources such as Medline will go into relative decline. The intelligent element in computerised search and retrieval will continue to move away from the database and into the searcher’s workstation. The librarian-mediated search will join the scriptorium and the Computer Department in the mausoleum of library history.

Some readers may think that this paper predicts a gloomy future for the working librarian. Perhaps it does, but much, much less gloomy than that of the commercial scholarly publisher, who has so much more to lose and is so much more certain of losing it. The pattern of scholarly work is changing as the PC becomes a pervasive tool in all disciplines and bulletin boards and respectable electronic journals emerge. Librarians may wish, or even need, to become the gatekeepers for changing patterns of scholarly communication.