Future of Distributed Databases

ACIS Seminar 7.11.1990: The Future of Distributed Databases

Although my brief is to raise some of the issues surrounding the future of databases and to describe the work of the National Task Group on Datasets, I thought it might be useful to set down my impressions of the library side of the history of the ISI initiative. It has slightly confused and differing strands of origin, which bear on any discussion of the future and which deserve to be more widely known. I am certainly aware that for at least seven years there has been interest in the university sector in mounting databases such as ISI campus-wide and, of course, we know that this has happened in industrial concerns for many years. About three years ago, some of the bigger schools of the University of London began to look at a London-based initiative based on Science Citation Index. At roughly the same time the Radcliffe Science Library in Oxford was considering the same idea, planning possibly to network the data throughout the University of Oxford. Oxford and London have irregular and informal contact and since both the cost of data and availability of software were an issue, we agreed to keep in touch with a view to sharing costs. Quite independently, John Lamble, the then librarian of the University of Bath considered an initiative to purchase the tapes for the South West region but had trouble in finding the five institutions then considered necessary for such an arrangement. He approached some colleges of the University of London to see whether we were interested in joining him. As a result of his discussions he called a public meeting in Bath to see whether a consortium was viable.

As a separate and simultaneous development, NISS and CHEST, which Mike Johnson has described, were moving closer to the Library community where colleagues such as Peter Stone at Sussex and the staff at Southampton University were demonstrating the links between the two communities of library and computer centres. To his eternal credit, Mike saw the potential for development here, was aware of the amount of money available to the Computer Board and most importantly of all, was willing to act. At this point thanks to the overlap of individuals in these various initiatives I have been describing, the SCONUL Advisory Committee on Automation Policy became concerned at the mushroom development of uncoordinated initiatives and invited Mike to a meeting for discussions on a national initiative. We thus followed Dick De Gennaro's dictum "Make big plans and aim high". Starting from the basis of seeking campus wide solutions we finished up with a mind-bogglingly large national deal in which Mike was able to say that he was acting on behalf of the Library community.

Mike has explained the mechanics of how the deal was set up. For better or worse Peter Stone and I had been the main library contacts and from the start we have made it clear that we consider this to be a potentially revolutionary deal. It seems clear however that it is the view of the Board that history is not made by accident - a quite false view as any humanities student knows. They prefer the conspiracy to the cock-up theory of history, even if the records have to be adjusted to show this. So when it was made clear that not only was this a dramatic leap in the dark but one that would have to be repeated at regular intervals, the Board felt the need to have an ex post facto justification for what they had done, and retrospectively to bring rationality to an irrational situation and justify as policy a piece of naked opportunism. It was also hoped that this would provide the justification and methodology for future action and not just for the ISI deal. Certainly Pete, Mike and I at least were determined that this should be a first step to other deals and not just an isolated and curious aberration. But before turning to these future deals, it may be as well if I go through the work and conclusions of the National Datasets Task Group, which was the vehicle set up by the Board to justify what it had done.

First of all then its remit and membership. The remit was:

a) To assess the future needs for national database and dataset services.

b) To consider resourcing issues.

c) To explore potential use by Polytechnics, Further Education Colleges and Schools.

d) To assess the implications for networking infrastructure and links to other networks.

e) To review access and delivery mechanisms.

f) To report to the computer Board by September 1990.

For the record I should perhaps also mention that the Computer Board solicited bids to run the data service and a separate review panel was set up. Three bids were received and that of Bath University accepted.

Needless to say the tight deadline of six months which the Group was given in which to report made it impossible to consider all the issues fully and so we effectively considered only a subset.

The membership of the Group was interesting. There were two librarians, Peter Stone and myself, a Polytechnic Director, a University Professor of Computing and another of Business Studies as well as a member of the Computer Board staff. all of these clearly felt varying degrees of complete ignorance or discomfort at dealing with this unfamiliar territory. Yet equally they were quite happy at making decisions on topics which were not their own, and spending very large sums of money, if adequately briefed. They also made very strong attempts to clamp down on the sort of highflown nonsense I am prone to, which sees this project as breaking new ground. At most, this was to be seen as a minor extension of existing activity, with the role of the Computer Board being to bring order to a disordered scene. I certainly gained the impression that the Board is happy to be innovative but does not want to be seen as such.

Well we set about taking evidence from various groups and bodies and it may be worth mentioning that there was as much input from those interested in the provision of sets of data as from those interested in bibliographic data. It should be borne in mind that things like census data and mapping data are competing for this source of funding and as librarians we will have to consider what our involvement and role will be with non-bibliographic data. Anyway without too much difficulty the Task Group agreed to recommend:

that the Board continue to spend a lot of money on data for the next three years

that there should be a review of options for service delivery

that the ISI service should be monitored

that the Group or a successor body should be used to continue looking at the issues

that sites be asked whether they would be interested in bidding to be a host for any future deals

Well the money side is straightforward. As you know, the sites are putting some money into the ISI deal and that notion of matched funding will continue to be the pattern for mainstream data sets. However it was agreed that as datasets become more specialised the relevant user community should be expected to provide an increasingly higher proportion of the resource, although it seems likely that the network infrastructure would remain free. This is quite important, because while there can be little doubt that the ISI databases are general, there were mixed views on whether something like Medline is a general or a community specific database. So there will be a hierarchy of provision with the level of Board funding being related to the size of the user community.

Second was the review of delivery mechanisms. It is important to remember that the ISI deal is an experiment, a very big experiment it's true, but still an experiment. Do we need a single data centre at Bath or should we be looking for several up and down the country? Is the model adopted for Bath the best one? For example there have been discussions, admittedly unavailing, which would buy time on existing commercial services rather than set them up in the UK. And of course ISI will be using STATUS rather than BRS for retrieval software. Is that the right decision?

Thirdly that the ISI deal be monitored. Well the intention here is not just to monitor, but to ensure that there is a lot of user input into the setting up of the service. This is intended to cover users as well as librarians and to look at how the database is serviced and supported as much as at how easy the software is to use. It will also consider issues of whether there should be user groups for each dataset or whether there is sufficient commonality to have one overarching group considering service issues rather than product deficiencies.

Finally the Group is to continue, to look further at its original brief and to consider issues arising from the experimental service.

I said earlier that it was the role of this Task Group to stress the evolutionary rather than the revolutionary nature of this development and if it suits political needs to do so, that is fine by me. However for this audience, I really do want to stress the revolutionary nature of what is going on. I imagine that most of you are now familiar with the concept of the change from the Ptolemaic to the Copernican world, but it bears repeating, because it is central to what may be about to happen with the ISI data.

In the traditional Ptolemaic World, the world most of us inhabit, the library is at the centre of the universe. Round it flow all the things that keep the library operating, library staff meetings, readers, money, then booksellers and fat publishers, photocopying, computers telephone, mail, books, journals and more meetings. It is not entirely a caricature to say that each of these is equally important to us. Now let's look at the Copernican world, where the user is at the centre of the universe. The pc or terminal gives a window on the world, but it is only one of many sources of information. Of course books and journals are there, but so is television, microform, film, laboratory results, post and telephone. In this information rich environment, the library plays only a small role and the more barriers and regulations we erect and the more difficult we make the library to use, the more likely it is that our Copernican user will concentrate on the alternative information sources.

Now one of the important points of the change from the Ptolemaic to the Copernican world was that the heavens did not in fact change. The sun did not stop circling the earth and the earth begin circling the sun. What changed was people's perception of the forces that governed their environment. There are still people who believe that the world is flat, but an ever diminishing and not very influential minority. I therefore want to suggest that the issues surrounding the ISI deal are issues of how we perceive our future. I have emphasised and will continue to emphasise that this is an experiment. The fact that it is a world first and the largest experiment we are ever likely to see do not make it an operational system. I stress that it s an experiment because it is not the only possible method of operation. Many issues remain to be settled. Do we want one data centre or several. Do we want a Status rather than a BRS front-end or do we want neither. What datasets do users (rather than librarians) want. Should we look at buying time on commercial systems rather than setting up rival operations. Have you talked to your Computer Centre? Can they cope with suddenly having a population of 6-10,000 e-mail users. What will the effect be on ILL services. Will document delivery simply find new end-user to end-user channels. A whole host of problems is thrown up which it will be exciting to resolve. Some are problems of practice but many are problems of principle surrounding the central issue of how end-users will manage their information gathering needs when presented with a new generation of computer tools.

I'd also like to talk about the immediate impact of the deal on libraries and to do that in the context of the SCONUL/IUCC Working Party Report on Information Provision. It is clear that in most institutions it is not yet the norm for all staff and students to have easy untrammelled access to terminals at places convenient to them. At the same time it seems that the most common response to the ISI deal is to stick the library with the bill. The natural library response, due to financial pressure, is to cancel either the CD or hard-copy of Science Citation Index. This is, of course, false short-term, if understandable logic. In the short term this actually reduces access to the data. Where in the past everyone had access to the hard copy, now only those who can access terminals will have admittedly much improved access to the data. Fine for one database. But what is going to happen when we add one or two products a year. Are we really going to denude our reference collections to the disadvantage of the library users? Will some of you think of offering value added services such as seminars and publications to defray the cost? The point I then want to make very strongly, also bearing in mind that from next year the Computer Board will have some responsibility for libraries, is the need for institutions to hammer out an information strategy which brings together the Library, the Computer Centre, the MAC initiative, information systems and so on. It is wholly unacceptable to plant this institutional issue on the library within the confines of their existing budgets. It will be quite some time before every student has unfettered access to computer systems. Institutions spend up to 15% of their budget on information in the widest sense and really must develop policies for its financing and management.

The next issue is the effect this may have on libraries and librarians and the way in which they operate. That is why I see this deal as a key marker for the future and one which requires very great attention on the part of the profession. If we get this wrong we may blow the future. I have been saying for some time that apart from a very few large libraries, we must move away from the notion of possession of books as the paradigm of library operation and move more towards the provision of access to information as our model. The role of the librarian moves much more towards that of product champion and gatekeeper. The role we have to develop is that of mass instruction in information management skills and I mean by that rather more than the annual ritual of the tour of the library for first year undergraduates. Take the case of the ISI deal. How are you going to instruct 6000 users in system use. What helpdesks will you have. Which of your staff will be responsible for the production and dissemination of documentation. How will the data be linked to the work of your institutional research strategy committee? Do you plan bibliometric studies of research impact? How will you liaise with the advisory desk in the Computer Centre. What are the demarcation lines? On the other hand, will you simply ignore all of this and assume or make it someone else's problem.

One of the interesting things to me about this experiment is that despite all the problems, it is an easy issue where the problems can be identified and answers found. It is just a new and very large problem. Within the next couple of weeks the results of the British Library RDD funded Information UK 2000 Project will be published. ISI presents us with a great mass of data which we are all familiar with and have used for years in hard copy. The Project 2000 report will suggest that in future the number of producers and distributors of data will mushroom, as has happened in conventional publishing. Issues of data quality and data context will begin to appear. There are significant worries about how this flood of data will be judged as well as managed. Who will advise on these issues? Again I would suggest that this is an area in which as a profession we can have a considerable impact, but only if we have mastered the problems of the easy databases like ISI. Where will users turn when they want information sets and who will actually be qualified to advise them. We have no natural right to a position of authority - but no more does anyone else - and it is a position to be earned if we wish.

So just in closing I want to repeat one or two key points about my view of the future. It is clear that a lot of money is going to go into datasets, from a body which will be increasingly influential in library affairs - the Computer Board. We can treat our share of this as competition for the present bookfund or we can fight to have this discussed as an institutional issue. Whether or not we succeed in this, there will be an inexorable build-up of end-user access to databases. I then find the choice for the future quite clear. Either we stick with the perfectly honourable role of doling out short loan textbooks to undergraduates - which will see at least me to retirement, or we can move into a quite different world where we are in effect information brokers and information skills teachers. That is the more difficult but the more exciting path. I still don't think the Computer Board knows what it is letting itself in for on this, when one looks at the potential scale of what we are unleashing. The SCONUL/IUCC survey showed about 300 CD ROM's in universities. We think of that as a great leap forward. But if we throw several million pounds and the entire JANET infrastructure at a problem, that completely dwarfs our CD initiatives. It is bound to have a major impact. The ISI initiative seems to me the most significant development since the introduction of BLAISE twenty years ago. So, my parting message is that you can ignore the future or join it - but look at how quaint the views of Ptolemy look nowadays.