88retroconversion

The state of retroconversion in the United Kingdom: a review

DEREK LAW*

ABSTRACT: This paper was commissioned in March 1987 by the Centre for Catalogue Research (now Centre for Bibliographic Management) at the University of Bath, for presentation to the LIBER Working Group on Library Automation. It discusses the background to retroconversion, the advantages and disadvantages of the various methods of tackling the work, project costs and the problems to be faced. There is a discussion of future trends and desirable areas of research both nationally and internationally.

BACKGROUND

Retrospective conversion of traditional catalogues into machine-readable form is a phenomenon of the last decade in British academic and research libraries. Retroconversion essentially attempts to remake the catalogue from the beginning and as such it represents an investment of dozens of man-years of effort for the smallest library and hundreds for the largest. This in turn means that the process is enormously expensive of resources. That fact has led to a large variety of ways of attempting the task and a constant seeking for new solutions in an effort to be as economical as possible. Conversion has fallen into three broad and overlapping phases:

1 The earliest attempts were driven by the need to create files for circulation systems, usually consisting of records created to a locally defined standard. The earliest reasons for retrospective conversion were rarely spelt out and can now seem rather vague, but appear to have at their root the way in which computer systems integrate the catalogue and circulation functions. It therefore becomes necessary to create records which are usable for both functions, if effort is not to be wasted. Some libraries have followed the path of creating short author/title records to local standards which can be used for circulation, while others have preferred to adopt MARC standards, later stripping the records down for the circulation function.

2 This was followed by projects which were still local in nature, even when they made use of the resources of a cooperative or perhaps BLAISE, usually concentrating on the most used stock, or perhaps the stock of a less accessible collection, such as an education branch library. The central point within these variations is that the retroconversion is still seen as an internal response to the needs of a single library system.

3 Finally, coincidental developments in communications technology and the appearance of online public access catalogues have opened up possibilities for sharing and cooperation amongst groups of libraries. These possibilities are only just beginning to be explored, but they bring us full circle to programmes of work designed to convert the whole catalogue, but now through the acquisition or creation of exchangeable records. Here, however, catalogue conversion is seen as a necessary preliminary to more general resource sharing.

This bald division conceals a wealth of varying practice. At least in the larger institutions, a retroconversion project may last for years, as funding ebbs and flows, and the library may have experienced all of these phases in what is basically a single exercise.

THE UNITED STATES EXPERIENCE

Retroconversion programmes in the United States are also a product of the last decade and the American Library Association’s Retrospective Conversion Discussion Group was founded only in 1981. Adler and Baber (1) give the flavour of the American experience in a book of case studies. These show several approaches differing from the methods most commonly found in the UK. Full details of individual projects are hard to come by and this book is therefore also useful in containing a full methodology from Rice University. It will be inappropriate for general copying but gives many useful pointers on the management of such projects.

THE PRESENT POSITION IN THE UNITED KINGDOM

It is extremely difficult to quantify how much work has been done and how much remains to do. Many libraries have fairly full circulation files, which may or may not be upgradable; libraries record the number of volumes, while their databases tend to record the number of titles; it is not always clear whether libraries are referring to total stock or monograph stock. The fullest information is available for the universities. A recent survey by Hoare (2) tabulates the situation in university libraries as at Easter 1986 and shows an enormous range of experience. Some 49 libraries responded, and mentioned conversions ranging from one special collection of a few thousand volumes at Brunel to over 1 million for the Bodleian Library’s pre-1920 catalogue. The largest files, at Aberdeen, Hull, Sussex and the Bodleian Library at Oxford, are of non-MARC records. Liverpool, using a simplified format, has the only MARC-based file of comparable size. Most of the libraries claimed to have some work in progress, but it is clear that completion rates vary widely.

Until now, Aberdeen has been unusual in the UK in having an external firm convert records to a non-MARC standard rather than converting in-house. These records have served the university well for many years, but present a problem now that the Scottish universities are looking at ways of cooperating, including the sharing of records. Yet it must be unattractive to consider writing off the 600 000 records possessed by Aberdeen. Both Hull and Sussex converted their records through in-house keyboarding, the former using Manpower Services Commission staff. Both the Bodleian and Liverpool used bureau services to undertake the keyboarding. Only Southampton, with a file of 300000 records, appears to have created its database from its own resources.

A second group of libraries has begun retroconversions, often using their own staff and often linked to membership of a cooperative. These efforts are intended to be selective, whether by date or some determinant of use. In addition to heavy use and short loan collection books, some libraries seem to have converted a branch library, either as an experiment to prove systems or to make information about the collections more accessible. Many libraries, such as Kent, Reading, St Andrews and Durham, have followed this route. One of the largest conversions is at Newcastle, unusual in that almost all staff are used to retrieve records from the OCLC database as a matter of routine rather than as a special project, and with a very high rate of productivity.

The final and still rare approach depends on the growing availability of records, which allows larger areas of less common stock to be converted while still retaining high hit rates. These libraries will tend to attempt to convert complete or almost complete sections of stock, irrespective of likely use of that stock. It may be that most libraries will aim eventually to convert the whole catalogue, but from Hoare’s survey it is clear that only a few libraries have planned total conversions, ranging from King’s College and University College in London, to Warwick, UMIST and, most ambitiously of all, Edinburgh.

It can be deduced that the majority of universities have converted between 3 and 30 per cent of their stock and that those with the highest percentages converted to MARC standards have the smallest files. It appears that in the university sector, perhaps 30 million items remain to be converted, although whether this represents the number of volumes or the number of titles is unclear. In either case it will require a massive capital investment (3).

The position varies in other major research libraries. The growth of stand-alone circulation systems means that almost all the polytechnics have at least short title circulation records, while many have MARC files of some size, particularly of heavily used stock. Many of the major public libraries with lending services have short title files, but the large reference libraries typically have less than 20 per cent of stock converted, while some of the very largest such as Manchester and Leeds have no machine readable records. The absence of a need for circulation systems as well as the sheer size of files has delayed significant progress in the national libraries. Scotland took an early interest in current cataloguing online in the mid 1970s but has not moved into retroconversion. Wales used REMARC records when little else was available, but is now having these upgraded, while the British Library has finally begun work on its large GK3 catalogue, after several false starts.

METHODS OF RETROCONVERSION

In-house copying

The simplest method of retroconversion would seem to be copying the existing records in full or in part into machine-readable form within the library or through the work of a bureau. It is particularly attractive where the keyboarding does not involve the addition of complicated tagging, such as for MARC, and is therefore in large measure copy typing. The largest such conversion was at Hull and it quickly provided an enormous and extremely useful file. However, a later analysis by Dyson (4) shows that the apparent economy of in-house conversion led to a hidden penalty in the shape of errors averaging over two per record. These were of varying degrees of importance, but 11.4 per cent of records were considered to have serious errors. Clearly the need for extensive quality control at all stages has to be taken into account when using this method. Nevertheless, the method has been adopted by half a dozen university libraries, which are more or less satisfied with the files so created. They are, however, unique to the library concerned and effectively bar the door to any record sharing with other institutions.

New formats

In the late 1960s and early 1970s a number of the new universities took a great interest in the development of shortened catalogue records. The experiments at Bath on the mini-catalogue (5) are probably the best known result of this period, but it also saw the development of the MINICS record format at Loughborough (6). This format is intended to be upward convertible to MARC, but that feature has not been used and the format has never achieved wide acceptance. The fear that such records would prove a poor investment seems to have been eased by Bath’s upgrading of its original records to MARC compatibility-although the quality of the old format was the key in this case.

Optical Character Recognition

Since human involvement in record copying appears to increase the number of errors in the catalogue, there has been a corresponding interest for some years, e.g. in such major libraries as the Bodleian, in the possibility of automatic input through Optical Character Recognition (OCR). This should not be confused with the American version of OCR conversion, which consists of retyping parts of the card catalogue entries in an OCR font. This is then read by an OCR reader and matched with a database belonging to either a bookseller or a utility, such as OCLC. British experiments on the other hand have looked at a solution which seems in principle neater, where the original record is scanned on an OCR machine and becomes the new record in an exact copy of the old.

Despite a number of experiments the method has so far produced disappointing and inconclusive results, notwithstanding its apparent promise. Diamond (7) describes the experience of Glasgow, where a Kurzweil Data Entry Machine was used with typed catalogue slips covering the 1968-79 period. The report is an optimistic one, but it shows a system converting only 10 slips an hour-with the hope of rising to 60 slips-and with a success rate of only 85 per cent, which means that virtually every record required editing.

Much the most promising OCR system is OPTIRAM/LIBPAC, described by Harrison (8), although it has yet to be proved in production. Existing card or sheaf records are read by a scanner based on Group 3 fax machines. The digitized images are converted to ASCII code and held on a microprocessor for analysis. Errors are reduced by identifying words as the unit and comparing the result with a dictionary held on computer, followed by automatic error correction. Perhaps more importantly, the software also uses format recognition to determine the various fields of a record and then adds MARC tagging. These records may well be to a low cataloguing standard, particularly if of some age, but they are at least MARC-compatible. Such a method seems particularly attractive for areas, such as music, where hit rates from conventional databases may be quite low. More generally, it in theory allows a library to combine the speed of in-house copy-typing conversions with the accuracy of automatic data capture.

External databases

The major alternative to in-house conversion is the acquisition of records from an external database. Despite the costs involved this method becomes increasingly attractive. Not only does the use of standard MARC records offer great hope for sharing and cooperation, but as the databases grow in size, hit rates rise correspondingly and even the most specialized of libraries will find them of substantial benefit. OCLC is possibly the best known supplier of records, holding over 12 million in all.

General

Libraries can achieve hit rates in excess of 80 per cent while the School of Slavonic and East European Studies of the University of London estimates a hit rate of almost 60 per cent. The main databases or utilities available in the UK are the British Library’s BLAISE-LINE service, OCLC Europe and UTLAS. The two major British cooperatives, BLCMP and SWALCAP, also have large files of several million records. Although they can be used for retroconversion projects, the databases are very different in nature, with BLCMP having a single central file and SWALCAP favouring a distributed arrangement. Most of these utilities were discussed in a survey of the services available in 1985 published by Vine (9). Even in the 2 years since then, there has been a shift in the databases available and in the range of services they are willing to offer in what has become a competitive market. The Vine article is, however, a useful snapshot of available services at that time. Leeves’s (10) guide to library systems gives the most up-to-date picture, although since its publication the British Library has announced a link with a North

American database (11). The current OCLC range of services is typical of those now available from the major utilities. A retroconversion service is offered where OCLC prepare a machine-readable catalogue using the library catalogue or shelflist, with no involvement from the library; a fixed-price fixed-term contract, for libraries with precise budgetary constraints; a reference service accessed via PSS for non-members, with the option of having the selected records written to tape; a microcomputer based system where search keys are entered offline onto floppy disk; a tape conversion service which allows files of non-MARC, typically which allows access to all shared files.

Two basic methods exist for acquiring these records: offline batch matching often by control number, or online matching with the records later supplied on tape. The first method tends to be the cheaper, but it has hidden costs which are not clearly understood. Assembling a file of control numbers or acronym keys on floppy disk at the keyboarder’s pace avoids telecommunications costs, but leads to an element of mismatch in the records which might be avoided through online matching. These mismatches require positive effort to remove them. As an aside, one should note here the fascinating work done by Ayres and his team at Bradford (12) on creating USBC, a technique aimed at merging large bibliographic databases while eliminating duplication. The long turnround time in receiving records and/or diagnostics from the utility allows sufficient opportunity for the same record to be requested several times before the first request has been satisfied, at least on a major project. This difficulty can be reduced through sophisticated project management, but that in turn imposes a cost. Online catalogue searching imposes high telecommunications costs and also opens up an area of debate on the type and therefore cost of staff required to match records satisfactorily.

The whole debate is further confused by the perceived threat of the deskilling of cataloguing, since such projects are rarely conducted entirely by existing staff, but more usually in whole or in part by specially appointed staff on clerical-related grades.

Bureau conversion

The newest conversion method lies with OCLC and Saztec, companies offering what is more commonly a North American or Australian service.Essentially, they offer an agency service. Working from a copy of the card catalogue or shelflist, these companies acquire or create MARC records in much the same way as a library would itself do. In theory, with a trained and resident staff, they can convert catalogues more quickly, without involving the library in management or staff training. Further, since much of the material not found on the databases can have an EMMA record created with great speed and accuracy by clerical staff, the cost is not necessarily higher than in-house methods. This is clearly a clean and efficient method which should appeal to libraries. Whether their institutions will be as keen on speed and the resultant need to find capital at once rather than spread over several years remains to be seen. Saztec have just won the contract to produce the British Library’s much delayed GK3 catalogue and, although the method used is rather different from a normal retroconversion and creates records which only approximate to the MARC standard, the contract will be watched with much interest and will prove the greatest possible test for this method. Saztec have also been working with the National Library of Wales on upgrading records originally acquired from REMARC.

Potential requirements files

Still to be explored is the use of potential requirements files (PRF) held in-house. One major element of online searching is telecommunications costs and holding a PRF locally would avoid this. Edinburgh experimented with this approach by holding a section of the SCOLCAP file, but has now abandoned it. It may be that CD-ROM is a more promising route for this approach but it again remains to be tested. It is worth noting that as part of its contract with the British Library, Saztec has gained the right to produce and market the catalogue on CD-ROM (13). Interestingly, the University of Illinois has produced a test CD-ROM disk containing 700 000 records from the 1976-86 period, for use within the state. Although not seen as its main aim, this clearly could act as a PRF.

Perhaps the largest commercial CD-ROM venture is LaserQuest, produced by GRC. It is an American database with some 4.5 million MARC records held on five disks. To minimize effort it is desirable to have up to five of the still moderately expensive players linked to a microcomputer, but since the entire file sells for around $4250 this appears a very attractive option financially. However, before it is viable in the European marketplace, work remains to be done on conversion between MARC formats and on Kermit type facilities to allow downloading of records to the main library system. GRC is at present exploring this.

Record sharing

As some libraries move ahead with their conversions, their growing files also begin to offer possibilities for sharing, outside the existing framework of cooperatives. Perhaps the closest operational approach to this is in London, where nine of the university’s schools share a file of 750 000 MARC records. The most likely vehicle for this sharing seems to be JANET, the Joint Academic Network. In the late 1970s a number of networks sprang up in the UK to link research staff in the universities and research institutes of particular regions of the country. The UK Computer Board decided in the early 1980s that it would be more sensible to turn this mild chaos into an ordered national network and appointed a Joint Network Team to undertake the task. Almost by chance, librarians with an interest in computing discovered the network and have been quicker than the general academic community to uncover its possibilities. The Joint Network Team has proved an enthusiastic supporter of library use. So far interest has lain in the possibility of interrogating the catalogues of other libraries and then transferring records between institutions. The most developed project to explore this has come from the Consortium of University Research Libraries (CURL). CURL is an informal grouping of the libraries of the seven largest universities in the UK-Cambridge, Edinburgh, Glasgow, Leeds, London, Manchester and Oxford-which between them have one of the richest potential databases in the world, in excess of 20 million volumes. The Scottish universities have also begun work on a resource-sharing project called SALBIN and have begun a programme of research into possibilities. Both these initiatives are promising, but it will take some time to establish whether they can be turned into operational systems. This possibility has perhaps been made more realistic by the decision of the University Grants Committee to provide some funding for both of them. JANET itself is not receptive to large-scale file transfer online and initial efforts may have to concentrate on identifying records online for later conventional transmission on exchange tapes.

RETROCONVERSION COSTS

The use of any of these methods requires careful costing. It has been assumed that by accepting high-quality MARC records a similar quality is required for EMMA cataloguing. Since this is generally recognized as a professional and costly activity the production of these EMMA records becomes very expensive; conventional wisdom puts the cost of creating such a record at £5-6, if using professional staff. But this uniformity of quality is self-imposed and it is not clear that it is an absolute need. It certainly emphasizes the importance of maximizing the hit rate. Raw records from the utilities can vary in price from about 20p to 60p depending on the method of retrieval used. To this must be added the cost of producing the local data on number of copies, locations, class mark, with the further possible addition of editing costs if authority control is used. Then there are telecommunications costs, format conversion costs, equipment costs and staff costs. These cost elements have varied enormously from conversion to conversion without obvious reason, but, as a rule of thumb for large conversions relying mainly on external record suppliers, an average figure of up to £2 per record is probably close to the mark. Most libraries will manage to reduce the actual cash outlay through the use of existing staff and equipment or find some of the resource from external sources such as the Manpower Services Commission. The most varied experience in the UK probably lies in Edinburgh, which has used many of the available services and a wide range of staff with different levels of qualification, as described by Ralls (14). Newcastle has used a quite different but equally long-term approach, as described by Bagnall (15).

The bureau approach is, at least on paper, a cheaper option. The exact price varies from library to library and has to be negotiated with the bureau. Factors involved are the type of materials, the likely hit rate, the amount of local data to be added, the state of the original records on which the work will be based and the timescale. Given these factors, both Saztec and OCLC are likely to quote a unit price in the range 60p to £1.

PROBLEM AREAS

A number of quite varied problems and areas of debate have emerged over time, as projects have been carried out.

1 There have been arguments over the cost of converting from USMARC to UKMARC. This has become an international problem, but it seems likely that the move towards UNIMARC will resolve the difficulty. IFLA’s promotion of the International MARC (IM) Programme to develop UNIMARC should progressively reduce this barrier to transborder data flow (16).

2 There are less esoteric problems in trying to reconcile non-MARC and MARC records. SWALCAP’s Libertas system claims an ability to upgrade brief circulation records to MARC standard, as does OCLC, while the Scottish universities are considering how Aberdeen’s Oriel records may be re-used.

3 The use of records from a variety of external sources has opened up what promises to be a lively debate on standards. Edinburgh, which has drawn records from perhaps the widest variety of sources, considers high standards of editing essential, while Hoare (2) quotes Newcastle as describing ’over-zealous editing as the bottomless pit into which a conversion may sink’. Adler gives the succinct view that ’You may start by being a perfectionist.... You will end by being a pragmatist’ (1).

4 The debate on standards reflects a real and legitimate difference of view on cataloguing standards and a real difference of view on how far the creation of online catalogues with multiple access points will affect the way in which catalogues are used. It also reflects a lack of clarity in supposed to be generally beneficial without there being any clear determination of what they are trying to achieve. But there is a substantial difference between moving from an imperfect manual catalogue to a perfect automated one and, say, moving from a series of split catalogues to a single machine-readable file whose virtue is seen as unity rather than consistency. Both of these goals have equal validity, but they may well require different standards of practice. In these cases, there is a great deal of room for debate, with no ’right’ answer, precisely because libraries are trying to achieve different goals. Ralls(14) describes the retroconversion programme in Edinburgh with clarity and conviction and an acute consciousness of the long repetitive grind of the work involved in a major library. Edinburgh’s methods and judgements will not apply everywhere, but they amply demonstrate the importance of explicitly determining what is to be achieved. A second paper by Ralls (17), presented to an Essen symposium, explores all of these issues.

5 Other areas of debate have tended to centre on more pragmatic difficulties. Is it necessary to convert the whole catalogue or, given cost constraints, should the library concentrate only on the most heavily used stock-and if so, how is it to be defined?

6 Should one work from the catalogue record or the book in hand? If the library has been regularly shelf-checked and the catalogue accurately reflects the stock, working from the shelf records will prove satisfactory. If not, is it sensible to convert the records for books which no longer exist in library stock? Conversely, to conduct a major shelf-check and catalogue correction exercise adds considerably to the cost of the retroconversion project.

7 Is it practical to work from the daily returns of books? These are clearly the most heavily used items, but to take the circulating stock out of circulation for any length of time damages service to readers.

8 How important is the consistency of catalogue headings once a multiplicity of access points is offered? Much time can be spent on authority control and the achievement of consistency. Would this be better spent on adding further records? This again reflects the need for a clear view of the aims of the project.

THE FUTURE OF RETROCONVERSION

At the lowest level, the work of retroconversion will continue more or less slowly, with the newer methods described above being explored and tested. As time goes on, the conversions should become cheaper and quicker as the available files of records grow and hit rates rise. This in turn is likely to re-emphasize the primacy of the MARC format. As a consequence, we may expect an increasing interest in methods of upgrading records automatically. Some already exist and are based on the derivation of acronym keys from the existing record, which is then matched with a MARC database. One major technical advance has, however, still to be made and that is the ability of open-system networks (as opposed to the closed networks of the cooperatives) to transfer large data files online. Many libraries have had split sequence catalogues for many years and may be happy to accept partial conversions and continue to accept split sequences, even if these splits are now between machine readable records and traditional catalogues. A handful of libraries which see complete catalogue conversion as one aspect of a much wider change in the growth of automated information services will continue to devote resources to the creation of a single catalogue file.

The development of UNIMARC should promote greater international cooperation and it is to be hoped that the welcome interest of the EEC in libraries may act as a vehicle for this. Once JANET has been mastered by UK libraries, they may be expected to develop an interest in the possibility of using the European Academic Research Network (EARN) to explore record exchange within Europe. Large files already exist in some countries, such as the Netherlands, while the new interest from countries such as France and Portugal raises the prospect of large-scale European cooperation. As trans-border data flow grows, there will no doubt be a growing interest in language problems. This has been examined to a degree in countries such as Belgium and Canada, but-apart from work at the National Library of Wales-has yet to be explored in any significant way in the UK.

AREAS FOR RESEARCH

From the above comments, it will be clear that there is an enormous task to be faced in the UK and it is safe to assume that the scale of the problem is mirrored throughout Europe. Cooperation will allow this work to be tackled with the greatest economy of effort and there are three particular areas where research would simplify either the task or the decisions surrounding it.

1 The online transfer of large files presents real difficulties. There are both technical difficulties in the transmission of large files and operational difficulties in controlling external access to a library’s primary files. When viewed as an international collaborative effort there may be language problems to contend with, e.g. in the area of subject access. This whole topic requires definition and exploration.

2 There is very little knowledge in the library community of the availability of databases and the networks, such as EARN and ARPANET, which might be used to access them. If the information collected by the EEC on the state of the art of library automation in Europe (the LIB2 Programme) is to be translated into action, some prototype record exchange systems must be set up to assess and monitor the practical difficulties.

3 There is a need for some research into the effect on library users of partial conversion of the catalogue. A strong theoretical case can be made for or against such total conversions, but impact studies would give hard evidence.

* DEREK LAW is Librarian of King’s College London. He has been involved in library automation programmes in various universities since the early 1970s. He has managed substantial retrospective conversion projects m Edinburgh and London and acted as a consultant to other institutions.

REFERENCES

(1) Adler, Anne G, and Baber, Elizabeth A. Retrospective conversion. Pierian, 1984.

(2) Hoare, Peter A. Retrospective catalogue conversion in British university libraries. British Journal of Academic Librarianship, 1986, 1 (2), 95-131.

(3) Library Technology Centre/Library Association. State of the art of the applications of new information technologies in libraries and their impact on library functions in the United Kingdom. LTC/LA, 1987.

(4) Dyson, Brian. Data input standards and computerization at the University of Hull. Journal of Librarianship, 1984, 16 (4), 246-61.

(5) Bryant, P, Venner, G M, and Line, M B. The Bath mini-catalogue: a progress report. Bath University Library, 1972.

(6) Lewis, D E, and Robinson, M E. Computer based cataloguing at Loughborough University of Technology 1968-1982: a review. Program, 1983, 17 (2), 52-7.

(7) Diamond, R J. Recon via KDEM at Glasgow University. Vine, 1982, 45, 17-22.

(8) Harrison, Martin. Retrospective conversion of card catalogues into full MARC format using sophisticated computer-controlled imaging techniques. Program, 1985, 19 (3), 213-30.

(9) Retrospective conversion: a look at some of the services available. Vine, 1985, 58, 19-25.

(10) Leeves, Juliet. Library systems: a buyer’s guide. Gower, 1987.

(11) BLAISE records-a new MARC record service. Bibliographic Services Newsletter, 1987, 4, 37.

(12) Ayres, F H et al. USBC, its use for union file creation: a feasibility study for a national database. British Library, 1984.

(13) British Library Catalogue conversion contract. Bibliographic Services Newsletter, 1987, ,4 12-2.

(14) Ralls, Marion C. The evolution of a retroconversion. Vine, 1985, 58, 31-8.

(15) Bagnall, J. LS/2000 live at Newcastle University Library: a progress report. Vine, 1985, 59, 20-5.

(16) National MARC records and international exchange. Bibliographic Services Newsletter, 1987, 42, 4-5.

(17) Ralls, M C. Retrospective catalogue conversion: policy, standards, strategy and quality control. Future of on-line catalogues: 1985 Essen Symposium, 30 September-3 October 1985. Essen, Gesamthochschulbibliothek, 1986.