87Retroconversion

THE STATE OF RETROCONVERSION IN THE

UNITED KINGDOM: A REVIEW

By

Derek G. Law

Librarian of King’s College London

Commissioned by the Centre for Catalogue Research,

University of Bath

The Centre for Catalogue Research is funded by the British Library Research and Development Department. However, the Department is not in

any way responsible for the details and options contained in this report.

March 1987

CONTENTS

Page

Retroconversion in the United Kingdom: a summary 3

Retroconversion in the United Kingdom 5

References 23

Appendix 1: UK University Libraries and retroconversion 24

Appendix 2: Edinburgh University: a case study* 29

*Reprinted from Vine, No 58 (1985) with the permission of the British Library

RETROCONVERSION IN THE UK: A SUMMARY

The problem Retroconversion programmes have followed a number of common paths over the last decade and although there has been a slow change

in the methods used there still remains the basic problem that in remaking the library catalogue the institution has to find resources in cash or kind equivalent

to dozens or in some cases hundreds of man years.

Methods

1. Copying existing files. The earliest stimulus to the creation of machine readable record files came from the introduction of circulation systems and many such files are being created and used. These are usually derived from existing manual records and may be created either by a bureau working from a copy of the catalogue or shelflist or by an in-house keyboarding project, with the keyboarding typically undertaken by Manpower Services Commission staff. This method may be used to produce a complete catalogue very quickly and economically, but unless there is extensive quality control it will not only replicate existing errors, but will introduce new ones. The use of a bureau to undertake the keyboarding has the same advantages and disadvantages but will tend to eliminate the introduction of new errors.

2. Optical Character Recognition. A further variant of this approach is the use of Optical Character Recognition equipment. OCR takes existing records and transcribes them as they stand, but the absence of human intervention both theoretically eliminates transcription errors and offers the possibility of the automatic addition of field tags. But the theory has not yet been transferred satisfactorily into practice on a production basis.

3. Use of Databases. Probably the commonest method now in use is to capture MARC records from one or more databases and to complete the file through the addition of newly created EMMA records. Although databases such as OCLC’s contain many millions of records, most libraries have found it difficult to get high hit rates for all areas of stock and have preferred to select heavily used areas of stock where high hit rates and low EMMA creation rates are likely. A choice has then to be made between off-line batch submission of acronym keys or ISBN’s on floppy disk and on-line searching of the database. The former is generally cheaper, but produces a proportion of mismatched records which have to be identified and eliminated, while the latter is more accurate but more expensive. In both cases it is still necessary to edit the records and add local data.

4. Upgrading files. Where short circulation records exist, some suppliers such as SWALCAP and OCLC now offer an upgrading service which creates acronym keys for automatic matching of records with MARC files.

5. Bureau services. More recently some firms have offered a complete bureau service whereby they will undertake the database searching and the creation of EMMA records for unmatched items. While this can be an attractive option it does pose the question of whether retroconversion should be done on the basis of existing records or whether it should be done from the book in hand.

6. Potential Requirements Files and CD-ROM. There has been some limited interest in the use of locally held potential requirements files. This may be expected to become a more attractive proposition when such technologies as CD-ROM allow the files to be created and held with minimal effort. The British Library’s decision both to have Saztec convert their catalogue and give them the CD-ROM rights is a hopeful sign here. A few libraries have also begun to look at some form of communal effort, whether by simply sharing records or more radically by identifying areas of overlapping stock then tackling them co-operatively.

Conversion costs. Costs can vary enormously, but for a large conversion, total costs will probably be close to £2 per title, although the cash outlay may be much lower if existing library resources are redeployed to assist with the project. The great danger is to underestimate the scale of the planned

operation and to neglect proper project planning techniques. Bureau services quote prices in the 60p-£1 range, but this depends greatly on the nature of the library.

Problem areas. The great area of debate is over standards. What level of error is acceptable, if any? Even taking records from a single database introduces a range of different cataloguing standards from different libraries. Older records will have been prepared to different cataloguing rules. There has to be a clear analysis of the reason for undertaking retroconversion. If a single high quality manual file is being converted, inconsistent records from a variety of sources and without authority control will be unacceptable. But if a large number of catalogues are being brought together, unity may be more important than consistency. This debate is further clouded by the question of whether the consistency of main entries becomes largely irrelevant when OPAC’s provide multiple access points. It is likely that the view of the library manager will differ from that of the cataloguer. Is it in fact necessary to convert the whole stock? Most large libraries have had split catalogue sequences for years without apparent effect.

Another range of difficulty emerges over the different MARC standards and their incompatibility. This should be resolved by the emergence of UNIMARC. A parallel question is how far shorter or lower grade records can be upgraded automatically using acronym keys. There has been some success here.

Finally, there are a range of practical problems. Does one work from the catalogue or the book in hand? This may depend on how good shelf-checking has been or whether the collection is closed or open access. Should one attempt to convert heavily used stock first and if so, how is it to be defined. The scale of such operations is enormous. How are such projects to be costed and managed efficiently?

Scale of the problem. It is difficult to quantify the exact scale of work to be done. Library statistics tend to be both incomplete and inconsistent, but it appears that the polytechnics as a group have fairly complete files; the major public lending libraries have full non-MARC circulation files, while public reference libraries have undertaken little or no conversion. The Universities are the best documented sector, as the appendix to this paper shows, although the warning on the accuracy of statistics remains. The majority of libraries have converted less than 30% of their stock to MARC standards, while the few that are complete or near complete tend to have the smallest collections. A recent survey suggested that perhaps 30 million records have still to be converted by the universities. The national libraries, without the impetus of circulation systems, and with the burden of enormous files have been slower to become involved.

The future. As record files grow, hit rates become higher and conversions progressively easier. Over 25 million records are already readily available to libraries from the North American utilities alone, although it must be said that this includes an unquantified proportion of duplicates and quality control of the records has been a long-standing problem. The ability to transmit large files over Open Systems networks has still to be achieved, but it may be expected that the impetus to retroconversion will confirm the primacy of the MARC format and that this will be further enhanced by the development of UNIMARC. The growth of Open Systems Interconnect standards and the promotion of networks should lead to the linking of networks such as JANET in the UK, ARPANET in the United States and EARN in Europe. Large files are known to exist in Europe, although little is known about them in the UK, but initiatives to share these resources may be expected to develop. As trans-border data flow grows it seems inevitable that there will be a parallel growth of interest in the sort of language problems so far experienced in Belgium and Canada.

RETROCONVERSION IN THE UNITED KINGDOM

Background. Retrospective conversion of traditional catalogues into machine readable form is a phenomenon of the last decade in British academic and research libraries. Retroconversion essentially attempts to remake the catalogue from the beginning and as such it represents an investment of dozens of man years of effort for the smallest library and hundreds for the largest. This in turn means that the process is enormously expensive of resources. That fact has led to a large variety of ways of attempting the work and a constant seeking for new solutions in an effort to be as economical as possible. Conversion has fallen into three broad and overlapping phases.

1. The earliest attempts were driven by the need to create files for circulation systems and usually they consisted of records created to a local standard. The earliest reasons for retrospective conversion are rarely spelt out and can now seem rather vague, but appear to have at their root the way in which computer systems integrate the catalogue and circulation functions. It therefore becomes necessary to create records which are usable for both functions, if effort is not to be wasted. Some libraries have followed the path of creating short author/title records to local standards which can be used for circulation, while others have preferred to adopt MARC standards, later stripping the records down for the circulation function.

2. This was followed by projects which were still local in nature, even when they made use of the resources of a co-operative or perhaps BLAISE, usually concentrating on the most used stock, or perhaps the stock of a less accessible collection, such as an education branch library. The central point within these variations is that the retroconversion is still seen as an internal response to the needs of a single library system.

3. Finally, co-incidental developments in communications technology and the appearance of On-line Public Access Catalogues have opened up possibilities for sharing and co-operation amongst groups of libraries, which are only just beginning to be explored, but which come full circle to programmes designed to convert the whole catalogue, but now through the acquisition or creation of exchangeable records.

This bald division conceals a wealth of varying practice. At least in the larger institutions, a recon project may last for years, as funding ebbs and flows and the library may have experienced all of these phases, in what is basically a single exercise.

The United States experience. Retroconversion programmes in the United States are also a product of the last decade and the American Library Association’s Retrospective Conversion Discussion Group which was founded in 1981. Adler and Baber (1) give the flavour of the American experience in a book of case studies. These show several approaches differing from the methods most commonly found in the UK. Full details of individual projects are hard to come by and this work is useful in containing a full methodology from Rice University. It will be inappropriate for general copying but gives many useful pointers on the management of such projects.

The present position in the United Kingdom. It is extremely difficult to quantify how much work has been done and how much remains to do. Many libraries have fairly full circulation files, which may or may not be upgradable; libraries record volumes, while their databases tend to be recorded as number of titles; it is not always clear whether libraries are referring to total stock or monograph stock. The fullest information is available for the universities. A recent survey by Hoare (2) tabulates the situation in university libraries as at Easter 1986 and shows an enormous range of experience. Some forty-nine libraries responded, and mentioned conversions ranging from one special collection of a few thousand volumes at Brunel to over one million for the Bodleian Library’s pre-1920 catalogue. The largest files, at Aberdeen, Hull, the Bodleian and Sussex are of non-MARC records. Liverpool, using a simplified format has the only Marc based file of comparable size. Most of the libraries claimed to have some work in progress, but it is clear that completion rates vary widely.

Until now, Aberdeen has been unusual in the UK in having an external firm convert records to a non-MARC standard, rather than converting in-house. These records have served the University well for many years, but present a problem now that the Scottish universities are looking at ways of co-operating, including the sharing of records. Yet it must be unattractive to consider writing off the 600,000 records possessed by Aberdeen. Both Hull and Sussex converted their records through in-house keyboarding, the former using Manpower Services Commission staff. Both the Bodleian and Liverpool used bureau services to undertake the keyboarding. Only Southampton with a file of 300,000 records appears to have created its database from its own resources.

A second group of libraries has begun retroconversions, often using their own staff and often linked to membership of a co-operative. These efforts are intended to be selective, whether by date, or some determinant of use. In addition to heavy use and short loan collection books, some libraries seem to have converted a branch library, either as an experiment to prove systems or to make information about the collections more accessible. Many libraries, such as Kent, Reading, ST Andrews and Durham have followed this route. One of the largest conversions is at Newcastle, unusual in that almost all staff are used to retrieve records from the OCLC database as a matter of routine, rather than as a special project and with a very high rate of productivity.

The final and still rare approach depends on the growing availability of records, which allows larger areas of less common stock to be converted while still retaining high hit rates. These will tend to attempt to convert complete or almost complete sections of stock, irrespective of likely use of the stock. It may be that most libraries will aim eventually to convert the whole catalogue, but from Hoare’s survey it is clear that only a few libraries have planned total conversions, ranging from King’s College and University College in London, to Warwick, UMIST and most ambitiously of akll in Edinburgh.

It can be deduced that the majority of universities have converted between 3% and 30% of their stock and that those with the highest conversion ratios to MARC standards have the smallest files. It appears that in the university sector, perhaps 30 million items remain to be converted, although whether this represents the number of volumes or the number of titles is unclear. In either case it will require a massive capital investment.

The position varies in other major research libraries. The growth of stand alone circulation systems means that almost all the polytechnics have at least short title circulation records, while many have MARC files of some size, particularly of heavily used stock. Many of the major public libraries with lending services have short title files, but the large reference libraries typically have less than 20% of stock converted while some of the largest such as Manchester and Leeds have no machine readable records. The absence of a need for circulation systems as well as the sheer size of files has delayed significant progress in the National Libraries. Scotland took an early interest in current cataloguing on=line in the mid-1970’s but has not moved into retroconversion. Wales used REMARC records when little else was available, but is now having these upgraded and the British Library has finally begun work on the large GK3 catalogue, after several false starts.

Methods of retroconversion

1. In-house copying. The simplest method of retroconversion would seem to be copying the existing records in full or in part into machine readable form within the library or through the work of a bureau. It is particularly attractive where the keyboarding does not involve the addition of complicated tagging, such as for MARC, and is therefore in large measure copy typing. The largest such conversion was at Hull and it quickly provided an enormous and extremely useful file. However, a later analysis by Dyson (3) shows that the apparent economy of in-house conversion led to a hidden penalty in the shape of errors averaging over two per record. These were of varying degrees of importance, but 11.4% of records were considered to have serious errors. Clearly the need for extensive quality control at all stages has to be taken into account when using this method. Nevertheless, the method has been adopted by half a dozen university libraries, which are more or less satisfied with the files so created. They are, however, unique to the library concerned and effectively bar the door to any record sharing with other institutions.

2. New formats. In the late 1960’s and early 1970’s a number of the new universities took a great interest in the development of shortened catalogue records. The experiments at Bath on the mini-catalogue (4) are probably the best known result of this period, but it also saw the development of the MINICS record format at Loughborough (5). This format is intended to be upward convertible to MARC, but that feature has not been used and the format has never achieved wide acceptance. The fear that such records would prove a poor investment seems to have been eased by Bath’s upgrading of its original records to MARC compatibility– although the quality of the old format was the key.

3. Optical Character Recognition. Since human involvement in record copying appears to increase the number of errors in the catalogue, there has been a corresponding increase for some years, for example in such major libraries as the Bodleian, in the possibility of automatic input through Optical Character Recognition (OCR). This should not be confused with the American definition of OCR conversion. That consists of retyping parts of the card catalogue entries in an OCR font. This is then read by an OCR reader and matched with a database belonging to either a bookseller or a utility, such as OCLC. British experiments on the other hand have looked at a solution which seems in principle neater, where the original record is scanned on an OCR machine and becomes the new record in an exact copy of the old. Despite a number of experiments, the method has had disappointing and inconclusive results notwithstanding its apparent promise. Diamond (6) describes the experience of Glasgow, where a Kurzweil Data Entry Machine was used on typed catalogue slips covering the 1968-79 period. The report is an optimistic one, but it shows a system converting only ten slips an hour – with the hope of rising to sixty slips – and with a success rate of only 85%, which meant that virtually every record required editing.

Much the most promising OCR system is OPTIRAM/LIBPAC, described by Harrison (7), although it has yet to be proved in production. Existing card or sheaf records are read by a scanner based on Group 3 fax machines. The digitised images are converted to ASCII code and held on a microprocessor for analysis. Errors are reduced by identifying words as the unit and comparing the result with a dictionary held on computer, followed by automatic error correction. Perhaps more importantly, the software also uses format recognition to determine the various fields of a record and then add MARC tagging. These records may well be to a low cataloguing standard, particularly if of some age, but they are at least MARC compatible. Such a method seems particularly attractive for areas such as music, where hit rates from conventional databases may be quite low. More generally, it in theory allows a library to combine the speed of in-house copy-typing conversions, with the accuracy of automatic data capture.

4. External databases. The major alternative to in-house conversion is the acquisition of records from an external database. Despite the costs involved this method becomes increasingly attractive. Not only does the use of standard MARC records offer great hope for sharing and co-operation, but as the databases grow in size, hit rates rise correspondingly and even the most specialised of libraries will find them of substantial benefit. OCLC is possibly the best known supplier of records, holding over 12 million in all. General libraries can achieve hit rates in excess of 80% while the School of Slavonic and East European Studies of the University of London estimates a hit rate of almost 60%. The main database of utilities available in the United Kingdom are the British Library’s BLAISE-LINE service, OCLC Europe and UTLAS. The two major British co-operatives, BLCMP and SWALCAP, also have large files of several million records. Although they can be used for recon projects, the databases are very different in nature with BLCMP having a single central file and SWALCAP favouring a distributed arrangement. Most of these utilities were discussed in a survey of the services available in 1985 published by Vine (8). Even in the two years since then, there has been a shift in the databases available and in the range of services they are willing to offer in what has become a competitive market. The article is however a useful snapshot of available services at that time. Leeves (9) guide to library systems gives the most up to date picture. The current OCLC range of services is typical of those now available from the major utilities. They offer a recon service, where OCLC prepare a machine readable catalogue using the library catalogue or shelflist, with no involvement from the library; a fixed price fixed term contract, for libraries with precise budgetary constraints; a reference service accessed via PSS for non-members, with the option of having records written to tape; a microcomputer based system where search keys are entered off-line onto floppy disk; a tape conversion service which allows files of non-MARC, typically circulation, records to be upgraded; ordinary membership of OCLC, which allows access to all shared files.

Two basic methods exist for acquiring these records; off-line batch matching often by control number, or on-line matching with the records later supplied on tape. The first method tends to be the cheaper, but it has hidden costs which are not clearly understood. Assembling a file of control numbers or acronym keys on floppy disk at the keyboarder’s pace avoids telecommunication costs, but leads to an element of mismatch in the records which might be avoided through on-line matching. These mismatches require positive effort to remove them. As an aside, one should note the fascinating work done by Ayres and his team at Bradford (11) on creating USBC, a technique aimed at merging large bibliographic databases while eliminating duplication. The long turn round time in receiving records and/or diagnostics from the utility allows sufficient opportunity for the same record to be requested several times before the first request has been satisfied, at least on a major project. This difficulty can be reduced through sophisticated project management, but that in turn imposes a cost. On-line catalogue searching imposes high telecommunication costs and also opens up an area of debate on the type and therefore cost of staff required to satisfactorily match records. The whole debate is further confused by the perceived threat of the de-skilling of cataloguing, since such projects are rarely conducted entirely by existing staff, but more usually in whole or in part by specially appointed staff on clerical-related grades.

5. Bureau conversion. The newest conversion method lies with OCLC and Saztec, companies offering what is more commonly a North American or Australian service. Essentially, they offer an agency service. Working from a copy of the card catalogue or shelf list, these companies acquire or create MARC records in much the same way as a library would itself do. In theory, with a trained and resident staff, they can convert catalogues more quickly, without involving the library in management or staff training. Further, since much of the material not found on the databases can have an EMMA record created with great speed and accuracy be clerical staff, the cost is not necessarily higher than in-house methods. This is clearly a clean and efficient method which should appeal to libraries. Whether their institutions will be as keen on speed and the resultant need to find capital at once rather than spread over several years remains to be seen. Saztec have just won the contract to produce the British Library’s much delayed GK3 catalogue and, although the method used is rather different from a normal retroconversion and creates records which only approximate to the MARC standard, the contract will be watched with much interest and will prove the greatest possible test for this method. Saztec have also been working with the National Library of Wales on upgrading records originally acquired from REMARC.

6. Potential Requirements Files. Still to be explored is the use of potential requirements files held in-house. One major element of on-line searching is telecommunications costs and holding a PRF locally would avoid this. Edinburgh experimented with this approach by holding a section of the SCOLCAP file, but has now abandoned it. It may be that CD-ROM is a more promising route for this approach but it again remains to be tested. It is worth noting that as part of its contract with the British Library, Saztec has gained the right to produce and market the catalogue on CD-ROM (9). Interestingly, the University of Illinois has produced a test CD-ROM disc containing 700,000 records from the 1976-86 period, for use within the state. Although not seen as its main aim, this clearly could act as a PRF.

7. Record sharing. As some libraries move ahead with their conversions, their growing files also begin to offer possibilities for sharing, outside the existing framework of co-operatives. Perhaps the closest existing approach to this is in London, where nine of the University’s schools share a file of 750,000 records. The most likely vehicle for this sharing seems to be JANET, the Joint Academic Network. In the late 1970’s a number of networks sprang up in the UK to link research staff in the universities and research institutes of particular regions of the country. The UK Computer Board decided in the early 1980’s that it would be more sensible to turn this mild chaos into an ordered national network and appointed a Joint Network Team to undertake the task. Almost by chance, librarians with an interest in computing discovered the network and have been quicker than the general academic community to uncover its possibilities. The Joint Network Team has proved an enthusiastic supporter of library use. So far interest has lain in the possibility of interrogating the catalogues of other libraries and then transferring records between institutions. The most developed project to explore this has come from the Consortium for University Research Libraries (CURL). CURL is an informal grouping of the libraries of the seven largest universities in the UK – Cambridge, Edinburgh, Glasgow, Leeds, London, Manchester and Oxford – which between them have one of the richest potential databases in the world, in excess of twenty million volumes. The Scottish universities have also begun work on resource sharing and have begun a programme of research into possibilities. Both these initiatives are promising, but it will take some time to establish whether they can be turned into operational systems. JANET itself is not receptive to large scale file transfer on-line and initial efforts may have to concentrate on identifying records on-line for later conventional transmission on exchange tapes.

Retroconversion costs. The use of any of these methods requires careful costing. It has been assumed that by accepting high quality MARC records a similar quality is required for EMMA cataloguing. Since this is generally recognised as a professional and costly activity the production of these EMMA records becomes very expensive; conventional wisdom puts the cost of creating such a record at £5-6., if using professional staff. But this uniformity of quality is self-imposed and it is not clear that it is an absolute need. It certainly emphasises the importance of maximising the hit-rate. Raw records from the utilities can vary in price from about 20 pence to 60 pence depending on the method of retrieval used. To this must be added the costs of producing the local data on number of copies, locations, class mark, with the further possible addition of editing costs if authority control is used. Then there are telecommunications costs, format conversion costs, equipment costs and staff costs. These factors have varied enormously from conversion to conversion without obvious reason, but, as a rule of thumb for large conversions relying mainly on external record suppliers, a figure of up to £2 per record is probably close to the mark. Most libraries will manage to reduce the actual cash outlay through the use of existing staff and equipment or find some of the resource from external sources such as the Manpower Services omission. The most varied experience in the UK lies in Edinburgh, which has used many of the available services, and a wide range of staff with different levels of qualification, as described by Ralls (12) in a paper reprinted here as Appendix 2. Newcastle has used a quite different but equally long-term approach, as described by Bagnall (13).

The bureau approach is, at last on paper, a cheaper option. The exact price varies from library to library and has to be negotiated with the bureau. Factors involved are the type of materials, the likely hit rate, the amount of local data to be added, the state of the original records on which the work will be based and the timescale. Given these factors, both Saztec and OCLC would quote a unit price in the range 60p to £1.

Problem areas. A number of quite varied problems and areas of debate have emerged over time, as projects have been carried out.

1. There have been arguments over the cost of converting from US to UKMARC. This has become an international problem, but it seems likely that the move towards UNIMARC may resolve the difficulty. IFLA’s promotion of the International MARC (IM) programme to develop UNIMARC should remove this barrier to trans-border data flow (14).

2. There are less esoteric problems in trying to reconcile non-MARC and MARC records. SWALCAP’s Libertas system claims an ability to upgrade brief circulation records to MARC standard, as does OCLC, while, as mentioned above, the Scottish universities are considering how Aberdeen’s Oriel records may be re-used.

3. The use of records from a variety of external sources has opened up what promises to be a lively debate on standards. Edinburgh, which has drawn records from perhaps the widest variety of sources considers high standards of editing essential, while Hoare quotes Newcastle as describing “over-zealous editing as the bottomless pit into which conversion may sink.” Adler gives the succinct view that “You may start by being a perfectionist…You will end by being a pragmatist.”

4. The debate on standards reflects a real and legitimate difference of view on cataloguing standards and a real difference on how far the creation of on-line catalogues with multiple access points will affect the way in which catalogues are used. It also reflects a lack of clarity in defining the aims of each recon project. Such projects are supposed to be generally beneficial without any clear determination of what they are trying to achieve. But there is a substantial difference between moving from an imperfect manual catalogue to a perfect automated one and, say, moving from a series of split catalogues to a single machine-readable file whose virtue is seen as unity rather than consistency. Both of these goals have equal legitimacy, but they may well require different standards of practice. In these cases, there is a great deal of room for debate, with no ‘right’ answer, precisely because libraries are trying to achieve different goals. Ralls (12) describes the retroconversion programme in Edinburgh with clarity and conviction and an acute consciousness of the long repetitive grind of the work involved in a major library. Edinburgh’s methods and judgements will not apply everywhere, but they amply demonstrate the importance of explicitly determining what is to be achieved. A second paper by Ralls (15) presented to the Essen Symposium explores all of these issues.

5. Other areas of debate have tended to centre around more pragmatic difficulties. Is it necessary to convert the whole catalogue or, given cost constraints, should the library concentrate only on the most heavily used stock – and if so how is it to be defined?

6. Should one work from the catalogue record or the book in hand? If the library has been regularly shelf-checked and the catalogue accurately reflects the stock, working from the shelf records will prove satisfactory. If not, is it sensible to convert the records for books which no longer exist in library stock? Conversely, to conduct a major shelf-check nd catalogue correction exercise adds considerably to the cost of the recon project.

7. Is it practical to work from the daily returns of book? These are clearly the most heavily used items, but to take the circulating stock out of circulation for any length of time damages service to readers.

8. How important is the consistency of catalogue headings once a multiplicity of access points is offered? Much time can be spent on authority control and the achievement of consistency. Would this be better spent on adding further records. This again reflects the need for a clear view of the aims of the project.

The future of retroconversion. At the lowest level, the work of retroconversion will continue more or less slowly, with the newer methods described above being explored and tested. As time goes on, he conversions should become cheaper and quicker as the available files of records grow and hit rates rise. This in turn is likely to re-emphasise the primacy of the MARC format. As a consequence, we may expect an increasing interest in methods of upgrading records automatically. Some already exist and are based on the derivation of acronym keys from the existing record, which is then matched with a MARC database. One major technical advance has however still to be made and that is the ability of open system networks (as opposed to the closed networks of the co-operatives) to transfer large data files on-line. Many libraries have had split sequence catalogues for many years and may be happy to accept partial conversions and continue to accept split sequences, even if now between machine-readable records and traditional catalogues. A handful of libraries which see complete catalogue conversion as one aspect of a much wider change in the growth of automated information services will continue to devote resources to the creation of a single catalogue file.

The development of UNIMARC should promote greater international co-operation and it is to be hoped that the welcome interest of the EEC in libraries may act as a vehicle for this. Once JANET has been mastered by UK libraries, they may be expected to develop an interest in the possibility of using the European Academic Research Network (EARN) to explore record exchange within Europe. Large files already exist in some countries such as the Netherlands, while the new interest from countries such as France and Portugal raises the prospect of large-scale European co-operation. As trans-border data flow grows, there will no doubt be a growing interest in language problems. This has been examined to a degree in countries such as Belgium and Canada, but apart from the National Library of Wales has yet to be explored in any significant way in the United Kingdom.

REFERENCES

1. Adler, Anne G. & Baber, Elizabeth A. Retrospective conversion. Ann Arbor, Pierian, 1984. ISBN 0-87650-177-3

2. Hoare, Peter A. Retrospective catalogue conversion in British University Libraries. British Journal of Academic Librarianship 1(2) 95-131, 1986

3. Dyson, Brian Data input standards and computerization at the University of Hull Journal of Librarianship 16 (4) 246-261, 1984

4. Bryant, P., Venner, G. M. and Line, M. B. The Bath mini-catalogue: a progress report. Bath, University Library, 1972

5. Lewis, D. E. & Robinson, M. E. Computer based cataloguing at Loughborough University of Technology 1966-1982: a review Program 17 (2) 52-57, 1983

6. Diamond, R. J. Recon via KDEM at Glasgow University Vine 45 17-22, 1982

7. Harrison, Martin Retrospective conversion of card catalogues into full MARC format using sophisticated computer-controlled imaging techniques Program 19 (3) 213-230, 1985

8. Retrospective conversion: a look at some of the services available Vine 58 19-25, 1985

9. Leeves, Juliet Library systems: a buyer’s guide. London, Gower, 1987 ISBN 0-566-03553-7

10. Ayres, F. H. [et al.] USBC, its use for union file creation: a feasibility study for a national database. London, British Library, 1984

11. British Library Catalogue conversion contract Bibliographic Services Newsletter 42 1-2, 1987

12. Ralls, Marion C. The evolution of a retroconversion Vine 58 31-38, 1985

13. Bagnall, J. LS/2000 live at Newcastle University Library: a progress report Vine 59 20-25, 1985

14. National MARC records and international exchange Bibliographic Services Newsletter 42 4-5, 1987

15. Ralls, M. C. Retrospective catalogue conversion: policy, standards, strategy and quality control Proceedings of the 1985 Essen Symposium, the future of on-line catalogues. Essen, University Library, 1985

APPENDIX 1: United Kingdom University Libraries and Retroconversion

RETROSPECTIVE CATALOGUE FILES IN BRITISH UNIVERSITY LIBRARIES

This table follows that of Hoare (2), updated with subsequently published information. Note that in many cases it is difficult to say whether the figure given for ‘Present size of file’ includes records added over the years as current cataloguing. Note too that the figure for total stock variously refers to total stock and to total number of titles in stock. Allowing for both these facts, the figures reflect reasonably fairly the amount of work still to be done.

Appendix 2: Edinburgh University Library, a case study

Edinburgh University: a case study

The following paper appeared in Vine in 1985. There are very few detailed case studies of the exact operation of retrospective conversion, which makes the paper particularly interesting. It is also of value in that Edinburgh has probably the most varied experience of such projects in the United Kingdom. They have set out deliberately on a major exercise with a long time frame and have at the same time chosen to experiment with a large number of record suppliers and operating methods. Retroconversion poses many questions of technical interest to libraries, but the largest projects require a range of management tools and pragmatic decision making processes not always found in the ordinary running of such libraries. All of these themes are explored in this paper, reprinted with the kind permission of the British Library.

THE EVOLUTION OF A RETROCONVERSION

An interim report on Edinburgh University Library’s retrospective catalogue conversion, by Marion C. Ralls.*

Edinburgh University Library is a classic example of distributed data, distributed processing, and distributed service, until 1982 all in manual form. It is dispersed, with the University, over several square miles in the centre and south of the city. The Main Library in George Square houses the central administration, the Arts and Social Sciences collections, the main Undergraduate Reading Room, the Special Collections, the Map collection, the main Reference and Statistical Reference collection, other archives, collections and special processing and service units such as the Bindery and the Photographic Department. There are also major collections in New College Theological Library, the Medical Libraries, the Science Libraries on the Kings Buildings campus, the Law and Centre for European Government libraries, the Music library and the Veterinary libraries. All of these are professionally staffed, and professional library work (selection, acquisition, cataloguing and classification, reader services etc.) is carried out there. Greater co-ordination is being achieved through since ‘the cuts and automation’ is seen as an instrument for further beneficial rationalization and co-operation in improved services. There are also numerous class and departmental libraries of varying size, some of which the University Library controls and supports, some of which it merely advises and helps as best it can. Altogether it is thought the stock is thought to comprise between one and a half million and two million items, but this includes approximately half a million un-catalogued items in Special Collections and New College.

The target database is estimated at 750,000 titles which are expected to represent 1,500,000 volumes. Initial checks from two different angles are confirming this original guesstimate. The breakdown of this target base, in the present order of priority is:-

Main Library Reading Room 4.5%

KB Science Libraries 7%

Main Library classified sequence 33%

Veterinary Libraries 2%

Medical Libraries 6%

Main Library Reference 2%

*Music Library 8.5%

New College catalogued collection 5%

Law Library and CEGS 6.5%

Unclassified material 10%

Class libraries 15%

_______

100%

_______

From * the priority is dependent on funding

*Marion Ralls is Director of Automation, Edinburgh University Library

The decision to convert all the Library’s catalogues and build a full and high quality database was taken both because the catalogue is seen as a prime library service in itself, and because the database will be the foundation for all other areas of Library service and housekeeping to be automated. The long term agenda includes not only cataloguing, on-line public catalogue access, circulation control and public access to circulation status, but also acquisitions, serials control, binding and inter-library loans processing, accounting and financial control, and a second, more sophisticated, public catalogue to serve scholarly and research use, browsing, and file transfer to personal work stations. In case you wonder, I should say that we do not plan to have artificially intelligent robots on the Issue and Reader Enquiries desks!

The university gave approval in principle and agreed to purchase the initial equipment but was unable to guarantee funding to meet the full estimated cost. The UGC contributed a Re-structuring Grant, to cover extra management costs over two years, and this was used for the creation of an Automation Management Team partly by secondment and partly by new appointments. The Manpower Services Commission accepted an application for an Library Automation Project (LAP) Team to work on the barcoding of the stock and the retroconversion of the catalogue. A contract was signed with Geac Computers Ltd. For a system (hardware and software) to support a database/catalogue of up to 750,000 items, circulation control and public on-line catalogue access throughout the library network, cataloguing and acquisitions systems, and an interface with the Edinburgh Regional Computing Centre network, PSS and SCOLCAP.

Barcoding

In February 1983, only weeks after all these necessary foundation stones had been laid, thirty nine enthusiastic MSC LAP assistants arrived in the library, and the Automation Project began in earnest, with the barcoding of the stock. The LAP team worked along the shelves, with the shelf list, identifying each item and sticking one barcode label in the book and its matching twin on the back of the shelf list slip. At first they worked in pairs but later they preferred to work alone, taking personal responsibility for the sheet binder for which they had ‘signed’, and continuing to see it through its databases searches and editing. Over the two years of MSC projects we have barcoded approximately 770,000 items (50% of the target stock), mostly monographs, but 9% journal volumes. Current acquisitions and bound journal volumes are barcoded as part of normal processing.

Most of the work was accurate and conscientious, but, with hindsight and experience we would not repeat an arrangement which left people doing this work all day, every day, for their sake and the Library’s. For the second year it has been done as a break from editing on the terminals, but it still runs ahead of the searching and editing.

Catalogue conversion

EUL’s main catalogue is a beautiful and much loved guard book, created with care and accuracy and conscientiously maintained. There are also various card and sheaf-slip catalogues for particular sections, but there were no ‘date splits’ as in most large libraries. However, the records are brief, to EUL’s own rules and lacking in any subject data except for our classification marks; and various sections of the library use different classifications – Dewey variants, UDC, NLM, Barnard etc. Only the Medical Library and New College Library were already cataloguing to AACR2.

1. Policy

A clear policy decision was taken to buy MARC records for as many titles as possible. This would give fuller bibliographic information and subject data, to support all the improved services and increased use of the catalogue we envisaged. It would also maintain compatibility with BLAISE and SCOLCAP, which EUL planned to use for current cataloguing, and allow co-operation with all other libraries using the MARC format. Thirdly it would give us a database with the record format assumed by all major library system suppliers. As the data has a longer life than any system this is felt to be of crucial importance.

2. MARC database services

The retrospective conversion database service which offered the speediest setting up with a reasonable hit-rate at an acceptable price was the Carrollton Press REMARC service, offered through the UK agents, Chadwyck-Healey. Apple microcomputers were quickly installed, the training required for creation of search keys was minimal, and in the first year nearly 300,000 were keyed and the floppy disks sent to the U.S. for processing. Carrollton promised a conversion to UK MARC format, and this, combined with the completion of their own database, delayed any response to the searches until the autumn. Geac had to set up MRMS (their cataloguing system) for us, and it was 1984 before editing started. This grand sweep across wide sections of the library, covering barcoding and searchkeying, gave a great sense of achievement: large numbers were quoted as “done”; the initial impact of being included in the automation project was felt through most of the library network; the momentum and excitement were a heady brew.

The editing brought us down to earth, not with a bang, nor a whimper, but with the slow sinking in of just how long a haul this retroconversion was really going to be. It has taken us a long time to get the full measure of the task; we now share the experience and the lessons learned for the benefit of others. An average 45% hit rate on REMARC left 55% to be sought elsewhere. Other databases had to be searched. All the records had to be checked and edited. Far from being “done” all sections were only just begun. We started using BLAISE Services in February 1984, and negotiated with SCOLCAP (of which EUL has been a full member for some time) for a selection from the union file to mount as a potential requirement file on our Geac, since we did not wish to use the off-line LOCAS service. This selection (e.g. 500,000 records) was provided from the LOCAS union file by August 1984 and mounted in September. At first we had to dedicate terminals to SCOLCAP work, but since March 1985, with Geac’s release 11 we can switch from the EUL file to SCOLCAP on the one terminal, using the high level menu. We also contracted with OCLC for an on-line record service, and again the British Library were helpful and agreed to provide conversion to UK MARC format. This service began in August 1984, with OCLC having installed three terminals in the Main Library and one at the Kings Buildings Science Library Centre.

Figure 1 overleaf gives a diagrammatic description of our present database searching, a cascade movement through the different sources until, for the hard-core misses, EMMA records are finally created. (REMARC records received are already on the EUL database, of course. A fresh contract, but on a smaller scale, is expected to be signed with REMARC shortly).

3. Hit-rates

Hit-rates, in the nature of the cascade, are not calculable for any but the first search as a true hit-rate against our catalogue. REMARC, as the first, ranged between 38% and 48%. Experiments using SCOLCAP as the first database on promising areas of stock (Social Sciences and the Medical Library) gave a high hit rate (100% on one Social Sciences binder) but it can also be quite low (5% in another) in patches of older stock for instances. BLAISE gives us a 77% hit-rate on searches selected as “British publications post-1950 and US publications post-1970, with a standard control number”. As BNB numbers were not generally recorded this means mainly post 1969. OCLC gives a very high hit-rate on the stock if used as a first database; we are glad to use it for records not found on SCOLCAP, BLAISE or REMARC, and without it our EMMA workload would be difficult, if not impossible to sustain. ‘Know thyself and know thy databases’ before plunging into large-scale contracts is the obvious rule.

4. Quality and service

All our suppliers have been helpful, and willing to negotiate contracts to suit Edinburgh’s particular needs. BLAISE and SCOLCAP records are the most consistently high quality as bibliographic records, to AACR2, and therefore need least editing time and expense. OCLC’s and REMARC’s records are of more variable quality and on average require twice as much editing, but the coverage is wider and reaches parts of the stock the British Library cannot reach. The provision of 650 subject headings data is of crucial importance, and since OCLC is searched on-line, can be checked before the record is accepted.

On-line access to databases is a great advantage, refining the ‘hit-rate’ to a ‘hit-an-acceptable record’ rate. Almost as important, it allows the worker to proceed to the next stage knowing where he or she is, which is preferable, and saves time and money.

The BLAISE service is absolutely reliable but tapes must arrive on time; ensuring this makes a long turn-round cycle, tolerable only because of the high hit-rate. On-line checking or use of SRS would help, but increase costs. REMARC was very slow to produce the first tapes, but seemed to have settled down by the end of 1983. OCLC has suffered both ‘down time’ and very slow response times, but the repair of a leaky transatlantic cable is promised as the solution to all our problems. It is easy to use, and popular with the MSC LAP team. Installation was efficient and training was given in a friendly and effective way.

SCOLCAP is on our own system at present, and any problems that lie there, for us, not as SCOLCAPs.

5. Editing

Editing varies considerably according to the quality of the record, the experience and competence of the assistant, the reliability and response times of the system and its suitability for this type of work. Rates of work are also dependent on the availability of terminals. Record quality has been discussed above: details assessments would take too much space here. The lesson is to watch it carefully, and develop clean rules and guidance for non-librarian staff to follow.

The unavoidable high turnover in MSC assistants inevitably affects productivity rates in editing. The optimal workforce would seem to be teams of five or six, appointed for at least two years to give security and stability, led by a trained librarian, and advised by a professional cataloguer. Needless to say many of our good MSC ‘LAP’ team members would be appointed to such teams if we only had funding to support them. In any case training is crucially important; so is good clean documentation and guidance; so also is checking procedure built into the work pattern, good supervision and management and the maintenance of good morale. Two excellent LAP managers and several very good supervisors have played a large part in the doubling of “productivity” in Edinburgh in the last year.

The Geac MRMS system is not primarily designed for the kind of bulk retroconversion EUL is using it for: at present it is slow and clumsy, and although Geac have made real improvements, and are planning more, it still means that EUL’s work rates may not be a reliable guide to what could be done on a faster system. (Geac was chosen primarily for its merits as a circulation system, with the possibility of extension to a satisfactory full integrated library system).

6. The “Productivity Rate”

Analysing what had been done last summer revealed the alarming fact that at that rate and pattern of work, the project would take to the end of the century to complete and could cost between £5 and £6 per record at 1984 prices. By February 1985 productivity had increased to such an extent that 5-7 years was a more reasonable estimate, and the cost had been pulled down to under £2 per record. One way of assessing productivity is to analyse the total cost of conversion, and then measure how long it is taking to complete 1000 records through all stages. For EUL this is:

Barcoding and first database search 15%

Searching other databases 15%

Editing 55%

EMMA creation 8%

Problem solving and completion 7%

The time required for these stages will average these proportions over many ‘thousands’ of records. From an alarming 20 man weeks per thousand records we can now achieve 10 man weeks per thousand records as long as terminals and experienced staff are available. On a faster system a better rate might be possible, but quality of work should not be sacrificed to speed, as this rebounds and costs more in the long run.

7. Management and monotoring

EUL has found weekly review meetings, with actions assigned, an indispensable part of monitoring the project. Monitoring in the form of a visible growing ‘thermometer’, was found to be inappropriate, as records added to the master file did not constitute a real measurement to near of the amount of work done. We now keep full, weekly updated, statistics of all aspects of the work, and produce progress charts which show intermediate goals (“milestones”) as well as growth of the master file. Figure 2 is a real life example.

8. Summary of progress

Approximately 20% of the work required to complete the target databasehas been done: The Main Library Reading room, is completed, and the Science Libraries over 80% so. The Veterinary Library and the Main Library Classified Sequence are well under way and other sections have had some work done, (e.g. barcoding and first search keyed).

The Future

A further MSC project has been granted for 1985/86, enabling work to continue, chiefly on the Main Library Classified Sequence. At an estimated 250,000 titles the target date for completion is the summer of 1987, so further funds will be required in 1986/7, to achieve this. The Social Sciences area is being tackled first.

For those areas of the catalogue, such as Music, where we do not expect to find many MARC records available, we are exploring the possibility of automatic conversion of our own catalogue by a combination of optical scanning, format recognition and automatic tagging and specified amendment. We have had a succesful pilot run of a set of simple Science periodical records by Optiram, and once we had suggested a link up with LIBPAC for the production of a MARC exchange tape, we were able to receive this in a form we could load successfully on to our Geac. A great deal of work has been done, writing specifications for Optiram/LIBPAC conversions of our Theses catalogue and our Music catalogue, and it is very much hoped that these can be implemented in the near future.

Funding for us, as for most University Libraries is the major problem, but we remain totally convinced that a comprehensive, high quality retrospective conversion is the only possible foundation for the future, and that it will prove more economical in the long run than any “quick and dirty” or minimal and selective approach. Moreover it is the only one which makes sense in the context of ever-increasing co-operation and information and record sharing between libraries.