Methodology for conversion

November 29th, 2011

Traditional Library retroconversion projects have focussed on finding existing high-quality MARC records to use to ‘match’ against the existing manual catalogue records. These MARC records would be acquired in bulk from a library catalogue agency through the focus of a project. Such a project would proceed through a machine matching phase, requiring additional quality control and editing to deliver an end result of high quality records with local library shelf-marks attached.

This is, however, an expensive approach, with per-record costs typically in the range $4-10 (even when records exist for machine-matching). This approach is unaffordable for a project like this, with some 536,400 records requiring conversion, and where a high-proportion of the records cannot be derived through machine-matching.

Another route is to take the information that is found on existing manual catalogue records – often reliable and authoritative information, but not structured according to the accepted standards of the MARC format, and to modify this data, without augmentation into a recognised data structure standard. This approach requires the Library to accept that these records will not fulfil the requirements of ‘high-quality’ MARC records, but has the potential to provide improved access to the existing level of information without the limitations of on-site availability.

Such a workflow could be based on existing methodology used to convert card catalogues successfully at the Bodleian[1]. This involves the mass-digitization of the cards, the keyboarding of the data contained on the cards by off-shore keyboarding agencies, and the mark-up of this keyboarded data into an XML-based encoding schema such as EAD[2] or TEI[3]. As far as the underlying data model is concerned, all the data sources from the project will be incorporated into a single object store. The indexes and views built on top of the object store will provide the ability to extract and distinguish individual collections and provide expressions of the content.

Some of the software tools need to undertake a project of this kind, such as a web-based editing tool, have already been created by the Cultures of Knowledge project[4]. In particular, we have already developed a simple yet highly efficient editorial interface that enables cataloguing staff to verify and amend the encoded textual outputs produced by a keying company, by comparing them against the images of the original card catalogue records.

Each catalogue component will be stored as a single Fedora object – and as a result can be readily augmented with comments, additional data, attachments and links to digitised images without requiring changes to the architecture of the storage system. The data from the object store can then be ‘piped’ into the Bodleian’s central resource discovery system (the Primo software provided by Ex Libris[5]) where the data can be searched alongside other large bibliographic datasets.

Depending on the findings of our pilot project with OCLC, it may also prove possible to take the resulting keyed and encoded data, use these to automatically generate suitable queries for the WorldCat API, and obtain matching MARC-21 records from WorldCat. Although records obtained in this way would need to be checked to ensure that they were a true match, they offer a rapid and potentially cost-effective way to enrich the Bodleian’s online catalogue.


[1] See http://www.history.ox.ac.uk/cofk/oxfordresources/bodleian-catalogue

[2] Encoded Archival Description see http://www.loc.gov/ead/

[3] Text Encoding Initiative see http://www.tei-c.org/index.xml

[4] See http://www.history.ox.ac.uk/cofk/

[5] See http://www.exlibrisgroup.com/category/PrimoOverview

Comments are closed.