LONTAD has a set of four primary, broad objectives. These objectives integrated existing objectives and long-standing needs of the organization. In the broadest terms, the project aims to:
- Digitize the entirety of the League of Nations Archives
- Provide online access to the entirety of the collection
- Improve the long-term physical preservation of the collection
- Provide for the long-term digital preservation of the products produced by the project.
An overall workflow, largely driven by experiences on previous small- and medium-scale projects, was initially envisioned. This overall workflow attempts to permit the fastest achievement of the project objectives, reducing the project risks and production bottlenecks, while also identifying “release valves” or processes that could be postponed or delayed if necessary. The workflow design also had to allow the continuation of in-person and online reference services with a minimum of disruptions, seen as critical due to heightened research interest fueled by the centenary of the League’s founding.
Three primary operations teams were established, with their general activities identified below:
- Pre-digitization: physical preparation for scanning, preservation and conservation treatment, and AMS work to prepare for digitization
- Digitization: digital capture (scanning), format conversions, optical character recognition (OCR)
- Post-digitization: quality control (QC) of digitization, metadata creation (description and indexing), and electronic file management and publishing
It is important to note that the digitization operation have been outsourced to a third-party service provider, although the work is done on-site. Because of this, however, the transfer points between pre- and post-digitization and the digitization team is more formalized, and also requires an additional QC phase and a clearly defined product acceptance process.
The establishment of a general project timeline of five years meant that it would be impossible to run each step of the project in series and that they instead would run concurrently, although with staggered start and end points. This stagger was planned as a means to ensure that sufficient material would be prepared by pre-digitization before the digitization commenced. See Figure 1 for a schematic of the project timeline with the various operations activities.
Initial analysis, as well as previous experience, identified a significant project risk at the point of transfer between pre-digitization and digitization; it is essential that the digitization team always has sufficient materials to scan, both to meet the project timeline as well as to ensure compliance with the service provider contract. We were also able to identify means to mitigate this risk, however, by planning and allowing for personnel to move from pre- to post-digitization if and when necessary.
The decision was made to approach the project by digitizing materials by the League’s sections (such as the Political Section, Health Section, etc.). This provides a number of advantages: it allows the project to produce and communicate results in smaller increments, it corresponds to existing finding aids and indexes, including the ongoing description work, and perhaps most importantly, it lets us establish a processing order that corresponds to research interest and other usage demands, including exhibitions and centenary events. This is critical, because a possible “release valve” would be to postpone metadata creation and publishing of seldom-used portions of the collection to the end of the project, or even, if necessary, to leave this work to regular IMS staff after the closure of the project. This decision was not the most obvious or only possibility however, as the collection boxes are only very roughly organized by section and this decision to proceed by section thus requires us to pull out rather randomly grouped sets of boxes. It would likely have simpler, from a pure digitization perspective, to simply start at “Box 1” and proceed through the collection. As suggested, the decisions on how to prioritize the order of League section also must take into consideration a range of other factors outside of the project and is in itself a reflection of other analog and digitalization concerns.
Although some level of physical processing and preparation is usually necessary when archives materials are being digitized, the LONTAD pre-digitization team goes beyond the minimal steps necessary to allow for safe digitization. These additional actions are taken with the long-term preservation of the materials in mind. In addition to the removal of a multitude of fasteners and unfolding of documents that would typically occur prior to any digitization, the pre-digitization team may also perform basic repairs and stabilization of torn paper, place materials into preservation enclosures to ensure their longer-term stabilization, and isolate any photographs with photo-safe preservation paper (photographs are generally kept in the files and not removed). Significantly, all old boxes are removed and replaced with modern, preservation quality archives boxes. This has entailed substantial efforts and costs, not just in terms of the purchase of the boxes themselves, but also in terms of the time and resources to perform the additional necessary tasks of creating, printing and affixing labels. When necessary, spacers are used to better support materials within boxes, or annex boxes are created to alleviate over-filled containers. These steps clearly have effects that will last well beyond the project life cycle and address broader IMS concerns.
After these physical treatments, files are sorted in appropriate order according to pre-existing registry or other reference numbers, and checked against any existing entries in the AMS and corrected if necessary, which improves the overall accuracy of the existing metadata. Where no entries exist, very basic metadata (a unique reference code and physical characteristics) are entered, which permits a digitization file cover sheet to be generated and also allows subsequent metadata work to be accomplished more rapidly and accurately.
There was significant amount of discussion in early planning stages about the extent to which these processes would be industrialized, that is, how finely to split up individual tasks in order to improve efficiency. It theoretically would be possible, for example, to have one or more staff specialize in preservation work, with others managing the replacement of boxes and box labels, and perhaps another to verify and enter data in the AMS. Some limited attempts at this have found, however, that while short-term time gains can be realized, overall quality suffers in the long-term and can create further issues downstream. As a result, we have settled on a process where pre-digitization staff are assigned multiple boxes at a time, and they perform all the pre-digitization tasks for any given box, although they typically industrialize the process to some extent as well, performing a step for a group of boxes before moving on to the next step in the process. This also makes it easier to address any quality issues, as all tasks related to a box were performed by only one person.
The initial planning had intended for the pre-digitization team to also implement the steps necessary to digitalize the archives physical inventory by codifying and entering storage locations and shelf information into the AMS. This was quickly postponed, however, as it became clear that all the staff allocated would be needed to prepare materials in order to avoid the production bottleneck, as well as unexpected changes to the overall size of the collection as a result of re-boxing and the additional of annex boxes. It therefore made more sense to wait until all materials have been processed and established in their final containers before implementing this system. This will be more efficient and again will also improve the accuracy and digitalization of later stacks management and loans.
Several choices relating to the digitization phase were also determined with broader contexts in mind. Overhead scanners designed for cultural heritage digitization were specified in the procurement process as the only acceptable scanners. Image formats and specifications were, in part, selected based on the desire to provide preservation quality images that could serve as surrogates for consultation of originals, while access copies are generated with the intention to provide the widest possible use. Two sets of delivery files are produced, a single-page master file (JPEG-2000, 300ppi, 24-bit color), as well as a multipage access PDF which has also been processed with OCR.
Previous digitization projects with League of Nations materials provided evidence that high-quality full-text renditions of League materials based on OCR are, in most cases, difficult to achieve due to the abundance of hand-and type-written documents, as well as poor quality mimeograph text, and a large number of languages. Despite these shortcomings, OCR was seen as a potential benefit to full-text searching and online visibility of the documents, particularly as description would only be done at the file level. In addition, a fairly robust technical metadata schema was requested to be provided, with an eye toward the digital preservation aspects of the project. The digitization service provider was also requested
The delivery of digitized images was requested on a relatively frequent basis, every two weeks, largely, from the IMS perspective, to ensure the continued access to reference services to original materials. Once boxes are pulled and entered into the pre-digitization process, they are considered unavailable to researchers until after scanning as a means to ensure their readiness and reduce the risk of researchers unintentionally re-ordering files or pages. A short delivery cycle ensures materials are available for traditional reference needs as quickly as possible.
Quality control is an essential component to any digitization project particularly when external service providers performing the scanning work. LONTAD is no exception. Initially in the project the post-digitization team was performing a 100% check on scanned images, largely due to a recurring error that was eventually corrected, and a 10% sampling is now performed. In addition to this QC, a physical QC check is also performed on a sample of materials, again with physical preservation in mind. Staff here are looking to ensure that materials are returned to boxes in good order and physical condition, an important check in view of the possibility that the originals will remain un-consulted for some time after digitization. It should also be noted that some decisions, such as the generation of checksums at digitization, have complicated QC, as many issues that could be fairly easily addressed by post-digitization staff must go back to the service provider in order to generate new checksums.
Metadata creation is a key component of post-digitization activity, but also the one that is perhaps most tied to pre- and post-project activities. The descriptive metadata created by the project will serve as a key component to providing access to its results. As mentioned previously, the LONTAD project has also had the benefit of being able to readily take advantage of existing description and indexing work that has been underway for a number of years, although with some caveats. As description has been underway for some time, a certain amount of drift in description quality can be observed in the AMS metadata of the League archives, due to changes in personnel as well as changes in recommended practices and processes that were not universally adapted or retroactively enforced. As a result, significant efforts have been made to standardize and document description processes, and existing metadata is thoroughly checked.
In addition to the descriptive metadata, post-digitization also provides a link within the AMS to the physical box or container. This is completely unnecessary for the digitization process, but is critical to eventually allowing management of loan and storage spaces with the AMS. As LONTAD will require work on each file-level entry in the AMS, however, it became clear that the most efficient means to do this processing was at the within the post-digitization processing.