Pamphlet collections
This section provides profiles of the collections, including their condition, duplication with other collections, and factors influencing their selection and scheduling for scanning. It draws on survey work, visits and correspondence with those responsible for the collections.
-
-
Collection profiles presents brief profiles of the individual collections, drawing on the surveys, visits and additional information
-
Recommended strategies recommends strategies for addressing selection, copyright, de-duplication, and the scheduling of the collections, based on information in the preceding sections.
See the project workflow section to see how these activities fit within the overall project work plan and workflow.
The format and condition of the pamphlets will have an impact on: their selection (e.g. item may be too fragile to send); handling considerations; time required to scan (e.g. number of pages, tightness of binding); and the quality of the OCR output achieved (e.g. use of unusual typefaces).
In preparation for the initial project proposal, contributing libraries were asked to provide information about the size and condition of their collections. Libraries supplied a total number of pamphlets for their collections. Total page numbers were then estimated by multiplying the number of pamphlets by an average of 25 pages per pamphlet (for 3 collections) or 35 pages per pamphlet (4 collections, who said their average would be higher than 25). Based on these calculations, the initial project bid estimated that 1 million pages (the BOPCRIS capacity for this project) would approximate 30,000 pamphlets.
As part of the scoping work for the revised project proposal, a survey was undertaken to collect more accurate page statistics and determine other factors likely to affect scan time and image or OCR quality.
Methodology
Libraries were asked to check a random selection of 100 items drawn from the collections chosen for inclusion in this project. In five cases a computer program was used to generate this selection from lists supplied by the libraries; in two cases (Durham and UCL) the sampling was made on a systematic basis by library staff, but without sight of the pamphlet or its title.
Each library was asked to check a range of characteristics, such as size, location, condition of binding, and presence of greyscale or colour illustrations. These criteria were drawn up by Julian Ball (BOPCRIS Manager) and the author. They were based on experience and some initial testing in Southampton (see Digital datasets).
The following criteria were chosen for the format and condition survey:
| 1. Overall condition (i.e. would the library be happy to send for scanning) |
10. Number of pages with grey illustrations |
| 2. Location (i.e. open shelves, on-site store, on-site archival, off-site store, off-site archival) |
11. Number of pages with colour |
| 3. Whether bound or separate |
12. Presence of foldouts (i.e. intentional foldouts) |
| 4. If bound, whether volume opens flat to 180° |
13. Presence of folded pages (i.e. folded to fit into a volume) |
| 5. If bound, whether volume has loose boards |
14. Presence of adverts |
| 6. If separate, whether there is margin stitching (i.e. additional stitching to hold sections together) |
15. Presence of annotations |
| 7. Whether loose pages |
16. Gothic or other unusual typeface used as main body text |
| 8. Whether obviously missing pages |
17. Existence of multiple copies |
| 9. Number of pages |
18. Closest size (to A5, A4 or A3) |
A copy of a page from the survey form is included as Appendix A. Questions 1-18 on this form relate to this survey and follow the order of the criteria listed above. As the survey began it became clear that some collections included significant proportions of non-English language pamphlets, so libraries were additionally asked to gather this information. This was not always possible where the survey work was already underway.
This survey was undertaken during 4-11 August, with some libraries spending up to 10 hours on it. This time was provided as a partner contribution to the scoping study.
Results
The tables below provide findings from the format and condition survey. General discussion is provided in this section, with the significant characteristics of each collection detailed in Collection profiles below.
Table 2.1.1:A Location and page averages (based on answers to questions 2 and 9)
|
Collection |
Location |
Number of items checked1 |
Total pages found |
Average pages per pamphlet |
|
Bristol |
Off-site store (83%) and On-site archive (17%) |
100 |
4548 |
45.5 |
|
Durham 2 |
On-site archive |
94 |
6141 |
65.3 |
|
Liverpool |
On-site archive |
99 |
4246 |
42.8 |
|
LSE |
On-site store |
100 |
4564 |
45.6 |
|
Manchester |
On-site store |
100 |
3466 |
34.7 |
|
Newcastle |
On-site store |
100 |
2903 |
29.0 |
|
UCL |
On-site archive |
99 |
4173 |
42.2 |
Note 1. In some cases a number less than 100 were checked, due to missing items or time constraints.
Note 2. Durham's sample included some 20th century items and other forms of publication, which may account for its higher page average.
Table 2.1.1:B Grey and colour pages (based on answers to questions 9, 10 and 11)
|
Collection |
Pamphlets containing greyscale |
Total grey pages |
% of total pages that are grey |
Pamphlets containing colour |
Total colour pages |
|
Bristol |
10% |
108 |
2.4% |
2% |
2 |
|
Durham 1 |
16% |
256 |
4.2% |
3% |
3 |
|
Liverpool |
0% |
0 |
0% |
3% |
3 |
|
LSE |
6% |
55 |
1.2% |
1% |
6 |
|
Manchester |
11% |
38 |
1.1% |
10% |
10 |
|
Newcastle |
6% |
22 |
0.8% |
0% |
0 |
|
UCL |
7% |
12 |
0.3% |
1% |
2 |
Note 1. Durham's sample included some 20th century items and other forms of publication, which may account for the higher number of grey pages.
Table 2.1.1:C Binding condition (based on answers to questions 3-6)
|
Collection |
Pamphlets bound in volumes |
% of pamphlets in bound vols. which cannot be opened to 180° |
% of pamphlets in bound vols. with loose boards |
Separate pamphlets |
% of separate pamphlets with margin stitching |
|
Bristol |
8% |
63% |
12% |
92% |
18% |
|
Durham |
0% |
NA |
NA |
100% |
13% |
|
Liverpool |
100% |
2% |
47% |
0% |
NA |
|
LSE |
100% |
0% |
5% |
0% |
NA |
|
Manchester |
71% |
28% |
7% |
29% |
45% |
|
Newcastle |
100% |
3% |
15% |
0% |
NA |
|
UCL |
100% |
0% |
0% |
0% |
NA |
Table 2.1.1:D Pamphlets with foldings, annotations and adverts (based on answers to questions 12-15)
|
Collection |
Pamphlets containing fold-outs |
Pamphlets with pages folded to fit volumes |
Pamphlets containing annotations |
Pamphlets containing adverts |
|
Bristol |
5% |
0% |
4% |
12% |
|
Durham |
5% |
0% |
7% |
6% |
|
Liverpool |
5% |
0% |
0% |
3% |
|
LSE |
3% |
2% |
4% |
5% |
|
Manchester |
10% |
0% |
10% |
7% |
|
Newcastle |
1% |
1% |
9% |
24% |
|
UCL |
9% |
5% |
43% |
9% |
Table 2.1.1:E Pamphlets with unusual typefaces, multiple copies and foreign languages (based on answers to questions 16-17 and an additional question1)
|
Collection |
Unusual typeface for body text |
Multiple copies |
Non-English language pamphlets1 |
Non-English languages represented in collection (no. of pamphlets) 1 |
|
Bristol |
2% |
0% |
1% |
French (1) |
|
Durham |
1% |
6% |
Not checked 1 |
Not checked 1 |
|
Liverpool |
0% |
2% |
7% |
French (7) |
|
LSE |
3% |
0% |
23% |
German (10); French (9); Dutch (1); Italian (1); Russian (1); Swedish (1) |
|
Manchester |
0% |
5% |
23% |
French (14); German (4); Italian (4); Arabic (1) |
|
Newcastle |
4% |
3% |
Not checked 1 |
Not checked 1 |
|
UCL |
0% |
3% |
1% |
French (1) |
Note 1. The foreign language check was introduced after the survey had begun when it appeared this would be significant for some collections. Consequently, not all libraries recorded this information.
Table 2.1.1:F Size, loose or missing pages and overall suitability for scanning (based on questions 1, 7-8, 18)
|
Collection |
Closest to A5 in size |
Closest to A4 |
Closest to A3 |
Pamphlets with loose pages |
Pamphlets with missing pages |
Would not send for scanning in current state |
|
Bristol |
100% |
0% |
0% |
19% |
0% |
1% |
|
Durham 1 |
54% |
44% |
2% |
0% |
0% |
0% |
|
Liverpool |
98% |
1% |
1% |
1% |
0% |
2% |
|
LSE |
91% |
8% |
1% |
0% |
0% |
7% |
|
Manchester |
90% |
10% |
0% |
15% |
0% |
1% |
|
Newcastle |
99% |
1% |
0% |
3% |
0% |
0% |
|
UCL |
100% |
0% |
0% |
1% |
0% |
0% |
Note 1. Durham's sample included some 20th century items and other forms of publication, which may account for the larger sizes.
Discussion
Some care needs to be taken in drawing conclusions based on this survey, since only 100 (or less) items were assessed from each collection. For the smaller collections (Durham and Liverpool) this represents 6 or 7% of the collection; but for the larger collections (LSE and Bristol) it is less than 1%. Nonetheless, it provides better information than the previous estimates and enables some profiling to be done. Should the project go ahead it would enable some of these statistics to be checked and would provide an indication of how accurate and useful this approach to characterising collections is.
Collections profiles provides a brief profile of each collection and highlights characteristics identified by the condition survey that are likely to impact upon scan time. These and other factors noted in that section have influenced the scheduling of the collections (both their order and allocation of time).
One of the most important findings from the survey was that the page averages for these collections was much higher than the 25-35 previously estimated. Because the project had set the number of pages to capture at 1 million (the BOPCRIS capacity for this project), this meant reducing the overall number of pamphlets we would expect to capture. The revised bid recalculated each collection's page numbers based on the averages found in this survey (see Table 2.2.1:A above for averages). When combined with a reduction for anticipated duplication (see next section), this reduced the overall number of pamphlets from nearly 30,000 to just over 23,000. Table 2.1.2:B, below, presents new estimates for each collection. Note that because selections are to be made from the LSE and Bristol the numbers of pamphlets from these two collections were adjusted until 1 million pages was achieved.
Survey of duplication across collections
Published pamphlets are not unique items and with seven contributing libraries some level of duplication is to be expected. It is important for the project to gauge the extent and pattern of that duplication in order to determine: (a) the best place within the workflow to de-duplicate, (b) the best order in which to scan the collections, and (c) the likely reduction in number of scanned pamphlets due to de-duplication. Estimates of duplication from those familiar with this material ranged from 20%-60% (across Copac as a whole), so a more accurate measure was required.
Methodology
It was hoped that duplication could be gauged by an automated means using the CURL or Copac databases. This did not prove possible due to the existence of multiple records and the lack of time and resources to develop suitable tools for comparison. A part of the work of MIMAS in Work Packages 3 and 8 of the project (see 4.3 below) will be to develop such tools to enable all libraries holding a pamphlet that has been digitised to be provided with links. In the absence of an automated means, the same 100 item sample used for the condition survey (Collection surveys above) was checked by libraries against Copac for duplicates: (a) across the six other libraries contributing to Phase 1 and (b) against all holdings on Copac. This check was included as questions 19 and 20 on the survey form (see Appendix A).
The duplication survey took place alongside the format and condition survey, during 4-11 August. In some libraries the same staff members completed both surveys; in others, special collections staff undertook the format and condition survey (questions 1-18) and cataloguing staff undertook the duplication survey (questions 19-20). Checking 100 records on Copac took up to 6 hours for some libraries. This time was provided as a partner contribution to the scoping study.
Results
The table below presents findings from the duplication survey. General discussion is provided in this section, with comments relating to individual collections discussed in Collection profiles below.
Table 2.1.2:A Duplication survey results (based on questions 19 and 20 of survey form)
|
Collection |
Unique on Copac |
Duplicated within any partner library |
Duplicated within individual partner libraries |
Duplicated within any non-partner libraries |
|
Bristol |
23% |
32% |
LSE (19%); Manchester (10%); Liverpool (6%); Newcastle (3%); Durham (1%); UCL (0%)1 |
70% |
|
Durham |
41% |
40% |
LSE (20%); Bristol (19%); Manchester (18%); Liverpool (4%); Newcastle (4%); UCL (1%)1 |
53% |
|
Liverpool |
24% |
45% |
Bristol (28%); LSE (24%); Manchester (16%); Durham (4%); UCL (2%)1; Newcastle (0%) |
61% |
|
LSE |
37% |
40% |
UCL (22%)1; Bristol (13%); Manchester (9%); Durham (5%); Liverpool (3%); Newcastle (3%) |
50% |
|
Manchester |
44% |
21% |
Bristol (10%); LSE (8%); Liverpool (5%); Newcastle (2%); Durham (1%); UCL (1%)1 |
55% |
|
Newcastle |
25% |
44% |
Bristol (28%); LSE (18%); Liverpool (8%); Manchester (8%); Durham (2%); UCL (0%)2 |
61% |
|
UCL 1 |
32% |
28% |
LSE (12%); Manchester (10%); Bristol (4%); Durham (3%); Liverpool (3%); Newcastle (1%) |
57% |
Note 1. There has been a delay in loading some of UCL's records into the CURL/Copac databases, so the duplication with its collection is not fully represented here. UCL's records will be loaded before the project commences.
Discussion
As previously mentioned, care needs to be taken in drawing conclusions based on this survey, since only 100 (or less) items were assessed from each collection. Nonetheless, it provides better information than previous guesses and enables some estimation to be done. Should the project go ahead it would enable the accuracy and usefulness of this approach to be evaluated.
Between 56 and 77 percent of these collections were found to be duplicated within at least one of the other 23 libraries represented on Copac. However, this still leaves a significant amount of material that is unique to each of these collections (23-44%).
Between 21 and 45 percent of the pamphlets in these collections were found to be duplicated within at least one of the other 6 libraries taking part in Phase 1 of the 19th Century Online Pamphlets project. However it is important to note here that the collections being digitised are a subset of each library's 19th century pamphlet holdings. For example, Manchester's Foreign and Commonwealth collection only represents about 21% of its total estimated holdings of 19th century pamphlets on Copac (17,000). So the 10% of UCL's Hume Collection found duplicated with Manchester's holdings is unlikely to equate to a 10% duplication for this particular project.
There is a fairly high level of duplication with the LSE and Bristol, which have large 19th century pamphlet collections (an estimated 15,989 and 22,150, respectively, on Copac). However, the project intends to make a selection from these two collections rather than capture them in their entirety, so the overlap with these collections can be compensated for in the selection process.
It is important to note, however, that the records for UCL's 19th century pamphlets are not yet fully loaded onto the CURL and Copac databases. It is likely that there is higher duplication with this collection than is apparent in the survey. UCL expect to have their full records on Copac before the project start date (January 2008).
For the purposes of putting together the revised bid we have assumed a duplication of half of the total amount found 'duplicated within any partner collection' for the Durham, Liverpool, Manchester, Newcastle and UCL collections and reduced the number of pamphlets we expect to capture accordingly. It may be that this reduction is larger than is necessary and we would seek to refine these estimates as the project proceeds. As selections are being made from Bristol and LSE, their numbers were not reduced for duplication, but were adjusted upwards to compensate for the reduction across the other collections, in order to make the collection up to 1 million pages.
The table below shows the combined effects of the higher page averages and the reductions made to account for duplication. Although it is hoped that this is closer than the previous estimates, it is likely that further adjustments will need to be made as the project proceeds and the numbers of pamphlets may rise or fall.
Table 2.1.2:B Effects of higher page averages and reduction for duplication
|
Partner Library |
Collection |
Estimated pamphlets |
Estimated pages |
|
|
|
Initial bid |
Revised bid |
Initial bid |
Revised bid |
|
UCL |
Hume Tracts |
4,200 |
3,528 |
147,000 |
148,881 |
|
Durham University |
Earl Grey Pamphlets |
1,450 |
1,160 |
36,250 |
75,478 |
|
University of Liverpool |
Knowsley Pamphlet Collection |
1,560 |
1,209 |
39,000 |
51,745 |
|
University of Newcastle |
Cowen Tracts |
1,974 |
1,579 |
49,350 |
45,796 |
|
University of Manchester |
Foreign & Commonwealth Pamphlets |
3,662 |
3,149 |
143,000 |
109,281 |
|
University of Bristol |
National Liberal Club Pamphlets (selection) |
8,500 |
6,250 |
297,500 |
284,375 |
|
LSE |
Pamphlet Collection (selection) |
8,500 |
6,250 |
297,500 |
285,000 |
|
Totals |
29,846 |
23,125 |
1,009,600 |
1,000,556 |
This section provides profiles for each of the seven collections, including overviews of the collection content and a list of significant points that emerged from the surveys, or from visits and discussions with collection contacts. All collections were visited during the scoping study.
Preferences for transportation, insurance, storage, scheduling of collections, and other issues, were discussed with collection managers during the visits or by correspondence. In addition to specific issues included in the profiles below, several points can be made here that apply to all or most of the collections:
- Collections were presented with the option of doing their own scanning for this project, but all are happy to use BOPCRIS as a consortia scanning service
- Everyone was happy with the suggested transport arrangements (HarrowGreen, Momart or equivalent) and most preferred the transport company to pack the pamphlets
- Everyone was happy with the level of insurance cover provided by the transport firms we asked to quote (minimum of £100,000 per consignment) and by BOPCRIS whilst on premises (minimum of £100,000)
- Everyone was happy with the storage conditions being offered by BOPCRIS, which is temperature controlled, has low ultraviolet levels and is only accessible by authorised personnel
- No one said they would require the return of original items from BOPCRIS if requested by users whilst away: each would notify users of their collection's absence and be happy to make do with photocopies or digital images (if already scanned) from BOPCRIS
- Collections will have the option to receive digital datasets relating to their own items (images, metadata and OCR), but few intend to take up this option within the life of the project (Durham, Manchester and possibly UCL)
|
University of Bristol National Liberal Club Pamphlets (selection from this collection) |
|
These pamphlets are from the libraries of, amongst others, Charles Bradlaugh, John Noble, the Liberation Society, the Land Nationalisation Society and the Cobden Club. There are also many individual items given by W.E. Gladstone and other prominent politicians. The collection is especially strong on 19th century commerce, economics, finance, politics, religion and sociology. It includes publications by and about not only the Liberal Party, but also the Conservative and Labour Parties. See more information about the collection |
|
Key points emerging from surveys, visits and correspondence
- Although this is a liberal club collection, its inclusion of other party pamphlets will provide a good political representation
- Collection is largely held off-site (83%) which is likely to slow the selection
- Collection contains a lot of separate pamphlets (92%) which is likely to slow scanning
- There is a high proportion of in-margin stitching (18% of the separates) which will slow scanning and may affect the quality of imaging (due to page bowing)
- The high proportion of separates means the collection may be useful in replacing poor quality copies bound into other collections (we note the high duplication of other collections with Bristol)
- The proportion of bound volumes is small (8%), but there may be difficulties capturing these because of the tightness of binding (63% of bound volumes cannot be opened flat)
- Bristol have no particular timing issues, but the collection will need to be broken into consignments because of the volume and the added time required for selection
- The bulk of the Bristol material should be staged towards the end of the project to fill gaps and replace poor duplicates
|
|
Durham University Earl Grey Pamphlets (all 19th Century pamphlets from collection) |
|
This is a family collection accumulated largely by the 2nd, 3rd and 4th Earls Grey. Charles was Foreign Secretary in 1806-1807 and Prime Minister 1830-1834. Henry George was Under-Secretary for Home Affairs in 1830, Under-Secretary for the Colonies in 1830-1834, Secretary at War in1835-1839 and Secretary of State for the Colonies in 1846-1852. Albert Henry George was Administrator of Rhodesia in 1896-1897 and Governor-General of Canada in 1904-1911. The Greys were strongly interested in parliamentary reform, colonial affairs and Catholic emancipation. See more information about the collection |
|
Key points emerging from surveys, visits and correspondence
- The Earl Grey Pamphlets collection is not owned by Durham but on loan from the family. Lord Howick, the current owner, has given permission for the collection to be included within the project
- The condition survey found surprisingly high page averages (65), greyscale counts (16% of pamphlets, 4.2% of pages), and larger sized items (44% were nearer to A4 than A5). Unfortunately the survey was skewed by the presence of some 20th century material and some books, journal issues, and official publications within the sample (these form part of the Earl Grey collection). With this material excluded, as this project would do, the number of items will drop to about 1,250 and these statistics would be likely to change
- None of Durham's pamphlets are bound, but are all held separately in archive boxes, requiring special storage and handling and slowing the scanning
- There is a reasonable proportion of in-margin stitching (13% of items) which will slow scanning and may affect the quality of imaging (due to page bowing)
- Although not asked for in the survey criteria, the person completing the form for Durham noted that 7 items (out of 94) contained uncut pages. Southampton will need to cut these pages for scanning or seek duplicates (Durham have approved cutting)
|
|
University of Liverpool Knowsley Pamphlet Collection (all 19th Century pamphlets from collection) |
|
This is a family collection, reflecting the careers of the Earls of Derby. Edward George was successively Irish Secretary (1830-33), Colonial Secretary (1833-34 and 1841-44) and three times Prime Minister (1852, 1858-59 and 1866-68). His career was summarised by Disraeli as follows: "He abolished slavery, he educated Ireland, he reformed parliament". His son, Edward Henry, 15th Earl of Derby (1826-1893) was Colonial Secretary and later Indian secretary in his father's administration of 1858-59. See more information about the collection |
|
Key points emerging from surveys, visits and correspondence
- The Knowsley collection is 100% bound in volumes which are uniform (consistent binding, pages trimmed to size) and open easily, factors which could speed the scanning process
- However, a large proportion of these pamphlets (47%) are in volumes with loose boards, which will require special care and handling and the use of a specialist scanner
- Although there is high duplication, this is largely with Bristol and LSE and so can be avoided through the selection and de-duplication processes
- This collection would need to be scheduled in 2007, because collection is due to be wrapped for building work in 2008
- Liverpool would prefer that the collection goes after Easter, since the first part of the year is the busiest for this material, but will be flexible
|
|
LSE Pamphlet Collection (selection from this collection) |
|
This collection includes a comprehensive set of material from political parties, including election manifestos and political cartoons. Issues in British political history include the corn laws, land question, Church and state and home rule for Ireland. There is a wealth of material on the co-operative movement, including the Cooperative Women's Guild, and from long-standing pressure groups such as the Fabian Society and organisations which have long disappeared such as the Cobden Club, the Imperial Federation Defence Committee, the Poor Law Reform Association, the Workhouse Visiting Society and the Liberty and Property Defence League. See more information about the collection |
|
Key points emerging from surveys, visits and correspondence
- The LSE pamphlet collection is 100% bound, which is good for scanning but not so good for selection - we will try to select by the volume
- The condition survey suggested that a high proportion of pamphlets might not be available for scanning due to their condition (7%), but this can be avoided through the selection process
- There are a high proportion of foreign language pamphlets, but this is also likely to be avoided through the selection process (which with focus on UK debates)
- Work is planned for the store where the pamphlets are housed in either 2007 or early 2008 - this will fit well with the project because LSE needs to be scheduled late in the project in order to pick up duplicates and fill gaps around other collections
- As this is a larger collection, it will need to be broken into consignments
|
|
University of Manchester Foreign & Commonwealth Office Pamphlet Collection (all 19th Century pamphlets from collection) |
|
This collection is on deposit from The Foreign and Commonwealth Office (FCO). It comprises two earlier collections. (1) The Foreign Office pamphlet collection, consisting largely of pamphlets acquired by British ambassadors overseas and sent back to London as being of value for the formulation of policy. This collection is rich in material from South America (where the British government was the formal arbitrator in boundary disputes), the Near East (both the last century of the Turkish Empire and the growth of Zionism) and the various great European "Questions", from the Congress of Vienna through to the aftermath of the creation of the German Empire. (2) The Colonial Office pamphlet collection, consisting chiefly of local imprints including, e.g., unique early Australiana. See more information about the collection |
|
Key points emerging from surveys, visits and correspondence
- The Foreign and Commonwealth Office pamphlet collection is not owned by the University of Manchester but is on loan from the FCO. Permission has been obtained for the collection to be included within the project
- The larger Foreign Office component in this collection is bound (71%) and often in tight binding (28%), which will require special scanning
- The smaller Colonial Office component is unbound, but were held in folders with pillars, which have occasionally punched through the text - this might necessitate replacement with duplicates, if available
- A very large proportion of the separate pamphlets also have in-margin stitching (45%), which will slow scanning and may affect the quality of imaging (due to page bowing)
- During the visit, many older font styles were found (e.g. long s's, diphthongs, and ligatures) - these will require the use of specialist OCR software
- There is a high proportion of foreign language material (23%), including non-Latin scripts (1%). Specialist OCR software will be required for foreign languages, but cannot recognise the non-Latin pamphlets.
- This collection contains a higher proportion of foldouts and (consequently) colour pages than other collections - these will require specialist scanning
- There is a high proportion of annotations (10%), which will require greyscale scanning
- There is a higher proportion of unique items than others, so there is less reduction for duplication, but also less chance of finding replacements for poor copies
|
|
University of Newcastle Cowen Tracts (all 19th Century pamphlets from collection) |
|
This is a personal collection of Newcastle-born Joseph Cowen (1829-1900), Member of Parliament (MP) and social reformer. On his father's death in 1873, he was elected in his place as MP for Newcastle and, though he came into conflict with both the Parliamentary and local Liberal parties, he remained MP for Newcastle until he retired in 1886.The Cowen Tracts date, in the main, from Cowen's active years of the late 1840s to early 1880s, though there is some earlier and later material. The topics covered largely reflect his main interests of social, educational and economic issues. See more information about the collection |
|
Key points emerging from surveys, visits and correspondence
- The Cowen collection is 100% bound (in three types of binding) and will generally open flat (3% will not)
- There are some potential issues with loose boards (15%), which will require special care and handling and a specialist scanner
- There is a high proportion of annotations (9%), which will require greyscale scanning
- There are some unusual typefaces (4%) which will require use of special OCR software
- Newcastle is happy for the collection to be done in one consignment and there are no particular timing issues
|
|
UCL Hume Tracts (all 19th Century pamphlets from collection) |
|
This is the personal collection of Joseph Hume (1777-1855), Radical Member of Parliament. Its broad subject-matter reflects the major political, economic and social developments and reforms taking place in Britain in the early part of the nineteenth century, and includes some of the causes championed by Hume during his parliamentary career, such as universal suffrage, Catholic emancipation, a reduction in the power of the Anglican church and an end to imprisonment for debt. See more information about the collection |
|
Key points emerging from surveys, visits and correspondence
- The Hume collection is very well bound (100%), with no loose boards or issues with opening flat
- A very high percentage of the pamphlets are annotated (43%), which will require grey or colour, take longer to scan, and may pose some OCR issues
- There are some personal items (e.g. letters) bound between pamphlets - these will be excluded from the project
- The collection contains a high proportion of fold-outs (9%), which will require specialist scanning
- There is a low level of duplication with other partner libraries suggesting this is a fairly unique collection
- The entire collection is pre-1855, so there are no copyright issues for this collection.
|
The key points identified for these collections have influenced the proposed scheduling.
During the scoping study we focused on four issues of particular importance in ensuring a good selection and flow of pamphlets to BOPCRIS for scanning: (1) selection of material; (2) addressing any copyright issues; (3) addressing the duplication across the collections; and (4) the scheduling of the collections for scanning. This section presents strategies for dealing with each of these issues. These would be firmed up in the project plan and further amended as necessary once the project was underway.
Selection strategy
As stated in the outline, an aim of this project is to provide users with a wide selection of pamphlets that focus on the political, social and economic issues of the 19th century. Instead of just digitising 23,000 pamphlets relating to this theme (which could have been achieved through a combination of two of the larger collections), this project has chosen to focus on capturing as much as is possible of a number of smaller collections associated with individuals or families (Durham, Liverpool, Newcastle and UCL) or organisations (Manchester), and to supplement these with pamphlets drawn from larger collections (Bristol and LSE). There are several advantages of capturing whole collections:
- In addition to understanding the motivation of pamphlet producers , users will be able to appreciate the motivations of their collectors . As seen in the format and condition survey above, some of these collections include significant levels of annotation by their collectors.
- Although these collections have a political, economic or social focus, they include material relating to the wider interests of their collectors, providing a good overview of the whole pamphlet literature.
- The collections can be linked with the collection-level descriptions in the online Guide to 19th Century Pamphlets; and found in collection-based searches.
- By including a larger number of library collections within this project, the workload is spread, the benefits are shared more widely (e.g. digital copies for libraries), and more can be learned to support the extension of the project to other collections in later phases.
De-selection criteria
For five collections, then, the appropriate strategy is not a selection strategy, but a de-selection strategy. Although the goal is to capture as much as is possible of these collections, the project anticipates de-selecting some pamphlets from digital capture, for these reasons:
- There are copyright concerns (see Copyright strategy)
- The pamphlet was published outside the bounds of the 19th century (either earlier or later). We note that extending into the 20th century would greatly increase copyright issues.
- The pamphlet has already been captured or is better captured from another collection (see De-duplication strategy)
- The pamphlet is too fragile to send for scanning.
Note that the project will not be deselecting on the basis of difficulties in scanning or OCR. BOPRIS has specialist scanners that can capture any material a library is willing to send.
Selection criteria
For the remaining two collections (Bristol and LSE), a positive selection of pamphlets will be made and will be based on the following criteria:
- Their relevance to themes of the great 19th century debates (e.g. universal suffrage, relationship of church and state, colonial policy), as identified by the collection curators, and the academics and teachers involved in the management and steering groups.
- Their usefulness in addressing gaps in the digital collection (e.g. themes not well covered, formats not represented, particular authors who should have a voice). Again, these gaps will be identified by curators, researchers and teachers.
- Feedback and demand from collection users : the bulk of Bristol and LSE material will be selected later in the project, by which stage there will be material available online and the possibility of tracking usage and surveying users.
- Replacements for copies held in the smaller collections that are in too poor or fragile a state to scan.
Copyright strategy
Copyright is the primary IPR issue facing a 19th century collection of published pamphlets. Although these collections are owned by their libraries (or have been included in the project with owners' permission), this ownership or permission does not automatically give libraries a right to copy them. For most of the pamphlets copyright will have expired and there will be no barrier to copying. But it is anticipated that a small proportion of the late 19th century material within these collections will still be within copyright and require either permission or exclusion. Project partners have some experience in these issues, but the project will seek advice during Work Packages 1 and 2.
We expect to identify three classes of material:
Class 1. Items clearly out of copyright (based on their age or anonymity)
Class 2. Items potentially within copyright due to age but of unknown status (e.g. because the death dates of the author or identity of their inheritors are not easily discoverable)
Class 3. Items known to be in copyright (based on age or the known dates of the author)
The following approach may be adopted:
- Libraries can send any Class 1 materials without concern
- Libraries must make the decision whether to send any Class 2 materials and bear the full responsibility for this decision: the project will expect them to make efforts to determine its status and require them to provide a suitable indemnification (this will form part of a Memorandum or Understanding)
- Libraries must not send any Class 3 materials for scanning without obtaining appropriate copyright clearance. The project will provide some assistance to libraries, including a suitable licence.
The following diagram shows a possible workflow for determining a pamphlet's copyright status.
Figure 1 Suggested Copyright Workflow
*This is a conservative cut-off - the project will seek legal advice on this.
Note that this chart assumes a conservative cut-off publication date of 1857 (150yrs) in case of an uncertain death date. We are aware of other projects who are using 120 years (i.e. 1887) as their cut off and will seek advice on this during the initial stages of project planning. It is anticipated that the vast majority of pamphlets in this collection will belong to Class 1. Many of the pamphlet records checked on Copac include dates for their authors, which will assist in determining the pamphlet's copyright status.
If, despite this strategy, issues arise with any of the pamphlets, the project would seek to remove it from the digital collection.
De-duplication strategy
As the duplication survey above has indicated, there is likely to be some duplication across these collections, although it is difficult to gauge the exact extent.
Because the project is selecting from the two larger collections (Bristol and LSE) there is an opportunity to de-duplicate these at the selection stage - provided there is some means of catching duplication when the selection is being made.
There are several approaches that might be taken to de-duplication for the smaller collections:
- Do not de-duplicate This may make sense given the focus on individual collections and the level of annotations of some pamphlets. However, this must also be balanced against the project's aim to create as large and accessible a selection of pamphlets as possible. Where there are a handful of personally annotated copies in different collections, the occasional duplicate may be permitted. But as a general policy, the approach of ignoring duplication would lead to wastage and reduce the number of unique resources that could be captured and delivered.
- De-duplicate at library, before sending for scanning Advantages of de-duplicating at the library are that pamphlets are not handled, transported, and removed from circulation unnecessarily. Disadvantages in de-duplicating at the library are that a comparison between different copies will not be possible, and the library going first in the queue will have more of their collection scanned than those following. There needs to be some mechanism for the LSE and Bristol to select around the other collections, so a database tool developed for this purpose could also be used to manage de-duplication between the smaller collections.
- De-duplicate at BOPCRIS, before scanning This approach has the advantage of enabling a comparison between physical copies. However, this would require that a pamphlet is kept until its duplicate arrives, which will take up space, delay return (and possibly require additional transportation). It would also require some means of identifying that the pamphlet has a duplicate in another collection.
- De-duplicate at BOPCRIS, post-scanning This approach was suggested in the initial bid, before the extent of duplication was explored. Although this does not require the originals to be held at BOPCRIS, it does assume that the appropriate datasets can be easily retrieved and compared. This may not be the case if BOPCRIS is sending the data to JSTOR and then deleting it from its own storage system (BOPCRIS does not have the resources to hold the entire digital collection at once). This approach also leads to great wastage, since one of the duplicate pair has been handled, transported and digitised unnecessarily.
There is no perfect solution to de-duplication. However, as a result of this scoping study the recommended strategy is to de-duplicate at the library, prior to sending the item to Southampton (Option 2). Library partners have agreed to this approach. It will require some sort of system (e.g. database) to identify duplicates among the smaller collections and enable the LSE and Bristol to make their selections with the other collections in view.
System to manage de-duplication
The project proposes that a database be developed at the beginning of the project ( see WP3) and maintained by a Project Officer located at BOPCRIS. We would expect to populate this database with simple bibliographic records exported from library OPACS or CURL/Copac at the beginning of the project. The database records would include additional fields to, for example, log the status and condition of the pamphlets.
As a library prepares its collection (or selection) it searches the database and identifies any duplicate records in other collections. It then checks the status of these items.
A pamphlet record can have one of several statuses:
- Not checked - the default; for items that have not yet been checked or selected from a collection
- Missing - pamphlet could not be found
- Not suitable - pamphlet is too fragile to send
- Sent - pamphlet is waiting to be sent or in transit
- Received - pamphlet is at BOPCRIS and awaiting scanning
- Scanned - pamphlet has been scanned and is awaiting signoff and return
- Returned - pamphlet has been returned
- Duplicate - not sent
If the librarian finds a duplicate with a status of 1-3, then the item should be sent (unless that library's own copy is also missing, not suitable or in poor condition). If another copy has a status of 5-8, then the item should be returned to its place on the shelf (or ignored within the volume) and the appropriate status recorded in the database.
Some libraries may still wish to check their duplicate for any significant annotations and make a case with the project team for a duplication.
The following diagram shows a de-duplication workflow based on the suggestions above. Note that it makes reference to a record slip. It is recommended that such a slip accompany all pamphlets to be scanned, and include details of the library name and catalogue ID number and other information. This will be particularly useful in marking which parts of a bound volume are to be scanned. The project would supply a record slip template and recommend that it be printed on an acid-free or PH-neutral paper.
The diagram below shows how de-duplication might be achieved within the library workflow using the proposed database.
Figure 2 Suggested De-duplication Workflow
Strategy for scheduling collections
The Gannt chart below sets out a possible scheduling of collections for digital production at BOPCRIS (see WP5). The time allocations take into account the size of the collections (i.e. number of pages) with some adjustment for its condition, based on the profiles in Collection profiles. The ordering of the collections is also influenced by the key characteristics identified in Collection profiles. We have tried to avoid the preparation of material overlapping or there being more than three libraries' pamphlets at BOPCRIS at any one time.
We have indicated single consignments for the smaller collections and two consignments each for the larger collections, although some of the smaller collections may also be broken in half or sent as two van loads. The collections we are doing in their entirety are scheduled earlier, and those from which selections are made (LSE and Bristol) generally occur towards the end of the schedule. This will enable the selections to be more informed and to minimise duplication. It will also enable the project to adjust the numbers from LSE and Bristol to ensure that the 1 million page target is met.
It must be stressed that this is one possible scheduling for the collections. Should the project go ahead the timetable would be negotiated with library partners.
Figure 3 A scanning schedule
See the scanning schedule