Skip to content

By Arran Rees on

Occupations, Machines, People: the connective tissue that helps connect collections

For four weeks during June and July last year, a group of researchers in the Congruence Engine began a set of mini-investigations that had been formed during a co-production workshop held at the University of Leeds. We’ve mentioned some of the inquiries that we did and some of our findings in our blog on the reflection workshop that we held at the end of the four-week research sprint. This blog is an attempt to dig down a bit deeper into the inquiry I ran and draw out the main things I learnt. 

At the end of the Co-production Workshop in Leeds, one of the questions that emerged was related to how we might link lists of textile machines with the different occupations in the textile industry. We knew that the Bradford Industrial Museum had a large number of machines for us to start with, and that the Saltaire Collection had some good data about machines too. We also thought that we might be able to look at some Trade Directories and census for the local area and start building a list of people and occupations. Asa Calow from MadLab had shown an interest in exploring Layout parsing to help make the trade directories more machine readable, so we decided to collaborate on that element of the work. I can’t say that we had a fully formed question – just a bunch of starting points and some potential digital technique experiments we wanted to do. The rest was all pretty emergent. 

The main finding of this work is that in order to connect different collections, a range of different types of data is needed. Collections data, as recorded in museum and archive cataloguing systems, isn’t particularly interoperable, nor is it good at referencing the type of information that is needed to hang collections together. I’ve been calling this the ‘connective tissue’. This connective tissue is data that exists as the content in a lot of collections, not as the metadata or catalogue data that is used to describe them. In this investigation, lists and descriptions of occupations proved to be a particularly powerful form of connective tissues that helped me make links between actual people, referenced in the historical record, and museum objects like the Noble Comb machine at the Bradford Industrial Museum’s collection. 

Arran's flow chart diagram with images and data, showing connections from wikidata entries to the 1921 census. Each data collection has a different coloured arrow point between images and catalogue entries.
Diagram of connections, © Arran Rees

 As a proviso to anyone reading this, my technical skills mainly sit in the middling range of the people involved in this project – I am pretty good at using anything with a user interface and can follow conversations about digital tools and techniques better than I use them. However, as we learn more about what we feel the social infrastructural requirements of a national collection are, it is important that people with a range of technical abilities are actively experimenting and undertaking inquiries as part of this project. I’ve broken this blog piece down into the three areas of work (getting a list of occupations, getting a list of machines and connecting them to people), trying to give an honest reflection on the work I did, barriers I came across and insights I gained. 

OCCUPATIONS

I started with trying to gather information about occupations, using digitised and OCR’d Trade Directories for Leeds and Bradford from 1854. The OCR had been done in 2004, so Asa said he would run it again using the latest version of Tesseract, trying to do some layout parsing on it using the Layout Parser tool too. Unfortunately, the results were not any better than the 2004 version – in fact, because the 2004 versions had been edited post-OCRing – they were more readable. Neither the 2004 nor the 2022 OCR’d text were usefully machine-readable to automatically extract the type of data we might be interested in. We parked that, and the work has since been developed through a workshop organised by Congruence Engine Communications Research Fellow, Daniel Wilson and Asa. 

In the meantime,  the very same Daniel Wilson had suggested the I-CeM project and its anonymised and filtered search and download of census data. I accessed the census material for Shipley (ward where Saltaire is based) and Manningham (Bradford). This gave me huge amounts of data, but it was still very messy and needed a lot of data cleaning to get anywhere. I found 11 variations of the spelling of Alpaca, which I thought was impressive! I was doing a lot of this on spreadsheets, and knowing what I know now, it would have been much better to use OpenRefine. After numerous hours spent painfully with a spreadsheet with too much data in it, I decided to park the census data… but not before I came across a great resource. Whilst Googling an occupation type, trying to work out what it was, I came across a fully transcribed, online version of an occupations list compiled from the 1921 census return. It had the textile industry as a category (as well as many other occupation types relevant to the other Congruence Engine themes). 

I started working from there, but due to the huge numbers of jobs, I needed to organise them in a way that made sense to me – in a way that linked with the processes involved in a textile mill. I used Classr – an online taxonomy builder. This, again, was a really long process and I had a few crises of confidence with it. It would have been much easier if I was able to build a quick tool to parse the data on the web page into a more usable format, but that was beyond me at this point. In the end, I created an example taxonomy – some of which is probably incorrect and would need to be looked over by a historian who knows the trades and occupations better than I do. All I wanted was to build an example taxonomy that might be useful for structuring and relating. Since doing this, Alex Butterworth, the Congruence Engine Digital History Lead, has managed to get the data from the occupations list into a machine-readable tabulated format. 

A table with data from the 1921 census on variants in workers using alpaca.
Alpaca variants from the census data, © Arran Rees 🦙

MACHINE DATA

At the end of the June workshop, I had thought the machine data held by the Bradford Industrial Museum (BIM) was in the form of a list of machines from the museum’s collections management system, but I was wrong. The machines data related to the actual machines in the BIM collection. Working with one of the museum’s curators, Lauren Padgett, we defined the type of data we were interested in, and Lauren extracted the data, categorised by process. This was incredibly helpful as the process wasn’t necessarily recorded in the machine catalogue data – the categorisation was done according to acquired curatorial knowledge from Lauren. I didn’t have that knowledge. This was the first clue that ‘connective tissue’ like textile manufacturing processes, that sits outside of the collections data, would be an essential element of connecting collections. 

The data from BIM, although not standardised (is any museum data truly standardised?!), was helpful and I only needed to do a small amount of data cleaning (mostly on manufacturer name and location). Sometimes the data was recorded in the description, but not the ‘production organisation’ field. I was able to use Grace’s Guide’s list of textiles manufacturers to help with this too. Some sort of machine-learning process that was able to help with this sort of data cleaning process would be a huge asset to managing museum data for contributions to a national collection. 

Lauren also identified images of machines in the collection, and we filtered the search to being just images of machines that we know were taken in either Salts or Lister’s Mill. Jonathan Ashworth, from the Saltaire collection sent me a csv file and a set images of machines from their collection too. I recorded the data in a combined spreadsheet with a new column for which collection they came from. 

A black and white photograph of works on wool combing machines known as 'Noble Combs' in Salts Mill, Saltaire.
Noble Combs at Salts Mill, © Bradford District Museums and Galleries

CONNECTING PEOPLE

After I’d created some sort of order that might allow me to understand that a Yarn Baller worked as part of the Warping process and would have been using a machine categorised under Warping, I started to think about how to add people to this and to bring a human element. This is where the incredible work of Colin Coates from the Saltaire Collection became very helpful. Colin has created an extensive list of biographies for people who worked in Salts Mill, using the digitised Shipley Times archive as the main source.  

Colin shared a spreadsheet of all the biographies he has written, indicating if the person definitely worked for Salts and whether the biography is available online. I filtered the refined data to show those whose biographies were online and who worked in Salts Mill. Then I set to work to draw out from the biographies, what the person’s occupation was, and what their active times might have been. The person’s birth and death date are recorded in the biographies (where known) and in many of them, their occupations are mentioned in passing. The process of pulling that information out into a spreadsheet ready for filtering and querying was pretty involved. IF I was able to use my list of occupations and do some fuzzy matching or feed the occupations list into some sort of named entity recognition tool (reaching with big IFs here), then this process could have taken just a fraction of the time. However, I did it manually for around 60 people (out of 193!). 

I chose to start trying to connect the collections with an image of a Noble Comb from the BIM collection that was taken in Salts in 1949. There is a Noble Comb in the BIM collection that was manufactured by Prince Smith and Stells, and was dated roughly within the same time range as the photo was taken. I found that the machine in the BIM collection could be connected with the record Grace’s Guide for Prince Smith and Stells, and with an image of a Prince Smith and Stells Nobel Comb in the gallery of BIM which is used on the Wikipedia article for the textiles manufacturing process of combing. From this, I thought we might be able to start building a picture of the interconnectedness of the textiles industry in Bradford. From the combined data sets, I could see that there was also an image of Noble Combs at Salts Mill in the Saltaire Collection – but it was not dated. However, the two images could definitely be connected through association with Salts Mill and the process of Combing. There are Wikipedia entries for both the process of Combing and for a generic ‘Wool Combing machine’. 

Wikipedia article and Wikidata entry for Combing, ©Wikipedia An image of the textile machines at Salts Mill in black and white. These machines are Noble Combs - wool combing machines.

A screenshot of the Wikipedia entry for combing.
Wikipedia article and Wikidata entry for Combing, ©Wikipedia

In Colin’s biographies, there were a few ‘Wool Combers’, but there was an entry for Susan Excell who would have just been coming to the end of her career around the same time the Nobel Comb image was taken in 1949 (it is possible she was retired by then – so this isn’t a definite link, but certainly a fuzzy link). There is also an oral history in the Bradford collection (ID: A0001) where the interviewee speaks about being trained on the combing machines in the late 1930s and early 40s at Salts. Both Susan and the Oral history interviewee can be connected to the photo and the Noble Comb machine at the BIM collection through their occupation. Although we cannot say that either actually used the machine in the photo – or whether the machine in the BIM collection is in the photo – we can associate them. 

CONCLUSIONS

From this mini-investigation, it became clear that information around occupations and processes were vital in connecting collections; they were the connective tissue behind the connections. However, often, this connective tissue-type data is not recorded as linkable and connectable data within museum and archive collections databases. As a human reading about Susan Excell, I was able to make the link that because she was a wool comber, she would have been involved in the process of combing and, because of the location and the dates she was working, was likely to have used a Noble Comb. However, without being explicitly told that a wool comber undertakes a process called combing and would have used a combing machine, this is not a knowable connection for a machine. 

In this case, the Wikipedia articles (and thus, the Wikidata entries) were instrumental in making those associated links, but there was no record in Wikipedia, or link in the Wikidata entries to the occupation of wool comber. This was only available in the list of occupations from the 1921 census. If ‘wool comber’ had a Wikidata entry, then Wikidata would have had the potential to be a computational linking mechanism for the whole inquiry. 

We are now interested in exploring the potential of Wikidata as a central location where these types of connective tissue resources can be recorded and linked to computationally. But we are also interested in the larger potential of occupations data and how we might model an occupation’s relationship to a textiles manufacutring process and machine – creating ontologies for industrial working lives.