Photographs of historic scenes exert a compelling effect on our imagination. Even those that are fading or stained seem to draw us into the lost moment of their capture. Indeed, if the quality or resolution of the photograph makes the scene or person depicted appear somehow elusive, the effect can be even stronger. We long to get closer to the subject, to fill in the gaps and solve the mystery of the unknowability of the past. In a sense, our engagement with such images offers a metaphor for our study of history more widely. It is a process which requires us to fill the gaps, responsibly, even while recognising that the gaps exist: that the evidential records on which historian-s rely is partial and often fragmentary.
In the Congruence Engine’s experimentation to discover new ways that technology can enable us to link the data that describes the objects and records in museum and archive collections – which is the central aim of the project – we are employing many methods that involve Machine Learning or ‘AI’. Among these, the new ‘generative’ forms of AI feed on human-made representations of reality – whether written or visual – to emulate what they construe, although currently without the conscious judgement that human experts can employ. We must therefore be cautious in their use, not least to ensure that the machine does not hallucinate false accounts of the world that it presents as real.
It is easy to have one’s head turned, though, by what is possible. In the course of the first year and a half of the project, alone, these AI techniques have suddenly developed at a dizzying velocity, to a point where they have begun to be able to reconstruct three dimensional immersive spaces from, the hype would suggest, a small number of photographic images. One such method, which seemed most promising for our purposes, was Neural Radiance Field Imaging: an approach which – by registering the position from which the photograph was taken in a virtual space – can extrapolate a model of how light falls on surfaces and so ‘regenerate’ the space represented ‘in the round’.
The ‘NeRF’ investigation that is covered by the film in the Discovery Museum exhibit set out to explore how this method could be applied to reconstructing historical scenes, based on a range of archival photographs. These include the kind of grainy, blurred and damaged images that are often the only visual record that remains of lost locations: demolished buildings or temporary exhibitions. We were asking three main questions. Two of these were technical and practical: how many overlapping images of the same place were needed to build a virtual 3d representation, and how possible would it be to find sufficient images of different kinds of location?
The third, though, was aesthetic: how well would the virtual world that was conjured also communicate, visually, what was missing in the source material? What we envisaged here was the possibility that holes or blanks would appear in the 3d space, presenting a graphical account of historical ‘terra incognita’: those areas on old maps which hadn’t yet been explored, in this case relating to the views of a scene which were not covered by any existing photographs. (My own interest here concerned what people in the past did and didn’t pay attention to. In contemporary terms, you might think about it as the ‘instagram-able’ view that everyone posts, compared to the overlooked corner of the room, which might contain something of particular interest to this historian).
We worked with Dr Sara Rimiti, a postdoctoral researcher from the Data Science centre at the University of Sussex, with oversight from Dr Ivor Simpson, a senior lecturer in the School of Informatics and Engineering. While I and others on the team sought out batches of images – textile mill interiors, bridges, telephone exchanges, power stations etc – in search of sufficient numbers of overlapping photographs of the same location, Sara set about a rigorously methodical investigation of what qualities and numbers of images, in combination, would produce what quality of output.
As it turned out, the answers to our three questions pulled in different directions. The cataloguing of photographs in archives – of which a dozen or so collections were explored, with a focus on a variety of different themes – is often quite thin. The record might say simply ‘mill interior’ or else give the name of a power station but not the location within the station, or it might record only ‘wool carding room’. Piecing together which images were of the same location was challenging and rarely were there more than a handful available, even across different collections. This work was very useful, though, in helping us understand how to design the kind of visual interface to data that will make it possible to view – at a glance – the density of images available for a particular space.
Meanwhile, starting with several hundred images of the Palace of Westminster already gathered from online sources, Sara’s experiments produced highly nuanced insights to inform our future work. Her findings, though, clearly indicated that fewer than a couple of dozen images were unlikely to be enough to create a good ‘NeRF’ virtual 3d reconstruction: frames grabbed from films and videos could offer an ideal source of overlapping images. This is especially true where the camera was moving through the space – a tracking or dolly shot – but in the early documentaries with which we were concerned, production costs rarely allowed for such elaborate filming techniques. What was more, and slightly disappointingly, rather than generating an aesthetic of ‘partiality and absence’, as I had hoped, the NeRF method simply failed to produce any meaningful output when several areas of a scene were not covered.
A month before the end of the project, though, the frenzied pace of AI innovation delivered a solution: one, in fact, which looked back to previous approaches to the problem, whilst bringing to bear the vastly greater computational processing power available now, compared to a decade ago. The amusingly named technique in question is ‘Gaussian Splatter’, which rebuilds the appearance of a scene using minuscule coloured ovoids – something like the equivalent of three-dimensional pixels – which can now generate a photo–realistic scene in high resolution.
Applying the process to a two–minute video moving through a derelict petrol station that I’d recorded on a recent visit to Wales generated a scene which transported me back to the place in detail, down to the dust on the windows and flaking rust on the pumps. We didn’t get as far as trying to include old photos alongside the video images but it already seems clear that this method will register the kind of aesthetics of partiality and absence for which we’d hoped. In this curious moment of AI revolution, the question is, by the time we get to test this, will another technology already have superseded the approach?