Session Summary: First, Capture Your Data

Peter Ainsworth of the University of Sheffield opened by describing what he views as the most exciting work in Humanities in the UK at the moment, which is to do with the linking together of humanities and scientific disciplines.

He went on to introduce the Digging into Image Data project by showing us some of the data. The project brought together 10 books produced within the same square mile of Paris that were copies of the Chronicles of Jean Froissart by two artists and two sets of scribes, which are now distributed across the world in different collections. To make these books accessible for study, they have created digital versions of each text, which constitutes about 10 terabytes of data, so the texts are no longer locked in vaults, but freely available in the cloud. Ainsworth demonstrated that the quality of the images allows you to get extremely close to the images, compare manuscripts and measure details very precisely.

One of the major problems they faced in this enterprise was to help the libraries and collectors who hold the original copies of the benefits of producing these electronic editions. The owners and curators have different attitudes to the open access, with some more than happy for the materials to be made openly available as part of preserving and bringing them to new audiences, whilst others require large payments in return for any use. His main challenge, therefore, was to make the technology do the talking to convince the libraries of the huge value of this sort of comparative work.

The Digging into Data challenge helped to connect this project with others work with similar types of data, including the University of Illinois at Urbana-Champaign, Michigan State University and the Alliance for American Quilts, North Carolina. These institutions are also developing technologies to study historical maps and quilts – both of which involve analysing large amounts of image data. This joint project looks at the computational scalability of adaptive image analyses and how this can help to identify the authors of the work within their collections of maps, quilts and manuscripts.

Ainsworth explained how the mixture of specialist academics and technicians enables them to look at the images in a variety of different ways. He also discussed their methodology, including how they ensure reliability and comparability in terms of the quality of their data sets – including photographing by hand and using the same meta data, rather than digitising automatically.

Despite differences in the types of data, the research methodology across the three is the same. They hope that the final output of the work will involve data about the salient characteristics of an artist with respect to other artists and software for extracting those salient characteristics which could be applied to identifying fakes. This includes the challenge of identifying whether there were many people or just one individual under the “master” titles used by art historians. Ainsworth showed us a series of image features from various illustrations to explain how certain image elements can be compared and analysed to these ends.

Ainsworth’s own work at the University of Sheffield involves looking more at the scribes than the artists of the manuscripts, attempting to get around the more subjective methods that have been used to identify between different scribes. To conclude, he took us through the technicalities with a series of slides demonstrating how they are using vectors to analyse the handwriting more objective, scientific and unprecedented ways using e-science to find ways of conducting analysis that humanists have not been able to do before, because they did not have the tools.