Session Summary: Memento – Versioning Resources to Support HTTP-based Discovery

Herbert Van de Somple, Staff Scientist at Los Alamos National Laboratory, gave us an enthusiastic introduction to Memento, which is an NDIIPP funded project examining the concept of time travel for the web. The aim of Memento is to make it easier to navigating the web of the past. There are web resources that have representations that change over time, but fortunately there are archive versions of prior versions of those pages. However, the archive version will be at a different URL to the original content. In wikipedia there is a general URI for the current version of the page which persists, then a different URI for each of the archive versions. That these records exist is a great source of optimism for Van de Somple, but he observed that finding and navigating these resources can be very difficult and is a matter of search.

He described how to find the archive pages for CNN.com and Wikipedia, which you then have to browse. There could be huge numbers of entries to navigate to get to what you want. Once you have found the page you require, navigating is more difficult again. He illustrated this by looking at an archive page from wikipedia, where the links go to the current version of the page, rather than the contemporary version, so you are not really navigating in time. There are also things missing, so your navigation experience of the past is not complete. Archives often re-write the links, but resources that are archived outside will not necessarily be seen correctly.

Van de Somple explained how Memento is a frame work which leverages the original URI of a resource to do a search in time. To get a prior version of a resource you have to use a URI, but Memento wants to change that by introducing the possibility of going to the current resource, but asking for it in a previous form.

He then got technical by explaining how HTTP “get” works and the preferences that this expresses using headers to help work out if you want to access a page at a particular compression, in a certain language and so on. The Memento team want to introduce a new preference to the current list: datetime. This is a resource in that will introduce content negotiation using time, leveraging an upcoming standard HTTP link. They have created a browser plugin that works with Wikipedia which uses a time slider to navigate seamlessly. This is currently available for download at the project website. They feel that the idea has real chance of wide adoption with some real traction.

He went on to talk about versioning, which is particularly significant in research. He discussed time-generic resources, which deliver a current representation when accessed. There are then time specific resources which are effectively snap shots of a current state at a particular point in time. When this is coupled with memento, this becomes a powerful way to navigate purely using HTTP across versions of resources. He took us through a practical example which uses pictures of Van de Sompel taken over a series of days that were published at the same URI, but use this system of time-specific resources. He also used an example using Dbpedia, navigating historical data used HTTP when you only need the generic URI.

Van de Somple then took us through the potential application of memento for scholarship, specifically looking at annotations, which are expected to become more web-centric, rather desktop-centric. However, if you attach annotations to a URI and the resource changes, your annotation can lose its context and become irrelevant. In fact, for certain regularly updating resources, the annotation is only relevant at that specific moment in time. Currently, the architecture of the web does not address this, but Memento allows annotations to become persistent, and in turn supports sharing of annotations. They have done experiments with online annotation tools, and if the archive copies were there, the annotations would be associated with the right content, even if the current content on the page had changed. Van de Somple used this as a convincing use case to show that the combination of Memento with a annotation model that supports datetime will be very useful.

He concluded by emphasising the value of the URI for discovery. The linked data concept means we now have page and machine readable data linked with the same URI, which has increased the value of the URI. Adding Memento will increase the value again, as the one URI would also give the archive of that page as well.