Session Summary: Supporting Technical Innovation in the UK – Repositories UK

Paul Walk, Deputy Director of UKOLN, used his presentation to discuss Repositories UK, a project that is really about innovation support. As UKOLN is now one of two JISC-funded Innovation Centre, a role that is still being worked out together with JISC. However, close to Walk’s heart is their work to support the developer community, so he would be discussing the project from this angle.

Walk outlined some of the lessons they had learnt from the history of the project, including the lesson that aggregation, as an activity, has general potential value and that a search service is only one realisation of that value. They were very keen that no particular service should dictate what should be done with this, as they became more aware that they were dealing with something that could become an infrastructure component.

Walk then took us through a diagram which described the pattern he had seen repeated in service development based on public funding over the years. This showed a collection of data in the centre which is of broad interest and general usefulness. In order to respond to this, the funders would want to see some kind of end-user interface to interact with that data. This would be the main focus of the funding, but there would also be the promise that the developers would put standard machine based APIs on to that data so that could potentially be used in the future for other purposes. Walk observed that you would often see this type of project pattern repeated by other funded projects in the JISC information environment. However, the APIs are there notionally, not supported and often either unused or, even worse, useless. He referred to this as a service anti-pattern: a design pattern for solving problems that is generalised, plausible and attractive, but has some kind of hidden problem.

In assessing why this pattern reoccurs, Walk noted that it is often caused by funding, which follows happy users. Trying to get something funded that does not have a clear end user, but would rather enable further, as yet not fully understood development that may in the future help end users is a really hard sell. Funders also want something they can showcase, whilst infrastructure is near enough invisible – particularly if it is good infrastructure – and it is hard to ascertain direct impact from infrastructure for users. This leads to a clear motivation to develop a user-facing service and not really develop the backend.

Walk went on to suggest a slightly better pattern, which involves using the API to build the user-facing application, so there is less uncertainty about the status of the API. This means that the API is no longer tacked on, which keeps them honest and means that they know that it works. The result is that there is more likelihood that another developer will build on the API to reach another end user.

Walk then provided a description of RepositoriesUK, which is a managed aggregation of repository metadata from UK HE institutions. They deliberately do not normalise the records, they just provide a cache, focussed on academic papers, from which others can then normalise the data – so they are effectively solving a network latency problem. The goal is to support innovation, and in order to develop using the pattern Walk described above, they needed a use-case, so they aim to develop some business intelligence, which in this case is a series of visualisations of that metadata to show patterns of research represented in the repositories across the UK, to help an end-user to do business analysis. The final aim is to develop an infrastructure component for services.

They introduced some design principles, which included a tiered service model, where they build a core service, then allow other people to work with them to develop further services, allowing that infrastructure to emerge and enable others. He hopes that they can develop patterns that they can then generalise and feedback into the infrastructure.

Walk then talked us through a diagram of how RepUK works (see picture below) and some of the technicalities, including how the data can be cleaned up for various uses, and what the licensing restrictions are for certain uses (despite taking data from open access repositories).

Their current progress includes 0.75millon records, with 6 projects consuming this data. They are also getting a lot of love from developers. However, they have identified issues with linking – do they want people to link to the records they are holding in the cache? Would they be undermining the source repositories if RepUK was really successful with its SEO? Walk identified these as areas for further discussion. He also identified state management is the real challenge, with maintaining the state of the records and federating one aggregation to the next, and the next, being really difficult. They have not yet tried to solve this, but have flagged it up as an issue, and Walk emphasised that he would be interested in talking with anyone with a view in this area.

Finally, he reported on the new lessons they have learnt, including the lesson that developers need infrastructure too. He quoted Tom Coates, who said “you need to develop for users, machines and developers, which is what they are trying to do with RepUK. Walk concluded by emphasising that there needs to be a leap of faith and an focus on creating the right environmental conditions for things to evolve.