Session Summary: Open Data Policies, Liz Lyon

Liz Lyon, the Director of UKOLN, built on Heather Joseph’s talk by contrasting the US policy environment with UK data policy, both today and in the future, and examined some of the challenges.

In a snap shot look at the present situation in UK HE institutions, Lyon quoted from a forthcoming report, which included comments by faculty members along the lines of: “I just back everything up onto data sticks, I didn’t even know you could back up to servers” . This was used to demonstrate that researchers have different levels of knowledge and practice about backing up and many have experienced catastrophic data loss.

The report also identified other issues on the ground relating to data management and open access. Giving away data is one big barrier for researchers, who liken it to giving away their baby. Trusting the methods of other researchers often stops them wanting to share their data or use the data shared by others for further research. They found that many researchers found the policy documents wordy and difficult to understand – they wanted guidance, but view policies as hollow mandates. Lyon explained how the findings of this report help to give an idea of how a data policy should be packaged if it is to be useful.

Lyon moved on to look at the future data landscape, using genomics as an example case. Sequencing genomes generates huge amounts of data, and is getting faster and cheaper ramping up the data deluge. This presents a challenge to researchers and they need advice and guidance on data storage – particularly given that their aim is to: “analyse an entire human genome in a single day sitting in with a laptop in Starbucks”. Lyon noted that several of pharmacological companies are moving towards putting data into the cloud, but our university researchers are lacking guidance about the most appropriate ways to store their data and make accessible.

Accessibility is a particular issue for this type of data, but there are changes in attitude taking place. Lyon drew our attention to the 24 human genomes now openly published on the web – including some quite high profile people, including Desmond Tutu. She referenced web sites and related communities allowing you to pay for a kit to help you send off a sample and get your genome sequenced. There is a community around this product consisting of people sharing their genomic information on the web. She also highlighted a new method of anonymising medical records for genomics research at Vanderbilt University, which will make it easier to make data available for mining.

Elaborating on this theme, Lyon quoted from a Wall Street Journal article “My Data, Your Data, Our Data”, which predicted that in the future people may choose to make their genomic data available on Facebook. She commented that if patients demand their data to be shared, researchers have no choice, and cited an example of one patient who is trying to get data about bone cancer unlocked so researchers can find a cure for his disease faster. Lyon felt that this demonstrates there is a growing role for university ethics committees to get involved in this space.

Lyon went on to discuss the BBC’s Lab UK projects, which are designed to use public participation and citizen science. Theses experiments are developed by scientists and are leading to real, useful peer reviewed and published science. This example raises questions about the attitudes of researchers in our faculties about working with the public, and whether they have the necessary guidance for working with the public in a similar way to engage more people and gather larger data sets for their research.

Lyon also talked about current work to develop new metrics for data citation, which can be very complex ranging from citing the journal at the macro level, to citing the visualisations and data sets at the micro level. Again, she felt that there needs to be more policy guidance to help our researchers understand these new levels of granularity resulting from open data.

Finally, Lyon discussed the activities of the DCC in this area. They scanned the policy documents of UK institutions to see what guidance is already available and are subsequently developing a data management planning tool. This includes a check list using the information in the various funding council policies, which was put out for comment and received useful feedback. They are now developing an online tool which they hope will be available in the summer at http://www.dcc.ac.uk/dmponline, which they hope will be a useful tool for researchers to give them some prompts to help them develop a specific data management plan.

She emphasised that we need to get to a position where data management plan is the norm, and embedded in the policies and the research lifecycle for the researcher. Lyon also wants clearer guidance for all of the major players, and for the data management plan to be part of the review processes so the DCC are developing tools to help reviewers. She admited that there are then questions and some thinking that needs to be done around compliance.

Lyon also observed that it would be good to share these plans so there needs to be some infrastructure for this. Assuming that a data management plan is going to led to better data, she also identified the need for some careful cost benefit analysis.

To conclude, Lyon observed that practice is currently disconnected from policy. She noted some policy gaps and the amount of work to be done with the funders to make the data management plans actually work.