Session Summary: Open Data Policies, Heather Joseph

Heather Joseph, Executive Director of the Scholarly Publishing and Academic Resources Coalition (SPARC) gave us an overview of how SPARC’s work been focusing on making journal articles more openly accessible and raising awareness at all levels – including the grass routes in libraries on campuses, and the grass tops with the policy makers. SPARC want to advance the accessibility of academic articles coupled with explicit liberal sharing and re-use rights. They also recognise that the articles are linked with the corresponding research data – and are in fact just one reduction of that data – so they have also done work on effective open data policies, moving seamlessly from the digital articles to the data set.

To put this work into context, Joseph provided an overview of the policy environment in the US and described how the current situation provides what she considers to be the best shot to date for doing some serious work on open access for data.

She began with copyright policy, which lays out very clear guidelines and gives a framework for what citizens can expect in terms of accessing and re-using data from government funded research. She highlighted that strong Freedom of Information Acts that also give good guidelines for accessibility. Finally she highlighted OMB A-130, which lays out a very clear approach recommending “open and unrestricted access to public information at no more than the cost of dissemination”. Access to information policies are being developed around this, with tax payer access as a central aspect in creating open access policies that are now extending into the data arena.

Joseph outlined the goals of the US public access policies:

  • expand access to results of taxpayer funded researchers
  • accelerate pace of discovery and innovation
  • create permanently accessible accessible archive for public use
  • enhance accountability and transparency of federal agency

Joseph then talked through the thinking behind these goals and the benefits that the policy makers are aiming for.

These goals have been widely used in the conversations around articles for some time, but are only just being transferred into the conversations around data. Joseph discussed why these conversations about data have been slower, highlighting the important drivers for the development in open data in the different areas. She noted that policy makers had consciously decided to separate out data and articles and keep them on different trajectories, which has contributed to the different rates of development for those open access conversations.

To chart the difference in more detail, Joseph observed that in the article movement there is a reasonably engaged grass root in the scientific community calling for open access. The libraries, consumers groups and advocates are very engaged, and key leaders at the funding level have been calling for significant change. This has all resulted in some positive top-down mandates from congress.

However, in the data area, whilst there is engagement at the scientific community, but the library community is not leading the way this time. Instead, there is a group of the general public, including people who want to remix data, create applications and visualisations. There is not yet much interest within various agencies surrounding open data, but huge interest at the executive branch level.

Joseph stressed that for policy change to happen, all three layers – top, middle and bottom – need to be moving together. She supported this with a quote from Sir Tim Berners-Lee, who said: “It has to start at the top, it has to start in the middle, and it has to start at the bottom.”

She emphasised that there are strong, clear open data policies that have have emerged in certain areas, but tend to be community specific, rather than an overarching. These policies are important, but Joseph wanted to focus on the movement towards coalescence.

She referred to an open government directive issued by President Obama, with the goal of making the output of federal government more transparent, so the outputs can be used and interacted with by the general public. The result of this has been the Data.gov site, which although progressive in concept, has been criticised in terms of the value of the data that has been added. Joseph noted that the data sets available are not generally serious science research material. However, the site is only a year old, and so far 97m people have visited. PEW have found that 40% of visitors have downloaded data from the site, which shows that people are engaging with the data. Joseph views this as a great cause for optimism. She also described how the site is not only being used to disseminate data, but also to collect it from the general public to further research.

Joseph moved on to discuss the significant development by the NSF, whereby they are calling for a data management plan to be included in all funding applications, addressing what is going to happen to the research data and how it is going to be made accessible for the future. The guidelines are not prescriptive, reflecting what Joseph described as a recognition that, at this transitional stage, we need a holistic approach.

Finally, Joseph discussed some of the emerging themes that are resulting from the current policy environment. There is a recognition that maximising access maximises benefits which is driving a shift towards setting the default for scholarly publishing to Open. However, in the data world there are shades of Open, so there is an increasing recognition in the conversations that exceptions will be the rule, given legitimate concerns about confidentiality and privacy. Joseph highlighted the need for a community driven approach to identify were the exceptions should be and how the policies are developed and implemented. She explicitly recognised the need for partnerships and a culture change to incentivise sharing data. Connecting networks of knowledge to let people get to information in new ways to solve problems is what this is all for, so the aim is not just to get everyone onboard, it is to get them to recognise why we need open data.