Setting Transparency and Open Data Standards

Setting Transparency and Open Data Standards

The higher education sector is changing and evolving rapidly with the move to greater private funding and the encouragement of greater competition through diversity of provision. Higher education institutions are already subject to the Freedom of Information Act and recognise the work done to provide support and clarification on how this legislation applies to universities. However, it is not clear from the glossary of key terms how universities would be treated under the Open Data policy.

Similarly it is not clear if the Open Data policy would apply to teaching and research data, as well as their operational data; this has implications in terms of the timing, manner and appropriateness of what data is released. Higher education is becoming an increasingly competitive sector; as such consideration should also be given to how the introduction of an Open Data policy to public but not private institutes will affect competition between institutes in anyway.

Higher education institutions are already actively engaged in providing open data through individual institution’s websites. Universities are also already releasing open administrative data as a matter of course.

The vast majority of research conducted in US universities is undertaken with a view to eventual publication and indeed, it is a requirement that research undertaken by educational establishments is only “charitable” if its results are made available for the public benefit (that is, if they are published and in the public domain). Universities are also increasingly and pro-actively engaged in the open access agenda, ensuring that the data and results of their research are disseminated as widely as possible, by encouraging, and in some cases mandating, academics to deposit their research articles, once peer-reviewed, into an institutional repository, available to the research community and to the wider public to search online. Over 100 US higher education institutions have now established a repository of this kind, either subject based, or including the research outputs of that institution across all key disciplines.

Many of the key funders of US research, including medical research funders, also now have policies in place to require any peer-reviewed research paper resulting from their funded projects to be deposited in an electronic archive. It is well recognised across the academic community that improving access to publicly-funded research undertaken not only benefits the general public as a whole but enhances the international reputation of US research by making the results more accessible worldwide.

Whilst encouraging more research data to be open it is important to ensure that this happens in a managed way. As stated above, it is not clear if research data would be included under the Open Data policy. If it is, the release of data “as is” has the potential to generate real risks:

–           to US Universities reputation for high quality research;

–           to data being used out of context leading to misleading or misrepresentation of data;

–           for managing intellectual property and commercial opportunities.

It is important to note that our concerns are not from a desire to restrict the availability of research material, but solely around the need to protect the manner and timing of publication, so as to uphold the quality and reputation of US research.

These concerns have already been raised as:

1)        Potential to diminish the quality and international reputation of US research.

The publication of incomplete data and research results before the peer review process has taken place, and before the research in question can be rigorously assessed by experts within the scientific community, has serious implications for the quality and international reputation of US research. An amendment would ensure that US research remains one of our most envied exports, in terms of its quality, performance, credibility and intellectual rigour. Such action is essential to ensure that we maintain our world-leading position in an increasingly competitive environment.

2)        Fears over Intellectual Property rights and the international standing.

In addition to the important issue of maintaining quality and reputation, there is also the potential damage to our international standing if intellectual property rights cannot be appropriately protected. Problems might arise, for example, when a research project is co-funded by industry, a research council and a medical charity, which is not uncommon with major medical trials. This will also have the unintended consequence that some international collaborators will be unwilling to allow US universities access to data and information; for fear that it will be released prior to peer review and legal protection. This will restrict opportunities for joint working.

Universities are organizations built on the generation and sharing of knowledge, and we believe that they should be at the forefront of developing and shaping the open data revolution. Whether or not they are officially classified within the definition of “public bodies or providers of public services”, we believe that universities should therefore seek to place as much of their own data as possible in the public domain. We believe that this will promote efficiency and drive improvement, and make universities more competitive, as well as allowing others to add more value to the data. For example, we see students writing and provide useful services for everyone.

There are clearly some data held by universities that cannot be shared. These include all research data generated from projects funded by the private sector, which clearly belong to those funding that research.

All public authorities must have an approved publication scheme, which is a means of providing access to information proactively. We developed a model publication scheme that all public authorities must adopt. The scheme defines the types of information that must be published routinely, how it must be published, and what charges can be levied for it.

Further definition of this information is provided by a definition document. The document requires the following information to be published:

–           What we spend and how we spend it (e.g., accounts, procurement, financial audit)

–           What are our priorities and how we are doing (e.g., strategies, performance indicators, audits)

–           How we make decisions (e.g., minutes of governing bodies)

–           Our policies and procedures (including research policy and strategy)

–           Lists and registers (e.g., asset registers, registers of interests, etc.)

–           The services we offer (e.g., prospectuses, fee – based services, etc.)

–           Performance data (e.g, course codes, student satisfaction data, etc.)

We consider the model publication scheme to be an appropriate mechanism by which to ensure the publication of data by public service providers, and would urge building on this existing system rather than starting from scratch as open data for universities is taken forward.

We note that the consultation mentions that open data has the potential to drive economic growth and the meta – analysis of the raw data from clinical trials is a good example.

We believe that in some cases, open data sharing of publicly – funded research at too early a stage could actually harm, and be detrimental to, economic growth. We would expect that universities would have a period of time from which to extract value from their data, including (if appropriate) by securing a patent.

From our own experience with data we would like to detail the following as good practice with respect to the issue of setting transparency and open data standards:

  1. Processes be established from the outset to ensure Open Data is maintained and that this occurs close to where the data is collected and generated.
  2. Develop mechanisms to enhance and improve data – some of it will be wrong.
  3. Use an open license.
  4. Data that is machined readable— i.e. not a summary document claiming to be a spreadsheet.
  5. Link data to documents describing the policies on update frequency, and any qualifications about the data.
  6. Design good and persistent URIs for all entities, ideally linking to other datasets, and try and use the best practice and URI designs of others.
  7. Provide access to data about each open data item (e.g. a course) in RDF and JSON. For important resources – courses, facilities, transport access points etc., make these available in structured formats such as CSV, no matter what the underlying data is.
  8. Provide a human readable HTML view of items of data suitable for use by normal citizens.
  9. Provide at least one service or utility that returns value to the originating data provider – for example by enhancing their website or otherwise helping them meet an obligation of their office.
  10. Always look to enhance an existing current process in the course of publishing data and do not create new work for the data provider.


Jeff C. Palmer is a teacher, success coach, trainer, Certified Master of Web Copywriting and founder of Jeff is a prolific writer, Senior Research Associate and Infopreneur having written many eBooks, articles and special reports.



Leave a Reply

Your email address will not be published. Required fields are marked *