Pilot Project provides findings and advice on data sharing in development research

Launched just ahead of last year’s FORCE11 conference devoted to the utilisation of technological and open science advancements to communication of scholarship, the data sharing pilot project of Prof. Cameron Neylon, Centre for Culture and TechnologyCurtin University, Australia, and his collaborators has published its final outcomes days before FORCE2017.

The project made use of the innovative open science RIO Journal, which allows for the publication of various research outcomes and sorting them into dedicated collections. The collection, “Exploring the opportunities and challenges of implementing open research strategies within development institutions: A project of the International Development Research Center” now features the project’s grant proposal as approved by the IDRC, a review article, several data management plans and case studies, and the final report as a research article.

The pilot project looked into the current state and challenges of data management and sharing policies and practices, as shown by case studies of seven IDRC-funded development research projects.

Having worked with the projects selected from across a range of geographies, scales and subjects over the course of 16 months, the data sharing initiative began with an introduction to data management and sharing concepts, as well as helping the projects to develop their own data management plans. Then, they carefully monitored and observed the implementation of those plans.

Over the course of the project, it became apparent that simply developing and implementing funder policies is not enough to change research culture. The question of how funder policy and implementation could support culture change both within research communities and within the funder itself became the focus of the initiative.

Data management plans have become a mandatory part of grant submission for many funders. However, they are often not utilised by researchers or funders later in the project, becoming “at best neutral and likely counter productive in supporting change in research culture.”

While the pilot project managed to identify a number of significant bottlenecks within both research institutions and for grantees that impede efficient data sharing practices, including, expectedly, lack of resources and expertise, the researchers specifically point out issues related to structural issues at the funder level.

“The single most productive act to enhance policy implementation may be to empower and support Program Officers,” says the author.

“This could be achieved through training and support of individual POs, through the creation of a group of internal experts who can support others, or via provision of external support, for instance, by expanding the services provided by the pilot project into an ongoing support mechanism for both internal staff and grantees.”

Amongst the findings of the pilot project are also the importance of language barriers and the need for better suited data management platforms and tools.

Furthermore, the study identified a gap between the understanding of “data” amongst cultures, pointing out that the concept of data is “part of a western scientific discourse which may be both incompatible with other cultures, particularly indigenous knowledge systems.”

In conclusion, the research article outlines a set of recommendations for funders, particularly those with a focus on development, as well as recommendations specific to the IDRC.

Original source:

Neylon C (2017) Building a Culture of Data Sharing: Policy Design and Implementation for Research Data Management in Development Research. Research Ideas and Outcomes 3: e21773. https://doi.org/10.3897/rio.3.e21773

The first microbial supertree from figure-mining thousands of papers

While recent reports reveal the existence of more than 114,000,000 documents of published scientific literature, finding a way to improve the access to this knowledge and efficiently synthesise it becomes an increasingly pressing issue.

Seeking to address the problem through their PLUTo workflow, British scientists Ross Mounce and Peter Murray-Rust, University of Cambridge and Matthew Wills, University of Bath perform the world’s first attempt at automated supertree construction using data exclusively extracted by machines from published figure images. Their results are published in the open science journal Research Ideas and Outcomes (RIO).

For their study, the researchers picked the International Journal of Systematics and Evolutionary Microbiology (IJSEM) – the sole repository hosting all new validly described prokaryote taxa and, therefore, an excellent choice against which to test systems for the automated and semi-automated synthesis of published phylogenies. According to the authors, IJSEM publishes a greater number of phylogenetic tree figure images a year than any other journal.

An eleven-year span of articles dating back to January, 2003 was systematically downloaded so that all image files of phylogenetic tree figures could be extracted for analysis. Computer vision techniques then allowed for the automatic conversion of the images back into re-usable, computable, phylogenetic data and used for a formal supertree synthesis of all the evidence.

During their research, the scientists had to overcome various challenges posed by copyrights formally covering almost all of the documents they needed to mine for the purpose of their work. At this point, they faced quite a paradox – while easy access and re-use of data published in scientific literature is generally supported and strongly promoted, common copyright practices make it difficult for a scientist to be confident when incorporating previously compiled data into their own work. The authors discuss recent changes to UK copyright law that have allowed for their work to see the light of day. As a result, they provide their output as facts, and assign them to the public domain by using the CC0 waiver of Creative Commons, to enable worry-free re-use by anyone.

“We are now at the stage where no individual has the time to read even just the titles of all published papers, let alone the abstracts,” comment the authors.

“We believe that machines are now essential to enable us to make sense of the stream of published science, and this paper addresses several of the key problems inherent in doing this.”

“We have deliberately selected a subsection of the literature (limited to one journal) to reduce the volume, velocity and variety, concentrating primarily on validity. We ask whether high-throughput machine extraction of data from the semistructured scientific literature is possible and valuable.”  

 

Original source:

Mounce R, Murray-Rust P, Wills M (2017) A machine-compiled microbial supertree from figure-mining thousands of papers. Research Ideas and Outcomes 3: e13589. https://doi.org/10.3897/rio.3.e13589

 

Additional information:

The research has been funded by the BBSRC (grant BB/K015702/1 awarded to MAW and supporting RM).

Legitimacy of reusing images from scientific papers addressed

It goes without saying that scientific research has to build on previous breakthroughs and publications. However, it feels quite counter-intuitive for data and their re-use to be legally restricted. Yet, that is what happens when copyright restrictions are placed on many scientific papers.

The discipline of taxonomy is highly reliant on previously published photographs, drawings and other images as biodiversity data. Inspired by the uncertainty among taxonomists, a team, representing both taxonomists and experts in rights and copyright law, has traced the role and relevance of copyright when it comes to images with scientific value. Their discussion and conclusions are published in the latest paper added in the EU BON Collection in the open science journal Research Ideas and Outcomes (RIO).

Taxonomic papers, by definition, cite a large number of previous publications, for instance, when comparing a new species to closely related ones that have already been described. Often it is necessary to use images to demonstrate characteristic traits and morphological differences or similarities. In this role, the images are best seen as biodiversity data rather than artwork. According to the authors, this puts them outside the scope, purposes and principles of Copyright. Moreover, such images are most useful when they are presented in a standardized fashion, and lack the artistic creativity that would otherwise make them ‘copyrightable works’.

image 3

“It follows that most images found in taxonomic literature can be re-used for research or many other purposes without seeking permission, regardless of any copyright declaration,” says Prof. David J. Patterson, affiliated with both Plazi and the University of Sydney.

Nonetheless, the authors point out that, “in observance of ethical and scholarly standards, re-users are expected to cite the author and original source of any image that they use.” Such practice is “demanded by the conventions of scholarship, not by legal obligation,” they add.

However, the authors underline that there are actual copyrightable visuals, which might also make their way to a scientific paper. These include wildlife photographs, drawings and artwork produced in a distinctive individual form and intended for other than comparative purposes, as well as collections of images, qualifiable as databases in the sense of the European Protection of Databases directive.

In their paper, the scientists also provide an updated version of the Blue List, originally compiled in 2014 and comprising the copyright exemptions applicable to taxonomic works. In their Extended Blue List, the authors expand the list to include five extra items relating specifically to images.

“Egloff, Agosti, et al. make the compelling argument that taxonomic images, as highly standardized ‘references for identification of known biodiversity,’ by necessity, lack sufficient creativity to qualify for copyright. Their contention that ‘parameters of lighting, optical and specimen orientation’ in biological imaging must be consistent for comparative purposes underscores the relevance of the merger doctrine for photographic works created specifically as scientific data,” comments on the publication Ms. Gail Clement, Head of Research Services at the Caltech Library.

“In these cases, the idea and expression are the same and the creator exercises no discretion in complying with an established convention. This paper is an important contribution to the literature on property interests in scientific research data – an essential framing question for legal interoperability of research data,” she adds.

###

Original source:

Egloff W, Agosti D, Kishor P, Patterson D, Miller J (2017) Copyright and the Use of Images as Biodiversity Data. Research Ideas and Outcomes 3: e12502. https://doi.org/10.3897/rio.3.e12502

Additional information:

The present study is a research outcome of the European Union’s FP7-funded project EU BON, grant agreement No 308454.