Open-source collaborative platform to collect content from over 350 institutions’ archives

With the technical and financial capacity of any currently existing single institution failing to answer the needs for a platform efficiently archiving the web, a team of American researchers have come up with an innovative solution, submitted to the U.S. Institute of Museum and Library Services (IMLS) and published in the open-access journal Research Ideas and Outcomes (RIO).

They propose a lightweight, open-source collaborative collection development platform, called Cobweb, to support the creation of comprehensive web archives by coordinating the independent activities of the web archiving community. Through sharing the responsibility with various institutions, the aggregator service is to provide a large amount of continuously updated content at greater speed with less effort.

In their proposal, the authors from the California Digital Library, the UCLA Library, and Harvard Library, give an example with the fast-developing news event of the Arab Spring, observed to unfold online simultaneously via news reports, videos, blogs, and social media.

“Recognizing the importance of recording this event, a curator immediately creates a new Cobweb project and issues an open call for nominations of relevant web sites,” explain the researchers. “Scholars, subject area specialists, interested members of the public, and event participants themselves quickly respond, contributing to a site list that is more comprehensive than could be created by any curator or institution.”

“Archiving institutions review the site list and publicly claim responsibility for capturing portions of it that are consistent with local collection development policies and technical capacities.”

Unlike already existing tools supporting some level of collaborative collecting, the proposed Cobweb service will form a single integrated system.

“As a centralized catalog of aggregated collection and seed-level descriptive metadata, Cobweb will enable a range of desirable collaborative, coordinated, and complementary collecting activities,” elaborate the authors. “Cobweb will leverage existing tools and sources of archival information, exploiting, for example, the APIs being developed for Archive-It to retrieve holdings information for over 3,500 collections from 350 institutions.”

If funded, the platform will be hosted by the California Digital Library and initialized with collection metadata from the partners and other stakeholder groups. While the project is planned to take a year, halfway through the partners will share a release with the global web archiving community at the April 2017 IIPC General Assembly to gather feedback and discuss ongoing sustainability. They also plan to organize public webinars and workshops focused on creating an engaged user community.

###

Original source:

Abrams S, Goethals A, Klein M, Lack R (2016) Cobweb: A Collaborative Collection Development Platform for Web Archiving. Research Ideas and Outcomes 2: e8760. doi: 10.3897/rio.2.e8760

Making the most out of biological observations data

Creating and maintaining a biodiversity data collection has been a much-needed worldwide exercise for years, yet there is no single standard on how to do this. This has led to a myriad of datasets often incompatible with each other. To make the most out of biodiversity data and to ensure that its use for environmental monitoring and conservation is both easy and legal, the FP7-funded EU project Building the European Biodiversity Observation Network (EU BON) published recommendations that provide consistent Europe-wide Data Publishing Guidelines and Recommendations in the EU BON Biodiversity Portal.

The report “Data Policy Recommendations for Biodiversity Data. EU BON Project Report” featured in the Research Ideas & Outcomes (RIO) journal, is the first contribution in a pioneering comprehensive project outputs compilation taking advantage of RIO’s unique option to publish collections of project results.

Biodiversity data and information provide important knowledge for many biological, geological, and environmental research disciplines. Additionally, they are crucial for the development of strong environmental policies and the management of natural resources. Information management systems can bring together a wealth of information and a legacy of over 260 years of biological observations which are now dispersed in a myriad of different documents, institutions, and locations.

EU BON aims to build a comprehensive “European Biodiversity Portal” that will incorporate currently scattered Europe-wide biodiversity data, while at the same time helping to realize a substantial part of the worldwide Group on Earth Observations Biodiversity Observation Network (GEO BON). To achieve this ambitious plan, EU BON identifies the strong need for a coherent and consistent data policy in Europe to increase interoperability of data and make its re-use both easy and legal.

“Biodiversity data and information should not be treated as commercial goods, but as a common resource for the whole human society. The EU BON data sharing agreement is an important step in this direction,” comments the lead author of the report Dr. Willi Egloff from Plazi, Switzerland.

In its report, the EU BON project analysis available single recommendations and guidelines on different topics. On this basis, the report provides structured guidelines for legislators, researchers, data aggregators, funding agencies and publishers to be taken into consideration towards providing standardized, easy-to-find, re-shareable and re-usable biodiversity data.

“We are extremely happy that EU BON is among the first to take advantage of our project outputs collections option in RIO. The first report they are publishing with us deals with issues of opening up data, and digitizing and collecting scientific knowledge, all close to RIO’s mission to open up the research process and promote open science,” says Prof. Lyubomir Penev, Founder and Publisher of RIO.

###

Original Source:

Egloff W, Agosti D, Patterson D, Hoffmann A, Mietchen D, Kishor P, Penev L (2016) Data Policy Recommendations for Biodiversity Data. EU BON Project Report. Research Ideas and Outcomes2: e8458. doi: 10.3897/rio.2.e8458

 

About EU BON:

EU BON stands for “Building the European Biodiversity Observation Network” and is a European research project, financed by the 7th EU framework programme for research and development (FP7). EU BON seeks ways to better integrate biodiversity information and implement into policy and decision-making of biodiversity monitoring and management in the EU.

Openly published Open Science Prize Grant Proposal builds on ContentMine and Hypothes.is to bridge scientists and facts

Public health emergencies such as the currently spreading Zika disease might be successfully necessitating open access for the available biomedical researches and their underlying data, yet filtering the right information, so that it lands in the hands of the right people, is what holds up professionals to bring the adequate measures about.

Submitted to the Open Science Prize contest, the present grant proposal, prepared with the joint efforts of scientists affiliated with Hypothes.is, ContentMine, University of CambridgeCottage Labs LLP and Imperial College of London, suggests a new scholarly assistant system, called amanuens.is, based on the existing ContentMine and Hypothes.is prototypes. Its aim is to combine machines and humans, so that mining critically important facts and making them available to the world can be made not only significantly faster, but also less costly. Through their publication in the open access journal Research Ideas and Outcomes (RIO), the scientists, who are also well-known open access and open data proponents, are looking for further support, feedback and collaborations.

While Hypothes.is is a mixture of software and communities, which together annotate the available literature, ContentMine are building an open source pipeline to extract facts from scientific documents, thus making the literature review process cheaper, more rigorous, continuous and transparent. The role of amanuens.is is meant to bring these two systems together.

As a result, Hypothes.is is to display ContentMine facts as annotations on the online document, therefore increasing their visibility. In turn, the large Hypothes.is community, comprising users ranging from devoted and experienced Wikipedia editors to dedicated citizen scientists, would be able to provide manually their own annotations, which could be then fed back into the ContentMine facts store.

“Facts are important – but science is performed by people – so ContentMine are partnering with Hypothes.is to bring communities together around facts in the scholarly literature,” sums up Dr Peter Murray-Rust. “Through combining machines and humans in a tight, iterating, loop, amanuens.is will be able to mine critically important facts and make them available to the world.”

In their proposal, the authors give a hypothetical, yet foreseeable example with a Hypothes.is community, centered around research and discussions regarding a bacterium, already proven to restrain some mosquitoes from transmitting various viruses, and its potential use against Zika. There, amanuens.is downloads all open access papers on Zika from a multitude of sources within 3 minutes. In a matter of a couple of seconds a total of 123 files are downloaded. Then, amanuens.is delivers a data table of the extracted data, including species, human genes, DNA primers and top word frequencies.

Within the community and thanks to the literature, made available via ContentMine, the users would be able to collaborate and build on the existing research outcomes. As a result, it could take only fifteen minutes and a brief proposal to mobilise the related scholarly resources and test for Zika resistance in infected with the virus mosquitoes.

“Finding facts to finding people took 15 minutes and this is how modern collaborative science should work,” Prof Peter Murray-Rust says about the given example. “The people then create knowledge from the facts. The knowledge creates communities. The communities explore science- and people-based solutions.”

In conclusion, the proposal states that similarly to the content and software provided by ContentMine and Hypothes.is, the outputs produced by amanuens.is will also be openly available. All of its data and annotations are to be public domain under a CC0 waiver.

###

Original source:

Martone M, Murray-Rust P, Molloy J, Arrow T, MacGillivray M, Kittel C, Kasberger S, Steel G, Oppenheim C, Ranganathan A, Tennant J, Udell J (2016) ContentMine/Hypothes.is Proposal.Research Ideas and Outcomes 2: e8424. doi: 10.3897/rio.2.e8424