Open-source collaborative platform to collect content from over 350 institutions’ archives

With the technical and financial capacity of any currently existing single institution failing to answer the needs for a platform efficiently archiving the web, a team of American researchers have come up with an innovative solution, submitted to the U.S. Institute of Museum and Library Services (IMLS) and published in the open-access journal Research Ideas and Outcomes (RIO).

They propose a lightweight, open-source collaborative collection development platform, called Cobweb, to support the creation of comprehensive web archives by coordinating the independent activities of the web archiving community. Through sharing the responsibility with various institutions, the aggregator service is to provide a large amount of continuously updated content at greater speed with less effort.

In their proposal, the authors from the California Digital Library, the UCLA Library, and Harvard Library, give an example with the fast-developing news event of the Arab Spring, observed to unfold online simultaneously via news reports, videos, blogs, and social media.

“Recognizing the importance of recording this event, a curator immediately creates a new Cobweb project and issues an open call for nominations of relevant web sites,” explain the researchers. “Scholars, subject area specialists, interested members of the public, and event participants themselves quickly respond, contributing to a site list that is more comprehensive than could be created by any curator or institution.”

“Archiving institutions review the site list and publicly claim responsibility for capturing portions of it that are consistent with local collection development policies and technical capacities.”

Unlike already existing tools supporting some level of collaborative collecting, the proposed Cobweb service will form a single integrated system.

“As a centralized catalog of aggregated collection and seed-level descriptive metadata, Cobweb will enable a range of desirable collaborative, coordinated, and complementary collecting activities,” elaborate the authors. “Cobweb will leverage existing tools and sources of archival information, exploiting, for example, the APIs being developed for Archive-It to retrieve holdings information for over 3,500 collections from 350 institutions.”

If funded, the platform will be hosted by the California Digital Library and initialized with collection metadata from the partners and other stakeholder groups. While the project is planned to take a year, halfway through the partners will share a release with the global web archiving community at the April 2017 IIPC General Assembly to gather feedback and discuss ongoing sustainability. They also plan to organize public webinars and workshops focused on creating an engaged user community.

###

Original source:

Abrams S, Goethals A, Klein M, Lack R (2016) Cobweb: A Collaborative Collection Development Platform for Web Archiving. Research Ideas and Outcomes 2: e8760. doi: 10.3897/rio.2.e8760

Roadmap: Global research data management advisory platform combines DMPTool and DMPonline

Roadmap, a global data management advisory platform that links data management plans (DMPs) to other components of the research lifecycle is a new open science initiative from partners at the University of California Curation Center (UC3) of the California Digital Library (CDL), USA, and the Digital Curation Centre (DCC), United Kingdom.

Both organizations sponsor and maintain such platforms, the DMPTool and DMPonline respectively. They allow researchers from around the world to create their data management plans in less time by employing ready-to-use templates with specific guidance tailored to address the requirements of specific funding agencies in the USA and the UK.

Recently, the proliferation of data sharing policies throughout the world has produced increasing demand for data management planning support from both organizations. Therefore, it makes sense for the CDL and DCC to consolidate efforts and move beyond a focus on national researchers and funders to extend their global outreach through Roadmap, a new open-source platform for data management planning. Their proposal was submitted to the Open Science Prize contest and is now published in the open access journal Research Ideas and Outcomes (RIO).

While the two teams have been working together unofficially and engaging in international initiatives, a formal partnership would signal to the global research community that there is one place to create DMPs and find advisory information.

“Research data management (RDM) that enables open science is now acknowledged as a global challenge: research is global, policies are becoming global, and thus the need is global,” explain the authors. “Open science has a global agenda, and by making DMPs true infrastructure in a global open access community we will elevate research and open data for reuse.”

roadmap still

In their joint project, the two organizations will combine their experience along with all existing functionality from their tools regarding the DMP use case into a single technical platform.

“New work on our respective systems is already underway to enable internationalization, integrate with other organizations and technical platforms, and encourage greater openness with DMPs,” they explain. “By joining forces, the Roadmap system will consolidate these efforts and move beyond a narrow focus on specific funders in specific countries, and even beyond institutional boundaries, to create a framework for engaging with disciplinary communities directly.”

To facilitate data sharing, reuse, and discoverability, Roadmap will be integrated with a number of platforms such as the Open Science Framework, SHARE, the Crossref/Datacite DOI Event Tracking system and Zenodo, among others. “Linking systems and research outputs across the web increases the chances that data will be discovered, accessed, and (re)used,” note the authors.

The team’s plan for enhanced openness includes encouraging authors to share their newly created data management plans by setting their privacy on “public” by default. They also intend to assign digital object identifiers (DOIs) to all plans, thus making them citable and motivating their authors to make them openly accessible. As part of this initiative, five researchers have just published their DMPs, created with the DMPTool, in Research Ideas and Outcomes (RIO).

“We see greater potential for the DMP as a dynamic checklist for pre- and post-award reporting; a manifest of research products that can be linked with published outputs; and a record of data, from primary through processing stages, that could be passed to repositories,” state the authors. “The DMP will therefore not only support the management of the data but boost its discoverability and reuse.”

###

Original source:

Simms S, Jones S, Ashley K, Ribeiro M, Chodacki J, Abrams S, Strong M (2016) Roadmap: A Research Data Management Advisory Platform. Research Ideas and Outcomes 2: e8649. doi: 10.3897/rio.2.e8649

One place for all scholarly literature: An Open Science Prize proposal

Openly accessible scholarly literature is referred to as “the fabric and the substance of Open Science” in the present small grant proposal, submitted to the Open Science Prize contest and published in the Research Ideas and Outcomes (RIO) open access journal. However, the scholarly literature is currently quite chaotically dispersed across thousands of different websites and disconnected from its context.

To tackle this issue, authors Marcin Wojnarski, Paperity, Poland, and Debra Hanken Kurtz, DuraSpace, USA, build on the existing prototype Paperity, the first open access aggregator of scholarly journals. Their suggestion is the first global universal catalog of open access scientific literature. It is to bring together all publications by automatically harvesting both “gold” and “green” ones.

Called Paperity Central, it is to also incorporate many helpful functionalities and features, such as a wiki-type one, meant to allow for registered users to manually improve, curate and extend the catalog in a collaborative, community-controlled way.

“Manual curation will be particularly important for “green” metadata, which frequently contain missing or incorrect information; and for cataloguing those publications that are inaccessible for automatic harvesting, like the articles posted on author homepages only,” further explain the authors.

To improve on its ancestor, the planned catalog is to seamlessly add “green” publications from across repositories to the already available articles indexed from gold and hybrid journals. Paperity Central is to derive its initial part of “green” content from DSpace, the most popular repository platform worldwide, developed and stewarded by DuraSpace, and powering over 1,500 academic repositories around the world.

All items available from Paperity Central are to be assigned with globally unique permanent identifiers, thus reconnecting them to their primary source of origin. Moreover, all different types of Open Science resources related to a publication, such as author profiles, institutions, funders, grants, datasets, protocols, reviews, cited/citing works, are to be semantically linked in order to assure none of them is disconnected from its context.

Furthermore, the catalog is to perform deduplication of each entry in the same systematic and consistent way. Then, these corrections and expansions are to be transferred back to the source repositories in a feedback loop via open application programming interfaces (APIs). However, being developed from a scratch, its code will possess many distinct features setting it apart from existing wiki-type platforms, such as Wikipedia, for example.

“Every entry will consist of structured data, unlike Wikipedia pages which are basically text documents,” explain the scientists. “The catalog itself will possess internal structure, with every item being assigned to higher-level objects: journals, repositories, collections – unlike Wikipedia, where the corpus is a flat list of articles.”

In order to guarantee the correctness of the catalog, Paperity Central is to be fully transparent, meaning the history of changes is to be made public. Meanwhile, edits are to be moderated by peers, preferably journal editors or institutional repository admins overlooking the items assigned to their collections.

In their proposal, the authors note that the present development plan is only the first phase of their project. They outline the areas where the catalog is planned to be further enhanced in future. Among others, these include involvement of more repositories and platforms, fully developed custom APIs and expansion on the scholarly output types to be included in the catalog.

“If we are serious about opening up the system of scientific research, we must plant it on the foundation of open literature and make sure that this literature is properly organized and maintained: accessible for all in one central location, easily discoverable, available within its full context, annotated and semantically linked with related objects,” explain the scientists.

“Assume we want to find all articles on Zika published in 2015,” they exemplify. “We can find some of them today using services like Google Scholar or PubMed Central, but how do we know that no other exist? Or that we have not missed any important piece of literature? With the existing tools, which have incomplete and undefined coverage, we do not know and will never know for sure.”

In the spirit of their principles of openness, the authors assure that once funded, Paperity Central will be releasing its code as open source under an open license.

###

Original source:

Wojnarski M, Hanken Kurtz D (2016) Paperity Central: An Open Catalog of All Scholarly Literature. Research Ideas and Outcomes 2: e8462. doi: 10.3897/rio.2.e8462