Pilot Project provides findings and advice on data sharing in development research

Launched just ahead of last year’s FORCE11 conference devoted to the utilisation of technological and open science advancements to communication of scholarship, the data sharing pilot project of Prof. Cameron Neylon, Centre for Culture and TechnologyCurtin University, Australia, and his collaborators has published its final outcomes days before FORCE2017.

The project made use of the innovative open science RIO Journal, which allows for the publication of various research outcomes and sorting them into dedicated collections. The collection, “Exploring the opportunities and challenges of implementing open research strategies within development institutions: A project of the International Development Research Center” now features the project’s grant proposal as approved by the IDRC, a review article, several data management plans and case studies, and the final report as a research article.

The pilot project looked into the current state and challenges of data management and sharing policies and practices, as shown by case studies of seven IDRC-funded development research projects.

Having worked with the projects selected from across a range of geographies, scales and subjects over the course of 16 months, the data sharing initiative began with an introduction to data management and sharing concepts, as well as helping the projects to develop their own data management plans. Then, they carefully monitored and observed the implementation of those plans.

Over the course of the project, it became apparent that simply developing and implementing funder policies is not enough to change research culture. The question of how funder policy and implementation could support culture change both within research communities and within the funder itself became the focus of the initiative.

Data management plans have become a mandatory part of grant submission for many funders. However, they are often not utilised by researchers or funders later in the project, becoming “at best neutral and likely counter productive in supporting change in research culture.”

While the pilot project managed to identify a number of significant bottlenecks within both research institutions and for grantees that impede efficient data sharing practices, including, expectedly, lack of resources and expertise, the researchers specifically point out issues related to structural issues at the funder level.

“The single most productive act to enhance policy implementation may be to empower and support Program Officers,” says the author.

“This could be achieved through training and support of individual POs, through the creation of a group of internal experts who can support others, or via provision of external support, for instance, by expanding the services provided by the pilot project into an ongoing support mechanism for both internal staff and grantees.”

Amongst the findings of the pilot project are also the importance of language barriers and the need for better suited data management platforms and tools.

Furthermore, the study identified a gap between the understanding of “data” amongst cultures, pointing out that the concept of data is “part of a western scientific discourse which may be both incompatible with other cultures, particularly indigenous knowledge systems.”

In conclusion, the research article outlines a set of recommendations for funders, particularly those with a focus on development, as well as recommendations specific to the IDRC.

Original source:

Neylon C (2017) Building a Culture of Data Sharing: Policy Design and Implementation for Research Data Management in Development Research. Research Ideas and Outcomes 3: e21773. https://doi.org/10.3897/rio.3.e21773

The first microbial supertree from figure-mining thousands of papers

While recent reports reveal the existence of more than 114,000,000 documents of published scientific literature, finding a way to improve the access to this knowledge and efficiently synthesise it becomes an increasingly pressing issue.

Seeking to address the problem through their PLUTo workflow, British scientists Ross Mounce and Peter Murray-Rust, University of Cambridge and Matthew Wills, University of Bath perform the world’s first attempt at automated supertree construction using data exclusively extracted by machines from published figure images. Their results are published in the open science journal Research Ideas and Outcomes (RIO).

For their study, the researchers picked the International Journal of Systematics and Evolutionary Microbiology (IJSEM) – the sole repository hosting all new validly described prokaryote taxa and, therefore, an excellent choice against which to test systems for the automated and semi-automated synthesis of published phylogenies. According to the authors, IJSEM publishes a greater number of phylogenetic tree figure images a year than any other journal.

An eleven-year span of articles dating back to January, 2003 was systematically downloaded so that all image files of phylogenetic tree figures could be extracted for analysis. Computer vision techniques then allowed for the automatic conversion of the images back into re-usable, computable, phylogenetic data and used for a formal supertree synthesis of all the evidence.

During their research, the scientists had to overcome various challenges posed by copyrights formally covering almost all of the documents they needed to mine for the purpose of their work. At this point, they faced quite a paradox – while easy access and re-use of data published in scientific literature is generally supported and strongly promoted, common copyright practices make it difficult for a scientist to be confident when incorporating previously compiled data into their own work. The authors discuss recent changes to UK copyright law that have allowed for their work to see the light of day. As a result, they provide their output as facts, and assign them to the public domain by using the CC0 waiver of Creative Commons, to enable worry-free re-use by anyone.

“We are now at the stage where no individual has the time to read even just the titles of all published papers, let alone the abstracts,” comment the authors.

“We believe that machines are now essential to enable us to make sense of the stream of published science, and this paper addresses several of the key problems inherent in doing this.”

“We have deliberately selected a subsection of the literature (limited to one journal) to reduce the volume, velocity and variety, concentrating primarily on validity. We ask whether high-throughput machine extraction of data from the semistructured scientific literature is possible and valuable.”  

 

Original source:

Mounce R, Murray-Rust P, Wills M (2017) A machine-compiled microbial supertree from figure-mining thousands of papers. Research Ideas and Outcomes 3: e13589. https://doi.org/10.3897/rio.3.e13589

 

Additional information:

The research has been funded by the BBSRC (grant BB/K015702/1 awarded to MAW and supporting RM).

Legitimacy of reusing images from scientific papers addressed

It goes without saying that scientific research has to build on previous breakthroughs and publications. However, it feels quite counter-intuitive for data and their re-use to be legally restricted. Yet, that is what happens when copyright restrictions are placed on many scientific papers.

The discipline of taxonomy is highly reliant on previously published photographs, drawings and other images as biodiversity data. Inspired by the uncertainty among taxonomists, a team, representing both taxonomists and experts in rights and copyright law, has traced the role and relevance of copyright when it comes to images with scientific value. Their discussion and conclusions are published in the latest paper added in the EU BON Collection in the open science journal Research Ideas and Outcomes (RIO).

Taxonomic papers, by definition, cite a large number of previous publications, for instance, when comparing a new species to closely related ones that have already been described. Often it is necessary to use images to demonstrate characteristic traits and morphological differences or similarities. In this role, the images are best seen as biodiversity data rather than artwork. According to the authors, this puts them outside the scope, purposes and principles of Copyright. Moreover, such images are most useful when they are presented in a standardized fashion, and lack the artistic creativity that would otherwise make them ‘copyrightable works’.

image 3

“It follows that most images found in taxonomic literature can be re-used for research or many other purposes without seeking permission, regardless of any copyright declaration,” says Prof. David J. Patterson, affiliated with both Plazi and the University of Sydney.

Nonetheless, the authors point out that, “in observance of ethical and scholarly standards, re-users are expected to cite the author and original source of any image that they use.” Such practice is “demanded by the conventions of scholarship, not by legal obligation,” they add.

However, the authors underline that there are actual copyrightable visuals, which might also make their way to a scientific paper. These include wildlife photographs, drawings and artwork produced in a distinctive individual form and intended for other than comparative purposes, as well as collections of images, qualifiable as databases in the sense of the European Protection of Databases directive.

In their paper, the scientists also provide an updated version of the Blue List, originally compiled in 2014 and comprising the copyright exemptions applicable to taxonomic works. In their Extended Blue List, the authors expand the list to include five extra items relating specifically to images.

“Egloff, Agosti, et al. make the compelling argument that taxonomic images, as highly standardized ‘references for identification of known biodiversity,’ by necessity, lack sufficient creativity to qualify for copyright. Their contention that ‘parameters of lighting, optical and specimen orientation’ in biological imaging must be consistent for comparative purposes underscores the relevance of the merger doctrine for photographic works created specifically as scientific data,” comments on the publication Ms. Gail Clement, Head of Research Services at the Caltech Library.

“In these cases, the idea and expression are the same and the creator exercises no discretion in complying with an established convention. This paper is an important contribution to the literature on property interests in scientific research data – an essential framing question for legal interoperability of research data,” she adds.

###

Original source:

Egloff W, Agosti D, Kishor P, Patterson D, Miller J (2017) Copyright and the Use of Images as Biodiversity Data. Research Ideas and Outcomes 3: e12502. https://doi.org/10.3897/rio.3.e12502

Additional information:

The present study is a research outcome of the European Union’s FP7-funded project EU BON, grant agreement No 308454.

Guidelines for scholarly publishing of biodiversity data from Pensoft and EU BON

While development and implementation of data publishing and sharing practices and tools have long been among the core activities of the academic publisher Pensoft, it is well-understood that as part of scholarly publishing, open data practices are also currently in transition, and hence, require a lot of collaborative and consistent efforts to establish.

Based on Pensoft’s experience, and elaborated and updated during the Framework Program 7 EU BON project, a new paper published in the EU BON dedicated collection in the open science journal Research Ideas and Outcomes (RIO), outlines policies and guidelines for scholarly publishing of biodiversity and biodiversity-related data. Newly accumulated knowledge from large-scale international efforts, such as FORCE11 (Future of Research Communication and e-Scholarship), CODATA (The Committee on Data for Science and Technology), RDA (Research Data Alliance) and others, is also included in the Guidelines.

The present paper discusses some general concepts, including a definition of datasets, incentives to publish data and licences for data publishing. Furthermore, it defines and compares several routes for data publishing, namely: providing supplementary files to research articles; uploading them on specialised open data repositories, where they are linked to the research article; publishing standalone data papers; or making use of integrated narrative and data publishing through online import/download of data into/from manuscripts, such as the workflow provided by the Biodiversity Data Journal. Among the guidelines, there are also comprehensive instructions on preparation and peer review of data intended for publication.

Although currently available for journals using the developed by Pensoft journal publishing platform ARPHA, these strategies and guidelines could be of use for anyone interested in biodiversity data publishing.

Apart from paving the way for a whole new approach in data publishing, the present paper is also a fine example of science done in the open, having been published along with its two pre-submission public peer reviews. The reviews by Drs. Robert Mesibov and Florian Wetzel are both citable via their own Digital Object Identifiers (DOIs).

###

Original source:

Penev L, Mietchen D, Chavan V, Hagedorn G, Smith V, Shotton D, Ó Tuama É, Senderov V, Georgiev T, Stoev P, Groom Q, Remsen D, Edmunds S (2017) Strategies and guidelines for scholarly publishing of biodiversity data. Research Ideas and Outcomes 3: e12431. https://doi.org/10.3897/rio.3.e12431

Robot to find and connect medical scientists working on the same research via Open Data

Sharing research data or Open Science, aims to accelerate scientific discovery, which is of particular importance in the case of new medicines and treatments. A grant proposal by an international research team, led by Dr Chase C. Smith, MCPHS University, and submitted to the Open Science Prize, suggests development of what the authors call The SCience INtroDuction Robot, (SCINDR). The project’s proposal is available in the open access journal Research Ideas and Outcomes (RIO).

Building on an open source electronic lab notebook (ELN) developed by the same team, the robot would discover and alert scientists from around the world who are working on similar molecules in real time. Finding each other and engaging in open and collaborative research could accelerate and enhance medical discoveries.

Already running and being constantly updated, the electronic lab notebook serves to store researchers’ open data in a machine-readable and openly accessible format. The next step before the scientists is to adapt the open source notebook to run SCINDR, as exemplified in their prototype.

“The above mentioned ELN is the perfect platform for the addition of SCINDR since it is already acting as a repository of open drug discovery information that can be mined by the robot,” explain the authors.

Once a researcher has their data stored on the ELN, or on any similar open database, for that matter, SCINDR would be able to detect if similar molecules, chemical reactions, biological assays or other features of importance in health research have been entered by someone else. If the robot identifies another scientist looking into similar features, it will suggest introducing the two to each other, so that they could start working together and combine their efforts and knowledge for the good of both science and the public.

Because of its ability to parse information and interests from around the globe, the authors liken SCINDR to online advertisements and music streaming services, which have long targeted certain content, based on a person’s writing, reading, listening habits or other search history.

“The potential for automatically connecting relevant people and/or matching people with commercial content currently dominates much of software development, yet the analogous idea of automatically connecting people who are working on similar science in real time does not exist,” stress the authors.

“This extraordinary fact arises in part because so few people work openly, meaning almost all the research taking place in laboratories around the world remains behind closed doors until publication (or in a minority of cases deposition to a preprint server), by which time the project may have ended and researchers have moved on or shelved a project.”

“As open science gathers pace, and as thousands of researchers start to use open records of their research, we will need a way to discover the most relevant collaborators, and encourage them to connect. SCINDR will solve this problem,” they conclude.

The system is intended to be tested initially by a community of researchers known as Open Source Malaria (OSM), a consortium funded to carry out drug discovery and development for new medicines for the treatment of malaria.

###

Original source:

Smith C, Todd M, Patiny L, Swain C, Southan C, Williamson A, Clark A (2016) SCINDR – The SCience INtroDuction Robot that will Connect Open Scientists. Research Ideas and Outcomes 2: e9995. doi: 10.3897/rio.2.e9995

Guiding EU researchers along the ‘last mile’ to Open Digital Science

Striving to address societal challenges in sectors including Health, Energy and the Environment, the European Union is developing the European Open Science Cloud, a complete socio-technical environment, including robust e-infrastructures capable of providing data and computational solutions where publicly funded research data are Findable, Accessible, Interoperable and Re-usable (FAIR).

Since 2007 The European Commission (EC) has invested more than €740 million in e-infrastructures through Horizon 2020 (the European Union Research and Innovation programme 2014-2020) and FP7 (the European Union’s Seventh Framework Programme for Research and Technological Development). They want to see this exploited in full.

Many research communities are, however, struggling to benefit from this investment. The authors call for greater emphasis on Virtual Research Environments (VREs) as the only way for researchers to capitalise on EC advances in networking and high performance computing. The authors characterise this as a “last mile” problem, a term borrowed from telecommunications networks and once coined to emphasise the importance (and difficulty) of connecting the broader network to each customer’s home or office. Without the last mile of connectivity, a network won’t generate a cent of value.

Some concerns around the transition to Open Digital Science refer to attribution and quality assurance, as well as limited awareness of open science and its implications to research. However, most difficulties relate to many e-infrastructure services being too technical for most users, not providing easy-to-use interfaces and not easily integrated into the majority of day-to-day research practices.

Trustworthy and interoperable Virtual Research Environments (VREs) are layers of software that hide technical details and facilitate communication between scientists and computer infrastructures. They serve as friendly environments for the scientists to work with complicated computer infrastructures, while being able to use their own set of concepts, ways of doing things and working protocols.

Helping them to solve the difficulties noted above, VREs could guide the skeptical research communities along the ‘last mile’ towards Open Digital Science, according to an international team of scientists who have published their Policy Brief in the open access journal Research Ideas and Outcomes (RIO).

The authors state “These domain-specific solutions can support communities in gradually bridging technical and socio-cultural gaps between traditional and open digital science practice, better diffusing the benefits of European e-infrastructures”. They also recognise that “different e-infrastructure audiences require different approaches.”

“Intuitive user interface experience, seamless data ingestion, and collaboration capabilities are among the features that could empower users to better engage with provided services,” stress the authors.

###

Original source:

Koureas D, Arvanitidis C, Belbin L, Berendsohn W, Damgaard C, Groom Q, Güntsch A, Hagedorn G, Hardisty A, Hobern D, Marcer A, Mietchen D, Morse D, Obst M, Penev L, Pettersson L, Sierra S, Smith V, Vos R (2016) Community engagement: The ‘last mile’ challenge for European research e-infrastructures. Research Ideas and Outcomes 2: e9933. doi: 10.3897/rio.2.e9933>

Sharing biodiversity data: Best tools and practices via the EU-funded project EU BON

Due to the exponential growth of biodiversity information in recent years, the questions of how to mobilize such vast amounts of data has become more tangible than ever. Best practices for data sharing, data publishing, and involvement of scientific and citizen communities in data generation are the main topic of a recent report by the EU FP7 project Building the European Biodiversity Observation Network (EU BON), published in the innovative Research Ideas & Outcomes (RIO) journal.

The report “Data sharing tools for Biodiversity Observation Networks” provides conceptual and practical advice for implementation of the available data sharing and data publishing tools. A detailed description of tools, their pros and cons, is followed by recommendations on their deployment and enhancement to guide biodiversity data managers in their choices.

“We believe publishing this report in RIO makes a lot of sense given the journal’s innovative concept of publishing unconventional research outcomes such as project reports. This feature provides projects like EU BON with the chance to showcase their results effectively and timely. The report provides a useful practical guide for biodiversity data managers and RIO gives the project an opportunity to share findings with anyone who will make use of such information”, explains Prof. Lyubomir Penev, Managing Director of Pensoft and partner in EU BON.

The new report is the second EU BON contribution featured in a dedicated project outcomes collection in RIO. Together with the data policy recommendations it provides a comprehensive set of resources for the use of biodiversity data managers and users.

“We did our biodiversity data sharing tools comparison from the perspective of the needs of the biodiversity observation community with an eye on the development of a unified user interface to this data – the European Biodiversity Portal (EBP)”, add the authors.

The scientists have identified two main challenges standing in front of the biodiversity data community. On the one hand, there is a variety of tools but none can as stand alone, satisfy all the requirements of the wide variety of data providers. On the other hand, gaps in data coverage and quality demand more effort in data mobilization.

Envisaged information flows between EU BON and LTER Europe, showing the complexity of sharing biodiversity data (from the 3rd EU BON Stakeholder Roundtable, Granada on 9-11 December 2015).
Envisaged information flows between EU BON and LTER Europe, showing the complexity of sharing biodiversity data (from the 3rd EU BON Stakeholder Roundtable, Granada on 9-11 December 2015).

“For the time being a combination of tools combined in a new work-flow, makes the most sense for EU BON to mobilize biodiversity data,” comment the report authors on their findings. “There is more research to be done and tools to be developed, but for the future there is one firm conclusion and it is that the choice of tools should be defined by the needs of those observing biodiversity – the end user community in the broadest sense – from volunteer scientists to decision makers.”

###

Original Source:

Smirnova L, Mergen P, Groom Q, De Wever A, Penev L, Stoev P, Pe’er I, Runnel V, Camacho A, Vincent T, Agosti D, Arvanitidis C, Bonet F, Saarenmaa H (2016) Data sharing tools adopted by the European Biodiversity Observation Network Project. Research Ideas and Outcomes 2: e9390. doi: 10.3897/rio.2.e9390

 

About EU BON:

EU BON stands for “Building the European Biodiversity Observation Network” and is a European research project, financed by the 7th EU framework programme for research and development (FP7). EU BON seeks ways to better integrate biodiversity information and implement into policy and decision-making of biodiversity monitoring and management in the EU.

 

 

Open neuroscience: Collaborative Neuroimaging Lab finalist for the Open Science Prize

Despite the abundance of digital neuroimaging data, shared thanks to all funding, data collection, and processing efforts, but also the goodwill of thousands of participants, its analysis is still falling behind. As a result, the insight into both mental disorders and cognition is compromised.

The Open Neuroimaging Laboratory framework, promises a collaborative and transparent platform to optimise both the quantity and quality of this invaluable brain data, ultimately gaining a greater insight into both mental disorders and cognition.

The project was submitted for the Open Science Prize competition by Katja Heuer, Max Planck Institute for Human Cognitive and Brain Sciences, Germany, Dr Satrajit S. Ghosh, Massachusetts Institute of Technology (MIT), USA, Amy Robinson Sterling, EyeWire, USA, and Dr Roberto Toro, Institut Pasteur, France. Amongst 96 submissions from all around the globe, it was chosen as one of six teams to compete in the second and final phase of the Prize.

Simply having access and being able to download brain magnetic resonance imaging (MRI) data is not enough to reap all potential benefits. In order for it to be turned into insight and knowledge, it needs to also be queried, pre-processed and analysed, which requires a substantial amount of human curation, visual quality assessment and manual editing. With research being rather patchy, a lot of efforts are currently redundant and unreliable.

On the other hand, the Open Neuroimaging Laboratory aims to aggregate annotated brain imaging data from across various resources, thus improving its searchability and potential for reuse. It is to also develop a tool that will facilitate and encourage the creation of distributed teams of researchers to collaborate together in the analysis of this open data in real time.

“Our project will help transform the massive amount of static brain MRI data readily available online into living matter for collaborative analysis,” explain the researchers.

“We will allow a larger number of researchers to have access to this data by lowering the barriers that prevent their analysis: no data will have to be downloaded or stored, no software will have to be installed, and it will be possible to recruit a large, distributed, group of collaborators online.”

“By working together in a distributed and collaborative way, sharing our work and our analyses, we should improve transparency, statistical power and reproducibility,” they elaborate. “Our aim is to provide to everyone the means to share effort, learn from each other, and improve quality of and trust in scientific output.”Untitled

Having already developed a functional prototype of the BrainBox web application, which provides an interactive online space for collaborative data analyses and discussions, the team will now turn it into a first version with an improved user experience, stability and documentation. Planned for the Open Science Prize Phase 2 are furthering the type of analyses and exploring the development of interfaces for database-wise statistical analyses.

In the spirit of the competition, the scientists have decided to release their code open source on GitHub to facilitate bug fixes, extension and maintainability.

###

Original source:

Heuer K, Ghosh S, Robinson Sterling A, Toro R (2016) Open Neuroimaging Laboratory.Research Ideas and Outcomes 2: e9113. doi: 10.3897/rio.2.e9113

Data sharing pilot to report and reflect on data policy challenges via 8 case studies

This week, FORCE2016 is taking place in Portland, USA. The FORCE11 yearly conference is devoted to the utilisation of technological and open science advancements towards a new-age scholarship founded on easily accessible, organised and reproducible research data.

As a practical contribution to the scholarly discourse on new modes of communicating knowledge, Prof. Cameron Neylon, Centre for Culture and Technology, Curtin University, Australia, and collaborators are to publish a series of outputs and outcomes resulting from their ongoing data sharing pilot project in the open access journal Research Ideas and Outcomes (RIO).

Starting with their Grant Proposal, submitted and accepted for funding by the Canadian International Development Research Centre (IDRC), over the course of sixteen months, ending in December 2016, they are to openly publish the project outputs starting with the grant proposal.

The project will collaborate with 8 volunteering IDRC grantees to develop Data Management Plans, and then support and track their development. The project expects to submit literature reviews, Data Management Plans, case studies and a final research article with RIO. These will report and reflect on the lessons they will have learnt concerning open data policies in the specific context of development research. Thus, the project is to provide advice on refining the open research data policy guidelines.

“The general objective of this project is to develop a model open research data policy and implementation guidelines for development research funders to enable greater access to development research data,” sum up the authors.

“Very little work has been done examining open data policies in the context of development research specifically,” they elaborate. “This project will serve to inform open access to research data policies of development research funders through pilot testing open data management plan guidelines with a set of IDRC grantees.”

The researchers agree that data constitutes a primary form of research output and that it is necessary for research funders to address the issue of open research data in their open access policies. They note that not only should data be publicly accessible and free for re-use, but they need to be “technically open”, which means “available for no more than the cost of reproduction, and in machine-readable and bulk form.” At the same time, research in a development context raises complex issues of what data can be shared, how, and by whom.

“The significance of primary data gathered in research projects across domains is its high potential for not only academic re-use, but its value beyond academic purposes, particularly for governments, SME, and civil society,” they add. “More importantly, the availability of these data provides an ideal opportunity to test the key premise underlying open research data — that when it is made publicly accessible in easily reusable formats, it can foster new knowledge and discovery, and encourage collaboration among researchers and organizations.”

However, such openness is also calling for extra diligence and responsibility while sharing, handling and re-using the research data. This is particularly the case in development research, where challenging ethical issues come to the fore. The authors point out the issues, raised by such practice, to be, among others, realistic and cost-effective strategies for funded researchers to collect, manage, and store the various types of data resulting from their research, as well as ethical issues such as privacy and rights over the collected data.

###

Original source:

Neylon C, Chan L (2016) Exploring the opportunities and challenges of implementing open research strategies within development institutions. Research Ideas and Outcomes 2: e8880. doi: 10.3897/rio.2.e8880

Open-source collaborative platform to collect content from over 350 institutions’ archives

With the technical and financial capacity of any currently existing single institution failing to answer the needs for a platform efficiently archiving the web, a team of American researchers have come up with an innovative solution, submitted to the U.S. Institute of Museum and Library Services (IMLS) and published in the open-access journal Research Ideas and Outcomes (RIO).

They propose a lightweight, open-source collaborative collection development platform, called Cobweb, to support the creation of comprehensive web archives by coordinating the independent activities of the web archiving community. Through sharing the responsibility with various institutions, the aggregator service is to provide a large amount of continuously updated content at greater speed with less effort.

In their proposal, the authors from the California Digital Library, the UCLA Library, and Harvard Library, give an example with the fast-developing news event of the Arab Spring, observed to unfold online simultaneously via news reports, videos, blogs, and social media.

“Recognizing the importance of recording this event, a curator immediately creates a new Cobweb project and issues an open call for nominations of relevant web sites,” explain the researchers. “Scholars, subject area specialists, interested members of the public, and event participants themselves quickly respond, contributing to a site list that is more comprehensive than could be created by any curator or institution.”

“Archiving institutions review the site list and publicly claim responsibility for capturing portions of it that are consistent with local collection development policies and technical capacities.”

Unlike already existing tools supporting some level of collaborative collecting, the proposed Cobweb service will form a single integrated system.

“As a centralized catalog of aggregated collection and seed-level descriptive metadata, Cobweb will enable a range of desirable collaborative, coordinated, and complementary collecting activities,” elaborate the authors. “Cobweb will leverage existing tools and sources of archival information, exploiting, for example, the APIs being developed for Archive-It to retrieve holdings information for over 3,500 collections from 350 institutions.”

If funded, the platform will be hosted by the California Digital Library and initialized with collection metadata from the partners and other stakeholder groups. While the project is planned to take a year, halfway through the partners will share a release with the global web archiving community at the April 2017 IIPC General Assembly to gather feedback and discuss ongoing sustainability. They also plan to organize public webinars and workshops focused on creating an engaged user community.

###

Original source:

Abrams S, Goethals A, Klein M, Lack R (2016) Cobweb: A Collaborative Collection Development Platform for Web Archiving. Research Ideas and Outcomes 2: e8760. doi: 10.3897/rio.2.e8760