Unilever Centre for Molecular Informatics
 

petermr’s blog

A Scientist and the Web

 

Copyrighted data

As I have blogged I shall be presenting at the JISC/NSF meeting on data-driven science. This idea - which goes back to Kepler, Mendeleev and others is that published data can be interpreted in new and exciting ways. This workshop explores technical and organisational issues. Chemistry is one of the saddest sciences in this respect. It is hypopublished, but worse, the data are often copyrighted. Yes, you tell me, facts (data) cannot be copyrighted. Have a look at the rubric for supporting information (facts, facts, facts) accompanying Amer. Chem. Soc. (ACS) publications.

Permissible Use of Supporting Information

Electronic Supporting Information files are available without a subscription to ACS Web Editions. All files are copyrighted by the American Chemical Society. Files may be downloaded for personal use; users are not permitted to reproduce, republish, redistribute, or resell any Supporting Information, either in whole or in part, in either machine-readable form or any other form. For permission to reproduce this material, contact the ACS Copyright Office by e-mail at copyright@acs.org or by fax at 202-776-8112. (This text, is itself, copyright by the ACS. I have reproduced it without permission). I approached a senior representative of Wiley at the ACS meeting (whom I already know and have good relations with). I asked him why Wiley copyrighted factual data accompanying publications. He said because “they wished to sell it” (and he willingly gave me permission to quote his answer). I do not know what Springer do. Supplemental data for their publications are not usually visible. Because of this I do not read their publications. Does it matter? Here’s a measure. If I want a (small-molecule) crystal structure done it might cost 5K USD at current commercial prices. A typical chemistry department might do 500 per year or more. That would cost 2.5 M USD to purchase. Obviously there are economies of scale, no profit motive and low salaries in academia, so let’s say 1 M USD when everything is taken into account. That’s what it costs. What is it worth? It’s not even ZERO. We GIVE this data to the publishers. If I want a spectrum I have to pay a publisher. If I want a crystal structure I have to pay an aggreagator. And most of the publishers aren’t even competent at managing the data quality (except for adding copyright notices). Even the rather conservative STM publishers association has said this copyrighting is unacceptable. So why does it still happen? I have banged on about this for a year or two including the SPARC Open Data mailing list, but I have seen no response from senior academia - they don’t care. Some funders (Wellcome, and some of the RCUK - but not all) DO care, but I suspect they are a minority. So, funders and academia, your acquiescence to non-Open Data is destroying large areas of potential data-driven science.

One Response to “Copyrighted data”

  1. Peter,
    Regarding your comments above about “Permissible Use of Supporting Information”

    Here’s the question…assume that a structure database contains 1 Million structures. Supporting information in an ACS journal contains a simple structure..say cholesterol…let’s say a particular property, Boiling point.

    In theory, provided cholesterol exists in the 1M structures it should be LEGAL to point to the article from the database and comment that experimental BP data exists in that article. The structure is NOT reproduced…it is already in the database. The data are not reproduced…just linked to. There should be no infringement.

Leave a Reply