Should Data Repositories be Open?
PMR: Obviously I’m a fan of Open Data and Open Access but I don’t take it as axiomatic that all Repositories must be completely Open. The primary purpose (IMO) is that (scientific) repositories preserve information and that they should try to capture all meaningful output from an institution. Much of this is, necessarily, not Open in the first instance. There are, for example, theses (and the data associated with them) that are closed because of commercially sensitive information, humanly sensitive information, etc. and universities have managed this concern for many years. So it’s reasonable that some information may stay closed for a considerable time.
There is also a pragmatic aspect. Many scientists (e.g. in chemistry) would never put their data in an Open repository at the beginning. The fear of being scooped (perhaps even by their own colleagues) or being banned from publication by publishers who regard this as prior disclosure, or invalidating a patent application. To over come that we have created an embargo process so that data can be stored and only disseminated later (in our eCrystals meeting with UKOLN and Soton 3 years was reported as probably tolerable to chemists). I hope that by carefully choosing the protocol it may be possible to lower this time gradually but it takes time and data.
Then - when the data come out of embargo - should they always be Open. I’d say yes, but there may be domain or community norms that militate against that, particularly in fields containing human data.
What is axiomatic, however, is that if we don’t capture it at all, then we cannot ever disseminate it, so my emphasis is on capture.
When giving the talk I do not feel bound to the precise topics in the abstract - so I’ll probably mention Open Data. What is on my mind at the moment is the critical need to adjust the thinking that Institutional Repositories as currently set up will address the data capture problem. They won’t - and if they try they will be much less successful than the IRs have been at capturingPDFs or other fulltext. So the need for a new breed of Data Repositories is clear. They will look very different from IRs if they are going to succeed.
Tags: or08
This entry was posted
on Thursday, March 6th, 2008 at 12:14 am and is filed under Uncategorized.
You can follow any responses to this entry through the RSS 2.0 feed.
You can leave a response, or trackback from your own site.