I was just about to go back to refactoring Chem4Word, when I saw this pingback on my blog and just have to comment. It’s really important. More of my comments at the bottom…
Which blogs should be preserved?
Richard M. Davis on 26th June, 2009 at 12:00 pm
You’d think it obvious that my blog should be preserved, though I’m not so sure about yours! According to the poster summarising the fascinating 2007 survey by Carolyn Hank et al: “The majority of bloggers agreed (36%) or strongly agreed (34.9%) that their own blogs should be preserved.” Five per cent don’t want their blogs preserved at all; nearly a quarter aren’t fussed either way.
Here’s one of the data tables (which I had to retype as HTML – Peter Murray Rust is right about PDFs and data):
Table 4. Preservation perceptions – general
| | Strongly agree or agree | Neither agree or (sic) disagree | Strongly disagree or disagree |
Should preserve | Personal blog | 70.9% | 23.8% | 5.3% |
| Every blog | 35.8% | 27.9% | 36.3% |
| Every comment | 31.4% | 31.9% | 36.7% |
| All online content | 28.2% | 22.3% | 49.5% |
Should not preserve | Some blogs | 44.7% | 27.7% | 27.7% |
| Some comments | 48.4% | 31.3% | 20.2% |
| Some online content | 51.3% | 24.9% | 23.8% |
The overall pattern seems a good vindication of our own project approach, which will progressively move from capturing blog content (posts), to addressing comments and content, reflecting the scale of the bloggers’ own priorities.
It also seems a useful juncture in our project to throw open the question: which blogs should we preserve?
With over 5 million active blogs noted by Technorati, it seems daft to even start to enumerate them but in our field (libraries, archives, information science), several stand out, and it’s the very nature and importance of these that bolster the case for keeping them. I have in mind in particular Peter Suber’s Open Access News blog, but also blogs such as those of Peter Murray Rust, Brian Kelly, Lorcan Dempsey, Dorothea Salo, Jill Walker Rettberg – all ripe with contemporary accounts and robust views on matters of scholarly communication. But in every case, we have cause to wonder: will that information survive, will that link still work tomorrow?
What blogs (or types of blogs) do you think should be preserved, and why?
PMR: This is really important. Blogs are evolving and being used for many valuable activities (here we highlight scholarship). Some bloggers spend hours or more on a popst. Bill Hooker has an incredible set of statistics about the cost of Open Access and Toll Access publications, page charges, etc. Normally that would get published in a journal no-one reads (I have even published in such – it was a huge effort and it’s got one citation. Not that I care about citations). So I tend to work out my half-baked ideas in public. Some people do their early science in the Open. Some are activists. Some review the current landscape, etc.
But preservation is really really difficult. I don’t know how to tackle it. Since 1993 I have been determined to preserve my digital record.
And I’ve failed.
I’ve created courses, forums, data sets, teaching-learning objects, blogs, preprints, etc.
And I’ve lost most of them.
There are many reasons. First it’s extremely hard to preserve complex digital objects. The problems include:
compound documents (and only after 15 years is the web coming round to realising this is important)
hyperlinks
moving URLs/URIs
formats
semantic behaviour
disorganised humans (me)
moving institution (4 times)
moving computer (about 10 times)
Henry Rzepa and I have worked hard on this and he is more organized than me. We put early versions of JUMBO on CD-ROMs and got the RSC to distribute them with an issue of the journal. I have saved things on DAT tapes from the SGI. DAT??? SGI??? I don’t have a machine which will read 3.5 floppies at home. I have trashed my much beloved BBC Micro.
Every time I change machine I lose large amounts of data.
At some stage someone will invent a true Memex for my digital activities. Until then:
Preservation is effectively impossible.
So what’s the answer? The only one I can think of at the moment is to disseminate as widely as possible. If people want to read your material they will take copies (if that is technically possible). I would urge University Repositories:
Stop agonizing about preservation and start disseminating.
If it’s worth preserving the the web will have a reasonable chance of containing it somewhere. If it’s not, well history will judge whether our current dross are the jewels of the future. We can’t tell.
DISSEMINATE, DISSEMINATE, DISSEMINATE
MAKE IT OPEN. FORGET COPYRIGHT. JUST PUBLISH.
CREATE LINKED OPEN DATA. LINKED OPEN DATA
CREATE AND RELEASE HERDS OF COWS, NOT PRESERVE HAMBURGERS IN A DEEP-FREEZE