Repository Listing: OAI-PMH vs Atom vs Sitemaps

June 19, 2007

A basic repository feature is providing a list of all the resources in a collection, and a way to incrementally discover changes. The usual way for repos to enable this is OAI-PMH, using either the ListRecords verb or the ListIds verb, and the ‘from’ argument to perform efficient incremental update, and the resumptionToken system to enable the server to condition the load generated.

The way the rest of the world does it is with Atom or RSS. Unnecessary retrievals can be prevented using conditional GET. The server chooses the size of the feed documents so it can control it’s own load. It’s even possible to avoid lost updates or list an entire collection using ‘first’, ‘last’, ‘next’ and ‘previous’ links (as in this tip). There’s no direct equivalent of PMH’s ‘from’ but as long as the feed has timestamps on each entry, then the client knows when to stop retrieving more feed chunks.

I’m currently reading the REST book, so I’m in a frenzy of resource-oriented fervour. OAI-PMH is, in the REST patois, a STREST interface (this theme was picked up in the discussion between Carl Lagoze and Andy Powell recently). The rich resource discovery possible with OAI-PMH is also overkill for what I’m after here.

I’m also unsure about syndication – I have a feeling that the resource representations in Atom / RSS feeds are unlikely to satisfy most repository clients’ needs. Isn’t a more resource-oriented approach to simply link to the resource and let the client negotiate with the resource for an appropriate representation? If so, Sitemaps fit the bill perfectly.

Well, maybe, but on balance I still think that Atom / RSS is a better choice; the RESTful repository will almost certainly have a feed around for human clients, and it’s better to adapt this for machine clients than adopt an additional mechanism.

6 Responses to “Repository Listing: OAI-PMH vs Atom vs Sitemaps”

  1. eFoundations Says:

    OAI-PMH vs. Atom vs. Sitemaps…

    For some time now I’ve been meaning to write a blog entry summarising the functional capabilities of the OAI-PMH and then looking at whether and how the same functionality could be …

  2. ojd20 Says:

    Update: –

    A conversation with Andy Powell over at the eFoundations blog has made me less convinced of my own conclusion here. The problem with my choice of Atom is that nobody currently supports harvesting through Atom, whereas some important resource discovery mechanisms use sitemaps (Google, Yahoo, MSN). I’m unsure why Atom didn’t choose to build in harvesting, but at a practical level Sitemaps look favourite.


  3. […] There were a couple of comments on Andy Powell’s reply post to my post comparing OAI-PMH, Atom and sitemaps for repository harvesting that make it worth revisiting the issue (sorry I didn’t pick them up at the time, I failed to add the conversation to co.mments). Scott pointed out that having link-only feeds is useless for humans – I agree, I was thinking too narrowly about machine clients. Lars Kirchhoff asks: – Isn’t that [an efficient harvesting API] actually what OAI-PMH is already? … So I would think it would be easier to strip down OAI-PMH for the general purpose use of web resource representation. […]


  4. […] On my quest for metadata formats and APIs I found that ATOM is not just another RSS but more like a simple database language. Google’s Data API GData strongly pushes ATOM forward (but may also introduce some problems). Jim Downing wrote about ATOM, OAI-PMH, and Sitemaps – three different ways to provide a list of all the resources in a collection, and to incrementally discover changes. OAI-PMH is much less prominent, but why? […]


  5. […] problems in detail (PERX, the experience of the NSDL with metadata quality, Andy Powell, Jim Downing and many others have done that), suffice to say their are issues about how protocol […]


  6. […] problems in detail (PERX, the experience of the NSDL with metadata quality, Andy Powell, Jim Downing and many others have done that), suffice to say their are issues about how protocol […]


Leave a comment