Golem is a markup language and library for extracting information from many kinds of CML (Chemical Markup Language) files, particularly aimed towards processing the output of atomistic simulation programs.
To see what it does, have a look at the documentation - it’s a fairly hefty bit of software, but if you’re ever needing to process CML, I hope it proves useful to you!
And an announcement: this’ll be the last post from me on Brighten the Corners. I’m starting a new job tomorrow, and leaving the University; I’ll be working for a small Web startup based in Cambridge.
I’ve been blessed the opportunity to work alongside some fantastic people here, and I’d like to thank them all; I’ll miss this place. So I’d like to thank my supervisor Peter M-R, Jim, Nico, Peter C, Joe, Nick D, Nick E, Lezan, Volker, Alan, Dave J, and everyone else at the Unilever Centre.
And thank you for reading! I’ll be moving to new digs at http://blog.lexical.org.uk/ - hope to see some of you there.
Recently, Jim Downing and I put together an entry for the ORE Challenge (OAI-ORE is a format for metadata about documents and other digital media in institutional repositories).
What we made was (Jim writes):
PeekORE is a javascript application that uses preloading and ORE autodiscovery to decorate links to pages representing ORE aggregations. It then allows users to quickly view and click through to the contents of an aggregation in a dynamic popup pane, without leaving the original page. It was inspired by the general coolness of feed autodiscovery, ORE autodiscovery, and Stacks, a glossy feature in OS X Leopard.
This year’s conference will concentrate on the scholars themselves, what they want as authors and readers, what they are being provided with and what they may be need to be provided with in the future – all in the context of the digital environment and the broader revolutions in progress. In many scoping conferences attended by publishers and/or librarians it is a truism to say that it is the scholars themselves that are routinely ignored in favour of discussions about business models and current ideological battles.
From an email, here’s what I’m planning to go on about:
I’ll [...] talk a bit about scientific data, RDF and linked data [...] the crossover between the research we do as academics and related fields like database journalism and the work
people are doing on bespoke data visualization, and between informal communication (like blogging) and formal communication (like journals).
If you’re going to be there, drop me a line - it’d be good to meet up.
My XTech 2008 paper’s now up on the conference website. If you liked the look of my slides, this goes into quite a bit more detail about what Golem is, and about some of the work we’ve done on extracting relationships from large volumes of crystallography data.
(If that sounds deathly dull to you, though, my apologies!)
Thanks to everyone who came along! It went pretty well, and the audience were very kind.
(Unfortunately, Slideshare’s weirded out some of the backgrounds, so a couple of the slides are hard to read. You can download a PDF here which doesn’t have that problem.)
Some of the slides are a bit minimalist, so I’ll write some notes on what I said later this week; the conference proceedings should be online soonish too.
Golem is a set of tools, and ontology language, for processing data written in CML. The Golem language is XML, and the tools and libraries are written in Python.
Golem is being developed as part of MaterialsGrid, where we’re building a system to automate the process of predicting the properties of engineering materials; recently I’ve been using to automatically extract RDF metadata from the CrystalEye database, letting me make things like this.
I’ll be talking about this work at XTech, so more soon, but if you have any problems getting Golem to work, please let us know.
The data comes from CrystalEye via Golem and the GeoNames dataset, and was stored in a Talis Platform triple store - thanks to them for letting me into the beta! I wrote the visualization in Processing.
I’ll be talking more about the process behind making this sort of thing at XTech ‘08, so I won’t bore you all with the details now - but if there’s anything you want to ask, leave a comment!
Crystallography, 2000-2007 from Andrew Walkingshaw on Vimeo.
I’m going to be busy over the next couple of months. I’ve been lucky enough to get speaking opportunities at a couple of conferences - both outwith the mainstream science thing, which is cool, if a little terrifying.
First up, I’m speaking at XTech 2008 in Dublin, a big web technology conference; here’s the abstract:
Modern science produces a lot of data; whether outputted from experimental apparatus or as the result of simulation, the volume of information a scientist can produce is now often radically greater than it was even ten years ago. It follows that there is now a need for new tools to enable scientists to filter, mine and search both their own data and that produced by other researchers.
One of the major sources of experimental data is in the “supplementary data” attached to publications in journals. Our CrystalEye repository exploits this by aggregating, and converting to CML , the supplementary data from journals which publish crystal structures.
The question then becomes how to add value to this data. One approach is by enhancing the searchability and discoverability of the data – a task for which RDF in general, and SPARQL in particular, is well-suited. We therefore, using our Golem ontology language and pyGolem toolkit (which enable the layering of richer semantics onto CML), extract metadata from CrystalEye as RDF, and use it to build new interfaces to the repository – thus making the data therein easier to find, analyse and reuse.
I’ll chuck the slides and transcript up here once I’ve given the talk.
The other’s interesting, very interesting. A bit scary, too. And I’ve not really got any idea what I’m going to talk about there. I have a few ideas, but…
It’s got me thinking, though. What to talk about? My work’s not the most accessible thing in the world, after all; I’m not Peter Wothers (watch those videos; they’re great if you’re a repressed pyromaniac like us lot). Same here, really; when I sit down to write here, I’m reckoning chemists probably don’t want to read semantic web/Internet stuff, and the Web folks won’t be so into the science.