Unilever Centre for Molecular Informatics
 

Brighten the Corners

Andrew Walkingshaw’s weblog

 

Golem 1.0.1, and new beginnings

October 1st, 2008

Golem 1.0.1 is out.

Golem is a markup language and library for extracting information from many kinds of CML (Chemical Markup Language) files, particularly aimed towards processing the output of atomistic simulation programs.

To see what it does, have a look at the documentation - it’s a fairly hefty bit of software, but if you’re ever needing to process CML, I hope it proves useful to you!

And an announcement: this’ll be the last post from me on Brighten the Corners. I’m starting a new job tomorrow, and leaving the University; I’ll be working for a small Web startup based in Cambridge.

I’ve been blessed the opportunity to work alongside some fantastic people here, and I’d like to thank them all; I’ll miss this place. So I’d like to thank my supervisor Peter M-R, Jim, Nico, Peter C, Joe, Nick D, Nick E, Lezan, Volker, Alan, Dave J, and everyone else at the Unilever Centre.

And thank you for reading! I’ll be moving to new digs at http://blog.lexical.org.uk/ - hope to see some of you there.

Bye!

Linked data / scientific publishing talk, streaming

August 21st, 2008

For those of you who haven’t got iTunes…


The Virtual Scholar: Plotting the Future of Scientific Data from Andrew Walkingshaw on Vimeo.

PeekORE

August 21st, 2008

Recently, Jim Downing and I put together an entry for the ORE Challenge (OAI-ORE is a format for metadata about documents and other digital media in institutional repositories).

What we made was (Jim writes):

PeekORE is a javascript application that uses preloading and ORE autodiscovery to decorate links to pages representing ORE aggregations. It then allows users to quickly view and click through to the contents of an aggregation in a dynamic popup pane, without leaving the original page. It was inspired by the general coolness of feed autodiscovery, ORE autodiscovery, and Stacks, a glossy feature in OS X Leopard.

For more details, and a movie, have a look at Jim’s blog.

On linked data and scientific publishing

August 21st, 2008

I gave a lecture earlier this year at the Second Bloomsbury Meeting on e-Publishing and e-Publications. The meeting was called “The Virtual Scholar”, and in the organisers’ words:

This year’s conference will concentrate on the scholars themselves, what they want as authors and readers, what they are being provided with and what they may be need to be provided with in the future – all in the context of the digital environment and the broader revolutions in progress. In many scoping conferences attended by publishers and/or librarians it is a truism to say that it is the scholars themselves that are routinely ignored in favour of discussions about business models and current ideological battles.

I was asked to speak about semantic data, RDF and linked data, and the talk was recorded; it’s up on the Internet now, and if you’ve got iTunes you can get it from UCL’s iTunes U site (or audio only). I’m uploading the video to Vimeo now for people who can’t do iTunes too - it’ll be at http://vimeo.com/adw27.

Here’s my slides too, as you can’t really see them in the video:

Asking/dancing

May 30th, 2008

I’ve just accepted an invitation to speak at the Second Bloomsbury Conference on E-Publishing and E-Publications, being held on the 26th and 27th of June at UCL.

From an email, here’s what I’m planning to go on about:

I’ll [...] talk a bit about scientific data, RDF and linked data [...] the crossover between the research we do as academics and related fields like database journalism and the work people are doing on bespoke data visualization, and between informal communication (like blogging) and formal communication (like journals).

If you’re going to be there, drop me a line - it’d be good to meet up.

XTech paper

May 19th, 2008

My XTech 2008 paper’s now up on the conference website. If you liked the look of my slides, this goes into quite a bit more detail about what Golem is, and about some of the work we’ve done on extracting relationships from large volumes of crystallography data.

(If that sounds deathly dull to you, though, my apologies!)

My XTech 2008 talk: “Science with XML and RDF: Golem and CrystalEye”

May 10th, 2008

XTech 2008 was great. I’ve not had a chance to go through my notes yet and mull things over, but there were a few big tech trends (especially XMPP and distributed messaging and linked data and resource discovery/disambiguation) which are going to have an impact on what us scientists do. More on them soon!

In the meanwhile here’s the slide deck from my talk;

Thanks to everyone who came along! It went pretty well, and the audience were very kind. (Unfortunately, Slideshare’s weirded out some of the backgrounds, so a couple of the slides are hard to read. You can download a PDF here which doesn’t have that problem.)

Some of the slides are a bit minimalist, so I’ll write some notes on what I said later this week; the conference proceedings should be online soonish too.

Leaps

May 2nd, 2008

Just in time for my presentation at XTech 2008, I’m releasing, with a lot of help from Toby White, the final beta of Golem:

Golem is a set of tools, and ontology language, for processing data written in CML. The Golem language is XML, and the tools and libraries are written in Python.

Golem is being developed as part of MaterialsGrid, where we’re building a system to automate the process of predicting the properties of engineering materials; recently I’ve been using to automatically extract RDF metadata from the CrystalEye database, letting me make things like this.

I’ll be talking about this work at XTech, so more soon, but if you have any problems getting Golem to work, please let us know.

The geographic spread of crystallography

April 2nd, 2008

Here’s a video I made for my colleagues’ presentations at Open Repositories 2008.

The data comes from CrystalEye via Golem and the GeoNames dataset, and was stored in a Talis Platform triple store - thanks to them for letting me into the beta! I wrote the visualization in Processing.

I’ll be talking more about the process behind making this sort of thing at XTech ‘08, so I won’t bore you all with the details now - but if there’s anything you want to ask, leave a comment!
Crystallography, 2000-2007 from Andrew Walkingshaw on Vimeo.

You can download the movie here. It’s under Creative Commons, so if you want to use it in something, please feel free, but please keep the credits - and it’d be great if you let me know where you’ve used it.

Still, I must speak frankly, Mr Shankly

March 31st, 2008

I’m going to be busy over the next couple of months. I’ve been lucky enough to get speaking opportunities at a couple of conferences - both outwith the mainstream science thing, which is cool, if a little terrifying.

First up, I’m speaking at XTech 2008 in Dublin, a big web technology conference; here’s the abstract:

Representing, indexing and mining scientific data using XML and RDF: Golem and CrystalEye
Andrew Walkingshaw (University of Cambridge)

Modern science produces a lot of data; whether outputted from experimental apparatus or as the result of simulation, the volume of information a scientist can produce is now often radically greater than it was even ten years ago. It follows that there is now a need for new tools to enable scientists to filter, mine and search both their own data and that produced by other researchers.

One of the major sources of experimental data is in the “supplementary data” attached to publications in journals. Our CrystalEye repository exploits this by aggregating, and converting to CML , the supplementary data from journals which publish crystal structures.

The question then becomes how to add value to this data. One approach is by enhancing the searchability and discoverability of the data – a task for which RDF in general, and SPARQL in particular, is well-suited. We therefore, using our Golem ontology language and pyGolem toolkit (which enable the layering of richer semantics onto CML), extract metadata from CrystalEye as RDF, and use it to build new interfaces to the repository – thus making the data therein easier to find, analyse and reuse.

I’ll chuck the slides and transcript up here once I’ve given the talk.

The other’s interesting, very interesting. A bit scary, too. And I’ve not really got any idea what I’m going to talk about there. I have a few ideas, but…

It’s got me thinking, though. What to talk about? My work’s not the most accessible thing in the world, after all; I’m not Peter Wothers (watch those videos; they’re great if you’re a repressed pyromaniac like us lot). Same here, really; when I sit down to write here, I’m reckoning chemists probably don’t want to read semantic web/Internet stuff, and the Web folks won’t be so into the science.

Flambeed Rice Krispies in liquid oxygen, though, they’re acceptable anywhere.

So what d’you think?

<<<<<<< .mine ======= >>>>>>> .r10748