Unilever Centre for Molecular Informatics
 

Peter Corbett

Teaching computers to read chemistry papers

 

BioNLP2007

Yesterday was the BioNLP 2007 workshop at the ACL, where I presented my paper. It was notable that there were quite a few people focusing on medical text, such as patient records, as well the more usual molecular biology and genetics. One of the speakers couldn’t attend, and was replaced by a representative from Nature, talking about OTMI. A number of people there were somewhat sceptical about its usefulness, especially as they considered that a lot of current interesting work ran at the discourse level, and that didn’t work when your sentences had been shuffled.

A slight personal point; it’s gratifying to see some people using MEMMs and not CRFs - if nothing else, I want using MEMMs to be respectable for as long as possible because they’re nice and quick to train and so you can test your ideas out with them quite rapidly.

Some papers on my to-read pile:

On the unification of syntactic annotations under the Stanford dependency scheme: A case study on BioInfer and GENIA (Sampo Pyysalo; Filip Ginter; Veronika Laippala; Katri Haverinen; Juho Heimonen; Tapio Salakoski) - essentially a case study in making corpora compatible with each other.

Mining a Lexicon of Technical Terms and Lay Equivalents (Noemie Elhadad; Komal Sutaria) - an interesting idea, bringing aspects of machine translation into a summarisation problem, by using a corpus of parallel articles in the technical and popular press.

Recognising Nested Named Entities in Biomedical Text (Beatrice Alex; Barry Haddow; Claire Grover) - the title says it all, really.

Exploring the Efficacy of Caption Search for Bioscience Journal Search Interfaces (Marti Hearst; Anna Divoli; Ye Jerry; Michael Wooldridge) - no NLP in this work (yet), but an interesting presentation on UI/HCI issues.

ConText: An Algorithm for Identifying Contextual Features from Clinical Text (Wendy Chapman; John Dowling; David Chu) - this one’s a bit different, as it uses mainly handwritten rules rather than machine learning to disambiguate various types of disease incidences.

BioNoculars: Extracting Protein-Protein Interactions from Biomedical Text (Amgad Madkour; Kareem Darwish; Hany Hassan; Ahmed Hassan; Ossama Emam) - another unsupervised IE paper, this one uses a PageRank-like approach to find well-attested relationships between entities and the patterns that denote those relationships.

A Study of Structured Clinical Abstracts and the Semantic Classification of Sentences (Grace Chung; Enrico Coiera) - on to the posters now, this one covers a nice, well-constrained discourse problem.

BaseNPs that contain gene names: domain specificity and genericity (Ian Lewin) - to do with finding the context for named entities.

Leave a Reply