Computing Text: 2012

Tuesday, April 17, 2012

Open access journals

It has long been obvious that the days of closed scientific publishing are just as numbered as those of all restrictive practices. In the age of the free flow of bits sharing information will only ever get easier (as Cory Doctorow is fond of pointing out).

As workers in text mining it is, of course, frustrating that we often can't apply our algorithms as widely as would be useful for scientific users of our systems because of journal access restrictions. (The results are real; see for example our recent contribution to a PLoS One paper about oral cancer, Incorporation of prior information from the medical literature in GWAS of oral cancer identifies novel susceptibility variant on chromosome 4 - the AdAPT method, in press April 2012.)

A recent report suggests that the losses associated with these restrictions are more than £100 million per year:

Text mining, for example, is a relatively new research method where computer programmes hunt through databases of plain-text research articles, looking for associations and connections – between drugs and side effects, for example, or between genes and disease – that a person scouring through papers one by one may never notice.

In March, JISC, a government-funded agency that champions the use of digital technology in UK universities for research and teaching, published a report. This said that if text mining enabled just a 2% increase in productivity for scientists, it would be worth £123m-£157m in working time per year.

But the process requires research articles to be accessed, copied, analysed and annotated – all of which could be illegal under current copyright laws.

( The Guardian, 9th April 2012.)

It is time to open up!

Permalink.

Sunday, March 4, 2012

Thanks for the memories

When Kevin Humphreys and I wrote the first version of GATE back in the 1990s we used Tcl/Tk, a nice clean scripting language with an extensible C API underneath. One of the innovative things that Tcl provided was a dynamic loading mechanism, and I used it to allow CREOLE plugins to be reloaded at run-time. A year or two after we'd released the system I could often be heard cursing my stupidity — the reloading system worked well when it was configured, but getting it to run cross-platform at user sites with diverse collections of underlying shared C libraries was a huge pain in the bum.

Fast forward 15 years or so and the class loader code that I put into GATE version 2 (the first Java version) also has some pain associated with it, and it is a real pleasure to see this post with all its carefull study and presentation. Even better, a new chunk of code to take away one of the gotchas with classloading and memory consumption in long-running server processes. Nice one Mark!

One other thing springs to mind — the design choices that we took for GATE 2 (around the turn of the millenium, with a first release in 2002) turned out to be pretty good, by and large (more luck than judgement on my part, of course). GATE has mushroomed orders of magnitude beyond our original plan in the intervening period, but despite a few creaking joints it still holds its own. That's a credit to several of the long-term inmates of the GATE team, and also to Java (and its later offshoots like Spring, Groovy and Grails). It's easy to get blinded by the Next Big Thing in computing, but if you stand on solid foundations (and keep working on reusability and refactoring) you can have your cake and eat it!

(And sorry for the cheesy title.)

Permalink.

Computing Text

Tuesday, April 17, 2012

Open access journals

Sunday, March 4, 2012

Thanks for the memories

Share

Hamish Cunningham

Blog Archive

Computing Text

Tuesday, April 17, 2012

Open access journals

Sunday, March 4, 2012

Thanks for the memories

Share

Hamish Cunningham

Subscribe To

Blog Archive