Computing Text: 2010

Friday, December 3, 2010

Harvard's Selection Process and UK Research "Careers"

One of the points that Malcolm Gladwell makes in his beautiful book Outliers is that selections made in face of over-abundance are likely to be random. He cites a study of Harvard undergraduate admissions which shows that there is a large element of chance involved -- the point being that where there is a surfeit of excellence on offer (and the queue of young hopefuls at Harvard's door probably qualifies) it is pretty meaningless to try and select the "best" using anything more high-tech than the toss of a coin.

There's an analogous randomness in the fate of UK research staff (that is, those staff employed only to do research, as opposed to academic faculty members whose remit includes research, teaching and related administrative tasks). These staff are most often known as research assistants (RAs -- a term that gives a clue as to their general status within our Universities).

The custom and practice of RA employment arose in a time when the ratio of research volume to faculty sizes was a lot lower, and it made perfect sense in that context for the position to be a staging post between postgraduate research and faculty jobs. Indeed a common longer form of the term is postdoctoral RA, and in previous periods there was a reasonably strong expectation that this was the final fence to jump on the way to the academic finishing line. There was, in other words, a built-in assumption that being an RA was just as temporary as the state of being a PhD student, for example.

Fast forward to the present (or even to 10 years ago, in fact), and there is now a significant problem with this picture: there are too many RAs for them to ever make the conversion to faculty. In my own department, for example, it is not uncommon for the number of RAs to be double that of faculty, and unless the rate of retirement of the latter leaps into the stratosphere (not impossible, I admit, given that our pensions and the HEFCE funding backbone are currently ConDemned) then most research staff can hold little hope of ever joining the grownups.

Why does this matter? Isn't this even a good thing, given that we want to select only the most committed to take responsibility for the future of research and of degree-level teaching? Mr. Gladwell's Harvard tale would tend to indicate otherwise. UK research is certainly in the world elite, consistently over-achieving relative to its size over a good range of metrics. We are not separating off the cream as much as taking a random sample, and, as modern employment law makes clear, any practice which leads to comparable employees getting different treatment for no good reason is illegitimate. Further, there is no need to claim that we are all of Harvard class to make the point: it is sufficient that a researcher is productive in their field (with all the implications of specialist knowledge and long years of training that this implies) for the waste involved in treating them as casual staff to be clear.

Above and beyond this point there are several other negative outcomes. The current system is:

Divisive. In my experience RAs don't usually feel an integral part of the departments and universities in which they work, and as a consequence their commitment to those organisations is often low.
Inefficient. Over a three year project an RA often spends the first year learning the job, the second year being productive, and the third year looking for another job.
Discouraging. In common with many of my colleagues, I spent 20 years on short-term contracts. If I hadn't been lucky enough to graduate to an open-ended contract I probably would be planning a move out of academia, and given that even now my funding is contingent on continued success within the shifting sands of the research funding agencies, I still don't feel secure.

The key point is that RAs are not temporary: as long as the volume of research being done is greater than the capacity of academic faculty we're here to stay, and in large numbers. This means that a system predicated on employment insecurity is no longer appropriate, and indeed commentators of all shapes and sizes (including former Sheffield VC Gareth Roberts) have advocated radical change of one sort or another.

What are the options? Changing the big picture of research careers requires intervention at a national level, but there are several local measures that can start to change the culture and help make Universities more attractive for contract research staff:

Making employment contracts open-ended. This doesn't magically improve job security but it does send out positive signals about our support for research as a career (and also means that responsibility for triggering redundancy moves from HR to the departments, increasing the likelihood of departments taking the issue seriously).
Setting up a buffer fund for bridging between research projects. This will necessarily be small-scale to begin with, but can serve as part of our arguments for wider changes in funding structures.
Shifting terminology away from "assistant" or "postdoc" and towards "professional researcher" and encouraging funding applications and other career development steps for contract staff.

For a longer version of this list see this discussion paper from Sheffield UCU (which also has links to related documents including the Roberts report). A good summary of the issues from a principal investigator perspective is available on the national UCU site. Time for a change?

Permalink.

Wednesday, October 20, 2010

More Clouding

When you plug your fridge into the mains electricity supply you don't worry about all the technology sitting behind the wall socket -- it just works. Cloud computing is starting to supply IT in a similar fashion. No more worrying about backups, no more wasted hours configuring a new or repaired machine -- just plug into the network, fire up your web browser and away you go.

Researchers have tougher and more specialised IT needs than most, so to realise the same ease of use that the cloud now provides for email or word processing requires work in several areas. One of these areas is to adapt existing established research tools to the cloud, and that is what we plan to do with GATE in the next period. Over the last decade GATE has become a world leader for research and development of text mining algorithms.

Text has become a more and more important communication method in recent decades. Our children's thumbs often spend half the day typing on their tiny phone keypads; our evenings often include sessions on Facebook or writing email to distant friends and relatives. When we interact with the corporations and governmental organisations whose infrastructure and services underpin our daily lives we fill in forms or write emails. When we want to publicise our work for our employer or share details of our leisure activities with a wider audience we create websites, post Twitter messages or make blog entries. Scientists also now use these channels in their work, in addition to publishing in peer-reviewed journals -- a process which has also seen a huge expansion in recent years.

This avalanche of the written word has changed many things, not least the way that scientists gather information from the experiences of their peers. For example, a team at the World Health Organisation's cancer research agency recently found the first evidence of a link between a particular genetic mutation and the risk of lung cancer in smokers. Their experiments require large amounts of costly laboratory time to verify or falsify hypotheses based on samples of mutations in gene sequences from their test subjects. Text mining from previous publications makes it possible for them to reduce this lab time by factoring in probabilities based on asssociation strengths between mutations, environmental factors and active chemicals.

A second area that has been revolutionised by the new world of text concerns a core function that commercial concerns must implement in order to stay in business. Customer relations and market research are no longer just about monitoring the goings on of the corporate call center. Keeping up to date with the public image of your products or services now means coping with the Twitter firehose (45 million posts per day), the comment sections of consumer review sites, or the point-and-click 'contact us' forms from the company website. To do this by hand is now impossible in the general case: the data volume long ago outstripped the possibility of cost-effective manual monitoring. Text mining provides alternative, automatic methods for dealing with this data.

GATE provides four core systems to support scientists experimenting with new text mining algorithms and developers using text mining in their applications:

GATE Developer: an integrated development environment for language processing components
GATE Embedded: an object library optimised for inclusion in diverse applications
GATE Teamware: a collaborative annotation environment for high volume factory-style semantic annotation projects built around a workflow engine
GATE Mímir: (Multi-paradigm Information Management Index and Repository) a massively scaleable multiparadigm index

Our plan for the next period is to work towards making use of these systems more like electric sockets and fridges!

A caveat: it is important to note that current commercial cloud offerings are not yet appropriate as a drop-in replacement for all academic computing facilities. For example, the cost of running a virtual machine on Amazon's EC2 continuously for 1 year is roughly equivalent to the cost of buying a similar machine. In the latter case the hardware may be expected to perform reliably for at least 3 years, which means that the Amazon option is only cost effective if the cost of hosting a server in your organisation is on the order of 3 times the cost of the server hardware. Careful quantification of the costs is important when moving to the cloud.

Permalink. On Blogspot.

Friday, July 9, 2010

How to Join Open Source Projects

This is rather tired ground that has been well trodden by other feet, but in the aftermath of a disagreement with one of the happy chappies from Ontotext I thought I'd reiterate a couple of home truths about open source projects and how you go about joining in. Along the way I'll also ask what "hack" means, for the benefit of those software people who've been locked in a small room without access to books or networks for the last couple of decades :-)

To start with something that should be obvious, all engineering projects of whatever type are social processes in which human factors are at least as important as technical ones. In open source this is often more important than in other areas because the people involved often give their time and expertise for free, and even when they're being paid specifically to participate there is almost always a discretionary element of their contribution (should I bother to answer this email from a complete beginner who obviously hasn't managed to find the user manual, or shall I finish work ten minutes early today?). This means that when you want to join an open source project (i.e. to become a developer, contribute code etc.) you need to show a little sensitivity and think about the needs of the project and its participants as a whole, not just your own take on the thing. I remember a particularly clear case of the opposite approach on the JavaCC project a few years ago (JavaCC is an excellent parser generator that was one of the first available for Java and is used in GATE for analysing JAPE transduction rules). Along came a new developer with some good ideas and some useful code -- which in principle was great news for everyone in the project. Unfortunately said developer jumped in with both feet, screaming abusive nonsense at the project administrators and demanding his own way at every juncture. The result? His useful code was useless and unused.

Why can't we use code from people with whom it is impossible to communicate and collaborate? Because, to paraphrase Stuart Brand, software is not something you finish but something you start -- if it is good and useful then it has a long life span, and during that span it changes and mutates and needs active support and maintenance. If we accept code into our projects from sources whose long-term commitment is questionable (and angry young men with poor collaboration skills are unpredictable in that respect) then we compromise the evolution of our systems (and sooner or later alienate our users).

On a more positive note, if you want to join an open source project here are some steps to help you start off on the right foot:

Talk to the developers. Communicate, communicate, communicate! I agree absolutely with Sussman et al. that "A computer language is not just a way of getting a computer to perform operations but rather that it is a novel formal medium for expressing ideas about methodology. Thus, programs must be written for people to read, and only incidentally for machines to execute", but this doesn't mean that the only thing you need to write is code. Get in touch with the developers as early in the process as possible, tell them what you're working on or plan to look at, and ask for their advice. Very often you'll get not just advice but active support, and the flip side is that when you produce your contribution they'll know where it comes from and be better able to judge its quality and its implications for the project as a whole.
Get to know the mechanisms in place for quality assurance (and quality control) and adopt them. If the project has a test suite you must at minimum ensure that your code doesn't break tests, and you should think seriously about writing new tests to cover all the things that you work on. Look at the documentation and write patches to cover the stuff you do. Contribute to discussions on the mailing list or forum in your area. Think hard about backwards compatibility -- has that interface you just added a method to been linked by a thousand other jars out there, and is it really worth forcing recompilation of all those systems? Don't just think about your own little patch, think about the knock-on effects on the whole system and on the ecosystem of users and developers around it.
Be humble. The reason I can write this post on a fantastic Ubuntu Debian GNU Linux system is because lots of people cleverer than me worked hard and contributed their work for the good of humanity. Even if I've invented something useful in my own little corner of computing that certainly doesn't mean that I have the right to sneer at others, even if their knowledge is less than mine in some area. Who knows what greatness their hearts and heads contain? (We are all geniuses, it is part of being human. If you don't believe me, try getting a computer to do image recognition like my 1-year old daughter!)
Be prepared to help developers when you want something integrated into the project. Most of the time your work will not be top of the todo-lists of the people you're joining; most often you're adding work to their already full plates, and you should be patient and helpful while they look at your work and figure out whether to include it, or to work towards making you a committer on the project.

So: not rocket science, just basic collaboration skills.

One of the obvious things not to do is to start throwing around pejorative terms. One of these that I find annoying is 'hack'.

"Your work is just a hack! My approach is state of the art! Thou shalt do it my way!"

Oh dear.

First, the users of software don't care. They choose one tool over another because of what they can do with it, not because of the way it conforms or otherwise to visions of elegance or correctness. Of course elegance and correctness can be factors in software performance and maintainability and so on -- but most often these qualities are subjective, particularly when applied by newcomers unfamiliar with the big picture -- which is my next point...

Second, such visions are personal, and especially so to outsiders. If you've sweated over a specific problem (in this case transducing graphs with FSAs) for years at a time then I'll listen to you tell me what is the most elegant solution with interest -- but if you haven't, I'm inclined to assume that there is likely to be stuff you don't know that may well compromise your view.

Third, if we define 'hack' as a quick or heuristic route to get something done, then why would that be a bad thing? And if you start down this route where will you end? For example, the case that lead to this post was in relation to a system that does finite state pattern matching and transduction over annotation graphs. Those of you with a background in formal languages may have already spotted the strangeness here: graphs can describe languages whose expressive power is greater than regular, which would seem to invalidate the whole idea of applying finite state techniques to them. It turns out that the data structures we're working with here (while doing information extraction over human language) have a lot of regular features, and that the indeterminacy that arises from the mismatch between regular FSAs and a graph-structured input stream are not an obstacle to our work (in fact they can sometimes be a good way to ignore ambiguities that we're not currently interested in). But: doesn't that fit the definition of a hack? From that point-of-view the whole subsystem under discussion is one big hack, which makes it even more ridiculous to criticise one approach or another as a "hack".

The moral of the story? Technology is never as important as it seems in our commodity-driven age. Better communication and collaboration skills win every time. The good news is that this is one of the best things about open source :-)

Permalink. On Blogspot.

Saturday, June 5, 2010

Data mining won't make you safer

(Wednesday 31st December 2008)

This holiday I read a fantastic novel called Little Brother by Cory Doctorow. It reminded me of how UK and US moves to collect more data about their citizens and give more powers to 'security' staff are in fact worse than useless. As someone who works in language processing I note with dismay the tendency of technologists to happily provide mining of personal data for state purposes, while cheerfully ignoring the fact that it won't make anyone any safer.

There are many reasons why invading privacy is counter-productive. Two important ones are:

the information isn't useful
state power is almost always abused

Why isn't the information useful? Imagine a method which is 99% successful at detecting anomalous behaviour and suggesting further investigation. Let's apply that method to 50 million adults in the UK, for example. That's 500,000 people who you now have to regard as suspects. In fact the accuracy of data mining in this type of case is much more likely to be around 50%, so if you collect all the data you can you'll still only know that 10s of millions of people might be suspect. Useless.

Second, security service personnel are just like everyone else: some are consciencious and some are unscrupulous. While you might just about consider it acceptable for all your personal data to be in the hands of a conscienscious, competent, well-trained and well-provisionned state employee, are we really naive enough to imagine that this covers everyone in every police force, army barracks or 'intelligence' office? Of course not; and if we were, the recent history of appalling miscarriages of justice should soon convince us otherwise.

Permalink.

Friday, May 21, 2010

Open Data at the National Archives

The GATE team, Ontotext and SSL have won a contract to help open up the UK National Archive's records of .gov.uk websites (going back through 1997 and comprising some 340 million pages).

I've been quite ignorant about this stuff until recently, and it has been a pleasure to discover that the archives and related organisations are actively pursuing the vision of open data and open knowledge. This project has taken a big step forward in the UK recently, with government funding allocated to publishing more and more material on data.gov.uk in open and accessible forms. The battle is by no means over, but I'm really looking forward to contributing in a small way to this work, and, hopefully, showing how GATE can help improve access to large volumes of government data.

We're going to use GATE and Ontotext's open data systems (which hold the largest subsets of Linked Open Data currently available with full reasoning capabilities) to:

import/store/index structured data in a scaleable semantic repository
- data relevant for the web archive
- in an easy to manipulate form
- using linked data principles
- in the range of 10s of billions of facts
make links from web archive documents into the structured data
- over 00s of millions of documents and terabytes of plain text
allow browsing/search/navigation
- from the document space into the structured data space via semantic annotation and vice versa
- via a SPARQL endpoint
- as linguistic annotation structures
- as fulltext
create scenarios with usage examples, stored queries
show TNA how to DiY more scenarios

Quoting from the proposal,

"Sophisticated and complex semantics can transform this archive... but the real trick is to show how simple and straightfoward mechanisms (delivered on modest and fixed budgets) can add value and increase usage in the short and medium terms. ... We aim to bring new methods of search, navigation and information modelling to TNA and in doing so make the web archive a more valuable and popular resource.

Our experience is that facetted and conceptual search over spaces such as concept hierarchies, specialist terminologies, geography or time can substantialy increase the access routes into textual data and increase usage accordingly."

Text processing technology is inherently inaccurate (think of how often you mis-hear or mis-understand part of a conversation, and then multiply that by the number of times you've seen a computer do something stupid!); what can we do to make this type of access trustworthy?

"Any archive of government publications is an inherently a tool of democracy, and any technology that we apply to such a tool must consider issues relating to reliability of the information that users will be lead to as a result, for example:

what is the provenance of the information used for structured search and navigation? have there been commercial interests involved? have those interests skewed the distribution of data, and if so how can we make this explicit to the user?

what is the quality of the annotation? these methods are often less accurate than human performance, and again we must make such inaccuracy a matter of obvious record lest we negatively influence the fidelity of our navigational idioms

Therefore we will take pains to measure accuracy and record provenance, and make these explicit for all new mechanisms added to the archive."

So open science (and our open source implementations of measurement tools in GATE) will contribute to open data and open government.

More open stuff.

Permalink.

Tuesday, May 4, 2010

More GATE Products Coming

Several years ago we (the GATE project, that is, not the royal "we" -- my knighthood seems to have got lost in the post for some reason) reached the conclusion that the tools that we've built for developing language processing components (GATE Developer) and deploying them as parts of other applications (GATE Embedded) were only one part of the story of successful semantic annotation projects. We like to think that our specialist open source software and our user community are the best in the world in many respects, but when we help people who are not specialists we encountered a bunch of other perspectives and problems. We also came across some hard problems of scaleability and efficiency which led us to implement a completely new system for annotation indexing (with thanks to Sebastiano Vigna and MG4J).

So, cutting to the chase, we developed a bunch of new systems and tools, partly with our commercial partners. We did this largely behind closed doors (although we did run a workshop on multiparadigm indexing at which we got a lot of useful input), partly because of our partners' requirements and partly because we wanted to minimise our support load while we ironed out the bugs in the initial versions.... which process has now run its course, and we're pleased to announce the imminent availability of lots of new stuff. Keep a watch out on GATE.ac.uk over the summer, as we'll be moving it all into our source repositories in advance of our 6.0 release in the autumn.

Enjoy...

Permalink.

Tuesday, April 27, 2010

Open Knowledge, Linked Data, Scruffy vs. Neat

Last weekend I had the pleasure of attending this year's Open Knowledge Conference. The list of good reasons for making government (and other) data open and for breaking down barriers to finding information on-line are longer than would fit in this post; one nice one that I hadn't heard before was Glyn Moody's point that Turing equivalence implies that there can be only one digital revolution, and that this in turn can prove the impossibility of preserving analogue bad habits like 'rights management'. At this point I should probably mention my employer's lawyers and what they'll do to you if you imply that I'm in favour of file sharing, but perhaps I'll just make do with a tired but accurate simile between the RIAA and those loveable old dinosaurs, dodos and other casualties of unsustainable lifestyle choices.

There was a lot of other interesting stuff being presented, including a talk by Jeni Tennison on large-scale open data from government and her work at data.gov.uk. Jeni ended her talk by saying that we shouldn't worry about proliferation of redundant and (potentially) contradictory material -- after all, this is what has happened with the web and no animals were harmed in the making, etc.

I like this point, and it chimes nicely with a move from "neat" to "scruffy" that we can observe around semantic technology in general and the semantic web in particular. The original vision published by Berners-Lee and others around a decade ago was very much inspired by Artificial Intelligence: your computer was going to book your dentist appointment on the right day to coincide with picking your mother up from the station, make sure the fridge was stocked with her favourite orange juice for later, and blah blah blah. Good stuff if you're a professor of logic computation looking for your next funding opportunity, but not really any nearer the horizon now than it was 10 (or 20 or 30) years ago.

Thankfully we've mostly woken up again, and now things are boiling down to a more practical residue, which, to paraphrase a more recent comment by Berners-Lee, is "all about the data, stupid". And this brings us back to Jeni's talk -- if we can get all those public data silos openned up and usable in the right way this will be a huge leap forward, and the fact that it will not be universally nice and neat and dressed in a shiney new bow tie is neither here nor there. Scruffy is neat in its own way.

A second thing that was interesting for me at this talk (and at OKCon more generally) was the question of data vs. content. The focus of the discussion today was very much about data in spreadsheets, relational databases and so on, and this seems to be where current success is happenning as more and more databases are being exported to variants of RDF. This must be good news for text analysis: looked at from an information extraction point-of-view, linked open data is a rich source of domain terminology (seeds for our gazetteers) and conceptual backbones (seeds for our result templates, taxonomies and ontologies). The next wave, it seems to me, is to link the linked data to all that text that's lurking in the databases telling all sorts of interesting stories -- if only we could find them.

Permalink.

Monday, February 22, 2010

Google stole my ngrams

A while ago Dave Schubmehl of Fairview pointed me to a paper by several prominent Googleers which does a nice and clear job of summarising some important lessons from the last decade of web analysis research. The upshot is that if you've a billion examples of human behaviour that pertains to your particular problem it will be a good bet to use a simple non-parametric word count model to try and generalise from that behaviour.

Absolutely true. This is, in fact, the main reason why Google was so successful to start with: they realised that hyperlinks represent neatly codified human knowledge and that learning search ranking from the links in web pages is a great way to improve accuracy.

What do we do with the cases where we can't find a billion examples? Probably we end up lashing together a model of the domain in a convenient schema language (sorry, I mean "build an ontology"), grubbing up whatever domain terminologies and so on that come to hand, and writing some regular expression graph transducers to answer the question.

So: we're not trying to replace Google. We're not applicable to every problem everyone has ever had with text ("Not every problem someone has with his girlfriend is necessarily due to the capitalist mode of production" -- Herbert Marcuse). But neither is Google going to pop round to your office next Tuesday and help you build an ngram model of a couple of billion user queries from their search logs to help you figure out why your customers hate the latest product release.

There's not really a competition here, the approaches are orthogonal.

Permalink.

Friday, February 12, 2010

Cloud Computing, GATE and Text Processing

When a new thing comes along in computing the first thing that happens is that a small and exclusive set of nerds like me get all excited. If the excitement seems likely to relate to the real world in any fashion that might actually generate someone somewhere some money (or can be spun as something that might do so) then the next thing that happens is that the marketing departments of 1001 IT corporations jump in with both feet and start generating acre after acre of turgid prose about how their aged and creaking product line is actually a prime example of Phenomenon X, the Bright New Thing of Computing.

So it has been with software "in the cloud", which is, it turns out, actually quite a good idea in various ways (setting it apart from most new trends in IT). What does the Cloud Computing commonly refer to (now that the sound and fury of the marketing teams has had a chance to settle a little)? Three things:

software as a service (SaaS), for example Google Docs or SalesForce.com
platform as service (PaaS), for example Google App Engine
infrastructure as service (IaaS), for example Amazon Web Services and most famously their Elastic Compute Cloud (EC2 -- which probably did most to popularise the term in the recent period)

These three now consitute the new wave: they are one of the main tracks that Google is betting on (SaaS and PaaS), what Amazon continues to succeed with (IaaS), and the grist for a hundred new startup mills (from specific applications like searching US campus sites to infrastructural help for cloud developers).

What does it have to do with GATE? IaaS is particularly well-suited to hosting text processing, which is typically bursty in its computational cost and therefore ill-suited to fixed infrastructure. SaaS is great for the provision of large web applications that are complex to install and maintain (like GATE Teamware). Hopefully this and other cloud offerings will be available on GATECloud.net in the not too distant... so watch this space!

Permalink.

Tuesday, February 9, 2010

Certifiable GATE gurus wanted.

In my previous post I described how we came to start taking our user community more seriously again; in the first part of 2010 the effect of this turn has been that the world and her dog seem to be beating a path to our door with requests for technical support, training, bespoke development and/or access to our latest prototypes. In fact it is proving difficult to keep up with demand, so: if you're a GATE expert how about getting certified and taking on some of the work with us? If you have a good knowledge of one or more part of GATE (and/or related application domains), please get in touch. (We promise not to tell anyone that you're certifiable :-) .) Permalink.

I love GATE users (though I couldn't eat a whole one).

Users. A bit of a nuisance. They insist on asking questions, testing limits, finding bugs. Around 5 years ago, after something like a decade of giving away software, the GATE team felt very like our old systems administrator, who had a habit of saying "the only secure network is one without any computers attached": we knew that our user community was a good idea in principle, but we really rather wished they'd all leave us alone. In fact we did our best to discourage GATE users: we stopped doing regular releases, we ignored the mailing list, and if we could have figured out how to take the thing out in the woods and bury it under a tree we probably would have.

We failed: GATE refused to die, people obstinately continued to use it, and, as we used it ourselves for all sorts of projects, more and more features were added, quality and functionality improved, and every time we decided it was all over someone would turn up with a pile of cash and a novel problem. So we conceded defeat and resolved to succeed. I think.

This is all a long-winded way of explaining our shift in emphasis over the past year or so: we are introverts no longer, but happy and well-adjusted user-friendly liveware. Text processing for ever! Forwards to world domination comrades! Oops, wrong blog.

So now we're back to actively supporting our users and growing our community. We've upgraded the documentation, we're running regular training weeks and developer sprints, and we've built up several new products and services around the core GATE code to cater for more of the cases we've seen of people trying to deploy text processing over the years (15 of which, incredibly, have passed under the bridge since we first set metaphorical pen to digital paper for GATE version 0.1). We've also revamped the website and no longer look like something that might have been produced at CERN circa 1995.

So far the response has been quite astonishingly positive... so perhaps users aren't such a bad thing after all.

Permalink.

Computing Text

Friday, December 3, 2010

Harvard's Selection Process and UK Research "Careers"

Wednesday, October 20, 2010

More Clouding

Friday, July 9, 2010

How to Join Open Source Projects

Saturday, June 5, 2010

Data mining won't make you safer

Friday, May 21, 2010

Open Data at the National Archives

Tuesday, May 4, 2010

More GATE Products Coming

Tuesday, April 27, 2010

Open Knowledge, Linked Data, Scruffy vs. Neat

Monday, February 22, 2010

Google stole my ngrams

Friday, February 12, 2010

Cloud Computing, GATE and Text Processing

Tuesday, February 9, 2010

Certifiable GATE gurus wanted.

I love GATE users (though I couldn't eat a whole one).

Share

Hamish Cunningham

Blog Archive

Friday, December 3, 2010

Wednesday, October 20, 2010

Friday, July 9, 2010

Saturday, June 5, 2010

Friday, May 21, 2010

Tuesday, May 4, 2010

Tuesday, April 27, 2010

Monday, February 22, 2010

Friday, February 12, 2010

Tuesday, February 9, 2010

Share

Hamish Cunningham

Subscribe To

Blog Archive