Skip to:

In praise of rule-based systems

The London Text Analytics Meetup never fails to provide stimulating talks. Even if you disagree with some of the ideas, or even most of the ideas, you engage with the talk and with the speaker. Tonight’s talk (March 6, 2018) was no exception, Michael Barclay provided an overview of his rule-based system (something he rather provocatively emphasised in his presentation, since rule-based systems are considered old-fashioned today).

What he has built is impressive indeed. It is nothing less than a rule-based system for analysing natural language and extracting relationships from it. As a final flourish, when he completed his talk and took questions, he left his system running to parse the first twelve chapters of Jane Austen’s Emma.  After the questions, he showed all the things his system had discovered about Emma, displayed as a graph.

The body of his talk comprised taking English sentences and seeing just how complicated it is for a machine to correctly parse it to discover the subject. Of all the components of a sentence, you would think the subject is the most obviously identifiable, yet he cleverly used increasingly complex examples – and in the spirit of the group, even showed us sentences where his parser failed.

From memory, the sentences reached levels of complexity such as: “David’s uncle’s fish and chips were greasy”, and pointed out that we expect a machine to identify that both the fish and the chips were greasy.

He contrasted his rule-based system with neural networks, but since he did not give any examples of a neural network, it was something of a one-sided presentation. He stated that rules can be taught, but also pointed out that many of the rules of mathematics are also taught in school, while not assuming any understanding of the underlying thinking.

After the presentation I expected many people would question using a rule-based approach. Instead, there were some equally fundamental questions. Why build a complete parser like this when Stanford has already built one? Why have a system that requires such detailed technical understanding to build and maintain the rules? Why try to capture all the syntactic rules of English when for practical purposes a much smaller subset might be sufficient? He showed, for example, a dictionary definition of the term “eye”. It included over 20 different senses. Of course for most practical purposes there is no question of all 20 meanings being possible; the goal might be simply to distinguish two or three meanings.

A few seconds’ browsing the Internet reveals the fundamental distinction:

Rule-based systems are examples of "old style" AI, which uses rules prepared by humans. Neural networks are examples of "new style" AI, whose mechanism is "learned" by the computer using sophisticated algorithms … While in some cases rule-based systems could be effective, the general trend in AI has been to switch to machine-learning algorithms such as neural networks, due to their much better performance. [Yuval Filmus, comment on Stack Exchange]

It wasn’t possible from this talk to assess whether a rule-based system runs faster or not. In my view, the sheer number of rules – and their sometimes unpredictable results  - makes a rule-based system very dependent on expert input. But what both systems have in common is some kind of inference. The results we saw were undeniably impressive, although when asked for a business case, Michael Barclay mentioned analysing the discourse of care home staff for signs of problems that needed to be investigated further. Such a small-scale application could be run quite effectively just by searching for a few keywords in the text, e.g. “crash”, “cry”, “unhappy”, for starters.  Nonetheless, what Michael Barclay demonstrated was hugely impressive; it just needs, I feel, a suitable use case to draw on what he has built.