Skip to:

semantic web

World domination through machine learning: a review of The Master Algorithm

Pedro Domingos likes big ideas. He sets out to describe to us how computers can write their own programs. For example, there is the well-established case of handwriting recognition. This is a form of machine learning in which the computer is provided with sufficient examples (and a training set) to enable the machine to learn to do something. If you show the machine the number “9” written enough ways, the machine eventually becomes as good or even better than a human at recognising a handwritten “9”.

Unfortunately, he alternates between very sensible and clear description like this, and sweeping optimistic generalisations. Mr Domingos is in no doubt who the new masters of the world are going to be. In his potted description of commerce, he describes the how “the progression from computers to the Internet to machine learning was inevitable ... once the inevitable happens and learning algorithms become the middlemen, power becomes concentrated in them.” In fact, there is no future for any company without using machine learning: “a company without machine learning can’t keep up with one that uses it ... businesses embrace it because they have no choice.” That’s a very stern conclusion!

How TrendMD uses collaborative filtering to show relatedness

TrendMD is (as its website states) “a content recommendation engine for scholarly publishers, which powers personalized recommendations for thousands of sites”. An interesting blog post by Matt Cockerill of TrendMD (published February 2016) claims “TrendMD’s collaborative filtering engine improves clickthrough rates 272% compared to a standard ‘similar article’ algorithm in an A/B trial”. That sounds pretty impressive.

Getting a feel for sentiment analysis

An excellent session of the London Text Analytics Group (March 14) contrasted two approaches to sentiment analysis: one proudly (and publicly) ditches grammar, while the other uses grammar to disambiguate content. Both approaches made ambitious claims for their software; which is the best approach?

Stephen Pulman of TheySay, a start up from the University of Oxford, had the more traditional approach.  He pointed out that taking individual words by themselves can lead to great confusion. Just assessing whether something is positive or negative is not so simple: “Bacteria” is negative, and “kill” is negative, but “kills bacteria” is positive.  More complex still, the phrase “never fails to kill bacteria” is highly positive.  A bag-of-words approach is unlikely to pick up all these distinctions.

Single Figure Publications and nano-publications

Single Figure Publications is an interesting idea by William Mobley published in F1000Research. F1000Research is “a publishing platform offering immediate publication of posters, slides and articles with no editorial bias, [but with] transparent peer review” Mr Mobley proposes in his editorial that Single Figure Publications (SFP) should be a new format of short text pieces, shorter than a scholarly article, and, tantalisingly, close to machine readable. What does he mean by this? 




Still more ontology definitions

Regular readers of this blog will notice an ongoing series of definitions of the term "ontology". Here is another one, by Tom Gruber, dating from 2009 (in the Encyclopedia of Database Systems, Springer-Verlag, 2009).

an ontology defines a set of representational primitives with which to model a domain of knowledge or discourse

The definition may be perfect, but as a description of a concept in terms that an untrained reader might understand, it scores about zero.