Skip to:

text analytics

Review of Tom Reamy, Deep Text

 I don't think the title  “Deep Text” does this book any favours - a more accurate description might be “Text Analytics within the Enterprise” – less catchy, but certainly more intelligible, and more indicative of what this book covers. From the title, you might think this is yet another business book inventing a catchphrase and spinning the idea out to 220 pages. In reality this is a detailed and thoughtful overview of the use of text analytics for content-based organizations, written by a highly experienced practitioner.  

Who is this book for? It is aimed at an audience that is involved in making business decisions (which means investment decisions) but that also needs to understand something about the technology involved. Large organisations will have senior management who would not open a book of this kind; very small organisations will not have the resources to build anything. Somewhere in the middle is the organisation Reamy is aiming at: trying to make sense of new technology without being able to turn to the resources of a research or semantic team.  It is aimed at a business market, but there are references throughout the book on good IT practices, such as lean development, and build to fail models.

What makes the book readable is that Tom Reamy isn’t afraid to speak his mind. While most consultants have spent years learning to bite their tongue and provide the advice that the client asks, Reamy states in no uncertain terms what he thinks has worked – and what hasn’t. For example, he is clear that “most metadata projects – particularly asking authors to add keywords to documents as they publish them into content management systems – have been failures.” That’s a bit of an indictment of a process that has been undertaken by many publishers, but it is today quite widely agreed that the result is no more than a folksonomy.  But, Reamy continues, “the other component that was supposed to improve search is adding taxonomies to the mix. I have to admit that I used to believe that this was the best answer, and spent a few years developing taxonomies for organizations which, while they helped somewhat, were rarely worth the effort and time.” You have to admire the author’s honesty, as he goes on to clarify: “The basic problem was not with the taxonomy, but with trying to apply the taxonomy to documents, in other words, manual tagging with all its well-known problems.”

What can machines discover from scholarly content?

Just as you thought that everything was known about the academic user journey, a workshop comes along (the WDSM Workshop on Scholarly Web Mining, SWM 2017, held in Cambridge, February 10 2017) that presents a whole new set of tools and investigations to consider.

It was a rather frantic event, squeezing no fewer than 11 presentations into a half-day session, even if the event took place in the sumptuous and rather grand surroundings of the Council Chamber in the Cambridge Guildhall. Trying to summarise all 11 presentations would be a challenge; were there any common areas of inquiry?

World domination through machine learning: a review of The Master Algorithm

Pedro Domingos likes big ideas. He sets out to describe to us how computers can write their own programs. For example, there is the well-established case of handwriting recognition. This is a form of machine learning in which the computer is provided with sufficient examples (and a training set) to enable the machine to learn to do something. If you show the machine the number “9” written enough ways, the machine eventually becomes as good or even better than a human at recognising a handwritten “9”.

Unfortunately, he alternates between very sensible and clear description like this, and sweeping optimistic generalisations. Mr Domingos is in no doubt who the new masters of the world are going to be. In his potted description of commerce, he describes the how “the progression from computers to the Internet to machine learning was inevitable ... once the inevitable happens and learning algorithms become the middlemen, power becomes concentrated in them.” In fact, there is no future for any company without using machine learning: “a company without machine learning can’t keep up with one that uses it ... businesses embrace it because they have no choice.” That’s a very stern conclusion!