Skip to:

book reviews

Review of Tom Reamy, Deep Text

 I don't think the title  “Deep Text” does this book any favours - a more accurate description might be “Text Analytics within the Enterprise” – less catchy, but certainly more intelligible, and more indicative of what this book covers. From the title, you might think this is yet another business book inventing a catchphrase and spinning the idea out to 220 pages. In reality this is a detailed and thoughtful overview of the use of text analytics for content-based organizations, written by a highly experienced practitioner.  

Who is this book for? It is aimed at an audience that is involved in making business decisions (which means investment decisions) but that also needs to understand something about the technology involved. Large organisations will have senior management who would not open a book of this kind; very small organisations will not have the resources to build anything. Somewhere in the middle is the organisation Reamy is aiming at: trying to make sense of new technology without being able to turn to the resources of a research or semantic team.  It is aimed at a business market, but there are references throughout the book on good IT practices, such as lean development, and build to fail models.

What makes the book readable is that Tom Reamy isn’t afraid to speak his mind. While most consultants have spent years learning to bite their tongue and provide the advice that the client asks, Reamy states in no uncertain terms what he thinks has worked – and what hasn’t. For example, he is clear that “most metadata projects – particularly asking authors to add keywords to documents as they publish them into content management systems – have been failures.” That’s a bit of an indictment of a process that has been undertaken by many publishers, but it is today quite widely agreed that the result is no more than a folksonomy.  But, Reamy continues, “the other component that was supposed to improve search is adding taxonomies to the mix. I have to admit that I used to believe that this was the best answer, and spent a few years developing taxonomies for organizations which, while they helped somewhat, were rarely worth the effort and time.” You have to admire the author’s honesty, as he goes on to clarify: “The basic problem was not with the taxonomy, but with trying to apply the taxonomy to documents, in other words, manual tagging with all its well-known problems.”

World domination through machine learning: a review of The Master Algorithm

Pedro Domingos likes big ideas. He sets out to describe to us how computers can write their own programs. For example, there is the well-established case of handwriting recognition. This is a form of machine learning in which the computer is provided with sufficient examples (and a training set) to enable the machine to learn to do something. If you show the machine the number “9” written enough ways, the machine eventually becomes as good or even better than a human at recognising a handwritten “9”.

Unfortunately, he alternates between very sensible and clear description like this, and sweeping optimistic generalisations. Mr Domingos is in no doubt who the new masters of the world are going to be. In his potted description of commerce, he describes the how “the progression from computers to the Internet to machine learning was inevitable ... once the inevitable happens and learning algorithms become the middlemen, power becomes concentrated in them.” In fact, there is no future for any company without using machine learning: “a company without machine learning can’t keep up with one that uses it ... businesses embrace it because they have no choice.” That’s a very stern conclusion!

Sparklines: Beautiful Evidence or Muddled Graphics?

Do you understand this graphic? It is an example of a sparkline, by Edward Tufte. Tufte was, if not the originator of sparklines, one of its earliest advocates. He wrote about them in his book Beautiful Evidence (2006); he defines sparklines as “small, intense, wordlike graphics, embedded in the context of words and numbers”. Tufte’s ideas were very influential and were taken up by Microsoft in their 2010 release of Excel. But I don't agree with him about sparklines. 

The seven classic books about computing

Library Image by Dmitrij Paskevic (CC0)


The books I’m thinking about might not be quite as old as the ones in the photo, but at least some of them will have I hope the patina of age. One of the slightly poignant aspects of working in computing is that the skills of many gifted individuals working in IT is lost within a few years – even if they commit their thoughts to a book. Unlike a novel or a work of art, the solutions and tools created by (say) Kernighan and Ritchie in The C Programming Language will one day be completely forgotten, as new languages and new processes replace them. That seems a shame.

The Lean Startup, or how the best entrepreneurs don’t listen to customers


"We really did have customers in those early days— true visionary early adopters— and we often talked to them and asked for their feedback. But we emphatically did not do what they said." This startling admission appears in the first page of The Lean Startup: How Constant Innovation Creates Radically Successful Businesses, by Eric Ries (2011). Should we congratulate him on his fresh approach, or laugh at him for missing the only true guidance that product development can trust, that is, the customer? The truth is somewhere between the two. Ries has written a book that some have labelled a key management text of the 21st century, while to a more jaundiced eye it reads like so many business books that come from America, combining evangelical fervour with rather dubious and questionable statements that have not been tested.

Classic computing titles: The Inmates are running the Asylum

Whatever they teach you on a computing degree, it doesn’t seem to be sufficient to create an effective web site. One of the paradoxes of the modern world is that we are surrounded by IT, and yet those who have studied IT formally seem often incapable of creating software that genuinely meets our needs – a glance at a few developer-led websites is often sufficient to demonstrate that. Alan Cooper’s book, The Inmates are Running the Asylum, although published some fifteen years ago, provides an idea why that might be. The author himself has a highly respectable track record as a developer – he was responsible for Visual Basic, so he can claim some understanding of the programming process, and of the programming mentality. So if he says that programming alone is not sufficient, then you are right to take notice. Everyone with an involvement in IT, whether as a user, or as an information professional as a sponsor and influencer could benefit from his assessment of how programmers think.


The Accidental Taxonomist

Hedden, H., 2010. The accidental taxonomist, Medford, N.J.: Information Today.

Heather Hedden has written an excellent introductory manual for anyone involved in setting up, running or expanding a taxonomy or thesaurus. Unlike many books on the subject, this is one for the practitioner, based on lots of practical experience — as Patrick Lambe describes it in his foreword, “this is taxonomy from 100 feet”.