Skip to:

Publishers attend their annual Rave

Yet again Rave Technologies assembled an impressive cast of speakers for their annual publishing event (London, October 2017). Despite the event being managed by a vendor, Rave resists any attempt to turn it into a corporate showcase.

This year the theme was broadly based around innovation, specifically digital innovation – you could ask if there is any innovation that is not digital, these days, but of that more below.

Review of Tom Reamy, Deep Text

 I don't think the title  “Deep Text” does this book any favours - a more accurate description might be “Text Analytics within the Enterprise” – less catchy, but certainly more intelligible, and more indicative of what this book covers. From the title, you might think this is yet another business book inventing a catchphrase and spinning the idea out to 220 pages. In reality this is a detailed and thoughtful overview of the use of text analytics for content-based organizations, written by a highly experienced practitioner.  

Who is this book for? It is aimed at an audience that is involved in making business decisions (which means investment decisions) but that also needs to understand something about the technology involved. Large organisations will have senior management who would not open a book of this kind; very small organisations will not have the resources to build anything. Somewhere in the middle is the organisation Reamy is aiming at: trying to make sense of new technology without being able to turn to the resources of a research or semantic team.  It is aimed at a business market, but there are references throughout the book on good IT practices, such as lean development, and build to fail models.

What makes the book readable is that Tom Reamy isn’t afraid to speak his mind. While most consultants have spent years learning to bite their tongue and provide the advice that the client asks, Reamy states in no uncertain terms what he thinks has worked – and what hasn’t. For example, he is clear that “most metadata projects – particularly asking authors to add keywords to documents as they publish them into content management systems – have been failures.” That’s a bit of an indictment of a process that has been undertaken by many publishers, but it is today quite widely agreed that the result is no more than a folksonomy.  But, Reamy continues, “the other component that was supposed to improve search is adding taxonomies to the mix. I have to admit that I used to believe that this was the best answer, and spent a few years developing taxonomies for organizations which, while they helped somewhat, were rarely worth the effort and time.” You have to admire the author’s honesty, as he goes on to clarify: “The basic problem was not with the taxonomy, but with trying to apply the taxonomy to documents, in other words, manual tagging with all its well-known problems.”

Sherlock Holmes and machine learning

Some people would claim there is an uncanny parallel between the methods used by Sherlock Holmes and machine learning. In both cases an apparently insoluble problem is suddenly resolved using nothing but careful assessment of the available evidence. In both cases we are startled that it was possible to find a solution when we, the readers, had no idea of it. So I was intrigued when Phil Gooch presented an analysis of a Sherlock Holmes story to discover the most important details. Could machine learning, like Holmes, solve a crime mystery? Phil’s presentation, at the excellent London Text Analytics Meetup Group, lived up to expectations, even if he didn’t quite demonstrate a machine solving the mystery. Instead, and quite an achievement in its own right, he coded the analysis in front of us while he was presenting. I’ve been to presentations where there is a live demo, but I’ve not seen coding on the screen (and taking suggestions for alterations) as the main part of the presentation. All credit to Phil, then, for coolness!

What can machines discover from scholarly content?

Just as you thought that everything was known about the academic user journey, a workshop comes along (the WDSM Workshop on Scholarly Web Mining, SWM 2017, held in Cambridge, February 10 2017) that presents a whole new set of tools and investigations to consider.

It was a rather frantic event, squeezing no fewer than 11 presentations into a half-day session, even if the event took place in the sumptuous and rather grand surroundings of the Council Chamber in the Cambridge Guildhall. Trying to summarise all 11 presentations would be a challenge; were there any common areas of inquiry?

Pages