Skip to:

How TrendMD uses collaborative filtering to show relatedness

TrendMD is (as its website states) “a content recommendation engine for scholarly publishers, which powers personalized recommendations for thousands of sites”. An interesting blog post by Matt Cockerill of TrendMD (published February 2016) claims “TrendMD’s collaborative filtering engine improves clickthrough rates 272% compared to a standard ‘similar article’ algorithm in an A/B trial”. That sounds pretty impressive. What is collaborative filtering and is it improving TrendMD’s results?

 

In the experiment described in the blog post (February 2016), TrendMD was compared to PubMed’s similar articles feature, and using a controlled random test (users didn’t know which method was being used to create the related articles displayed) the TrendMD was shown to increase clickthroughs quite dramatically.

It is clear from the graphic that click-throughs increased, but were they to the most relevant articles? The blog compares PubMed “similar articles” feature (not actually related citations, as the graphic states), which are generated using word counts, and explains that collaborative filtering makes use of click data to predict which links are likely to be most useful.

The article states:

it is clear that the most useful further reading links are not always the most semantically related. Indeed, if articles are too closely related, there may be diminishing returns from discovering more articles in precisely the same niche. By analogy, if I’ve just bought a coffee maker, I probably don’t want to buy another coffee maker, but I may well be interested in buying coffee beans, or descaler.

This comment appears to muddle very different ways in which the Internet operates.  Academic researchers reading about (say) kidney disease are very interested in more articles about kidney disease. It may not be what your or I would click next, but we aren’t researchers. They aren’t in the least interested in reading about things that sufferers of kidney disease might be interested in, such as pain relief. In other words, that word “useful” begs a question – useful for who? Did anyone ask researchers if the links they clicked to were more or less useful? 

Comments

Hi Michael, To address your point: "Academic researchers reading about (say) kidney disease are very interested in more articles about kidney disease." Absolutely, and the recommendations provided by TrendMD will certainly tend to be in the same general topic area. But a key role played by article recommendations is to allow the reader to explore 'around' a topic - i.e. to find and explore interesting adjacent areas of research. If articles are clustered too strictly by 'similarity', then the reader can easily find themself stuck on an 'island' of closely related articles, all saying much the same thing, and linking to each other. But by using collaborative filtering, we focus instead on the question 'once a reader has read this article, what directions of exploration are likely to be of most interest to them'. The main signal that the algorithm gets as to whether an article recommendation is 'interesting' is whether the reader clicks on it, but TrendMD is also able to use the information how long the reader spends on the article which they have clicked on, which is another powerful indicator of usefulness. On average, the longer a reader spends on the article, the more interesting it is likely to have been.