Skip to:

Digital humanities: more than just text mining

A recent book by Matthew Jockers, Macroanalysis, outlines an approach to digital humanities based on what is usually referred to as text mining. I can't help feeling that the "macroanalysis" approach, which is very similar to that of Franco Moretti's "distant reading" (from his book of the same title), looks at only one aspect of digital humanites, which seems to get all the attention while another important aspect is ignored.


A genuine XML workflow for journals

The terms "XML workflow" and "XML first" are used so frequently that it is as if the simple repetition of the terms provides sufficient proof that what is claimed to be happening is actually taking place. Many workflows that claim to be XML first do not provide full round-tripping of the content, and certainly not at the same time being fully compliant with the industry-standard DTD. At a recent presentation by Rave Technologies (London, 19 November) a genuine 100% XML workflow was demonstrated for journals, and it was impressive in several ways.

AccountingWEB: a model community

AccountingWEB has been running for some 15 years, which makes it one of the longest-established communities on the Web. I'm not even sure there was a Web 15 years ago. And the slick website oozes confidence. Simply reading the numbers of reads or comments for content on the home page makes you realise this is a thriving site: over 8,000 reads for "the worst mistake accountants make" (strange that such a story should be so popular with accountants), 7,700 reads for a new article about an accountant fined by his professional association for abusing HM Revenue & Customs officials.

PatientsLikeMe: how good a community site?

A site entitled PatientsLikeMe states its goal very clearly - this is a site for sufferers to share experiences and views, and with 240,000 registered users (or 220,000, the site lists two different totals) it clearly meets a need. How does it rank as an example of a community site?

<--break->I started by looking for a condition, which is where I imagine most users would start. I keyed Alzheimer's, and the resulting screen started with a definition of the disease, followed by a table:

Table of symptoms

Unravelling Ravelry: how to build an online community

There can never have been a more appropriate name for a website. Ravelry can refer not only to the practice of knitting (of which this site must be the definitive community) but also to the astonishingly intricate structure of the site. Ravelry comprises not one but five interlinked databases, covering pretty much the entire activity of a practising knitter. It's a very impressive capture of the activity of knitting - not just buying things, which most community sites would start with, but also evaluating patterns, dreaming about projects, and even remembering which needles you own.


Safari Flow - can you improve on Safari?

O'Reilly's new initiative, Safari Flow, which is currently available in beta form, is a timely prompt to review the success, and the qualities, of its elder brother, Safari Books Online. For several years Safari Books Online has been a great exemplar of digital publishing. For some users, including me, it represented a far more fundamental shift to digital delivery than the later, and seemingly more successful, rise of e-books.


The Accidental Taxonomist

Hedden, H., 2010. The accidental taxonomist, Medford, N.J.: Information Today.

Heather Hedden has written an excellent introductory manual for anyone involved in setting up, running or expanding a taxonomy or thesaurus. Unlike many books on the subject, this is one for the practitioner, based on lots of practical experience — as Patrick Lambe describes it in his foreword, “this is taxonomy from 100 feet”.

Still more ontology definitions

Regular readers of this blog will notice an ongoing series of definitions of the term "ontology". Here is another one, by Tom Gruber, dating from 2009 (in the Encyclopedia of Database Systems, Springer-Verlag, 2009).

an ontology defines a set of representational primitives with which to model a domain of knowledge or discourse

The definition may be perfect, but as a description of a concept in terms that an untrained reader might understand, it scores about zero.