Skip to:

Enabling Digital Publishing

Content Enrichment for Digital Publishing

As digital content becomes the norm, rather than the exception, the expectations of what can be done with it, and hence the processes to create it become more complex. Publishers and information owners need guidance on implementing digital publishing (to websites, ebooks, even to print!). The requirements are different for professional and academic publishers, for trade publishers, and for B2C information providers. How can I help? What is described here is not rocket science. The headings below divide the digital publishing process into twelve distinct but linked stages. I have provided help on each of them at various timesof them (click on each one to learn more):

Talking to users and stakeholders reveals details of the current system, but equally enables users to state what they think could be improved. By interviewing users across the system, an overall picture can be obtained. The goals of the interviews are:
  • To understand how current software tools are used
  • To understand how the current production processes work in practice.
  • To identify possible areas for improvements.

Deliverable: A summary document that explains in more detail the current publishing infrastructure, in sufficient detail to be able to recommend specific improvements.

All too often the needs of users are assumed. Yet the process of technology change has a far-reaching effect on the entire information industry. Information was paid, now it is free, and in some cases (such as the Financial Times) paid again, or a mixture of the two. Do the users find what they want? Do we know what they are looking for? Surprisingly, some publishers are not very clear how and where they could reuse information they have already compiled. Other publishers find surprises in the way their information is accessed. A study of 20 pensions lawyers all using the same software revealed surprising differences in information retrieval and in understanding of the product - surprising, that is, to the publisher.
The content audit follows the information gathering stage and forms a corollary to it. Although it seems obvious, a check of where and how content is held can reveal problems with the current system, as well as being an essential pre-requisite for any change to the system. The content audit counts the current content totals, as well as capturing samples of existing content from it. This exercise reveals:
  • Where and how content is held
  • An accurate estimate of repository size requirements and likely growth.
  • Representative samples of content.
  • Details of content structure that the content owners may not be aware of.
  • Where content has multiple uses.
  • List of inputs and outputs (e.g. to third-party delivery platforms, e-book platforms, customer interfaces).
  • Where metadata originates or is amended.

The content audit makes possible the task of making content searchable across different types, for example books as well as journals. Deliverable: a spreadsheet and explanatory document.

It is rare for any publisher to have processes that are entirely unique. Typically publishers share similar issues, and potentially can share similar solutions. A remarkable feature of the publishing industry is the extent to which peers are happy to share best practice information and to demonstrate existing systems. A report that looks at four or five comparable organisations and identifies best practice can prove highly cost-effective for any organisation contemplating a change to their process. Before committing any substantial resources, then, an organisation can have an informed understanding of the real cost, and the benefits and drawbacks of different approaches. Most importantly, this methodology addresses the problem that potential stakeholders in any new processes may not be aware of the capabilities of a new system without having seen it working, so are not in a position to see the benefits of changes to the existing processes. The resulting report identifies where the centres of excellence are, together with contacts for further exchange of information.
What do we do well? How can we improve? Where are we wasting money? These are the fundamental questions for any publisher. Typically based on the information gathering stages above, I create a set of recommendations for an organisation’s information creation and delivery. The proposals comprise:
  • Suggested content structure
  • Benefits and drawbacks of potential structures.
  • Implications for workflow
  • Extent to which standard solutions or customised solutions need be found.
  • Suggestions for phased implementation
  • Skills required to manage the new structure
  • Outline budget to carry out changes

It is possible to create proposals without first gathering information or auditing the content, but this process is inevitably less precise.

It could be said that providing links to other content it what made the Web grow in the first place, but even today many of the promises of linking information are only just starting to be realised. It could be said that digital content today has the greatest value insofar as it can be linked to other, related content. Nonetheless, creating links can be an expensive business. Should it be done manually or automatically? Does it need subject experts at all? How much will it cost and can I add more links later? This stage examines the current state of the art for creating and manipulating linked date from publishable content. It comprises:
  • Suggested tools for validating content.
  • Ways of linking content internally.
  • How to linking data to the outside world.
  • Reusing content in different formats.
  • Outline costs and idea of phased development.
  • A suggested proof of concept.
If the organisation considers that a change in the current system is warranted, then a requirements gathering process will be done to enable suitable tenders to be submitted. Requirements would be based around use cases, and would state the requirement rather than a specific technical means by which the problem could be solved. Deliverable: Requirements document (Request for Proposal) for a vendor selection process.
When a new system is envisaged, whether a new website or a new underlying content repository and workflow, the most reliable process for identifying a supplier is via a requirements specification: a formal description of the requirements, expressed as a series of use cases and items. Several procurement methods are possible, but typically a long list of suitable vendors is identified, from which six or seven vendors are invited to respond to the Requirements Specification. The suppliers' responses are then evaluated in a tabular form to create a shortlist. Up to three suppliers would then be invited to present their solution in person. Systems selected have ranged from £5,000 to £3m, following EU procurement rules where appropriate. The tasks below are for a closed bidding process. This stage covers:
  • Identifying suitable suppliers that have the capability to meet the requirements.
  • Taking up customer references and reference site visits.
  • Scoring vendor proposals and presentations to an agreed set of criteria.
  • Ensuring a level playing field for all vendors: clarification of questions and running an FAQ list to ensure no vendor receives privileged information exclusively.
  • Face-to-face presentations by proposed suppliers, who will be asked to carry out an agreed list of tasks.
  • Risk analysis of suppliers.

This phase ends with an agreed supplier who will deliver the solution that best meets the requirements, to an agreed cost and timescale.

Once a vendor has been chosen, it is important to ensure the relationship is clearly defined. This stage involves:
  • Checking the vendor and system integrator credit status.
  • Negotiating the contract with the vendor to protect the publisher.
  • Reviewing statements of work from the system supplier.
  • Ensuring any custom work that is commissioned is ring-fenced to ensure the deliverables are clearly stated and demonstrable.

Deliverable: agreed contract and statement of work for signature.

More information projects fail at the implementation stage than at any other stage. While it may seem simple to trust the developer to implement the system specified, the devil is in the detail. The implementation of a new system throws up many questions and decisions, not least about prioritisation. A wrong decision during the implementation phase can add several months to the project timescale. Hence an independent project manager that represents the publisher’s interests is required. The tasks involved here comprise:
  • Identifying appropriate tests to demonstrate fitness for purpose.
  • Managing the transfer of technical skills to the in-house users.
  • Ensuring that the organisation understands what the system can do, by providing demonstrations and workshops.
  • Ensuring any training by the system integrator or vendor meets user needs.
  • Ensuring hosting is set up in a cost-effective way that is commensurate with the system requirements.
  • Identifying and recommending external expertise as required (e.g. if changes to a schema or DTD are envisaged).

It is highly recommended that an external project manager with publishing knowledge acts for the publisher. This is because:

  • The project manager should have sufficient technical knowledge to understand what is being discussed in the project communication.
  • The project manager is familiar with the software development process.
  • The project manager is external to the organisation and so can provide objectivity: he is outside the existing business processes.
  • The project manager provides a single point of contact with the system integrator.

Any project management methodology can of course be aligned to the publisher’s existing systems: project style can be waterfall or Agile, but whichever methodology is implemented the overall goal is the same: to ensure that requirements are captured accurately, costed clearly, and implemented in a clear order of priority. Agile does not mean less specification; it may mean more. Whichever methodology is adopted, I provide the following methodology:

  • A way of ensuring senior management know what is happening.
  • No use of technical terminology that obfuscates understanding.
  • Single-page reports to the project board every two weeks, including budget updates.
  • Genuine and regularly updated risk assessment and mitigation.
  • And, at the end, a set of lessons learned.
A site in development requires a clearly collated set of criteria by which it is to be judged. In addition, any site requires a check of basic functionality, including different browsers, different devices (laptops, tablets, desktops, smartphones, and so on). Equally, it’s no use the system providing results if those results don’t appear quickly enough on the user’s screen. A clear set of system tests is provided and the functionality thoroughly tested. Usability testing can effectively be carried out on new systems in development. This can involve both quantative and qualitative tests, as well as observing the software in use. This involves the observation of users without them necessarily being aware they are being tested. In this way the performance and effectiveness of websites can be evaluated.
The questions many analytics suites answer may be useful but not always relevant. All too often site owners start investigating solutions for problems that are not appropriate to their own system. No analytics package can tell you if the user has found what they were looking for. Hence analytics needs to be interpreted and used alongside other tools for evaluation. Equally, not all websites need to appear high in the search engine rankings. Making a website findable is not the same as ensuring it appears high in the search rankings. Such a listing is appropriate for some, but not for all websites. Better is to ensure that the users you want to come to the site can find it - not quite the same question.

If you are a content owner, I can manage the whole process for you, whether directly or by recommending specialist solution providers. ConsultMU Ltd was founded in 2002 by Michael Upshall. You can contact me at michael [at]

Content Licensing

Studies of content licensing tend mainly to consider contracts and licence negotiation, but there is more to it than that. In my book Content Licensing (Chandos Publishing, 2009) I aim to cover both contractual issues and the technology behind licensing, for example, how best to store and capture content, and how to hold title metadata. In addition, there is a strategic dimension to content licensing: should a publisher host their own content or license via aggregators? I have a separate website dedicated to licensing content and all that it entails.

For The Hutchinson Encyclopedia I set up an encyclopedia and reference content database system, with automatic platform-independent delivery of content to licensing partners in the US and the UK, first with Random House UK Ltd, and subsequently after a management buy-out with Helicon Publishing PLC.

For Continuum Publishing Group I found licensing partners and advised on a digital publishing strategy for their key titles.

For Encyclopaedia Britannica UK I advised on licensing to UK higher education.

For Global Market Briefings, a financial information publisher, formed by a management buy-out in 2004, I advised on licensing partners and a digital platform for content delivery.

Ten Tips for compiling a reference work

Reference works, from A-Z encyclopedias to short subject dictionaries, published online or in print, have many characteristics in common. Whether there is a single author, or a dispersed collection of compilers all over the world, many of the problems of compilation remain similar. I have tried to summarise the essential points for compiling them in a single list, which assumes a hierarchical managed compilation process, rather than the Wikipedia-style collaborative creation and update. Note that this list is very concise: these are simply a few recommendations that have remained important from my experience of many different kinds of reference-work compilation. This list may seem trite, but many large-scale reference projects fail because they ignore these fundamental points.

1. Establish the readership.
Is this work for teenagers? for higher education? For general users? Your approach will differ based on your answers to this question. How much can they be expected to know, and what do you have to explain to them? Determine the reader profile before going any further.

2. Create a headword list.
This is the single most important task, and should comprise a substantial part of the entire compilation time (perhaps 20%). Are you going to have an entry for "The Origin of Species" as well as for "Charles Darwin", or just one or the other?

3. Create article lengths.
Unlike Wikipedia, your reference work should have entries with length roughly proportional to the topic’s importance. If you have external contributors, elicit their comments on proposed article length.

4. Create sample entries.
You should create at least three short, medium and long entries. These give indications of style and tone for the compilers. The compilers will follow these far more than a style guide (see below).

5. Think about rights.
Do you have permission to publish all the content that will appear in the work? In all formats? In all markets? It is expensive to clear rights after publication.

6. Create a style guide.
Style guides ensure a consistency of treatment that makes a good reference work. They are a pain to compile but are vital for large-scale works. They can be continually updated.

7. Establish a linking policy.
You might choose to leave it to the computer, as Wikipedia does, so that every use of certain words becomes a link, for example "He was born in France" becomes a link to the "France" entry. Or you can think about when it might be appropriate to link, and when not. The latter approach is more work but produces a better product.

8. Think multi-format publishing.
Is this going to be a print title? A Web title? Or both? Make sure you don’t have references such as “see page 245”. There are plenty of ways of eliminating this problem. In a print volume, it is easy to state “in the last article”, or “in the next chapter” -but on the Web there may be neither, since chunks are accessed and read independently. Make sure your content does not refer to other sections in this way.

9. Set an illustration policy.
Are you going to illustrate only long entries? Some short entries may need an illustration to make any sense (such as DNA). Specify which entries are to be illustrated at the start.

10. Create only as much metadata as is required for the initial use of the work.
How much metadata (tagging) is sufficient? Don't start thinking of possible ways of accessing the material in five years' time; restrict yourself to the immediate market and your anticipated first users. You can always add more tags later. But make sure, via your style guide (above) that the metadata you add is consistent and unambiguous.