Skip to:

A genuine XML workflow for journals

The terms "XML workflow" and "XML first" are used so frequently that it is as if the simple repetition of the terms provides sufficient proof that what is claimed to be happening is actually taking place. Many workflows that claim to be XML first do not provide full round-tripping of the content, and certainly not at the same time being fully compliant with the industry-standard DTD. At a recent presentation by Rave Technologies (London, 19 November) a genuine 100% XML workflow was demonstrated for journals, and it was impressive in several ways. The presentation started with the premise: If the online journal is today the version of record, then why does every current production workflow provide only PDF proofs? That's a very good question!

<--break->The presentation, by Michael Hepp and Charles O'Connor of Dartmouth Journal Services, described a full XML workflow for journals - that is, a fully web-based XML-driven, article proofing and editing system, which is edited via HTML. The service is currently in beta form, with an anticipated live date of summer 2014. In essence, the Dartmouth approach (which was developed by working with Rave Technologies as an offshore developer) was to provide direct access to the XML of the journal article throughout the process. Both author and editor can make changes directly to the article online. The process comprises a full round-trip for edited content.

It might seem risky to allow the author to make changes directly in this way, and a key requirement is to provide a full tracked changes system. It took some nine months of development to come up with a method of tracking changes so that every author change could be recreated as required - fascinatingly, there appears to be a limitation of using XSL-FO in that it is possible for some author changes to be masked if more than one change is made at the same time. To overcome this limitation, Dartmouth came up with a way of "denormalizing" all nested elements to granularly expose all edits.

What are the advantanges of this full XML workflow?

There is no need for a typesetter to rekey author changes.
Authors see the online version of the article earlier in the process, inclusive of links, multimedia content, and any semantic enrichment (none of which are easy to provide using PDFs).
Authors and editors are editing the XML but without the need to use a full XML editor. Hence no knowledge of XML is required to use the system.
A PDF can be output at any time in the process if required for download and/or printing. But the PDF is not used for editing.
What about possible drawbacks of the system? It seems churlish to criticise such a brilliant idea, but there are a few potential problems. The most important one, it seems to me, is that although the system is making use of existing commercial software tools applied in concert (SDL Live Content Create for the editing environment, Typefi to produce high-quality PDFs from XML content, which apparently provides better output than possible with XSL-FO), it is risky to bolt together third-party components in this way. Because the solution is an assembly of existing software, there is the potential for everything to fall over when one or more of the core components is updated. This is not to say it is likely to happen any time soon, only that it could happen.

The other problem, recognized by the presenters themselves, is that the solution has great potential, but the team building it is quite small and may struggle to deal with all the potential customers, quite apart from requests to adapt the system to the specific needs of new customers and new markets. There was some talk of licensing the software, but to productize software requires quite a bit of work, and I can imagine the development team being pulled in several directions to extend this system. One direction would be to move back in the workflow to include copy editing; another possible extension is to adapt the system for books as well as for journals.

Quite apart from being impressed by the elegance of the software solution, I was also fascinated to learn a bit more about how the solution had been developed. This innovation had come from a relatively small print services group of companies - not where you would expect a fully digital workflow to originate. What is more, the methodology for developing the software was very thorough. After developing the initial prototype, Dartmouth ran a roadshow presenting it to several customers across the US, followed by a focus group with authors based at a university. Many scholarly publishers have not grasped the idea of running focus groups with their authors, and it was very encouraging to see a company usually at one remove from authors taking the trouble to learn from direct author feedback and suggestions. Finally, this small group had outsourced the entire software developer to an Indian-based company, using an Agile methodology, not only to develop to a tight brief, but also to work together as a team to identify solutions that were not known to either party at the start. Working with offshore developers is never simple, and coming up with new ideas in this way would be a challenge enough for a team all based in a single office, so to have enough trust in outsourced developers to examine problems together is impressive indeed.

All in all, the Dartmouth Journal System looks to be a very intelligent and thought-through innovation in the scholarly production workflow.