This page outlines some of the projects Graham has been involved in.
SGML to XML conversion
In this project, the client had 7GB of legacy data held in an obsolete SGML format. The data needed to be converted to XML that parsed against a Relax NG version of the DocBook schema. The aim of the project was to allow the client to upload the data to a website. The project involved a lot of clean-up. Much of the clean-up was done programmatically, thanks to Graham’s systems, but some manual clean-up was also required.
Creating an EPUB demo from an XML file
Graham has written a script that takes an XML file and creates an EPUB file that can be used as a demo. The text in an EPUB file is reflowable, meaning that it can be optimised to display correctly on whatever reader is being used to view the content. The script will be a valuable selling tool for the publisher for whom it was written.
InDesign to XML conversion
IDML is InDesign’s open, XML-based file format and Graham is working with two clients on automated IDML to XML conversion to enable them to benefit from automated workflows for content publishing.
The first is Prepress Projects Ltd, where Graham is converting IDML files to XML conformant to the NLM DTD.
The second is Hope Services (Abingdon) Ltd where he is converting the IDML files to XML conformant to the Hart Publishing DTD.
Using OmniMark to style Word documents
Graham is developing an automated process for an international publisher that will significantly reduce the amount of preparation time spent by copy editors on new files for publication in books or journals.
The process takes unstructured Word documents which have some form of mark-up (e.g. the use of the heading styles or the use of a larger bold font to indicate a heading). It outputs a fully-styled Word file where every component is assigned a style. This includes:
- headings
- lists
- quotes
- verses
- figures
- tables.
The process also styles the component parts of the bibliography (e.g. first name, last name, publication year) and then tags citations in the body of the text accordingly. Finally, the process tidies up the manuscript by performing actions such as:
- checking that items such as brackets and quote marks are paired correctly
- converting items such as “–-” to “-“
- ensuring all bibliographic references are in the correct order
- all tables and figures referenced in the document are present.
This outputted file is then sent to the copyeditor for any necessary manual clean-up. The clean-up file is then checked automatically to ensure all tags are valid. Once all these processes are complete, the files are converted to NLM conformant documents.
Word to XML conversion
This process involves taking fully styled Word files and converting them to XML instances that conform to an in-house schema. The outputted files are then sent to a copyeditor for checking and then on to the typesetter.
Word to XML conversion
This process involves taking fully styled Word files and converting them to XML instances that conform to an enhanced DocBook schema. These resulting files are then concatenated and the enhanced elements removed so that the files conform to a standard DocBook schema.
RTF to XML conversion
Graham worked on a project for the British Army’s vehicle maintenance manuals. The work he undertook allows data modules to be automatically generated from Word files.
XML to XML conforming to AvP70 DTD
The RAF Handling Squadron’s flight record cards and air crew manuals are available in XML. Graham has written software that automatically converts this raw XML data into XML data that conforms to the AvP70 DTD.
HTML to XML conversion
Graham has worked with the Royal Institution of Chartered Surveyors to produce isurv: an online portal providing property professionals with best practice information. The project involved 5000 HTML files in various formats, including Word HTML. The files were converted into well-formed XML and split into six information groups: building surveying, commercial property, construction, environmental, planning and valuation for use in the portal.
XML conversion in the defence industry
Graham provided setup scripts and ongoing support for a UK MoD project. The project was set up to convert a range of MoD publications from a legacy format into XML. When files had been created by MoD staff, they were exported for processing by Graham. This involved transformation of the exported files into a single file per volume so that the file is valid SGML that conforms to the target MoD DTD. These tasks were performed using OmniMark.
RTF to XML conversion in a legal context
Graham undertakes work for the Guernsey Legal Information Board website, the Jersey Legal Information Board website and the Cayman Islands Legal and Judicial Information website. All three sites are the official sources of legal information in their respective countries.
For the Guernsey Legal Information Board website, Graham converted Word versions of the Laws in Force and the superseded revised editions to RTF to and then to XHTML that uses CSS. The XHTML version of the Laws in Force and the revised editions appear on the Guernsey Legal Information Board website. PDF versions of the RTF files are also available for download. These documents cannot be confused with the official version because the Guernsey insignia is removed before conversion to PDF.
The project is an ongoing one for Graham. Each time a law is introduced or revised, the Law Draftsman places a Word file in a directory on his or her PC. This file is picked up using an automated process before undergoing the processes outlined above. The resulting files are placed on a staging server. These files are checked by the Law Draftsman before being made available on the live site. In this way, the law can be updated on the GLIB website in a matter of minutes.
For the Jersey Legal Information Board website, Graham converted Word versions of the Laws in Force and the superseded revised editions to RTF to and then to XHTML that uses CSS ‘ the same process as above.
Again, the project is an ongoing one. Each year, the JLIB sends all the files that have changed in the preceding year. The same process as for the GLIB takes place, but in bulk rather than individually.
For all three websites, Graham also converts articles relating to the laws from RTF to XHTML and PDF for publication on the respective sites.
RTF to XML conversion
In an ongoing project, Graham works with a number of UK typesetters to convert RTF files to well-formed XML. As a result of the conversion to XML, the typesetting process itself has become more time-efficient.
For the same clients he has also written a series of DTDs which allow any document to be converted into XML and formatted in a uniform house style.
Large volume, multi-stage RTF to XML conversion
Graham was commissioned to convert 25,000 legacy files into XML that met the global platform requirements of a multi-national company. To do this, Graham used the client’s DTD to convert the legacy files from RTF to XML. The XML files were then converted to current style RTF. In the final stage, the identically styled files were converted to XML.
For the same client he wrote a conversion programme that allowed staff to convert any file into a full-fledged XML document.
RTF to XML conversion and clean-up
A client required RTF documents to be cleaned up to allow staff to work with them cost- and time-effectively. Graham wrote a program which converted the RTF documents into XML. Once in XML, the data underwent up to forty clean-up and enhancement routines. The resultant data was then converted to RTF.
DTD development
One client required data to be converted to XML for use in XPP. Graham worked with the client to write a DTD and conversion script that allowed this to happen.
Miles 33 to XML conversion
In a recent project, Graham worked with a publisher with content created using Miles 33. The project helped the publisher streamline an annual update to two publications. Historically this had involved a typesetter manually updating each publication using a marked-up version of one of the publications.
Graham took the data from both publications and placed them in four separate databases. These databases are updated by running them against a Word file. Once updated, the databases are converted to XML for composition using XyEnterprise XPP. The data can be sorted according to a complex custom sort order. It is also used for the automatic compilation of 1,500 ‘ 4,500-page books.
The work has provided considerable cost- and time-saving benefits. The set-up costs and the initial conversion costs for this project were less than a single year’s manual update by a typesetter.
To discuss your project’s requirements, please get in touch.