Saturday, 20 November 2010

HOWTO: Deep linking into the NZETC site

As the heaving mass of activity that is the mixandmash competition heats up, I have come to realise that I should have better documented a feature of the NZETC site, the ability to extract the TEI xml annotated with the IDs for deep linking.

Our content's archival form is TEI xml, which we massage for various output formats. There is a link from the top level of every document to the TEI for the document, which people are welcome to use in their mashups and remixes. Unfortunately, between that TEI and our HTML output is a deep magic that involves moving footnotes, moving page breaks, breaking pages into nicely browsable chunks, floating marginal notes, etc., and this makes it hard to deep link back to the website from anything derived from that TEI.

There is another form of the TEI available which is annotated with whether or not each structural element maps 1:1 to an HTML: nzetc:has-text and what the ID of that page is: nzetc:id This annotated XML is found by replacing the 'tei-source' in the URL with 'etexts'

Thus for The Laws of England, Compiled and translated into the Māori language at there is the raw TEI at and the annotated TEI at

Looking in the annotated TEI at we see for example:

<div xml:id="t1-g1-t1-front1-tp1" xml:lang="en" rend="center" type="titlePage" nzetc:id="tei-GorLaws-t1-g1-t1-front1-tp1" nzetc:depth="5" nzetc:string-length="200" nzetc:has-text="true">

This means that this div has it's own page (because it has nzetc:has-text="true" and that the ID of that page is tei-GorLaws-t1-g1-t1-front1-tp1 (because of the nzetc:id="tei-GorLaws-t1-g1-t1-front1-tp1"). The ID can be plugged into:<ID>.html to get a URL for the HTML. Thus the URL for this div is This process should work for both text and figures.

Happy remixing everyone!