Monday, 3 October 2016

How would we know when it was time to move from TEI/XML to TEI/JSON?

This post inspired by TEI Next by Hugh Cayless.

How would we know when it was time to move from TEI/XML to TEI/JSON?

If we stand back and think about what it is we (the TEI community) need from the format :
  1. A common format for storing and communicating Texts and augmentations of Texts (Transcriptions, Manuscript Description, Critical Apparatus, Authority Control, etc, etc.).
  2. A body of documentation for shared use and understanding of that format.
  3. A method of validating Texts in the format as being in the format.
  4. A method of transforming Texts in the format for computation, display or migration.
  5. The ability to reuse the work of other communities so we don't have to build everything for ourselves (Unicode, IETF language tags, URIs, parsers, validators, outsourcing providers who are tooled up to at least have a conversation about what we're trying to do, etc)
[Everyone will have their slightly different priorities for a list like this, but I'm sure we can agree that a list of important functionality could be drawn up and expanded to requirements list at a sufficiently granular level so we can assess different potential technologies against those items. ] 

If we really want to ponder whether TEI/JSON is the next step after TEI/XML we need to compare the two approaches against such as list of requirements. Personally I'm confident that TEI/XML will come out in front right now. Whether javascript has potential to replace XSLT as the preferred method for really exciting interfaces to TEI/XML docs is a much more open question, in my mind.  

That's not to say that the criticisms of XML aren't true (they are) or valid (they are) or worth repeating (they are), but perfection is commonly the enemy of progress.

2 comments:

Hugh Cayless said...

I'm not so much interested in the "when to abandon XML?" question (to which the answer is, as far as I'm concerned, "not yet"), as I am in projecting TEI out of the XML box. When it's useful for TEI to be expressed as HTML or JSON, can I still check that it's TEI? Or will I be forced to convert it back to XML to do that sort of validation? Is all TEI necessarily XML? Are we constrained by the limitations and occasional bugs of XML and its attendant technologies?

Stuart Yeates said...

The only post-XML validation approach I can think of for TEI is Data Shapes https://www.w3.org/2014/data-shapes/wiki/Main_Page