ECLIX

From LMNLWiki

Reading XML as LMNL: ECLIX

ECLIX is a set of conventions for annotating arbitrary XML to declare elements within the XML as CLIX milestones, and hence as LMNL range-start and -end markers, or as LMNL empty range markers. It is a useful intermediate format, which can generally be created from an XML format using a simple transform, between arbitrary XML (that may use any convention whatsoever, or none, for representing overlap) and LMNL.

Any ECLIX document may be transparently converted into a CLIX document (and thence into LMNL) using a generic stylesheet, available from the XML Pipelining page.

In ECLIX, the presence of an attribute in the CLIX namespace designates an XML element as being a range marker, and its content as annotations on the range.

Three CLIX attributes are recognized:

  • @clix:sID identifies an element as a range start marker
  • @clix:eID identifies an element as a range end marker
  • @clix:rID identifies an element as an empty range marker

where the 'clix' prefix is bound to the namespace "http://lmnl.net/clix" (any prefix may be used).

Additionally, the following constraints apply:

  • All values of @clix:sID, @clix:eID and @clix:rID follow the XML naming rules.
  • Pairs of @clix:sID and @clix:eID must uniquely identify the range they delimit. Similarly, @clix:rID must be unique among values of @sID, @eID and @rID in the clix namespace.
  • A @clix:eID must follow the corresponding @clix:sID (i.e., be placed on an element later in the document), on an element of the same type as the element holding the @sID.

All these rules may be validated using a Schematron (not implemented as of this writing, but feasible).

The rules of converting ECLIX to LMNL are as follows:

  • All XML elements that have no attributes @clix:sID, @clix:eID or @clix:rID are taken as range identifiers, with the range starting where the element starts and the range ending where the element ends. The qualified name of the element becomes the name of the range. Attributes on those elements are represented as annotations on the ranges they correspond to, with the attribute name becoming the annotation name and the attribute, the content of the annotation. Because attribute order in XML is not significant, these annotations may be in any order when ECLIX is generated from XML. (Generally, arbitrary LMNL can only be rendered as ECLIX when annotation order can be determined not to be significant among annotations cast to attributes. However, since LMNL annotations may also be represented as text or element contents of CLIX delimiters, the representation of ordered annotations is still possible in ECLIX.)
  • Any element with a @clix:sID is taken as the start of a range, which ends wherever the element with the matching @clix:eID appears. (Cognoscenti will recognize this as the "Trojan milestone" or "HORSE" convention for identifying overlapping ranges in XML.)
  • Any element with a @clix:rID is taken as an empty range marker.
  • All text and element contents of elements identified as CLIX range markers are taken to be annotations of the ranges concerned. There are two ways this can occur:
    • If there are no non-whitespace text contents, and if no elements are marked as CLIX delimiters (with @clix:sID, @clix:eID or @clix:rID), element contents are mapped to annotations with the same names, and in the same order, as they occur.
    • If any non-whitespace text, or any elements marked as CLIX delimiters, occur as children of an element marked as a CLIX delimiter, then an anonymous annotation is attached and all the text and element children of the delimiter (including elements marked as CLIX and those not) are mapped to the contents of the annotation, as they are normally. Within such an annotation, @clix:sID and @clix:eID must resolve as they do in document-wide scope. (That is, a CLIX range cannot start in one annotation and end in another.)
  • LMNL atoms, apart from atoms implicitly represented as Unicode characters, are represented in ECLIX in the same way they are in CLIX -- but are not yet specified at the time of writing. See the CLIX page. Make noise if you have a need, or a good use case, for atoms in CLIX or ECLIX.

Wendell 14:45, 21 September 2007 (BST)

Examples forthcoming

Earlier materials

A rag-bag of ideas for ECLIX (Extended CLIX) by John Cowan:

(Note: these reflect issues in converting arbitrary LMNL into ECLIX rather than the reverse.)

  • Use a different root element, clix:eclix.
  • If an annotation contains nothing but plain text (no generalized atoms, no ranges), represent it (optionally? mandatorily?) as an attribute. (But what about annotation order?)
  • If an annotation owns no meta-annotations except the ones that are represented as attributes, represent it as a whole using an XML element rather than a pair of milestones, since annotations can't overlap.
  • Leave out redundant pairs of clix:sID and clix:eID. (Or make this part of CLIX.)
  • Represent a range as an element if it doesn't overlap with any other range (and doesn't have annotations that can be given using attributes).
  • If a single range covers the entire document, omit the clix:eclix element (to get to the point where all XML documents are ECLIX documents).

Any more ideas?