Talk:CLIX
From LMNLWiki
Contents |
Matching Without IDs
Jeni 16:23, 11 September 2006 (EDT): Perhaps CLIX isn't meant as an authoring format, but in any case, it would be a lot easier to use if you didn't have to put globally unique IDs on everything. Can't we limit the use of IDs to places where it's ambiguous? For example, allow:
<clix:root>
<book clix:role="start-range">
<title>Beginning XSLT 2.0</title>
<author>Jeni <surname clix:role="start-range" />Tennison<surname clix:role="end-range" /></author>
</book>
...
<book clix:role="end-range" />
</clix:root>
- Hmm. I'm not sure. --John Cowan 16:56, 11 September 2006 (EDT)
- I think this question goes to what we expect/intend to use CLIX for. "Easier to use" might vary across contexts. It's certainly easier to forget about the IDs if one is trying to make this stuff in an editor. But it's not easier for a stylesheet developer who's using XSLT 2.0 grouping to do structural induction back into XML, is it? (We have a fan at Columbia, Terry Catapano, who's using CLIX-like stuff to do exactly that.) I could be swayed either way, but I think maybe ECLIX is the "easy to use in an editor" format and CLIX, which ought to be derived programmatically for the most part, should be easy to process ... so maybe require the IDs here. In any case, more discussion of what these formats are for would help clarify, I think. --Wendell 18:05, 24 September 2006 (EDT)
- Yep, having thought it through I agree that CLIX should always have the IDs, with ECLIX being the level at which they're dropped. — Jeni 16:16, 27 September 2006 (EDT)
Proposed alternative for annotations
Since annotations can't overlap other annotations, how about
<clx:clix>
<foo clx:sID="f">
<clx:annotation>
<bar clx:sID="br"/>...<bar clx:eID="br"/>
</clx:annotation>
</foo>
...
<foo clx:eID="f">
<clx:annotation>
<baz clx:sID="bz">
<clx:annotation>
<fred clx:sID="f"/>...</fred clx:eID="f"/>
</clx:annotation>
</baz>
</clx:annotation>
</foo>
</clix:clix>
Wouldn't this method also let us do away with clix:role attributes?
Abbreviating clx:annotation as clx:annot or clx:note would be okay.
--Wendell 18:39, 24 September 2006 (EDT)
- I think we can either have a markup language in which elements are named after the range/annotation/atom that they represent, or one in which elements in the CLIX namespace represent the tags. Both kinds of markup could be useful, but using both within a single markup language would be confusing. In other words, I now think it should either be
<clx:clix>
<foo clx:sID="f">
<bar clx:sID="br" clx:role="start-annotation" />...<bar clx:eID="br" clx:role="end-annotation" />
</foo>
...
<foo clx:eID="f">
<baz clx:sID="bz" clx:role="start-annotation">
<fred clx:sID="f" clx:role="start-annotation" />...</fred clx:eID="f" clx:role="end-annotation" />
</baz>...<baz clx:eID="bz" clx:role="end-annotation" />
</foo>
</clx:clix>
- or
<clx:clix>
<clx:start-range id="f" prefix="" ns="" name="foo">
<clx:start-annotation id="br" prefix="" ns="" name="bar" />...<clx:end-annotation id="br" prefix="" ns="" name="bar" />
</clx:start-range>
...
<clx:end-range id="f" prefix="" ns="" name="foo">
<clx:start-annotation id="bz" prefix="" ns="" name="baz">
<clx:start-annotation id="f" prefix="" ns="" name="fred" />...<clx:end-annotation id="f" prefix="" ns="" name="fred" />
</clx:start-annotation>...<clx:end-annotation id="bz" prefix="" ns="" name="baz" />
</clx:end-range>
<clx:clix>
- I think the former is more in the CLIX tradition (if one can use the term 'tradition' here) — Jeni 16:40, 27 September 2006 (EDT)
- I'm sympathetic with the arguments here. Yet I note also that in both examples, there's a clx:clix wrapper element, so we can't restrict the clix namespace to attributes in either case. (I agree that mixing the two forms would seem unnecessarily confusing.) Likewise, while I agree that the former is "more in the CLIX tradition", the thing that's really in the "CLIX tradition" is ECLIX. It's true that the only people I know who have demonstrated processing CLIX, namely me and Terry C, have flattened it first -- but in my case ("half-LMNL" of 2004) I flattened it all the way into your second example. (And then it was pulled into something like LOOL. :-)
- In other words, I think it's useful to separate the question of what we should to accommodate earlier CLIX proposals (the "CLIX tradition") from what should be the officially-sanctioned simplest straightforward representation of LMNL in XML (the "canonical" notion). I think the former is very well accounted for by ECLIX (which I regard as a brilliant contribution BTW), and because of that I think the "canonical" format is free to be something just a bit easier to validate and process. (BTW if we call it something else that's fine with me: I'm not sure the name "CLIX" has stuck anywhere.) --Wendell 20:28, 1 October 2006 (EDT)
- Another, perhaps more persuasive, argument for using the names of elements as the names of ranges/annotations is that QNames are much easier to represent in the name of the element than as attributes. You need attributes for the namespace and local part of the name, plus the prefix if you want to do nice round-tripping; if you have LMNL with namespaces in it then the document becomes very long and repetitious.
- I suppose the thing that feels most weird about the above proposal is that
<clx:annotation>must have a particular milestone element as its first child, and another specific one as its last child. That kind of constraint feels strange to me, but the alternatives all feel strange as well. I think we need to see how easy or hard it is to transform to ECLIX without a<clx:annotation>element to know whether it's worthwhile. — Jeni 08:11, 2 October 2006 (EDT)
- I suppose the thing that feels most weird about the above proposal is that
- I agree with you now. Having transformed ECLIX to CLIX (at least provisionally), I now don't think
clx:annotationis actually much of a help and doesn't fundamentally change the mapping issues. Wendell 14:28, 16 August 2007 (BST)
- I agree with you now. Having transformed ECLIX to CLIX (at least provisionally), I now don't think
Re: open issues
Namespace evasion
The open issues section notes that "If the CLIX namespace is used in a LMNL document, it can't be serialized."
I think we need a general namespace-aliasing mechanism for handling this and similar issues at the processing layer, not unlike (though perhaps a bit easier to understand) the feature XSLT offers for processing and then serializing XSLT . This could be as simple as a "shadow" namespace for every namespace in LMNL....
http://lmnl.net/namespace/clix would be shadowed by http://lmnl.net/namespace/shadow/clix, which would (in absurd cases) be shadowed by http://lmnl.net/namespace/shadow/shadow/clix etc.
This would require that we reserve http://lmnl.net/namespace/shadow and its ilk.
At suitable times in processing (generally, when serializing), shadow namespaces could be switched out for the real thing. Most of the time they'd simply be placeholders, meant to go unnoticed.
Note that I've used a "namespace" step in the namespace path. I think doing so generally leaves less chance for regret, should one ever want to put a page at a namespace designator (ha). This could be "ns" for conciseness.
—Wendell 18:55, 24 September 2006 (EDT)
CLIX vs. LMNL events XML
I think whether we need both depends on what we intend to do with CLIX. I think CLIX could be a very useful platform for experimentation and even real-world processing, at least in an interim period before we have a true LMNL toolkit. It keeps getting reinvented, which is a sign that it's useful.
—Wendell 18:55, 24 September 2006 (EDT)
- The way I see it, Events Markup is the XML equivalent to the events returned by a pull parser. It essentially marks up the significant tokens in a LMNL serialisation. It's the easy thing to generate from a parser, but it's pretty complicated to process because it's completely flat.
- CLIX is the next level up: it marks up the mark-up in a LMNL serialisation: the tags. It forms a bridge between events markup and ECLIX and is reasonably easy to process.
- ECLIX is the user-friendly version that people could write in an editor. It's also important to me that plain XML documents can be interpreted as LMNL documents via ECLIX.
- I can see a requirement for another kind of XML-based representation of LMNL documents in which the content and ranges of a particular limen are separated (i.e. each range is represented by an element with the start and end indexes as attributes). It would be easy to process such a representation to add, move and remove ranges from a document. (I dub this LOOL, for LMNL Out Of Line. --John Cowan 23:52, 27 September 2006 (EDT))
- I have absolutely no problems in having multiple ways of representing LMNL in XML form: they meet different needs, and they naturally form a pipeline from LMNL syntax to plain XML and back again, which I think is really useful.
- I agree on all points. --John Cowan 23:52, 27 September 2006 (EDT)
— Jeni 16:13, 27 September 2006 (EDT)
