Blog

CURIEs - A Cure for URIs

thammond – 2008 December 03

A quick straw poll of a few folks at London Online yesterday revealed that they had not heard of CURIE’s. And there was I thinking that most everybody must have heard of them by now. 🙂 So anyway here’s something brief by way of explanation.

CURIE stands for Compact URI and does the signal job or rendering long and difficult to read URI strings into something more manageable. (URIs do have the particular gift of being “human transcribable” but in practice their length and the actual characters used in the URI strings tend to muddy things for the reader.) So given that the Web is built upon a bedrock of URIs, anything that then makes URIs easier to handle is going to be an important contributor to our overall ease of interaction with the Web.

(Continues)

Ten years back (in February 1998) when XML was first introduced it presented a flat naming system for document markup. For purposes of modularity and markup reuse the XML Namespaces specification released the following year allowed for element and attribute names to be replaced by expanded names in which the hitherto simple names would be replaced by name pairs consisting of a namespace name and a local name. The use of URIs for the namespace name thus opened the doors to assigning globally unique names for XML element/attribute names. As a practical point (both to keep the names short and to deal with URI characters), the notion of a qualified name (or QName) was introduced, whereby the local name would be qualified by a prefix which stood in for the namespace name.

This was such a successful device that over time it was applied to URIs in general as a mechanism for abbreviation. Especially in RDF/XML schema elements were referenced by QName. And the practice has spilled over into non-XML syntaxes (e.g. the N3 and Turtle RDF grammars which use a “@prefix” directive). But there were problems since the device was grounded in XML the local names were constrained by allowable characters for XML elements and attributes (e.g. names cannot start with a digit character), as well as there being no specification for applying this same device to non-XML grammars.

CURIE is an initiative to generalize this notion of qualified names for URIs beyond the immediate XML context for naming elements and attributes (which would also allow their use in attribute values), to a more general use in applications beyond XML. The development of CURIE is based upon work done in the definition of XHTML2, and upon work done by the RDF-in-HTML Task Force, a joint task force of the Semantic Web Best Practices and Deployment Working Group and XHTML 2 Working Group. The Editor’s draft CURIE Syntax 1.0 is currently a W3C Candidate Recommendation which is receiving comments through Jan 15, 2009, at which time it is intended to put it forward as a W3C Proposed Recommendation. Meantime, though, the new W3C Recommendation RDFa Syntax in XHTML (published Oct 14, 2008) has a normative section on CURIEs (see Sect. 7).

So, what do CURIEs look like? Taking a simple RDFa example for DOI we might have a fragment such as:

<div xmlns:doi="http://0-dx.doi.org.wam.leeds.ac.uk/" xmlns:dcterms="http://purl.org/dc/terms/">
<span property="dcterms:hasPart" resource="[doi:10.1038/nature07184]"/>
</div>
</div>

This would be processed by an RDFa processor to yield the RDF triple (in N3/Turtle):

<doi:10.1038/nature07184> dcterms:hasPart <http://0-dx.doi.org.wam.leeds.ac.uk/10.1038/nature07184> .

This triple (or fact) says that the resource identified by doi:10.1038/nature07184 has as a component part (cf. DCTERMS vocabulary) the resource identified by http://0-dx.doi.org.wam.leeds.ac.uk/10.1038/nature07184. (The abstract work identified by the DOI has as a component part the splash page identified by the proxy URL.)

OK, so what’s going on? The “property” attribute takes a CURIE as value where the prefix “dcterms” is standing in for the XML namespace URI. The “about” and “resource” attributes both take a URI or CURIE as value, but because of any potential confusion a (so-called) “Safe CURIE” must be used which is a CURIE wrapped in brackets. The above example does not use brackets for the “about” attribute and therefore an RDFa processor would read this as being a full URI, i.e. &lt’doi:10.1038/nature07184>, whereas it does use brackets for the “resource” attribute and therefore this would be read as being a (Safe) CURIE, i.e. http://0-dx.doi.org.wam.leeds.ac.uk/10.1038/nature07184.

We can turn this around as follows:

<div xmlns:doi="http://0-dx.doi.org.wam.leeds.ac.uk/" xmlns:dcterms="http://purl.org/dc/terms/">
<span property="dcterms:isPartOf" resource="doi:10.1038/nature07184"/>
</div>
</div>

This would be processed by an RDFa processor to yield the RDF triple (in N3/Turtle):

<http://0-dx.doi.org.wam.leeds.ac.uk/10.1038/nature07184> dcterms:isPartOf <doi:10.1038/nature07184> .

This triple (or fact) says that the resource identified by http://0-dx.doi.org.wam.leeds.ac.uk/10.1038/nature07184 is a component part (cf. DCTERMS vocabulary) of the resource identified by doi:10.1038/nature07184. (The splash page identified by the proxy URL is a component part of the abstract work identified by the DOI.)

So what do CURIEs give us? Nothing more than a generic means to be able to make human-friendly statements such as

<doi:10.1038/nature07184> dcterms:hasPart doi:10.1038/nature07184 .

instead of having to spell it out in full triples form using long-winded URIs:

<doi:10.1038/nature07184>
<http://http://purl.org/dc/terms/hasPart>
<http://0-dx.doi.org.wam.leeds.ac.uk/10.1038/nature07184> .