Querying XML Documents
7th December 2000
Paul Cotton
ZIG
Washington, Dec 7, 2000
(My comments are in italics)
Organisation
- XML query history and QL '98
- XML Query WG history, goals and status
- XML Query Requirements
- XML Query Data Model
- XML Query Algebra
- Questions
XML query history
- Early 1998: ``roll your own query language''
- XSL Working Group
- XSLT needed syntax to select nodes
- XML Linking Working Group
- XPointer needed syntax to select a location
- February 1999 joint meeting
- Rapprochement on 90% of syntax
- XPath
- W3C recommendation with XSLT
(Used by both XSLT and XPointer candidate recommendations)
XML query history - 2
Query Languages Workshop '98
- W3C sponsored workshop
- Boston (USA), December 2-3, 1998
- 98 participants: W3C members, database vendors, invited experts, etc.
- 66 position papers
(Particularly check out:
Mayers - "Desiderata for an XML query language" (ten points)
Cotton & XXX - "(summary paper)")
- See: http:###
W3C XML Query WG - History
- July 1999 - Working Group proposed as part of XML Activity phase 3
rechartering
- September 1999 - WG chartered
- More than 30 W3C member companies
- Eight F2F meetings and 40+ telecons so far
- Close working relationship with other W3C Working Groups (Schema,
XSL, I18N)
(This can make life difficult)
W3C XML Query WG - Goals
- `The goal of the XML Query WG is to produce a data model for XML
documents, a set of query operators on that data model, and a
query language based on those query operators.''
W3C XML Query WG - Status
- Jan 2000 - Requirements Working Draft
- May 2000 - XML Query Data Model WD
- May 2000 - Feedback on Schema Last Call
- August 2000 - Revised Requirements Working Draft wiht Use Cases
- Dec 2000 - XML Query Algebra WD
- Future public WDs every three months
- Proposed recommendation(s)
(later in 2001)
XML Query Requirements
- Usage Scenarios
### slide whipped away
General Requirements
- Non-procedural query language
(what to find, not how to find it)
- XML syntax for query language but also a readable syntax
(Yes, these requirements are contradictory)
- Protocol independent
- Standard error conditions
- Future support for updates
XML Query Data Model
- Built on XML Infoset and PSV
(Inforset seems to be something like a parsed document object)
(PSV is the Post Schema Validatation info-set: the significant
difference here seems to be that the data is typed)
- Namespace aware
- Support for XML Schema data types
- Support for inter- and intra-document references
XML Query Functionality
- Operators on all data types
(e.g. "less than" for numeric types)
- Text operators across element boundaries
(e.g. proximity matching of text ignoring tags)
- Support for hierarchy and sequence
- Ability to combine data from different locs
- Aggregation and sorting
- Combination of operators including queries as operands
(analogous to embedding a query in to FROM clause of an SQL
query -- not possible in SQL.)
XML Query Functionality - 2
- Support for NULL values
- Structural preservations
- Identity preservation
- Operations on names
(e.g. wildcarding element names -- SELECT *_id FROM ...)
- Operations on ``schemas''
- Extensibility
- Closure
XML Query Data Model WD
- Defines information available to a query processor
- Infoset plus the following
- - Support for XML Schema data types (PSV)
- - Support for document collections
- - Support for references
- Node-labelled tree contructor model with node identity
(I think this is a concrete interface to the infoset, which is
an abstract concept. I think)
- Mapping from Infoset to Query Data Model defined in Annex A
(Most important part of the document!)
- http://www.w3.org/TR/query-datamodel/
XML Query Algebra WD
(First release was this Monday!)
- Defines operations on Query Data Model
- Simple principles, easy to use
- Firm mathematical foundation
- Many issues still open
- - References
- - Unordered data
- - Algebra subset of syntax?
- http://www.w3.org/TR/query-datamodel/
Introduction to the Query Algebra: Part I: Data model and types
Document Algebra
-------- --------
<bib> bib [
<book> book [
<title>Data on the web</title> title [ "Data on the web" ],
<year>1999</year> year [ 1999 ],
<author>Fruitbat</author> author [ "Fruitbat" ]
</book> ]
</bib> ]
(... and the value of this is? ...)
(This moves too fast to take notes)
Questions