Shen Y.,HP Labs Bristol UK |
Shen Y.,University of Bristol |
Feng L.,Tsinghua University
Computer Systems Science and Engineering | Year: 2011
Sequence-based XML indexing aims at avoiding expensive join operations in XML query processing. In this paper, we first provide an overview of the state-of-the-art XML indexing and querying techniques, particularly the twig-based and sequence-based approaches, in the literature. We describe their difficulty in resolving the semantic uncertainty problem which is inherent in XPath queries with predicates. We then present a new sequencing technique based on the theory of Eulerian Cycle, called EulerSequenceing. Such method can translate an XML document (respectively XPath query) into a sequence with strict O(n) time and space complexity, where n is the number of nodes in an XML tree (respectively an XPath twig). An associated FastEulerStack subsequence matching algorithm is developed to deal with XPath queries with semantic uncertainty by generating one query sequence with a simple query equivalence theory, and then retrieve the correct answers without refinement phases. The extensive experimental results demonstrate that our FastEulerStack is efficient, effective, and scalable in processing XPath queries with predicates. © 2011 CRL Publishing Ltd.