I need to filter documents by date (last week, last month, etc.) with Marklogic 8. The database contains 1.3 million XML documents.
The documents look like this:
<work datum_gegenereerd="2015-06-10" gegenereerd="2015-06-10T14:28:48" label="gmb-2015-12000">
...
I've created a range element attribute index on work/@datum_gegenereerd (scalar type date).
The following query works but is slow (3 seconds):
xquery version "1.0-ml";
for $a in //work
where xs:date($a/@datum_gegenereerd) > current-date()- 5*xs:dayTimeDuration('P1D')
return
<hit>{base-uri($a)}</hit>
After a lot of experimenting, it turns out that I can get the performance down to 0.02 seconds by removing the xs:date cast from the where statement.
xquery version "1.0-ml";
for $a in //work
where $a/@datum_gegenereerd > current-date()- 5*xs:dayTimeDuration('P1D')
return
<hit>{base-uri($a)}</hit>
Can anyone explain this behaviour?
Update:
when I delete the attribute range index, the performance for the second variant goes down to 3+ seconds as well. And recreating the index brings the performance back up. This makes me wonder how to read David's statement below that there is no way to use a custom index from plain xquery. (BTW: the query returns 1267 XML documents, out of a possible 450000 documents with root element work in a total database of 1.35 million documents)
Update 2:
I messed up with the performance metric of 0.02 seconds. But it is very fast in the query console. Of the 3 versions, the cts-search seems a tiny bit faster.
You may have created an index, but you are not using it. You need to use an element-attribute-range-query to find all of the fragments that have dates in the range in question.
something like
cts:search(doc(), cts:element-attribute-range-query(xs:QName("work"), xs:QName("datum_gegenereerd"), ">" current-date()- 5*xs:dayTimeDuration('P1D'))
BUT: if you really just want the URIS, then the element-range-query would be used with cts:uris (sometihng like this - but check the docs)
cts:uris('', (), cts:element-attribute-range-query(xs:QName("work"), xs:QName("datum_gegenereerd"), ">" current-date()- 5*xs:dayTimeDuration('P1D'))
The second one does everything in memory and just pulls the URIs from the URI lexicon that point to document fragments where the date query matches.
Collected from the Internet
Please contact [email protected] to delete if infringement.
Comments