5.3 Session Expression Cache

Expression Caching is a powerful feature to create well-performing functionally rich applications. You can use it for query result caching (avoiding to compute the same query twice), as a mechanism to simulate query result cursors; allowing an expensive query that delivers large result to be evaluated once, allowing subsequent queries to show small parts of a result set, that e.g. fit on the screen.

The mechanism allows Caching of Arbitrary Subexpressions inside a so-called Multi-Query Session.

5.3.1 Multi-Query Sessions

MonetDB/XQuery allows you to interact with the database server using a single session in which you see the same snapshot of the database all the time. That is, a multi-query session that may last for a long time.

You get such a session by prefixing queries using XQuery options:

declare option pf:session-id "ID"; 
declare option pf:session-timeout "MSECS"; 

QUERY

Here you should substitute QUERY by your query, ID by an identifier (letters, numbers, underscore, -), and MSECS by an integer number that indicates a duration in milliseconds.

XQuery options are part of the XQuery standard and systems implementing it are free to define their meaning. Non-meaningful options are simply ignored, such that adding such options will not affect the interoperability of your queries.

The function of the pf:session-* options is that all queries that are wrapped as such with the same ID use the same database snapshot.

An example query is one that display male persons:

doc("auctions.xml)//person[gender = "male"]/name

which could be wrapped in the pf:session pragma as follows:

declare option pf:session-id "my-own-id"; 
declare option pf:session-timeout "10000"; 

doc("auctions.xml)//person[gender = "male"] }
this says that the session is called my-own-id (a name the user is free to make up), and that it should be kept alive for 10 seconds (10000ms). After 10 seconds of inactivity in the session, the session is silently terminated, which means that the database snapshot is released at the server.

5.3.2 Caching of Arbitrary Subexpressions

Now consider in our example that we have a web interface that displays a table of person names. However, only 20 names fit on a screen, and the application provides a scroll bar and "next" and "previous" buttons to navigate through the list of persons. Each time the user clicks on those buttons, a new query will be executed like:

subsequence(doc("auctions.xml)//person[gender = "male"]/name, LO, HI)

with different values for LO and HI. This means that the entire query gets re-evaluated, which may take a long time, resulting in a poor user experience.

The sub-expression caching infrastructure allows users to mark up any subexpression for caching, using a pragma:

(# pf:cache EXPRID #) { EXPR }

Pragma's (# xx #) are not an extension themselves, they are part of the XQuery standard and are normally ignored, semantically they do not change the query, so the presence of pragmas does not affect the interoperability of your queries.

Again, EXPRID is an identifier made up by the user and EXPR can be anything. It should be noted, however, that the pf:cache pragma cannot be used inside for-loops.

For example, we could rewrite our previous example query into this one, which displays the first 10 male persons:

declare option pf:session-id "my-own-id"; 
declare option pf:session-timeout "30000"; 

subsequence((# pf:cache my-male-persons #) { doc("auctions.xml)//person[gender = "male"] }, 0, 10)

which says that within session my-own-id, the subquery for male persons should be cached under name my-male-persons. This has as effect that on the first time this query is executed in the session, the result of the expression is cached inside the session. Any subsequent request enclosed by a pf:cache pragma with the my-male-persons identifier will not take computational effort, as the result is already cached.

For example, if a user hits the "next" button, the next 10 male persons can be produced in no time as follows:

declare option pf:session-id "my-own-id"; 
declare option pf:session-timeout "30000"; 

subsequence((# pf:cache my-male-persons #) { doc("auctions.xml)//person[gender = "male"] }, 10, 20)

A side effect of the query with caching pragmas is that the session timeout is set to the current time plus the timeout (here 30000, hence 30 seconds). In other words, each query that uses a cached session causes that session to be kept alive for the amount of time it specifies.

Note that one can terminate a session by sending a (dummy) query with a timeout of 0.

5.3.3 Consistency

We should note that the current implementation of Session Subexpression Caching in MonetDB/XQuery is rather simple, as it requires the user to annotate the interesting subexpressions with pragmas (rather than doing this automatically).

Even more, it is the responsibility of the user to be consistent in the use of pf:cache pragma identifiers: if the same identifier is used in the same session for different subexpressions, incorrect results will be returned (MonetDB/XQuery does not test itself that the subexpression syntax which produced a cached result is identical to the syntax given in the prior query that computed the sub-result).

5.3.4 Concurrent Access to a Session

The session reuse mechanism in MonetDB/XQuery will cache sessions, yet allows only a single query to access it at the same time (it locks the session). This limits parallelism on multi-core machines. For this reason, we support the option pf:session-nocache:

declare option pf:session-id "ID"; 
declare option pf:session-timeout "MSECS"; 
declare option pf:session-nocache "true"; 

QUERY

The idea behind a session-nocache session is that it only re-uses a session (with potentially pre-created cached results attached to it), but it is not allowed to store any new cached subexpression values. While this means that particular activities of this queries will not be available for re-use in subsequent queries, the fact that the state of the session is left unchanged means that multiple of such nocache session queries (in the same session!) can run in parallel on a multi-core server.

Thus, queries whose results are likely not to be reused, but whose computation relies on precomputed expressions, are a target for running with session-use, with the benefit that increased parallel performance can be obtained. This is only relevant if you have multiple queries that could be executed concurrently.

This option is also useful for avoiding to pollute the cached session with many constructed nodes, if you query constructs many nodes, as explained below.

5.3.5 Memory Consumption

A final issue is the size of the cache. For each session, a default limit of 128MB of results is maintained. This quantity can be changed by modifying the value of the xquery_procMB MonetDB environment variable, followed by a server restart.

The item sequences in the session cache are management automatically by the system using an LRU scheme.

Special attention should be paid to caching subexpressions that perform node construction. The MonetDB/XQuery of node construction causes temporary tables to be populated with tuples that represent the new nodes. Therefore, such queries cause extra memory consumption (in addition to the XML document in the database that remains open, and the cached sequence of items, there is extra data being kept that represents the new nodes).

To avoid polluting memory with many constructed nodes, you should consider using pf:session-nocache to avoid caching them. Of course, if the constructed nodes are what you want to cache, you should do so, but beware of the size.

The complication with constructed node space is that the system cannot garbage collect it, hence this memory space only grows. There is a hard limit imposed on the amount of constructed nodes (1M), after which the session gets terminated! This draconian measure is currently the only way to keep resource consumption under control.

5.3.6 Updates

Due to the snapshot semantics, users will see the same database state throughout the entire session.

Updating queries are allowed in a session, however these always trigger the termination of the session.