van der Vlist, Eric. “XQuery Injection: Easy to exploit, easy to prevent....” Presented at Balisage: The Markup Conference 2011, Montréal, Canada, August 2 - 5, 2011. In Proceedings of Balisage: The Markup Conference 2011. Balisage Series on Markup Technologies, vol. 7 (2011). https://doi.org/10.4242/BalisageVol7.Vlist02.
Balisage: The Markup Conference 2011 August 2 - 5, 2011
Balisage Paper: XQuery Injection
Easy to exploit, easy to prevent...
Eric van der Vlist
Dyomedea
Eric is an independent consultant and trainer. His domain of expertise includes Web
development and XML technologies.
He is the creator and main editor of XMLfr.org, the main site dedicated to XML technologies in French, the author of the O'Reilly
animal books XML
Schema and RELAX NG and a member or the ISO DSDL (http://dsdl.org) working group focused on XML schema languages.
He is based in Paris and you can reach him by mail (vdv@dyomedea.com) or meet him at one of the many conferences where he presents his
projects.
We all know (and worry) about SQL injection, but should we also worry about XQuery
injection?
With the power of extension functions and the implementation of XQuery update features,
the answer is clearly yes! We will see how an attacker can send information to an
external site or
erase a collection through XQuery injection on a naive and unprotected application
using the eXist-db REST API.
That's the bad news...
The good news is that it's quite easy to protect your application from XQuery injection
after this word of warning. We'll discuss a number of simple techniques (literal string
escaping,
wrapping values into elements or moving them out of queries in HTTP parameters) and
see how to implement them in different environments covering traditional programming
languages, XSLT, XForms
and pipeline languages.
I am not a security expert and, as far as I know, the domain covered by this paper
is very new. The list of attacks and counter attacks mentioned hereafter is nothing
more than the list of
attacks and counter attacks I can think of. This list is certainly not exhaustive and following its advise is by no means a guarantee that you'll be safe!
If you see (or
think of) other attacks or solutions, drop me an email so that I may improve the next versions of this document.
Many thanks to Alessandro Vernet (Orbeon) for the time he has spent discussing these
issues with me and for suggesting to rely on query string parameters and to Adam Retter
(eXist-db
developer) for his thorough review of this paper!
Code Injection
Wikipedia defines code injection as:
the exploitation of a computer bug that is caused by processing invalid data. Code
injection can be used by an attacker to introduce (or "inject") code into a computer
program to
change the course of execution. The results of a code injection attack can be disastrous.
For instance, code injection is used by some computer worms to propagate.
SQL injection is arguably the most common example of code injection since it can potentially affect
any web application or
website accessing a SQL database including all the widespread AMP systems.
The second well known example of code injection is Cross Site Scripting (XSS) which could be called "HTML and
JavaScript injection".
According to the Web Hacking Incident Database, SQL injection is the number one attack
method involved in 20% of the web attacks and Cross Site Scripting is number two with
13% suggesting that code injection techniques are involved in more than 1 out of 3
attacks on the web.
If it's difficult to find any mention of XQuery injection on the web, it's probably
because so few websites are powered by XML databases but also because of the false
assumption that XQuery
is a read only language and that its expression power is limited, meaning that the
consequences of XQuery injection attacks would remain limited.
This assumption must be revised now that XML databases have started implementing XQuery Update Facilities and that XQuery
engines (either databases, libraries such as Saxon or middleware such as BEA Weblogic)
have extensive extension function libraries which let them communicate with the external
world!
Furthermore, when you think about it, even the good old XSLT 1.0 document() function or its XPath 2.0/XQuery 1.0 doc() friend are potential risks.
Example of XQuery Injection
Scenario
If you develop an application that requires user interaction, you will probably need
sooner or later some kind of user authentication, and if your application is powered
by an XML
database, you may want to store user information in this database.
Note
There are two ways to rely on a database for user authentication: you can either store
user and password information in the database (like any other information) or rely
on the database
internal security mechanism. The authentication method used in this example just stores
user and password information in the database.
In the Java world, Tomcat comes with a number of so called authentication "realms" for plain files, SQL
databases or LDAP but there is no realm to use an XML database to store authentication
information.
That's not really an issue since the realm interface is easy to implement. This interface
has been designed so that you can store the passwords either as plain text or encrypted.
Of
course, it's safer (and recommended) to store encrypted passwords, but for the sake
of this example, let's say you are lazy and store them as plain text. I'll spare you
the details, but the
real meat in your XML database realm will then be to return the password and roles
for a user with a given login name.
If you are using an XML database such as eXist with its REST API, you will end up
opening an URL with a Java statement such as:
new URL("http://localhost:8080/orbeon/exist/rest/db/app/users/?_query=//user[mail=%27" + username + "%27]")
Attack
Let's put on a black hat and try to attack a site powered by an XML database that
gives us a login screen such as this one:
We don't know the precise statement used by the realm to retrieve information or the
database structure, but we assume that the authentication injects the content of HTML
form somewhere
into an XQuery as a literal string and hope the injection is done without proper sanitization.
We don't know either if the programmer has used a single or a double quote to isolate
the content of the input form, but since that makes only two possibilities, we will
just try
both.
The trick is:
to close the literal string with a single or double quote
to add whatever is needed to avoid to raise an XQuery parsing error
to add the XQuery statement that will carry the attack
to add again whatever is needed to avoid to raise a parsing error
to open again a literal string using the same quote
Let's take care of the syntactic sugar first.
We'll assume that the XQuery expression is following this generic pattern:
After injection, the XQuery expression will look like:
<URL>?_query=<PATH>[<SUBPATH> = '' or <ATTACK> or .='']
The inner or expression has 3 alternatives. The first one will likely return false
(the <SUBPATH> is meant to be the relative path to the user name and most applications
won't tolerate
empty user names in their databases. The XQuery processor will thus pull the trigger
and evaluate the attack statement.
The attack must be an XQuery "Expr" production. That includes FLOWR expressions, but excludes declarations that belong
to the prologue. In practice, that means that we can't use declare namespace declarations
and that we need to embed an extension functions call into elements that declare their
namespaces.
What kind of attack can we inject?
The first kind of attacks we can try won't break anything but export information from
the database to the external world.
With eXist, this is possible using standard extension modules such as the HTTP client
module or the mail module. These modules can be activated or deactivated in the eXist
configuration
file and we can't be sure that the attack will work but if one of them is activated
we'll be able to export the user collection...
An attack based on the mail module looks like the following:
A similar attack could send the content of the collection on pastebin.com using the
HTTP client module.
To inject the attack, we concatenate the start container string (' or ), the attack itself and the end container string ( or .='), normalize the spaces and paste
the result into the login entry field.
The login screen will return a login error, but if we've been lucky we will receive
a mail with the full content of the collection on which the query has been run.
If nothing happened, we might have used the wrong quote and we can try again replacing
the single quotes from our container string by double quotes.
If nothing happens once again, which is the case with the naive REST URL construction
used in this example, this might be because the application does not encode the query
for URI. In that
case, we must do it ourselves and encode the string before copying it into the entry
field like the XPath 2.0 encode-for-uri() would do.
And then, bingo:
We have a new message with all the information we need to login:
The second kind of attack we can try uses the same technique deletes information from
the database. A very simple and extreme one just erases anything from the collection
and leaves empty
document elements:
for $u in //user return update delete $u/(@*|node()
Note that, in both cases, we have not assumed anything about the database structure!
SQL injection attacks often try to generate errors messages that are displayed within
the resulting HTML pages by careless sites and expose information about the database
structure but
that hasn't been necessary so far.
On this authentication form, generating errors would have been hopeless since Tomcat
handles this safely and only exposes a "yes/no" answer to user entries and sends error
messages to the
server log but on other forms this could also be an option, leading to a third kind
of attacks.
If we know the database structure for any reason (this could be because we've successfully
leaked information in error messages, because the application's code is open sourced
or because
you've managed to introspect the database using functions such as xmldb;get-child-collections()), we
can also update user information with forged authentication data:
let $u := //user[role='orbeon-admin'][1]
return (
update value $u/mail with 'eric@example.com',
update value $u/password with 'foobar'
)
What about the doc() function?
It can be used to leak information to the external world:
Now that we've seen the harm that these attacks can do, what can we do to prevent
them?
A first set of recommendations is to limit the consequences of these attacks:
Do not store non encrypted passwords.
Use a user with read only permissions to perform read only queries.
Do not enable extensions modules unless you really need them.
If the authentication realm of our example had followed these basic recommendations,
our attacks would have had limited consequences:
If the database user used to query the database has no write access the attacker wouldn't
have been able to erase the user information.
If the extensions modules that allow to send mails, the attacker wouldn't have been
able to send a mail.
These recommendations are always worth to follow. They can be compared to recommending
to avoid leaving valuables in a room but there are cases when you need to do so and
that
doesn't mean that you shouldn't put a lock on the room's door!
To block the attacks themselves, we need a way to avoid the values being copied into
the XQuery expressions leaking out of the literal strings where they are supposed
to be located.
Generic How To
The most common way to block these kind of attacks is to "escape" the dangerous characters
or "sanitize" user inputs before sending them to the XQuery engine.
In an XQuery string literal, the "dangerous" characters are:
The & that can be used to make references to predefined or digital entities and needs to
be replaced by the &
The quote (either simple or double) that you use to delimit the literal that needs
to be replaced by ' or "
And that's all! These two replacements are enough to block code injections through
string literals.
Of course, you also need to use a function such as encode-for-uri() so that the URL
remains valid and to block injections through URL encoding.
The second way to block these attacks is to keep the values that are entered through
web forms out of the query itself.
When using eXist, this can be done by encoding these values and sending them as URL
query parameters. These parameters can then be retrieved using the request:get-parameter() extension function.
Which of these methods should we use?
There is no general rules and it's rather a matter of taste. That being said...
Sanitizing is more portable: request:get-parameter is an eXist specific function that
cannot be used with other databases.
Parameters may (arguably) be considered cleaner since they separate the inputs from
the request. They can also be used to call stored queries.
Note
These techniques are efficient and enough to protect your application as long as you
don't open a new breach. This is the case when your XQuery expression dynamically
executes something
against a query engine.
In a highly hypothetical case where the XQuery expression would execute a SQL query,
this SQL Query would have to be protected against SQL injection.
A more common case in XQuery land is when you use a *:evaluate() extension function
to dynamically execute an XPath or XQuery expression.
It is common to see developers filtering values as a protection against SQL Injection
and you could also do that as a protection against XQuery injection but in both cases
this is often a
bad idea!
Filtering user input is often a bad idea and whenever you do so you should be doing
that for data quality reasons and not for security reasons since the constraints will
very likely be
different.
To protect this application against XQuery injection, we could have filtered out the
user input to exclude simple quotes and that would have been effective (assuming we
use a simple quote
to delimit the string literal) but that would have given Tim O'Reilly a new opportunity
to rant against dumb applications that do not accept is name as an input!
We've seen that it's as easy to sanitize user input than it would have been to filter
it, so please, don't use filters for security!
Java
Assuming that we use single quotes to delimit XQuery string literals, inputs can be
sanitized in Java using this function:
Each user input must be sanitized separately and the whole query must then be encoded
using the URLEncoder.encode() method. Depending on the context, it may also be a good
idea to call an additional method such as trim() to remove leading and trailing space or toLowerCase() to normalize the value to lower case. In the authentication
realm, the Java snippet could be:
To query is now a fixed string that could be stored in the eXist database or encoded
in a static variable.
XPath 2.0 Environments
In environments that rely on XPath 2.0 such as XSLT 2.0, XProc, XPL,... the same patterns
can be used if we replace the Java methods with their XPath 2.0 equivalents. In XSLT
2.0 it is
possible to define a sanitize function similar to the one we've created in Java but
this isn't the case for other host languages and we'll skip this step.
To sanitize user inputs in an XPath 2.0 host language, we need to add a level of escaping
because the & character is not available directly but through the
& entity reference. The XQuery query combines simple and double quotes that are not
very easy to handle in a select attribute (even if the escaping rules of
XPath 2.0 help a lot) and the query pieces can be put into variables for convenience.
That being said, the user input can be sanitized using statements such as:
xquery version "1.0";
declare function local:sanitize-apos($text as xs:string) as xs:string {
replace(replace($text, '&', '&amp;'), '''', '&apos;')
};
declare function local:sanitize-apos($text as xs:string) as xs:string {
replace(replace($text, "&", "&amp;"), """", "&quot;")
};
local:sanitize-apos(''' or ( for $u in //user return update delete $u/(@*|node() ) ) or .=''')
XForms
The problem is very similar in XForms with the difference that XForms is meant to
deal with user input and that the chances that you'll hit the problem are significantly
bigger!
The rule of thumb here again is: never inject a user input in an XQuery without sanitizing
it or moving it out of the query using request parameters.
When using an implementation such as Orbeon Forms, that supports attribute value templates
in resource attributes, it may be tempting to write submissions such as:
Unfortunately, this would be tantamount to the unsafe Java realm that we've used as
our first example!
To secure this submission, we can just adapt one of the two methods used to secure
XSLT accesses. This is especially straightforward with the Orbeon implementation that
implements an
xxforms:variable extension very similar to XSLT variables. You can also go with FLOWR expressions
or use xforms:bind/@calculate definitions to store intermediate
results and make them more readable but it is also possible to write a mega XPath
2.0 expression such as this one:
We have explored in depth injections targeted on XQuery string literals. What about
other injections on XML based applications?
XQuery Numeric Literal Injection
It may be tempting to copy numeric input fields directly into XQuery expressions.
That's safe if, and only if, these fields are validated. If not, the techniques that
we've seen with
string literals can easily be adapted (in fact, it's even easier for your attackers
since they do not need to bother with quotes!).
That's safe if you pass these values within request parameters but you will generate
XQuery parsing errors if the input doesn't belong to the expected data type. Also
note that request:get-parameter() returns string values and may need casting in your XQuery query.
In both cases, it is a good idea to validate numeric input fields before sending your
query (this is a case where filters can be used without risking to get Tim O'Reilly
angry)!
When using XForms, this can be done by binding these inputs to numeric datatypes.
Otherwise, use whatever language you are programming with to do the test.
If you use literals and don't want (or can't) do that test outside the XQuery query
itself, you can also copy the value in a string literal and explicitly cast it into
the numeric data
type you are using. The string literal then needs to be sanitized like we've already
seen.
XQuery Direct Element Injection
Literals are the location where user input is most likely copied in XQuery based applications
(they cover all the cases where the database is queried according to parameters entered
by
our users) but there are cases where you may want to copy user input within XQuery
direct element constructors.
One of the use cases for this is the XQuery Update Facility where update primitives
may contain direct element constructors, in which it is tempting to include input
fields
values.
Here again you're safe if you use request parameters but you need to sanitize your
input if you're doing direct copy.
The danger here is not so much delimiters but rather enclosed expressions that let
your attacker include arbitrary XQuery expressions.
The < also needs to be escaped as it would be understood as a tag delimiter as well, of
course as the &..
That makes 4 characters to escape:
& must be replaced by &
< must be replaced by <
{ must be replaced by {{
} must be replaced by }}
XUpdate injection
XUpdate is safer than XQuery Update Facility since the latter has no support for enclosed
expressions. That doesn't mean that & and < are not meant to be
escaped but since XUpdate documents are a well formed XML documents, the tool or API
that you'll be using to create this document will take care of that if it's an XML
tool
Unfortunately XUpdate uses XPath expressions to qualify the targets where updates
should be applied, and if you use a database like eXist, which supports XPath 2.0
(or XQuery 1.0) in
these expressions, this opens a door for attacks that are similar to XQuery literal
injections.
Again, if you use request parameters you'll be safe.
If not, the sanitization to apply is the same as that for XQuery injection except
that the XML tool or API that you'll be using should take care of the XML entities.
*:evaluate() injection
Extension functions such as saxon:evaluate (or eXist's util:eval()) are also prone to attacks similar to XQuery injection if user input is not properly
sanitized.
The consequences of these injections may be amplified by extension functions that
provide read and write access to system resources but even vanilla XPath can be harmful
with its
document() function that provides read access to the file system as well as network resources
that may be behind the firewall protecting the server.
These function calls need to be secured using similar techniques adapted to the context
where the function is used.
Defining variables out of the function call and using these variables within the function
call is an effective solution quite similar to using query parameters in a query.
Note
When such functions are called inside a query, you may have to sanitize twice! In
that case, the second level of sanitization can be done in XQuery.