Monday, March 10, 2008

Dom4j and XPath

In this, the latest installment of obscure gotchas in Java development, I'm going to discuss an interesting behavior of Dom4j, definitely something to beware of: when you use '/' or '//' to start an XPath search expression in conjunction with the instance method Node.select{Nodes | SingleNode}, the search does not start at that node! In fact, it will always start at the actual document root, contrary to what one may expect from looking at the code / API.

Allow me to illustrate with an example. Lets say you are working with this simplified XML document:

<Account>
<Owner>
<ContactInfo>
<Name>Tom Jones</Name>
...
</ContactInfo>
...
</Owner>
<Cosigner>
<ContactInfo>
<Name>Jim Johnson</Name>
...
</ContactInfo>
...
</Cosigner>
</Account>

Dom4j makes it easy to find the Cosigner node:
Node cosignerNode = document.selectSingleNode("/Account/Cosigner");

and at first glance, I thought the following code snippet would return the Cosigner's name:
cosignerNode.selectSingleNode("//ContactInfo/Name") => "Tom Jones"

Counter intuitively, this code returns 'Tom Jones'. This is because when you start an XPath query with '/' or '//', Dom4j will traverse the DOM back up to the root node to begin its search. By removing the leading slashes, it works as expected:
cosignerNode.selectSingleNode("ContactInfo/Name") => "Jim Johnson"

So in conclusion, be careful whenever you use selectSingleNode; make sure that you understand that whenever you use // relative XPath queries, the result will come from the root of the entire document, and will not be limited to children of the node on which you invoke it.