La classe DOMXPath

(PHP 5, PHP 7)

Introduction

Support de XPath 1.0.

Synopsis de la classe

DOMXPath {
/* Propriétés */
/* Méthodes */
public __construct ( DOMDocument $doc )
public mixed evaluate ( string $expression [, DOMNode $contextnode [, bool $registerNodeNS = true ]] )
public DOMNodeList query ( string $expression [, DOMNode $contextnode [, bool $registerNodeNS = true ]] )
public bool registerNamespace ( string $prefix , string $namespaceURI )
public void registerPhpFunctions ([ mixed $restrict ] )
}

Propriétés

document

Sommaire

add a note add a note

User Contributed Notes 6 notes

up
45
Mark Omohundro, ajamyajax dot com
8 years ago
<?php
// to retrieve selected html data, try these DomXPath examples:

$file = $DOCUMENT_ROOT. "test.html";
$doc = new DOMDocument();
$doc->loadHTMLFile($file);

$xpath = new DOMXpath($doc);

// example 1: for everything with an id
//$elements = $xpath->query("//*[@id]");

// example 2: for node data in a selected id
//$elements = $xpath->query("/html/body/div[@id='yourTagIdHere']");

// example 3: same as above with wildcard
$elements = $xpath->query("*/div[@id='yourTagIdHere']");

if (!
is_null($elements)) {
  foreach (
$elements as $element) {
    echo
"<br/>[". $element->nodeName. "]";

   
$nodes = $element->childNodes;
    foreach (
$nodes as $node) {
      echo
$node->nodeValue. "\n";
    }
  }
}
?>
up
2
archimedix32783262 at mailinator dot com
3 years ago
Note that evaluate() will use the same encoding as the XML document.

So if you have a UTF-16 XML, you will have to query using UTF-16 strings.

You can use iconv() to convert from your code's encoding to the target encoding for better legibility.
up
1
peter at softcoded dot com
6 months ago
You may not always know at runtime whether your file has
a namespace or not. This can make it difficult to create
XPath queries. Use the seriously underdocumented
"namespaceURI" property of the documentElement of a
DOMDocument to determine if there is a namespace.
Use code such as the following:

$doc = new DOMDocument();
$doc->load($file);
$xpath = new DOMXPath($doc);
$ns = $doc->documentElement->namespaceURI;
if($ns) {
  $xpath->registerNamespace("ns", $ns);
  $nodes = $xpath->query("//ns:em[@class='glossterm']");
} else {
  $nodes = $xpath->query("//em[@class='glossterm']");
}
//look at nodes here
up
0
peter at softcoded dot com
6 months ago
Using XPath expressions can save a lot of programming
and allow you to home in on only the nodes you want.
Suppose you want to delete all empty <p> tags.
If you create a query using the following XPath expression,
you can find <p> tags that do not have any text
(other than spaces), any attributes,
any children or comments:

$expression = "//p[not(@*) 
   and not(*)
   and not(./comment())
   and normalize-space(text())='']";
  
This expression will only find para tags that look like:

<p>[any number of spaces]</p>
<p></p>

Imagine the code you would have to add if you used
DOMDocument::getElementsByTagName("p") instead.
up
-20
dhz
6 years ago
I just spent far too much time chasing this one....

When running an xpath query on a table be careful about table internal nodes (ie: <tr></tr>, and <td></td>).  If the master <table> tag is missing, then query() (and likely evaluate() also) will return unexpected results.

I had a DOMNode with a structure like this:

<td>
    <table></table>
    <table>
        <tr>
            <td></td>
        </tr>
        <tr>
            <td></td>
            <td></td>
        </tr>
    </table>
</td>

Upon which I was trying to do a relative query (ie: <?php $xpath_obj->query('my/x/path', $relative_node); ?>).

But because of the lone outer <td></td> tags, the inner tags were being invalidated, while the nodes were still recognized.  Meaning that the following query would work:

<?php $xpath_obj->query('*[2]/*[*[2]]', $relative_node); ?>

But when replacing any of the "*" tokens with the corresponding (and valid) "table", "tr", or "td" tokens the query would inexplicably break.
up
-35
david at lionhead dot nl
8 years ago
When using DOMXPath and having a default namespace. Consider using an intermediate function to add the default namespace to all queries:

<?php
// The default namespace: x:xmlns="http://..."
$path="/Book/Title";
$path=preg_replace("\/([a-zA-Z])","/x:$1",$path);

// Result: /x:Book/x:Title
?>
To Top