downloads | documentation | faq | getting help | mailing lists | licenses | wiki | reporting bugs | php.net sites | conferences | my php.net

search for in the

hebrev> <fprintf
[edit] Last updated: Fri, 17 May 2013

view this page in

get_html_translation_table

(PHP 4, PHP 5)

get_html_translation_tableGibt die Umwandlungs-Tabelle zurück, die von htmlspecialchars() und htmlentities() verwendet wird

Beschreibung

array get_html_translation_table ([ int $table = HTML_SPECIALCHARS [, int $flags = ENT_COMPAT | ENT_HTML401 [, string $encoding = 'UTF-8' ]]] )

get_html_translation_table() gibt die Umwandlungs-Tabelle zurück, die intern in den Funktionen htmlspecialchars() und htmlentities() verwendet wird.

Hinweis:

Sonderzeichen können auf unterschiedliche Weise kodiert werden. " kann kodiert werden als &quot;, &#34; oder &#x22. get_html_translation_table() gibt nur die von htmlspecialchars() und htmlentities() genutze Form zurück.

Parameter-Liste

table

Welche Tabelle zurückgegeben werden soll. Entweder HTML_ENTITIES oder HTML_SPECIALCHARS.

flags

Eine Bitmaske von einem oder mehreren der folgenden Flags, welche festlegen welche Anführungszeichen die Tabelle enthalten wird, sowie für welchen Dokumenttyp sie ist. Der Standardwert ist ENT_COMPAT | ENT_HTML401.

Verfügbare flags-Konstanten
Konstantenname Beschreibung
ENT_COMPAT Tabelle soll Entities für doppelte Anführungszeichen enthalten, aber nicht für einfache.
ENT_QUOTES Tabelle soll Entities sowohl für einfache, als auch doppelte Anführungszeichen enthalten.
ENT_NOQUOTES Tabelle soll weder Entities für einfache, noch für doppelte Anführungszeichen enthalten.
ENT_HTML401 Tabelle für HTML 4.01.
ENT_XML1 Tabelle für XML 1.
ENT_XHTML Tabelle für XHTML.
ENT_HTML5 Tabelle für HTML 5.

encoding

Definiert die zu verwendende Zeichenkodierung. Standardwert ist ISO-8859-1 in PHP Versionen vor 5.4.0 und UTF-8 in PHP 5.4.0 und neuer.

Die folgenden Zeichensätze werden mit PHP 4.3.0 und höher unterstützt:

Unterstützte Zeichensätze
Zeichensatz Alias Beschreibung
ISO-8859-1 ISO8859-1 Westeuropäisch, Latin-1
ISO-8859-15 ISO8859-15 Westeuropäisch, Latin-9. Enthält das Euro-Zeichen sowie französische und finnische Buchstaben, die in Latin-1(ISO-8859-1) fehlen.
UTF-8   ASCII-kompatibles Multi-Byte 8-Bit Unicode.
cp866 ibm866, 866 DOS-spezifischer Kyrillischer Zeichensatz. Dieser Zeichensatz wird ab PHP Version 4.3.2 unterstützt.
cp1251 Windows-1251, win-1251, 1251 Windows-spezifischer Kyrillischer Zeichensatz. Dieser Zeichensatz wird ab PHP Version 4.3.2 unterstützt.
cp1252 Windows-1252, 1252 Windows spezifischer Zeichensatz für westeuropäische Sprachen.
KOI8-R koi8-ru, koi8r Russisch. Dieser Zeichensatz wird ab PHP Version 4.3.2 unterstützt.
BIG5 950 Traditionelles Chinesisch, hauptsächlich in Taiwan verwendet.
GB2312 936 Vereinfachtes Chinesisch, nationaler Standard-Zeichensatz.
BIG5-HKSCS   Big5 mit Hongkong-spezifischen Erweiterungen; traditionelles Chinesisch.
Shift_JIS SJIS, 932 Japanisch
EUC-JP EUCJP Japanisch

Hinweis: Weitere Zeichensätze sind nicht implementiert, an ihrer Stelle wird ISO-8859-1 verwendet.

Rückgabewerte

Gibt die Umwandlungstabelle als Array zurück, wobei die ursprünglichen Zeichen die Schlüssel sind und die Entities die Werte.

Changelog

Version Beschreibung
5.4.0 Der Standardwert für encoding wurde zu UTF-8 geändert.
5.4.0 Die Konstanten ENT_HTML401, ENT_XML1, ENT_XHTML und ENT_HTML5 wurden hinzugefügt.
5.3.4 Der encoding parameter wurde hinzugefügt.

Beispiele

Beispiel #1 Umwandlungs-Tabellen-Beispiel

<?php
var_dump
(get_html_translation_table(HTML_ENTITIESENT_QUOTES ENT_HTML5));
?>

Das oben gezeigte Beispiel erzeugt eine ähnliche Ausgabe wie:

array(1510) {
  ["    "]=>
  string(5) "&Tab;"
  ["
"]=>
  string(9) "&NewLine;"
  ["!"]=>
  string(6) "&excl;"
  ["""]=>
  string(6) "&quot;"
  ["#"]=>
  string(5) "&num;"
  ["$"]=>
  string(8) "&dollar;"
  ["%"]=>
  string(8) "&percnt;"
  ["&"]=>
  string(5) "&amp;"
  ["'"]=>
  string(6) "&apos;"
  // ...
}

Siehe auch



hebrev> <fprintf
[edit] Last updated: Fri, 17 May 2013
 
add a note add a note User Contributed Notes get_html_translation_table - [25 notes]
up
4
michael dot genesis at gmail dot com
1 year ago
The fact that MS-word and some other sources use CP-1252, and that it is so close to Latin1 ('ISO-8859-1') causes a lot of confusion. What confused me the most was finding that mySQL uses CP-1252 by default.

You may run into trouble if you find yourself tempted to do something like this:
<?php
    $trans
[chr(149)] = '&bull;';    // Bullet
   
$trans[chr(150)] = '&ndash;';    // En Dash
   
$trans[chr(151)] = '&mdash;';    // Em Dash
   
$trans[chr(152)] = '&tilde;';    // Small Tilde
   
$trans[chr(153)] = '&trade;';    // Trade Mark Sign
?>

Don't do it. DON'T DO IT!

You can use:
<?php
    $translationTable
= get_html_translation_table(HTML_ENTITIES, ENT_NOQUOTES, 'WINDOWS-1252');
?>

or just convert directly:
<?php
    $output
= htmlentities($input, ENT_NOQUOTES, 'WINDOWS-1252');
?>

But your web page is probably encoded UTF-8, and you probably don't really want CP-1252 text flying around, so fix the character encoding first:
<?php
    $output
= mb_convert_encoding($input, 'UTF-8', 'WINDOWS-1252');
   
$ouput = htmlentities($output);
?>
up
2
kevin at cwsmailbox dot xom
2 years ago
Be careful using get_html_translation_table() in a loop, as it's very slow.
up
2
iain (duh) workingsoftware.com.au
5 years ago
I wrote a quick little function for converting something like '&middot;' into '&#183;':

$to_convert = '&middot;';
$table = get_html_translation_table(HTML_ENTITIES);
$equiv = '&#'.ord(array_search($to_convert,$table)).';';
up
1
Maurizio Siliani at trident dot it
5 years ago
If you have troubles (like me) getting data from ISO-8859-1 encoded forms where user copy and paste from word, this routine could be useful.
It adds to the standard get_html_translation_table the codes of the characters usually M$ Word replacs into typed text.
Otherwise those characters would never be displayed correctly in html output.

function get_html_translation_table_CP1252() {
    $trans = get_html_translation_table(HTML_ENTITIES);
    $trans[chr(130)] = '&sbquo;';    // Single Low-9 Quotation Mark
    $trans[chr(131)] = '&fnof;';    // Latin Small Letter F With Hook
    $trans[chr(132)] = '&bdquo;';    // Double Low-9 Quotation Mark
    $trans[chr(133)] = '&hellip;';    // Horizontal Ellipsis
    $trans[chr(134)] = '&dagger;';    // Dagger
    $trans[chr(135)] = '&Dagger;';    // Double Dagger
    $trans[chr(136)] = '&circ;';    // Modifier Letter Circumflex Accent
    $trans[chr(137)] = '&permil;';    // Per Mille Sign
    $trans[chr(138)] = '&Scaron;';    // Latin Capital Letter S With Caron
    $trans[chr(139)] = '&lsaquo;';    // Single Left-Pointing Angle Quotation Mark
    $trans[chr(140)] = '&OElig;    ';    // Latin Capital Ligature OE
    $trans[chr(145)] = '&lsquo;';    // Left Single Quotation Mark
    $trans[chr(146)] = '&rsquo;';    // Right Single Quotation Mark
    $trans[chr(147)] = '&ldquo;';    // Left Double Quotation Mark
    $trans[chr(148)] = '&rdquo;';    // Right Double Quotation Mark
    $trans[chr(149)] = '&bull;';    // Bullet
    $trans[chr(150)] = '&ndash;';    // En Dash
    $trans[chr(151)] = '&mdash;';    // Em Dash
    $trans[chr(152)] = '&tilde;';    // Small Tilde
    $trans[chr(153)] = '&trade;';    // Trade Mark Sign
    $trans[chr(154)] = '&scaron;';    // Latin Small Letter S With Caron
    $trans[chr(155)] = '&rsaquo;';    // Single Right-Pointing Angle Quotation Mark
    $trans[chr(156)] = '&oelig;';    // Latin Small Ligature OE
    $trans[chr(159)] = '&Yuml;';    // Latin Capital Letter Y With Diaeresis
    ksort($trans);
    return $trans;
}
up
1
Jérôme Jaglale
6 years ago
htmlentities includes htmlspecialchars, so here's how to convert an UTF-8 string :
htmlentities($string, ENT_QUOTES, 'UTF-8');
up
1
trukin at gmail dot com
6 years ago
There have been issues when hispanic websites or other websites dont use the corrent collision in mysql.

Some problems result that the accents (éä ... ) result in weird characters when a backup is done and restored later on. Or when database is changed to another one.

To fix this try something like this
function accents($text){
    foreach(get_html_translation_table(HTML_ENTITIES) as $a=>$b){
        $text = str_replace($a,$b,$text);   
    }
    return $text;
}

and use as accents("Hello ....... WITH ACCENTS") and it will return the escaped string.
up
1
kevin_bro at hostedstuff dot com
10 years ago
Alans version didn't seem to work right. If you're having the same problem consider using this slightly modified version instead:

function unhtmlentities ($string)  {
   $trans_tbl = get_html_translation_table (HTML_ENTITIES);
   $trans_tbl = array_flip ($trans_tbl);
   $ret = strtr ($string, $trans_tbl);
   return preg_replace('/&#(\d+);/me',
      "chr('\\1')",$ret);
}
up
2
dirk at hartmann dot net
11 years ago
get_html_translation_table
It works only with the first 256 Codepositions.
For Higher Positions, for Example &#1092;
(a kyrillic Letter) it shows the same.
up
0
chris
6 years ago
A lot of quite common characters (or at least not rare, like oelig, euro or minus) are missing from the table unfortunately.
Here are some, if you want to make your translation table more complete and your xml data less error-prone. Not sure why some characters have 2 codes, just use one. Here goes: '&apos;'=>'&#39;', '&minus;'=>'&#45;', '&circ;'=>'&#94;', '&tilde;'=>'&#126;', '&Scaron;'=>'&#138;', '&lsaquo;'=>'&#139;', '&OElig;'=>'&#140;', '&lsquo;'=>'&#145;', '&rsquo;'=>'&#146;', '&ldquo;'=>'&#147;', '&rdquo;'=>'&#148;', '&bull;'=>'&#149;', '&ndash;'=>'&#150;', '&mdash;'=>'&#151;', '&tilde;'=>'&#152;', '&trade;'=>'&#153;', '&scaron;'=>'&#154;', '&rsaquo;'=>'&#155;', '&oelig;'=>'&#156;', '&Yuml;'=>'&#159;', '&yuml;'=>'&#255;', '&OElig;'=>'&#338;', '&oelig;'=>'&#339;', '&Scaron;'=>'&#352;', '&scaron;'=>'&#353;', '&Yuml;'=>'&#376;', '&fnof;'=>'&#402;', '&circ;'=>'&#710;', '&tilde;'=>'&#732;', '&Alpha;'=>'&#913;', '&Beta;'=>'&#914;', '&Gamma;'=>'&#915;', '&Delta;'=>'&#916;', '&Epsilon;'=>'&#917;', '&Zeta;'=>'&#918;', '&Eta;'=>'&#919;', '&Theta;'=>'&#920;', '&Iota;'=>'&#921;', '&Kappa;'=>'&#922;', '&Lambda;'=>'&#923;', '&Mu;'=>'&#924;', '&Nu;'=>'&#925;', '&Xi;'=>'&#926;', '&Omicron;'=>'&#927;', '&Pi;'=>'&#928;', '&Rho;'=>'&#929;', '&Sigma;'=>'&#931;', '&Tau;'=>'&#932;', '&Upsilon;'=>'&#933;', '&Phi;'=>'&#934;', '&Chi;'=>'&#935;', '&Psi;'=>'&#936;', '&Omega;'=>'&#937;', '&alpha;'=>'&#945;', '&beta;'=>'&#946;', '&gamma;'=>'&#947;', '&delta;'=>'&#948;', '&epsilon;'=>'&#949;', '&zeta;'=>'&#950;', '&eta;'=>'&#951;', '&theta;'=>'&#952;', '&iota;'=>'&#953;', '&kappa;'=>'&#954;', '&lambda;'=>'&#955;', '&mu;'=>'&#956;', '&nu;'=>'&#957;', '&xi;'=>'&#958;', '&omicron;'=>'&#959;', '&pi;'=>'&#960;', '&rho;'=>'&#961;', '&sigmaf;'=>'&#962;', '&sigma;'=>'&#963;', '&tau;'=>'&#964;', '&upsilon;'=>'&#965;', '&phi;'=>'&#966;', '&chi;'=>'&#967;', '&psi;'=>'&#968;', '&omega;'=>'&#969;', '&thetasym;'=>'&#977;', '&upsih;'=>'&#978;', '&piv;'=>'&#982;', '&ensp;'=>'&#8194;', '&emsp;'=>'&#8195;', '&thinsp;'=>'&#8201;', '&zwnj;'=>'&#8204;', '&zwj;'=>'&#8205;', '&lrm;'=>'&#8206;', '&rlm;'=>'&#8207;', '&ndash;'=>'&#8211;', '&mdash;'=>'&#8212;', '&lsquo;'=>'&#8216;', '&rsquo;'=>'&#8217;', '&sbquo;'=>'&#8218;', '&ldquo;'=>'&#8220;', '&rdquo;'=>'&#8221;', '&bdquo;'=>'&#8222;', '&dagger;'=>'&#8224;', '&Dagger;'=>'&#8225;', '&bull;'=>'&#8226;', '&hellip;'=>'&#8230;', '&permil;'=>'&#8240;', '&prime;'=>'&#8242;', '&Prime;'=>'&#8243;', '&lsaquo;'=>'&#8249;', '&rsaquo;'=>'&#8250;', '&oline;'=>'&#8254;', '&frasl;'=>'&#8260;', '&euro;'=>'&#8364;'
up
0
Patrick nospam at nospam mesopia dot com
7 years ago
Not sure what's going on here but I've run into a problem that others might face as well...

<?php

$translations
= array_flip(get_html_translation_table(HTML_ENTITIES,ENT_QUOTES));

?>

returns the single quote ' as being equal to &#39; while

<?php

$translatedString
= htmlentities($string,ENT_QUOTES);

?>
returns it as being equal to &#039;

I've had to do a specific string replacement for the time being... Not sure if it's an issue with the function or the array manipulation.

-Pat
up
0
pinkpanther at swissonline dot ch
9 years ago
In case you want a 'htmlentities' function which prevents 'double' encoding of the ampersands of already present entities (&gt; => &amp;gt;), use this:

<?php
function htmlentities2($myHTML) {
  
$translation_table=get_html_translation_table (HTML_ENTITIES,ENT_QUOTES);
  
$translation_table[chr(38)] = '&';
   return
preg_replace("/&(?![A-Za-z]{0,4}\w{2,3};|#[0-9]{2,3};)/","&amp;" , strtr($myHTML, $translation_table));
}
?>
up
-1
subweb007 at hotmail dot com
4 years ago
This function will convert get_html_translation_table from a ISO-8859-1 string to UTF-8 string.

<?php
function translation_table_to_utf8($arTranslationtable)
{
   
//loop through the array and convert everything both keys and values
   
foreach($arTranslationtable as $charkey => $char)
    {
       
$charkey = utf8_encode($charkey);
       
$arUTFchars[$charkey]= utf8_encode($char);
    }
     return
$arUTFchars;
}

//get the translation table
$arSpecialchar     = get_html_translation_table(HTML_ENTITIES);

//call the function to convert to utf-8
$arUTFchars = translation_table_to_utf8($arSpecialchar);
print_r($arUTFchars);
?>
up
-1
Alex Minkoff
8 years ago
If you want to display special HTML entities in a web browser, you can use the following code:

<?
$entities = get_html_translation_table(HTML_ENTITIES);
foreach ($entities as $entity) {
    $new_entities[$entity] = htmlspecialchars($entity);
}
echo "<pre>";
print_r($new_entities);
echo "</pre>";
?>

If you don't, the key name of each element will appear to be the same as the element content itself, making it look mighty stupid. ;)
up
-1
alan at akbkhome dot com
10 years ago
If you want to decode all those &#123; symbols as well....

function unhtmlentities ($string)  {
    $trans_tbl = get_html_translation_table (HTML_ENTITIES);
    $trans_tbl = array_flip ($trans_tbl);
    $ret = strtr ($string, $trans_tbl);
    return  preg_replace('/\&\#([0-9]+)\;/me',
        "chr('\\1')",$ret);
}
up
0
adolfoabegg at gmail dot com
4 years ago
"rafael at phpit dot com dot br" your solution only works for the ISO-8859-1 encoding, I mean, it works but only for that encoding and that's because get_html_translation_table won't let you specify the charset... it uses the default one, that is ISO-8859-1

The solution from "olito24 at gmx dot de" does work for UTF-8, I just modified it a bit specifying the UTF-8 charset, also the $str parameter wasn't being used at all, I just renamed it to $string

Note:
Change ENT_NOQUOTES to ENT_QUOTES to convert both double and single quotes

These are the functions to encode html but tags using UTF-8 and ISO-8859-1

<?php

class Html
{

/*by olito24 at gmx dot de*/
   
function htmlButTags($string) {       
       
       
$pattern = '<([a-zA-Z0-9\. "\'_\/-=;\(\)?&#%]+)>';
       
preg_match_all ('/' . $pattern . '/', $string, $tagMatches, PREG_SET_ORDER);
       
$textMatches = preg_split ('/' . $pattern . '/', $string);
       
        foreach (
$textMatches as $key => $value) {
           
$textMatches [$key] = htmlentities ($value, ENT_NOQUOTES, 'UTF-8');
        }
       
        for (
$i = 0; $i < count ($textMatches); $i ++) {
           
$textMatches [$i] = $textMatches [$i] . $tagMatches [$i] [0];
        }
       
        return
implode ($textMatches);
       
    }

/*by "rafael at phpit dot com dot br" */
   
function htmlButTags_iso($str){
       
// Take all the html entities
       
$caracteres = get_html_translation_table(HTML_ENTITIES,ENT_NOQUOTES);
       
// Find out the "tags" entities
       
$remover = get_html_translation_table(HTML_SPECIALCHARS,ENT_NOQUOTES);
       
// Spit out the tags entities from the original table
       
$caracteres = array_diff($caracteres, $remover);
       
// Translate the string....
       
$str = strtr($str, $caracteres);
       
// And that's it!
       
return $str;
    }
   
}

?>
up
0
yes at king22 dot com
6 years ago
Searching for a fast replacement of the MS WORD special characters which are not covered by get_html_translation_table() , I think the following function might help someone

<?php
function clean_up($str){
$str = stripslashes($str);
$str = strtr($str, get_html_translation_table(HTML_ENTITIES));
$str = str_replace( array("\x82", "\x84", "\x85", "\x91", "\x92", "\x93", "\x94", "\x95", "\x96""\x97"), array("&#8218;", "&#8222;", "&#8230;", "&#8216;", "&#8217;", "&#8220;", "&#8221;", "&#8226;", "&#8211;", "&#8212;"),$str);
return
$str;
}
?>

It replaces all types of quotes (single and double), horizontal ellipsis (...), bullet, en dash and em dash.
up
0
chris
6 years ago
and a few more :
'&image;'=>'&#8465;', '&weierp;'=>'&#8472;', '&real;'=>'&#8476;', '&trade;'=>'&#8482;', '&alefsym;'=>'&#8501;', '&larr;'=>'&#8592;', '&uarr;'=>'&#8593;', '&rarr;'=>'&#8594;', '&darr;'=>'&#8595;', '&harr;'=>'&#8596;', '&crarr;'=>'&#8629;', '&lArr;'=>'&#8656;', '&uArr;'=>'&#8657;', '&rArr;'=>'&#8658;', '&dArr;'=>'&#8659;', '&hArr;'=>'&#8660;', '&forall;'=>'&#8704;', '&part;'=>'&#8706;', '&exist;'=>'&#8707;', '&empty;'=>'&#8709;', '&nabla;'=>'&#8711;', '&isin;'=>'&#8712;', '&notin;'=>'&#8713;', '&ni;'=>'&#8715;', '&prod;'=>'&#8719;', '&sum;'=>'&#8721;', '&minus;'=>'&#8722;', '&lowast;'=>'&#8727;', '&radic;'=>'&#8730;', '&prop;'=>'&#8733;', '&infin;'=>'&#8734;', '&ang;'=>'&#8736;', '&and;'=>'&#8743;', '&or;'=>'&#8744;', '&cap;'=>'&#8745;', '&cup;'=>'&#8746;', '&int;'=>'&#8747;', '&there4;'=>'&#8756;', '&sim;'=>'&#8764;', '&cong;'=>'&#8773;', '&asymp;'=>'&#8776;', '&ne;'=>'&#8800;', '&equiv;'=>'&#8801;', '&le;'=>'&#8804;', '&ge;'=>'&#8805;', '&sub;'=>'&#8834;', '&sup;'=>'&#8835;', '&nsub;'=>'&#8836;', '&sube;'=>'&#8838;', '&supe;'=>'&#8839;', '&oplus;'=>'&#8853;', '&otimes;'=>'&#8855;', '&perp;'=>'&#8869;', '&sdot;'=>'&#8901;', '&lceil;'=>'&#8968;', '&rceil;'=>'&#8969;', '&lfloor;'=>'&#8970;', '&rfloor;'=>'&#8971;', '&lang;'=>'&#9001;', '&rang;'=>'&#9002;', '&loz;'=>'&#9674;', '&spades;'=>'&#9824;', '&clubs;'=>'&#9827;', '&hearts;'=>'&#9829;', '&diams;'=>'&#9830;'
up
0
zohar at zohararad dot com
6 years ago
Another way of converting HTML entities into numeric entities to please XML parsers is using two arrays as conversion tables in a preg_replace function. The conversion table mechanism is based on Ryan's examples above.

<?php
function xmlEntities($s){
//build first an assoc. array with the entities we want to match
$table1 = get_html_translation_table(HTML_ENTITIES, ENT_QUOTES);

//now build another assoc. array with the entities we want to replace (numeric entities)
foreach ($table1 as $k=>$v){
 
$table1[$k] = "/$v/";
 
$c = htmlentities($k,ENT_QUOTES,"UTF-8");
 
$table2[$c] = "&#".ord($k).";";
}

//now perform a replacement using preg_replace
//each matched value in array 1 will be replaced with the corresponding value in array 2
$s = preg_replace($table1,$table2,$s);
return
$s;
}
?>
up
0
rbotzer at yahoo dot com
7 years ago
The existance of html entities such as &quot; inside an xml node causes most xml parsers to throw an error.  The following function cleans an input string by converting html entities to valid unicode entities.

<?php

function htmlentities2unicodeentities ($input) {
 
$htmlEntities = array_values (get_html_translation_table (HTML_ENTITIES, ENT_QUOTES));
 
$entitiesDecoded = array_keys   (get_html_translation_table (HTML_ENTITIES, ENT_QUOTES));
 
$num = count ($entitiesDecoded);
  for (
$u = 0; $u < $num; $u++) {
   
$utf8Entities[$u] = '&#'.ord($entitiesDecoded[$u]).';';
  }
  return
str_replace ($htmlEntities, $utf8Entities, $input);
}
?>

So, an input of
Copyrights &copy; make &quot;me&quot; grin &reg;

outputs
Copyrights &#169; make &#34;me&#34; grin &#174;
up
0
ryan at ryancannon dot com
8 years ago
In XML, you can't assume that the doctype will include the same character entity definitions as HTML. XML authors may require character references instead. The following two functions use get_html_translation_table() to encode data in numeric references. The second, optional argument can be used to substitute a different translation table.

function xmlcharacters($string, $trans='') {
    $trans=(is_array($trans))? $trans:get_html_translation_table(HTML_ENTITIES, ENT_QUOTES);
    foreach ($trans as $k=>$v)
        $trans[$k]= "&#".ord($k).";";
    return strtr($string, $trans);
}
function xml_character_decode($string, $trans='') {
    $trans=(is_array($trans))? $trans:get_html_translation_table(HTML_ENTITIES, ENT_QUOTES);
    foreach ($trans as $k=>$v)
        $trans[$k]= "&#".ord($k).";";
    $trans=array_flip($trans);
    return strtr($string, $trans);
}
up
-1
robertn972 at gmail dot com
4 years ago
I found this useful in converting latin characters

<?php
function convertLatin1ToHtml($str) {
$allEntities = get_html_translation_table(HTML_ENTITIES, ENT_NOQUOTES);
$specialEntities = get_html_translation_table(HTML_SPECIALCHARS, ENT_NOQUOTES);
$noTags = array_diff($allEntities, $specialEntities);
$str = strtr($str, $noTags);
return
$str;
}
?>
up
-1
Liam Morland
4 years ago
Here is a simple way to convert named character entities to numeric character entities:

<?php
function numeric_entities($string){
   
$mapping = array();
    foreach (
get_html_translation_table(HTML_ENTITIES, ENT_QUOTES) as $char => $entity){
       
$mapping[$entity] = '&#' . ord($char) . ';';
    }
    return
str_replace(array_keys($mapping), $mapping, $string);
}
?>
up
-1
edwardzyang at thewritingpot dot com
6 years ago
Quite disappointingly, get_html_translation_table() only gives the characters for ISO-8859-1, making it quite useless for UTF-8 or anything else like that (as a previous commenter noticed).
up
-2
Kenneth Kin Lum
4 years ago
to display the mapping on a webpage no matter what the server encoding is, this can be used

  echo "<pre>\n";
  echo htmlentities(print_r((get_html_translation_table(HTML_SPECIALCHARS)), true));
  echo htmlentities(print_r((get_html_translation_table(HTML_ENTITIES)), true));

since get_html_translation_table() actually gives the special chars in iso-8859-1 (Latin-1) encoding, so to see the tables correctly using

  print_r(get_html_translation_table(HTML_ENTITIES));

your server needs to give a HTTP header as iso-8859-1, unless you use header() or manually set the browser's encoding setting to iso-8859-1.  And you need to view the source of the page to see the mapping.  (except English version of IE 7 outputs the page source as iso-8859-1 anyway).
up
-2
kumar at chicagomodular.com
10 years ago
without heavy scientific analysis, this seems to work as a quick fix to making text originating from a Microsoft Word document display as HTML:

<?php
function DoHTMLEntities ($string)
    {
       
$trans_tbl = get_html_translation_table (HTML_ENTITIES);
       
       
// MS Word strangeness..
        // smart single/ double quotes:
       
$trans_tbl[chr(145)] = '\'';
       
$trans_tbl[chr(146)] = '\'';
       
$trans_tbl[chr(147)] = '&quot;';
       
$trans_tbl[chr(148)] = '&quot;';

               
// Acute 'e'
       
$trans_tbl[chr(142)] = '&eacute;';
       
        return
strtr ($string, $trans_tbl);
    }
?>

 
show source | credits | sitemap | contact | advertising | mirror sites