To highlight words in multi-byte text:
<?php
$s = 'Алабала';
$f = 'а';
echo preg_replace('/('.$f.')/iu', '<b>$1</b>', $s);
?>
(PHP 4 >= 4.2.0, PHP 5)
mb_eregi_replace — Replace regular expression with multibyte support ignoring case
$pattern
, string $replace
, string $string
[, string $option
= "msri"
] )
Scans string
for matches to
pattern
, then replaces the matched text
with replacement
.
pattern
The regular expression pattern. Multibyte characters may be used. The case will be ignored.
replace
The replacement text.
string
The searched string.
option
option
has the same meaning as in
mb_ereg_replace().
The resultant string or FALSE
on error.
Note:
mb_regex_encoding() 指定的內部編碼或字元編碼將會當做此函式用的字元編碼。
處理非信任的輸入時從不使用e 修飾字元,就不會轉法(即呼叫preg_replace())。不注意這些很可能會導致應用程式引發遠端程式執行的漏洞。
To highlight words in multi-byte text:
<?php
$s = 'Алабала';
$f = 'а';
echo preg_replace('/('.$f.')/iu', '<b>$1</b>', $s);
?>
Transliterator for cyrillic-to-latin letters for UTF chars:
<?php
function do_translit($st) {
$replacement = array(
"й"=>"i","ц"=>"c","у"=>"u","к"=>"k","е"=>"e","н"=>"n",
"г"=>"g","ш"=>"sh","щ"=>"sh","з"=>"z","х"=>"x","ъ"=>"\'",
"ф"=>"f","ы"=>"i","в"=>"v","а"=>"a","п"=>"p","р"=>"r",
"о"=>"o","л"=>"l","д"=>"d","ж"=>"zh","э"=>"ie","ё"=>"e",
"я"=>"ya","ч"=>"ch","с"=>"c","м"=>"m","и"=>"i","т"=>"t",
"ь"=>"\'","б"=>"b","ю"=>"yu",
"Й"=>"I","Ц"=>"C","У"=>"U","К"=>"K","Е"=>"E","Н"=>"N",
"Г"=>"G","Ш"=>"SH","Щ"=>"SH","З"=>"Z","Х"=>"X","Ъ"=>"\'",
"Ф"=>"F","Ы"=>"I","В"=>"V","А"=>"A","П"=>"P","Р"=>"R",
"О"=>"O","Л"=>"L","Д"=>"D","Ж"=>"ZH","Э"=>"IE","Ё"=>"E",
"Я"=>"YA","Ч"=>"CH","С"=>"C","М"=>"M","И"=>"I","Т"=>"T",
"Ь"=>"\'","Б"=>"B","Ю"=>"YU",
);
foreach($replacement as $i=>$u) {
$st = mb_eregi_replace($i,$u,$st);
}
return $st;
}
?>
when trying to find a way to strip newline from a multibyte UTF-8 string i got to this function just to discover later that POSIX don't "do" newline so i can't strip them, examples of what i tried are : \r\n , \\r\\n , (\\r\\n) (\\r|\\n)
and got no result
so since i wanted something like mb_nl2br() that's simple i wrote this little recursive function for UTF-8:
<?php
function mb_str_replace($find,$replace,&$str)
{
$i = mb_strpos($str,$find, 0,"UTF-8");
if ($index===false) {return;}
$str = mb_substr($str, 0,$i).$replace.mb_substr($str, $i+mb_strlen($find,"UTF-8"),mb_strlen($str,"UTF-8"));
$this->mb_str_replace($find,$replace,$str);
}
?>
note: moderate unit tesing was done, changed to other encodings