Normalizer::normalize

normalizer_normalize

(PHP 5 >= 5.3.0, PHP 7, PHP 8, PECL intl >= 1.0.0)

Normalizer::normalize -- normalizer_normalize Normalizes the input provided and returns the normalized string

Description

Object-oriented style

public static Normalizer::normalize(string $string, int $form = Normalizer::FORM_C): string|false

Procedural style

normalizer_normalize(string $string, int $form = Normalizer::FORM_C): string|false

Normalizes the input provided and returns the normalized string

Parameters

string

The input string to normalize

form

One of the normalization forms.

Return Values

The normalized string or false if an error occurred.

Examples

Example #1 normalizer_normalize() example

<?php
$char_A_ring
= "\xC3\x85"; // 'LATIN CAPITAL LETTER A WITH RING ABOVE' (U+00C5)
$char_combining_ring_above = "\xCC\x8A"; // 'COMBINING RING ABOVE' (U+030A)

$char_1 = normalizer_normalize( $char_A_ring, Normalizer::FORM_C );
$char_2 = normalizer_normalize( 'A' . $char_combining_ring_above, Normalizer::FORM_C );

echo
urlencode($char_1);
echo
' ';
echo
urlencode($char_2);
?>

Example #2 OO example

<?php
$char_A_ring
= "\xC3\x85"; // 'LATIN CAPITAL LETTER A WITH RING ABOVE' (U+00C5)
$char_combining_ring_above = "\xCC\x8A"; // 'COMBINING RING ABOVE' (U+030A)

$char_1 = Normalizer::normalize( $char_A_ring, Normalizer::FORM_C );
$char_2 = Normalizer::normalize( 'A' . $char_combining_ring_above, Normalizer::FORM_C );

echo
urlencode($char_1);
echo
' ';
echo
urlencode($char_2);
?>

The above example will output:

%C3%85 %C3%85

See Also

add a note add a note

User Contributed Notes 4 notes

up
4
anrdaemon at freemail dot ru
6 years ago
"If you get error messages while starting apache of xampp package with activated extension=intl.dll," do NOT copy any files around.

Use Apache's "LoadFile …" functionality to load any missing DLL's not found within a %PATH%. Even php##ts.dll itself.
up
5
spam at oscar dot xyz
9 years ago
You can use the 'original' abbreviations if you feel more comfortable:

<?php
Normalizer
::NFD;
Normalizer::NFKD;
Normalizer::NFC;
Normalizer::NFKC;
?>
up
6
akniep at rayo dot info
14 years ago
Especially when matching texts against each-other or against keywords, it is helpful to normalize the texts before.
The following function removes all diacritics (marks like accents) from a given UTF8-encoded texts and returns ASCii-text.

Be sure to have the PHP-Normalizer-extension (intl and icu) installed.

Tipp: You may also want to map the text to lower case before execute matching procedures ...

<?php

function normalizeUtf8String( $s)
{
   
// Normalizer-class missing!
   
if (! class_exists("Normalizer", $autoload = false))
        return
$original_string;
   
   
   
// maps German (umlauts) and other European characters onto two characters before just removing diacritics
   
$s    = preg_replace( '@\x{00c4}@u'    , "AE",    $s );    // umlaut Ä => AE
   
$s    = preg_replace( '@\x{00d6}@u'    , "OE",    $s );    // umlaut Ö => OE
   
$s    = preg_replace( '@\x{00dc}@u'    , "UE",    $s );    // umlaut Ü => UE
   
$s    = preg_replace( '@\x{00e4}@u'    , "ae",    $s );    // umlaut ä => ae
   
$s    = preg_replace( '@\x{00f6}@u'    , "oe",    $s );    // umlaut ö => oe
   
$s    = preg_replace( '@\x{00fc}@u'    , "ue",    $s );    // umlaut ü => ue
   
$s    = preg_replace( '@\x{00f1}@u'    , "ny",    $s );    // ñ => ny
   
$s    = preg_replace( '@\x{00ff}@u'    , "yu",    $s );    // ÿ => yu
   
   
    // maps special characters (characters with diacritics) on their base-character followed by the diacritical mark
        // exmaple:  Ú => U´,  á => a`
   
$s    = Normalizer::normalize( $s, Normalizer::FORM_D );
   
   
   
$s    = preg_replace( '@\pM@u'        , "",    $s );    // removes diacritics
   
   
   
$s    = preg_replace( '@\x{00df}@u'    , "ss",    $s );    // maps German ß onto ss
   
$s    = preg_replace( '@\x{00c6}@u'    , "AE",    $s );    // Æ => AE
   
$s    = preg_replace( '@\x{00e6}@u'    , "ae",    $s );    // æ => ae
   
$s    = preg_replace( '@\x{0132}@u'    , "IJ",    $s );    // ? => IJ
   
$s    = preg_replace( '@\x{0133}@u'    , "ij",    $s );    // ? => ij
   
$s    = preg_replace( '@\x{0152}@u'    , "OE",    $s );    // Œ => OE
   
$s    = preg_replace( '@\x{0153}@u'    , "oe",    $s );    // œ => oe
   
   
$s    = preg_replace( '@\x{00d0}@u'    , "D",    $s );    // Ð => D
   
$s    = preg_replace( '@\x{0110}@u'    , "D",    $s );    // Ð => D
   
$s    = preg_replace( '@\x{00f0}@u'    , "d",    $s );    // ð => d
   
$s    = preg_replace( '@\x{0111}@u'    , "d",    $s );    // d => d
   
$s    = preg_replace( '@\x{0126}@u'    , "H",    $s );    // H => H
   
$s    = preg_replace( '@\x{0127}@u'    , "h",    $s );    // h => h
   
$s    = preg_replace( '@\x{0131}@u'    , "i",    $s );    // i => i
   
$s    = preg_replace( '@\x{0138}@u'    , "k",    $s );    // ? => k
   
$s    = preg_replace( '@\x{013f}@u'    , "L",    $s );    // ? => L
   
$s    = preg_replace( '@\x{0141}@u'    , "L",    $s );    // L => L
   
$s    = preg_replace( '@\x{0140}@u'    , "l",    $s );    // ? => l
   
$s    = preg_replace( '@\x{0142}@u'    , "l",    $s );    // l => l
   
$s    = preg_replace( '@\x{014a}@u'    , "N",    $s );    // ? => N
   
$s    = preg_replace( '@\x{0149}@u'    , "n",    $s );    // ? => n
   
$s    = preg_replace( '@\x{014b}@u'    , "n",    $s );    // ? => n
   
$s    = preg_replace( '@\x{00d8}@u'    , "O",    $s );    // Ø => O
   
$s    = preg_replace( '@\x{00f8}@u'    , "o",    $s );    // ø => o
   
$s    = preg_replace( '@\x{017f}@u'    , "s",    $s );    // ? => s
   
$s    = preg_replace( '@\x{00de}@u'    , "T",    $s );    // Þ => T
   
$s    = preg_replace( '@\x{0166}@u'    , "T",    $s );    // T => T
   
$s    = preg_replace( '@\x{00fe}@u'    , "t",    $s );    // þ => t
   
$s    = preg_replace( '@\x{0167}@u'    , "t",    $s );    // t => t
   
    // remove all non-ASCii characters
   
$s    = preg_replace( '@[^\0-\x80]@u'    , "",    $s );
   
   
   
// possible errors in UTF8-regular-expressions
   
if (empty($s))
        return
$original_string;
    else
        return
$s;
}
?>

The above function is mainly based on the following article:
http://ahinea.com/en/tech/accented-translate.html
up
-7
tom dot vom dot berg at online dot de
9 years ago
If you get error messages while starting apache of xampp package with activated extension=intl.dll, copy the files

    * icudt##.dll
    * icuin##.dll
    * icuio##.dll
    * icule##.dll
    * iculx##.dll
    * icutu##.dll
    * icuuc##.dll

## = version number

from "/program files/xampp/php"
into your "/program files/xampp/apache/bin" or whereever your xampp resides :-)
To Top