Submit a Pull Request Report a Bug

Normalizer::normalize

normalizer_normalize

(PHP 5 >= 5.3.0, PHP 7, PHP 8, PECL intl >= 1.0.0)

Normalizer::normalize -- normalizer_normalize — Normaliza la entrada provista y devuelve la cadena normalizada

Descripción

Estilo orientado a objetos

public static Normalizer::normalize(string $input, int $form = Normalizer::FORM_C): string

Estilo por procedimientos

normalizer_normalize(string $input, int $form = Normalizer::FORM_C): string

Normaliza la entrada provista y devuelve la cadena normalizada

Parámetros

input: La cadena de entrada a normalizar
form: Una de las formas de normalización.

Valores devueltos

La cadena normalizada o false si ocurrió un error.

Ejemplos

Ejemplo #1 Ejemplo de normalizer_normalize()

<?php
$carácter_A_anillo = "\xC3\x85"; // 'LETRA LATINA MAYÚSCULA A CON ANILLO SUPERIOR' (U+00C5)
$carácter_anillo_superior_combinación = "\xCC\x8A";  // 'ANILLO SUPERIOR DE COMBINACIÓN' (U+030A)
 
$carácter_1 = normalizer_normalize( $carácter_A_anillo, Normalizer::FORM_C );
$carácter_2 = normalizer_normalize( 'A' . $carácter_anillo_superior_combinación, Normalizer::FORM_C );
 
echo urlencode($carácter_1);
echo ' ';
echo urlencode($carácter_2);
?>

Ejemplo #2 Ejemplo Orientado a Objetos

<?php
$carácter_A_anillo = "\xC3\x85"; // 'LETRA LATINA MAYÚSCULA A CON ANILLO SUPERIOR' (U+00C5)
$carácter_anillo_superior_combinación = "\xCC\x8A";  // 'ANILLO SUPERIOR DE COMBINACIÓN' (U+030A)
 
$carácter_1 = Normalizer::normalize( $carácter_A_anillo, Normalizer::FORM_C );
$carácter_2 = Normalizer::normalize( 'A' . $carácter_anillo_superior_combinación, Normalizer::FORM_C );
 
echo urlencode($carácter_1);
echo ' ';
echo urlencode($carácter_2);
?>

El resultado del ejemplo sería:

%C3%85 %C3%85

Ver también

normalizer_is_normalized() - Comprobar si la cadena proporcionada ya está en la forma de normalización especificada.

add a note

User Contributed Notes 4 notes

down

anrdaemon at freemail dot ru ¶

6 years ago


"If you get error messages while starting apache of xampp package with activated extension=intl.dll," do NOT copy any files around.

Use Apache's "LoadFile …" functionality to load any missing DLL's not found within a %PATH%. Even php##ts.dll itself.

down

spam at oscar dot xyz ¶

9 years ago


You can use the 'original' abbreviations if you feel more comfortable:

<?php
Normalizer::NFD;
Normalizer::NFKD;
Normalizer::NFC;
Normalizer::NFKC;
?>

down

akniep at rayo dot info ¶

14 years ago


Especially when matching texts against each-other or against keywords, it is helpful to normalize the texts before.
The following function removes all diacritics (marks like accents) from a given UTF8-encoded texts and returns ASCii-text.

Be sure to have the PHP-Normalizer-extension (intl and icu) installed.

Tipp: You may also want to map the text to lower case before execute matching procedures ...

<?php

function normalizeUtf8String( $s)
{
    // Normalizer-class missing!
    if (! class_exists("Normalizer", $autoload = false))
        return $original_string;
    
    
    // maps German (umlauts) and other European characters onto two characters before just removing diacritics
    $s    = preg_replace( '@\x{00c4}@u'    , "AE",    $s );    // umlaut Ä => AE
    $s    = preg_replace( '@\x{00d6}@u'    , "OE",    $s );    // umlaut Ö => OE
    $s    = preg_replace( '@\x{00dc}@u'    , "UE",    $s );    // umlaut Ü => UE
    $s    = preg_replace( '@\x{00e4}@u'    , "ae",    $s );    // umlaut ä => ae
    $s    = preg_replace( '@\x{00f6}@u'    , "oe",    $s );    // umlaut ö => oe
    $s    = preg_replace( '@\x{00fc}@u'    , "ue",    $s );    // umlaut ü => ue
    $s    = preg_replace( '@\x{00f1}@u'    , "ny",    $s );    // ñ => ny
    $s    = preg_replace( '@\x{00ff}@u'    , "yu",    $s );    // ÿ => yu
    
    
    // maps special characters (characters with diacritics) on their base-character followed by the diacritical mark
        // exmaple:  Ú => U´,  á => a`
    $s    = Normalizer::normalize( $s, Normalizer::FORM_D );
    
    
    $s    = preg_replace( '@\pM@u'        , "",    $s );    // removes diacritics
    
    
    $s    = preg_replace( '@\x{00df}@u'    , "ss",    $s );    // maps German ß onto ss
    $s    = preg_replace( '@\x{00c6}@u'    , "AE",    $s );    // Æ => AE
    $s    = preg_replace( '@\x{00e6}@u'    , "ae",    $s );    // æ => ae
    $s    = preg_replace( '@\x{0132}@u'    , "IJ",    $s );    // ? => IJ
    $s    = preg_replace( '@\x{0133}@u'    , "ij",    $s );    // ? => ij
    $s    = preg_replace( '@\x{0152}@u'    , "OE",    $s );    // Œ => OE
    $s    = preg_replace( '@\x{0153}@u'    , "oe",    $s );    // œ => oe
    
    $s    = preg_replace( '@\x{00d0}@u'    , "D",    $s );    // Ð => D
    $s    = preg_replace( '@\x{0110}@u'    , "D",    $s );    // Ð => D
    $s    = preg_replace( '@\x{00f0}@u'    , "d",    $s );    // ð => d
    $s    = preg_replace( '@\x{0111}@u'    , "d",    $s );    // d => d
    $s    = preg_replace( '@\x{0126}@u'    , "H",    $s );    // H => H
    $s    = preg_replace( '@\x{0127}@u'    , "h",    $s );    // h => h
    $s    = preg_replace( '@\x{0131}@u'    , "i",    $s );    // i => i
    $s    = preg_replace( '@\x{0138}@u'    , "k",    $s );    // ? => k
    $s    = preg_replace( '@\x{013f}@u'    , "L",    $s );    // ? => L
    $s    = preg_replace( '@\x{0141}@u'    , "L",    $s );    // L => L
    $s    = preg_replace( '@\x{0140}@u'    , "l",    $s );    // ? => l
    $s    = preg_replace( '@\x{0142}@u'    , "l",    $s );    // l => l
    $s    = preg_replace( '@\x{014a}@u'    , "N",    $s );    // ? => N
    $s    = preg_replace( '@\x{0149}@u'    , "n",    $s );    // ? => n
    $s    = preg_replace( '@\x{014b}@u'    , "n",    $s );    // ? => n
    $s    = preg_replace( '@\x{00d8}@u'    , "O",    $s );    // Ø => O
    $s    = preg_replace( '@\x{00f8}@u'    , "o",    $s );    // ø => o
    $s    = preg_replace( '@\x{017f}@u'    , "s",    $s );    // ? => s
    $s    = preg_replace( '@\x{00de}@u'    , "T",    $s );    // Þ => T
    $s    = preg_replace( '@\x{0166}@u'    , "T",    $s );    // T => T
    $s    = preg_replace( '@\x{00fe}@u'    , "t",    $s );    // þ => t
    $s    = preg_replace( '@\x{0167}@u'    , "t",    $s );    // t => t
    
    // remove all non-ASCii characters
    $s    = preg_replace( '@[^\0-\x80]@u'    , "",    $s ); 
    
    
    // possible errors in UTF8-regular-expressions
    if (empty($s))
        return $original_string;
    else
        return $s; 
}
?>

The above function is mainly based on the following article:
http://ahinea.com/en/tech/accented-translate.html

down

-7

tom dot vom dot berg at online dot de ¶

10 years ago


If you get error messages while starting apache of xampp package with activated extension=intl.dll, copy the files

    * icudt##.dll
    * icuin##.dll
    * icuio##.dll
    * icule##.dll
    * iculx##.dll
    * icutu##.dll
    * icuuc##.dll

## = version number

from "/program files/xampp/php"
into your "/program files/xampp/apache/bin" or whereever your xampp resides :-)

add a note