How do I remove accents from characters in a PHP string?


Question

I'm attempting to remove accents from characters in PHP string as the first step to making the string usable in a URL.

I'm using the following code:

$input = "Fóø Bår";

setlocale(LC_ALL, "en_US.utf8");
$output = iconv("utf-8", "ascii//TRANSLIT", $input);

print($output);

The output I would expect would be something like this:

F'oo Bar

However, instead of the accented characters being transliterated they are replaced with question marks:

F?? B?r

Everything I can find online indicates that setting the locale will fix this problem, however I'm already doing this. I've already checked the following details:

  1. The locale I am setting is supported by the server (included in the list produced by locale -a)
  2. The source and target encodings (UTF-8 and ASCII) are supported by the server's version of iconv (included in the list produced by iconv -l)
  3. The input string is UTF-8 encoded (verified using PHP's mb_check_encoding function, as suggested in the answer by mercator)
  4. The call to setlocale is successful (it returns 'en_US.utf8' rather than FALSE)

The cause of the problem:

The server is using the wrong implementation of iconv. It has the glibc version instead of the required libiconv version.

Note that the iconv function on some systems may not work as you expect. In such case, it'd be a good idea to install the GNU libiconv library. It will most likely end up with more consistent results.
PHP manual's introduction to iconv

Details about the iconv implementation that is used by PHP are included in the output of the phpinfo function.

(I'm not able to re-compile PHP with the correct iconv library on the server I'm working with for this project so the answer I've accepted below is the one that was most useful for removing accents without iconv support.)

1
78
5/23/2017 12:18:36 PM

Accepted Answer

I think the problem here is that your encodings consider ä and å different symbols to 'a'. In fact, the PHP documentation for strtr offers a sample for removing accents the ugly way :(

http://ie2.php.net/strtr

6
6/19/2009 12:21:09 PM

What about WordPress implementation?

function remove_accents($string) {
    if ( !preg_match('/[\x80-\xff]/', $string) )
        return $string;

    $chars = array(
    // Decompositions for Latin-1 Supplement
    chr(195).chr(128) => 'A', chr(195).chr(129) => 'A',
    chr(195).chr(130) => 'A', chr(195).chr(131) => 'A',
    chr(195).chr(132) => 'A', chr(195).chr(133) => 'A',
    chr(195).chr(135) => 'C', chr(195).chr(136) => 'E',
    chr(195).chr(137) => 'E', chr(195).chr(138) => 'E',
    chr(195).chr(139) => 'E', chr(195).chr(140) => 'I',
    chr(195).chr(141) => 'I', chr(195).chr(142) => 'I',
    chr(195).chr(143) => 'I', chr(195).chr(145) => 'N',
    chr(195).chr(146) => 'O', chr(195).chr(147) => 'O',
    chr(195).chr(148) => 'O', chr(195).chr(149) => 'O',
    chr(195).chr(150) => 'O', chr(195).chr(153) => 'U',
    chr(195).chr(154) => 'U', chr(195).chr(155) => 'U',
    chr(195).chr(156) => 'U', chr(195).chr(157) => 'Y',
    chr(195).chr(159) => 's', chr(195).chr(160) => 'a',
    chr(195).chr(161) => 'a', chr(195).chr(162) => 'a',
    chr(195).chr(163) => 'a', chr(195).chr(164) => 'a',
    chr(195).chr(165) => 'a', chr(195).chr(167) => 'c',
    chr(195).chr(168) => 'e', chr(195).chr(169) => 'e',
    chr(195).chr(170) => 'e', chr(195).chr(171) => 'e',
    chr(195).chr(172) => 'i', chr(195).chr(173) => 'i',
    chr(195).chr(174) => 'i', chr(195).chr(175) => 'i',
    chr(195).chr(177) => 'n', chr(195).chr(178) => 'o',
    chr(195).chr(179) => 'o', chr(195).chr(180) => 'o',
    chr(195).chr(181) => 'o', chr(195).chr(182) => 'o',
    chr(195).chr(182) => 'o', chr(195).chr(185) => 'u',
    chr(195).chr(186) => 'u', chr(195).chr(187) => 'u',
    chr(195).chr(188) => 'u', chr(195).chr(189) => 'y',
    chr(195).chr(191) => 'y',
    // Decompositions for Latin Extended-A
    chr(196).chr(128) => 'A', chr(196).chr(129) => 'a',
    chr(196).chr(130) => 'A', chr(196).chr(131) => 'a',
    chr(196).chr(132) => 'A', chr(196).chr(133) => 'a',
    chr(196).chr(134) => 'C', chr(196).chr(135) => 'c',
    chr(196).chr(136) => 'C', chr(196).chr(137) => 'c',
    chr(196).chr(138) => 'C', chr(196).chr(139) => 'c',
    chr(196).chr(140) => 'C', chr(196).chr(141) => 'c',
    chr(196).chr(142) => 'D', chr(196).chr(143) => 'd',
    chr(196).chr(144) => 'D', chr(196).chr(145) => 'd',
    chr(196).chr(146) => 'E', chr(196).chr(147) => 'e',
    chr(196).chr(148) => 'E', chr(196).chr(149) => 'e',
    chr(196).chr(150) => 'E', chr(196).chr(151) => 'e',
    chr(196).chr(152) => 'E', chr(196).chr(153) => 'e',
    chr(196).chr(154) => 'E', chr(196).chr(155) => 'e',
    chr(196).chr(156) => 'G', chr(196).chr(157) => 'g',
    chr(196).chr(158) => 'G', chr(196).chr(159) => 'g',
    chr(196).chr(160) => 'G', chr(196).chr(161) => 'g',
    chr(196).chr(162) => 'G', chr(196).chr(163) => 'g',
    chr(196).chr(164) => 'H', chr(196).chr(165) => 'h',
    chr(196).chr(166) => 'H', chr(196).chr(167) => 'h',
    chr(196).chr(168) => 'I', chr(196).chr(169) => 'i',
    chr(196).chr(170) => 'I', chr(196).chr(171) => 'i',
    chr(196).chr(172) => 'I', chr(196).chr(173) => 'i',
    chr(196).chr(174) => 'I', chr(196).chr(175) => 'i',
    chr(196).chr(176) => 'I', chr(196).chr(177) => 'i',
    chr(196).chr(178) => 'IJ',chr(196).chr(179) => 'ij',
    chr(196).chr(180) => 'J', chr(196).chr(181) => 'j',
    chr(196).chr(182) => 'K', chr(196).chr(183) => 'k',
    chr(196).chr(184) => 'k', chr(196).chr(185) => 'L',
    chr(196).chr(186) => 'l', chr(196).chr(187) => 'L',
    chr(196).chr(188) => 'l', chr(196).chr(189) => 'L',
    chr(196).chr(190) => 'l', chr(196).chr(191) => 'L',
    chr(197).chr(128) => 'l', chr(197).chr(129) => 'L',
    chr(197).chr(130) => 'l', chr(197).chr(131) => 'N',
    chr(197).chr(132) => 'n', chr(197).chr(133) => 'N',
    chr(197).chr(134) => 'n', chr(197).chr(135) => 'N',
    chr(197).chr(136) => 'n', chr(197).chr(137) => 'N',
    chr(197).chr(138) => 'n', chr(197).chr(139) => 'N',
    chr(197).chr(140) => 'O', chr(197).chr(141) => 'o',
    chr(197).chr(142) => 'O', chr(197).chr(143) => 'o',
    chr(197).chr(144) => 'O', chr(197).chr(145) => 'o',
    chr(197).chr(146) => 'OE',chr(197).chr(147) => 'oe',
    chr(197).chr(148) => 'R',chr(197).chr(149) => 'r',
    chr(197).chr(150) => 'R',chr(197).chr(151) => 'r',
    chr(197).chr(152) => 'R',chr(197).chr(153) => 'r',
    chr(197).chr(154) => 'S',chr(197).chr(155) => 's',
    chr(197).chr(156) => 'S',chr(197).chr(157) => 's',
    chr(197).chr(158) => 'S',chr(197).chr(159) => 's',
    chr(197).chr(160) => 'S', chr(197).chr(161) => 's',
    chr(197).chr(162) => 'T', chr(197).chr(163) => 't',
    chr(197).chr(164) => 'T', chr(197).chr(165) => 't',
    chr(197).chr(166) => 'T', chr(197).chr(167) => 't',
    chr(197).chr(168) => 'U', chr(197).chr(169) => 'u',
    chr(197).chr(170) => 'U', chr(197).chr(171) => 'u',
    chr(197).chr(172) => 'U', chr(197).chr(173) => 'u',
    chr(197).chr(174) => 'U', chr(197).chr(175) => 'u',
    chr(197).chr(176) => 'U', chr(197).chr(177) => 'u',
    chr(197).chr(178) => 'U', chr(197).chr(179) => 'u',
    chr(197).chr(180) => 'W', chr(197).chr(181) => 'w',
    chr(197).chr(182) => 'Y', chr(197).chr(183) => 'y',
    chr(197).chr(184) => 'Y', chr(197).chr(185) => 'Z',
    chr(197).chr(186) => 'z', chr(197).chr(187) => 'Z',
    chr(197).chr(188) => 'z', chr(197).chr(189) => 'Z',
    chr(197).chr(190) => 'z', chr(197).chr(191) => 's'
    );

    $string = strtr($string, $chars);

    return $string;
}

To better understand what this function does, check this corresponding conversion table here:

À => A
Á => A
 => A
à => A
Ä => A
Å => A
Ç => C
È => E
É => E
Ê => E
Ë => E
Ì => I
Í => I
Î => I
Ï => I
Ñ => N
Ò => O
Ó => O
Ô => O
Õ => O
Ö => O
Ù => U
Ú => U
Û => U
Ü => U
Ý => Y
ß => s
à => a
á => a
â => a
ã => a
ä => a
å => a
ç => c
è => e
é => e
ê => e
ë => e
ì => i
í => i
î => i
ï => i
ñ => n
ò => o
ó => o
ô => o
õ => o
ö => o
ù => u
ú => u
û => u
ü => u
ý => y
ÿ => y
Ā => A
ā => a
Ă => A
ă => a
Ą => A
ą => a
Ć => C
ć => c
Ĉ => C
ĉ => c
Ċ => C
ċ => c
Č => C
č => c
Ď => D
ď => d
Đ => D
đ => d
Ē => E
ē => e
Ĕ => E
ĕ => e
Ė => E
ė => e
Ę => E
ę => e
Ě => E
ě => e
Ĝ => G
ĝ => g
Ğ => G
ğ => g
Ġ => G
ġ => g
Ģ => G
ģ => g
Ĥ => H
ĥ => h
Ħ => H
ħ => h
Ĩ => I
ĩ => i
Ī => I
ī => i
Ĭ => I
ĭ => i
Į => I
į => i
İ => I
ı => i
IJ => IJ
ij => ij
Ĵ => J
ĵ => j
Ķ => K
ķ => k
ĸ => k
Ĺ => L
ĺ => l
Ļ => L
ļ => l
Ľ => L
ľ => l
Ŀ => L
ŀ => l
Ł => L
ł => l
Ń => N
ń => n
Ņ => N
ņ => n
Ň => N
ň => n
ʼn => N
Ŋ => n
ŋ => N
Ō => O
ō => o
Ŏ => O
ŏ => o
Ő => O
ő => o
Π=> OE
œ => oe
Ŕ => R
ŕ => r
Ŗ => R
ŗ => r
Ř => R
ř => r
Ś => S
ś => s
Ŝ => S
ŝ => s
Ş => S
ş => s
Š => S
š => s
Ţ => T
ţ => t
Ť => T
ť => t
Ŧ => T
ŧ => t
Ũ => U
ũ => u
Ū => U
ū => u
Ŭ => U
ŭ => u
Ů => U
ů => u
Ű => U
ű => u
Ų => U
ų => u
Ŵ => W
ŵ => w
Ŷ => Y
ŷ => y
Ÿ => Y
Ź => Z
ź => z
Ż => Z
ż => z
Ž => Z
ž => z
ſ => s

You can generate this convesion table yourself by simply iterarting over the $chars array of the function:

foreach($chars as $k=>$v) {
   printf("%s -> %s", $k, $v);
}

Licensed under: CC-BY-SA with attribution
Not affiliated with: Stack Overflow
Icon