PHP Unicode support - or the lack thereof
Mar 31, 2010
Well, I just had the pleasure to fix special character (umlaut) handling in a legacy PHP application. To put it short: It has been a while since I saw so many i18n issues as I figured out in PHP (version 5) during the last hour:
-
PHP strings are just plain byte arrays. Their content is non-portable as it is dependent on the current default encoding.
-
The same applies to the representation built by
serialize
. It contains a length-prefixed byte representation of the string without actually storing any encoding information. -
Most PHP (string) functions have no clue about Unicode. For a detailed list including each function’s risk level, refer to: http://www.phpwact.org/php/i18n/utf-8
Note to self: Never ever use PHP for a new project.