Php UTF-8 Charsets
-
An interesing article which explains what can go wrong when you’re handing UTF-8 character sets in PHP. The current versions of PHP do mangle up UTF-8 characters when you use the built in string functions in PHP.
“When I discovered that the popular web development tool PHP has almost complete ignorance of character encoding issues, blithely using 8 bits for characters, making it darn near impossible to develop good international web applications, I thought, enough is enough.”
“Darn near impossible” is perhaps too extreme but, certainly in PHP, if you simply “accept the defaults” you probably will end up with all kinds of strange characters and question marks the moment anyone outside the US or Western Europe submits some content to your site
Link to article: Php I18n Charsets - Web Application Component Toolkit
If you’re looking for a function set for PHP which is UTF-8 character safe, head over to the following link:
http://dev.splitbrain.org/view/darcs/dokuwiki/inc/utf8.php
They have a whole set of helper functions which allow you to manipulate UTF-8 data without corrupting the data.
























September 26th, 2006 at 1:01 pm
there is some interesting news on the Unicode efforts for PHP6 in the weblog of Sara Golemon: http://blog.libssh2.org/index.php?/archives/38-PHP6-News-from-the-front….html
September 26th, 2006 at 3:28 pm
Yep - UTF character sets won’t be a problem once PHP 6 comes out into the market. The current versions 4.x and 5.x is where the problems with UTF characters loom large.