When dealing with multiple languages and internalization in PHP, some of the default functions in PHP end up mangling up the unicode characters in PHP. This is evident when you have a lot of funny looking characters coming up on your web page instead of the actual characters. Apart from setting the UTF-8 headers in your HTML page, you should be careful on which functions you use to handle your strings. There is an extensions called mbstring which you can install in PHP which gives you a set of functions which are unicode ( actually multibyte ) ready.
ASCII characters store each character in one byte. Unicode characters like UTF-8 use multiple bytes to handle the wider range of character sets. Some of the built in functions in PHP assume each character is only one byte and ends up breaking multibyte characters due to this assumtion.
One way to ensure that your content doesn’t get mangled up is to substitue the regular php functions in your code with the mbstring variety. To get the entire list of mbstring functions, head over to: http://php.net/manual/en/ref.mbstring.php
A few examples of the function mapping are:
EMail Function : Instead of using mail, you could use the mbstring function mb_send_mail
String Functions: strtoupper becomes mb_strtoupper; strlen becomes mb_strlen, substr becomes mb_substr and so on…
Now instead of going in and changing all your code to become multibyte ready, PHP gives you an easy way to overload the default functions with the mbstring variety.
You can set a value to mbstring.func_overload in php.ini. The value set for this function decides which functionality is overloaded by default with the mbstring variety:
- 1 – overloads the mail functions. So you don’t have to substitute mail with mb_send_mail in your code. The mail functuion it self will work like mb_send_mail if mbstring.func_overload is set to 1 in php.ini
- 2 – enables string functions overloading
- 4 – enables regular expression functions overloading
- 7 – enables mail, strings and regular expressions overloading