• 18th July 2008 - By Vinu Thomas
    Complete set of printable ASCII characters. Th...Image via Wikipedia

    When dealing with multiple languages and internalization in PHP, some of the default functions in PHP end up mangling up the unicode characters in PHP. This is evident when you have a lot of funny looking characters coming up on your web page instead of the actual characters. Apart from setting the UTF-8 headers in your HTML page, you should be careful on which functions you use to handle your strings. There is an extensions called mbstring which you can install in PHP which gives you a set of functions which are unicode ( actually multibyte ) ready.

    ASCII characters store each character in one byte. Unicode characters like UTF-8 use multiple bytes to handle the wider range of character sets. Some of the built in functions in PHP assume each character is only one byte and ends up breaking multibyte characters due to this assumtion.

    One way to ensure that your content doesn’t get mangled up is to substitue the regular php functions in your code with the mbstring variety. To get the entire list of mbstring functions, head over to: http://php.net/manual/en/ref.mbstring.php

    A few examples of the function mapping are:

    EMail Function : Instead of using mail, you could use the mbstring function mb_send_mail
    String Functions: strtoupper becomes mb_strtoupper; strlen becomes mb_strlen, substr becomes mb_substr and so on…

    Now instead of going in and changing all your code to become multibyte ready, PHP gives you an easy way to overload the default functions with the mbstring variety.

    You can set a value to mbstring.func_overload in php.ini. The value set for this function decides which functionality is overloaded by default with the mbstring variety:

    • 1 – overloads the mail functions. So you don’t have to substitute mail with mb_send_mail in your code. The mail functuion it self will work like mb_send_mail if mbstring.func_overload is set to 1 in php.ini
    • 2 – enables string functions overloading
    • 4 – enables regular expression functions overloading
    • 7 – enables mail, strings and regular expressions overloading
    Zemanta Pixie
  • 6 Comments to “mbstring Functions by default in PHP”

    • [...] a new post to his blog, Vinu Thomas talks about a set of functions that can make your life easier when [...]

    • Kevin on July 18, 2008

      Again, a very useful posting. Thanks a lot!

      You should have mentioned, that the mbstring extension is not available on all hosts, but thats probably obvious for your reader…

      I can’t wait for PHP6 to come out, where we can (hopefully) forget about these problems. Its ridiculous still struggling around with UTF8 encoding in 2008 ;)

    • vinu on July 18, 2008

      Yeah- I agree with you – it absolutely sucks having to figure out encoding issues between PHP and MySQL – especially when working with a company which enables publishers getting their content online. You can imaging the number of sleepless nights the team had figuring out why have those blocks appearing instead of characters :)

    • mbstring| David Lou on July 22, 2008

      [...] [...]

    • paan on July 30, 2008

      another thing to note is that the mbstring.func_overload in php.ini uses the *nix like file permission convention to..
      meaning that if you, for example, wants to enable only mail and string mb functions you can use the value of 3

    • [...] 今天正好在网上看到一篇关于如何设置mbstring系列函数为php默认使用函数的一篇文章,顺便也就仔细看了看php.ini中mbstring部分的设置参数。mbstring系列函数在涉及到中文及其它亚洲字符集的开发中是经常使用的,研究一下还是有必要的。 [...]

    Leave a Reply