Logo Background RSS

mbstring Functions by default in PHP

  • Written by vinuvinu 5 Comments5 Comments Comments
    Last Updated: July 18th, 2008
    Complete set of printable ASCII characters. Th...Image via Wikipedia

    When dealing with multiple languages and internalization in PHP, some of the default functions in PHP end up mangling up the unicode characters in PHP. This is evident when you have a lot of funny looking characters coming up on your web page instead of the actual characters. Apart from setting the UTF-8 headers in your HTML page, you should be careful on which functions you use to handle your strings. There is an extensions called mbstring which you can install in PHP which gives you a set of functions which are unicode ( actually multibyte ) ready.

    ASCII characters store each character in one byte. Unicode characters like UTF-8 use multiple bytes to handle the wider range of character sets. Some of the built in functions in PHP assume each character is only one byte and ends up breaking multibyte characters due to this assumtion.

    One way to ensure that your content doesn’t get mangled up is to substitue the regular php functions in your code with the mbstring variety. To get the entire list of mbstring functions, head over to: http://php.net/manual/en/ref.mbstring.php

    A few examples of the function mapping are:

    EMail Function : Instead of using mail, you could use the mbstring function mb_send_mail
    String Functions: strtoupper becomes mb_strtoupper; strlen becomes mb_strlen, substr becomes mb_substr and so on…

    Now instead of going in and changing all your code to become multibyte ready, PHP gives you an easy way to overload the default functions with the mbstring variety.

    You can set a value to mbstring.func_overload in php.ini. The value set for this function decides which functionality is overloaded by default with the mbstring variety:

    • 1 - overloads the mail functions. So you don’t have to substitute mail with mb_send_mail in your code. The mail functuion it self will work like mb_send_mail if mbstring.func_overload is set to 1 in php.ini
    • 2 - enables string functions overloading
    • 4 - enables regular expression functions overloading
    • 7 - enables mail, strings and regular expressions overloading
    Zemanta Pixie
    Bookmark and share:
    • del.icio.us
    • Digg
    • StumbleUpon
    • BlinkList
    • blogmarks
    • Furl
    • Slashdot
    • Spurl
    • Technorati
    • YahooMyWeb
    • description
    • Facebook
    • Google
    • Live
    • Ma.gnolia
    • NewsVine
    • Reddit
    • TwitThis

Advertisement

  1. #1 Kevin
    July 18th, 2008 at 7:53 pm

    Again, a very useful posting. Thanks a lot!

    You should have mentioned, that the mbstring extension is not available on all hosts, but thats probably obvious for your reader…

    I can’t wait for PHP6 to come out, where we can (hopefully) forget about these problems. Its ridiculous still struggling around with UTF8 encoding in 2008 ;)

    Post ReplyPost Reply
  2. #2 vinu
    July 18th, 2008 at 7:56 pm

    Yeah- I agree with you - it absolutely sucks having to figure out encoding issues between PHP and MySQL - especially when working with a company which enables publishers getting their content online. You can imaging the number of sleepless nights the team had figuring out why have those blocks appearing instead of characters :)

    Post ReplyPost Reply
  3. #3 paan
    July 30th, 2008 at 11:34 am

    another thing to note is that the mbstring.func_overload in php.ini uses the *nix like file permission convention to..
    meaning that if you, for example, wants to enable only mail and string mb functions you can use the value of 3

    Post ReplyPost Reply
  • Trackback: Vinu Thomas’ Blog: mbstring Functions by default in PHP | Development Blog With Code Updates : Developercast.com Trackbacks
  • Trackback: mbstring| David Lou Trackbacks
  • Leave a Comment