#1
  1. No Profile Picture
    Contributing User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Jul 2003
    Posts
    61
    Rep Power
    12

    Japanese+Regex+PHP getting Kanji characters


    Hi all,

    Been scratching my head about this all week.

    Is there a way to create a Regex statement to get all Kanji characters in Japanese? I am creating a PHP website which uses pretty URLs and would like to get this up in the URI. The only thing is that with the mb_ereg_replace() function which is a multibyte regex function it can only pick up the Hiragana and Katakana.

    http://jp2.php.net/mb_ereg_replace

    According to that, in the first example you can do that but not the Kanji. Is there a way to do it?

    Thanks a lot for your help.
  2. #2
  3. No Profile Picture
    Contributing User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Jul 2003
    Posts
    61
    Rep Power
    12
    For all those having trouble I did this:

    According to the website which contains all the Japanese Unicode Lib.
    http://www.rikai.com/library/kanjita....unicode.shtml

    Code:
    Code:
    //convert japanese characters
                $url = mb_convert_kana($url, "asKHV");
          
                //remove all symbols
                //table provided at http://www.rikai.com/library/kanjitables/kanji_codes.unicode.shtml
                $pattern = '/[^\wぁ-ゔァ-ヺー\x{4E00}-\x{9FAF}_\-]+/u';
                $url = preg_replace($pattern, '+', $url);
    Just in case anyone ever needs to do this.

    The function will convert all the characters first using the mb_convert_kana function then will remove all Japanese Symbols and only leave Hiragana, Katakana and Kanji.

    Hope this helps anyone having this problem.

IMN logo majestic logo spyfu logo threadwatch logo seochat tools logo