#1
  1. No Profile Picture
    Contributing User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Jan 2013
    Posts
    35
    Rep Power
    5

    moving from latin-1 to utf-8


    I need to move a large website from latin-1 to utf-8

    The website has 2 database and front-end, and only one of them will be moved to utf-8
    The problem is for that file that are shared, are PHP included file. Witch encoding should assign at file itself?

    Any way to bulk change file encoding?
  2. #2
  3. Forgotten Moderator
    Devshed Supreme Being (6500+ posts)

    Join Date
    Mar 2007
    Location
    Washington, USA
    Posts
    16,180
    Rep Power
    9644
    Only one? That's not good. Having two different encodings will come back to haunt you in the future - you might as well switch both at the same time.

    File encoding only matters if you use non-ASCII characters in the files themselves. As in you literally have characters like À or £. If that's the case then you would need to change the encodings; for a one-time operation it's a matter of your operating system (eg, it will be easier to do on Linux than Windows) or you can write a PHP script that finds all files, grabs the contents, runs it through iconv or mb_convert_encoding, then puts the contents back.

    You sure you don't have any other questions? If these files interact with the remaining latin-1 database then mixed encoding could be a problem. You may also need to do things like alter your database connection strings to use UTF-8, plus your frontend pages will need to specify that they use UTF-8 now. If you do any string processing with functions like substr() then those all need to be changed to use the mbstring equivalents. Probably other things not coming to mind right now.
  4. #3
  5. No Profile Picture
    Contributing User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Jan 2013
    Posts
    35
    Rep Power
    5
    Are 2 website of the same brand but the have input data from other database one in UTF-8 and the other in Latin 1
    The 2 website doesn't interact, each one is independent

    on this new server PHP default_charset is set to UTF-8 and PHP version 5.6.30, it should be by default in UTF-8

    does strlen, substr, strstr mess up even on string that doesn't contain special chars?
  6. #4
  7. Forgotten Moderator
    Devshed Supreme Being (6500+ posts)

    Join Date
    Mar 2007
    Location
    Washington, USA
    Posts
    16,180
    Rep Power
    9644
    Functions like strlen work on binary strings, meaning they don't understand things like multibyte character encodings (such as UTF-8). You can use them for strings that definitely do not have multibyte characters, but if the strings might have those, such as user input, then you may need the mbstring versions.
    For example,
    PHP Code:
    echo strlen("À"); // 2
    echo mb_strlen("À"); // 1 
    strlen() for the byte length of a string is fine. strlen() for the character length of a string is not fine.

    It gets worse with functions like substr:
    PHP Code:
    echo substr("Düsseldorf"02); // D� = \x44\xC3
    echo mb_substr("Düsseldorf"02); // Dü = \x44\xC3\xBC 

IMN logo majestic logo threadwatch logo seochat tools logo