April 3rd, 2017, 06:53 AM
moving from latin-1 to utf-8
I need to move a large website from latin-1 to utf-8
The website has 2 database and front-end, and only one of them will be moved to utf-8
The problem is for that file that are shared, are PHP included file. Witch encoding should assign at file itself?
Any way to bulk change file encoding?
April 3rd, 2017, 09:31 AM
Only one? That's not good. Having two different encodings will come back to haunt you in the future - you might as well switch both at the same time.
File encoding only matters if you use non-ASCII characters in the files themselves. As in you literally have characters like À or £. If that's the case then you would need to change the encodings; for a one-time operation it's a matter of your operating system (eg, it will be easier to do on Linux than Windows) or you can write a PHP script that finds all files, grabs the contents, runs it through iconv or mb_convert_encoding, then puts the contents back.
You sure you don't have any other questions? If these files interact with the remaining latin-1 database then mixed encoding could be a problem. You may also need to do things like alter your database connection strings to use UTF-8, plus your frontend pages will need to specify that they use UTF-8 now. If you do any string processing with functions like substr() then those all need to be changed to use the mbstring equivalents. Probably other things not coming to mind right now.
April 3rd, 2017, 09:52 AM
Are 2 website of the same brand but the have input data from other database one in UTF-8 and the other in Latin 1
The 2 website doesn't interact, each one is independent
on this new server PHP default_charset is set to UTF-8 and PHP version 5.6.30, it should be by default in UTF-8
does strlen, substr, strstr mess up even on string that doesn't contain special chars?
April 3rd, 2017, 10:14 AM
Functions like strlen work on binary strings, meaning they don't understand things like multibyte character encodings (such as UTF-8). You can use them for strings that definitely do not have multibyte characters, but if the strings might have those, such as user input, then you may need the mbstring versions.
strlen() for the byte length of a string is fine. strlen() for the character length of a string is not fine.
echo strlen("À"); // 2
echo mb_strlen("À"); // 1
It gets worse with functions like substr:
echo substr("Düsseldorf", 0, 2); // D� = \x44\xC3
echo mb_substr("Düsseldorf", 0, 2); // Dü = \x44\xC3\xBC