#1
  1. No Profile Picture
    Contributing User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Jul 2004
    Posts
    288
    Rep Power
    0

    Database collation for french and english, which one to use?


    Hi everyone, I just started a new project and I must accept french accents (, , , , . etc.).

    I ran into issues and after some reading, I saw that 'utf8_general_ci' might be the best choice for table collation to prevent troubles.

    So on any table I created I've set 'utf8_general_ci' as the table collation and also on every field I created.

    All my PHP pages are encoded in UTF8 and this decleration is also on every page
    PHP Code:
    <meta charset="utf-8"
    .

    I've tested my script as I was building it, it worked fine, but then I noticed that I didn't tested out french accents yet.

    So I type a couple of them on a form to add entries, I printed them just below the form and, it seemed to work out well...

    Up until I checked on the database..., you see, on the webpage, I see this :

    But on the database (phpMyAdmin), I see this : è é ê ç

    If I enter a french accent directly on that field with phpMyAdmin () at the end of that string ( è é ê ç), it stays like this on phpMyAdmin : à è é ê ç , but on the webpage, I get this : �

    It seems that I don't have the right collation, and I can't seem to get the right one. Is there someone who could tell me the right collation to use for both tables and fields to get french accents right bot on the webpage and phpMyAdmin?

    Thanks a lot.
    Last edited by ExportA; July 19th, 2017 at 09:57 PM.
  2. #2
  3. Lazy Moderator
    Devshed Supreme Being (6500+ posts)

    Join Date
    Mar 2007
    Location
    Washington, USA
    Posts
    16,333
    Rep Power
    9645
    First, you're confusing charset with collation. The charset is what is responsible for converting between bytes and characters and so controls how everything appears. The collation is basically rules about how to compare characters (not bytes), for example whether == or even if == a; most of the time you'll want charset_general_ci for that.

    But yes, you're right: you aren't using UTF-8 everywhere. Since you've checked the pages and your database already I bet the answer is in the database connection settings.

    What does this query
    Code:
    SHOW VARIABLES LIKE '%char%'
    return when you run it from your site? (Not phpMyAdmin.)
  4. #3
  5. No Profile Picture
    Contributing User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Jul 2004
    Posts
    288
    Rep Power
    0
    I found an answer before you posted on the forum and I went to bed, too tired, and I now I'm back to share the solution I've found that works.

    It's seems that you were right about the database connection settings, because it is on that very page that the problem got solved. Someone elsewhere with a similar problem was told to put this line just below the mysql_select_db line, I tried it, and it works. Now everytime that I hit save in a form, every french accents is litterally saved as it was typed (, , , , etc.) and it is loaded back the same way.

    Code:
    mysql_set_charset('utf8');
    By putting this simple line below mysql_select_db, everything was resolved.
  6. #4
  7. Lazy Moderator
    Devshed Supreme Being (6500+ posts)

    Join Date
    Mar 2007
    Location
    Washington, USA
    Posts
    16,333
    Rep Power
    9645
    Note that doing that won't fix data already stored in a database - that takes its own work. But that doesn't sound like it is an issue for you.
  8. #5
  9. No Profile Picture
    Contributing User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Jul 2004
    Posts
    288
    Rep Power
    0
    Originally Posted by requinix
    Note that doing that won't fix data already stored in a database - that takes its own work. But that doesn't sound like it is an issue for you.
    Yes, it won't fix what's inside the DB, but like you said, it doesn't matter since it's a new system and data inserted so far is only for test purpose.

    On a sidenote, I might have a new 'issue', not a big problem, but I've noticed that when I save files on the server with Notepad++ (Notepad++ is configured to create new documents as UTF-8), I see when creating a new document that it is UTF-8, I type only 'good day', save it onto the server, but when I open it again, it reads 'ANSI' in the botton right corner of Notepad++.

    I then change the encoding to UTF-8 and save it again, then close it, and when I reopen it, still ANSI.

    Now I've read that a file without declaration (meta charset) is suspected to be ANSI when opened unless there's an UTF-8 char on that document. So I added a '' to make 'good day ', and now when I open the document with Notepad++, it is automatically recognized as an UTF-8 file instead of ANSI ... BUT, if I open the file on a web browser, I see this : good day é

    After that, I typed a small correct file like this;

    Code:
    <!doctype html>
    <html>
    <head>
      <meta charset="utf-8">
    </head>
    <body>
    
    good day 
    
    </body>
    </html>
    Now with that in the file, when I open it in Notepad++, it says it is UTF-8 AND when I open it on a web browser, it says correctly 'good day ', so everything seems fine.

    If on that very file, I remove the '', (file still UTF-8), save and close, when I reopen it, it says the file is 'ANSI'.

    From what I understand of that, is that a file will be considered by Notepad++ as 'ANSI' up until there's a UTF-8 char inside it BUT if you want that UTF-8 char to display correctly on a web browser, you need to declare the file is UTF-8 in the meta tag.

    Now..., I made tests will all this but with Ultra-Edit and it always reports it as being UTF-8 and never ANSI, so I guess it's a kind of problem with Notepad++ by the way it detects file encoding BUT, whenever I use Ultra-Edit or Notepad++, I need to put all the code stated earlier to display 'good day ' and not 'good day é'.

    I really think the issue is how Notepad++ detects and/or states the file encoding because Ultra-Edit has not that issue, always UTF-8 detected, while Notepad++ 'needs' an UTF-8 to be detected and stated as UTF-8 and not ANSI
  10. #6
  11. Lazy Moderator
    Devshed Supreme Being (6500+ posts)

    Join Date
    Mar 2007
    Location
    Washington, USA
    Posts
    16,333
    Rep Power
    9645
    Originally Posted by ExportA
    From what I understand of that, is that a file will be considered by Notepad++ as 'ANSI' up until there's a UTF-8 char inside it BUT if you want that UTF-8 char to display correctly on a web browser, you need to declare the file is UTF-8 in the meta tag.
    Exactly. ANSI (aka ASCII) bytes are identical to UTF-8 bytes for most characters that can be found on an English language keyboard. In fact they're identical to almost every other character encoding out there, too.

    It's like if you hear someone answer a question with "no": that particular word is so common in (European) languages that you can't really know what the speaker is using just by that. You have to wait until they say more.

    Originally Posted by ExportA
    Now..., I made tests will all this but with Ultra-Edit and it always reports it as being UTF-8 and never ANSI
    It sounds like Ultra-Edit just doesn't do an ANSI mode, and assumes everything is UTF-8 unless it finds out otherwise. That's a very reasonable approach.

    Originally Posted by ExportA
    so I guess it's a kind of problem with Notepad++ by the way it detects file encoding
    Eh, it's not so much a problem as it is just a different approach. Detecting file encodings can actually be quite difficult - even impossible in some situations - and it seems simply that the Notepad++ folks and the Ultra-Edit folks are handling it differently.

    Originally Posted by ExportA
    BUT, whenever I use Ultra-Edit or Notepad++, I need to put all the code stated earlier to display 'good day ' and not 'good day é'.
    Yeah, basically. It's an HTML thing.
  12. #7
  13. No Profile Picture
    Contributing User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Jul 2004
    Posts
    288
    Rep Power
    0
    Thanks a lot for your time, because of this, I know more from all this. It was blurry in my mind and didn't know exactly why it acted as it did sometimes, but now I think I've figured it out because of this.

    Thanks again!

IMN logo majestic logo threadwatch logo seochat tools logo