Thread: Tables charset

    #1
  1. No Profile Picture
    Contributing User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Jan 2017
    Posts
    45
    Rep Power
    1

    Tables charset


    What charset should i use for greek characters !? I currently have utf8 in the html code and utf8_unicode_ci for the database and tables. The greek characters or any other characters are being displayed on the site with no problems, but when i check the table fields all i see are strange symbols. If i want to do a search on the site using the greek characters i won't get any results at all since in the database aren't showing properly.
    What charset should i use that would work with normal characters and with greek characters !?
    Thank you.
  2. #2
  3. Forgotten Moderator
    Devshed Supreme Being (6500+ posts)

    Join Date
    Mar 2007
    Location
    Washington, USA
    Posts
    16,212
    Rep Power
    9644
    UTF-8 is good. Keep using it.

    If you can't see the characters then that's something you need to fix on your side. As you said, everything else is working, right? So what are you using to view the database?
  4. #3
  5. No Profile Picture
    Contributing User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Jan 2017
    Posts
    45
    Rep Power
    1
    Using phpmyadmin to login and view databases and tables. I thought something was wrong with the database since i am using some greek characters to search a certain article on the site, but it didn't popup
  6. #4
  7. Forgotten Moderator
    Devshed Supreme Being (6500+ posts)

    Join Date
    Mar 2007
    Location
    Washington, USA
    Posts
    16,212
    Rep Power
    9644
    phpMyAdmin is probably working correctly... What does it output if you manually run the query
    Code:
    SHOW VARIABLES LIKE '%char%'
  8. #5
  9. No Profile Picture
    Contributing User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Jan 2017
    Posts
    45
    Rep Power
    1
    http://i.imgur.com/ZUsU3da.png Hmm that latin1 thing seems our of place
    edit: ok, used
    PHP Code:
    set global character_set_server utf8
    and restarted the sql server. it shows as utf8 now but still the weird symbols appear instead of the proper characters. do i need to do something else to the table/field itself ?
    Last edited by pwpx2; July 9th, 2017 at 03:08 PM.
  10. #6
  11. Forgotten Moderator
    Devshed Supreme Being (6500+ posts)

    Join Date
    Mar 2007
    Location
    Washington, USA
    Posts
    16,212
    Rep Power
    9644
    character_set_server isn't as important if the others are set correctly, which they basically are.

    Sounds like your site is not actually using the right charsets when interacting with the database. It's storing the data incorrectly, but it's doing so consistently which is why the site seems to be working.

    Next you need to figure out how bad the damage will be to fix it.

    1. Are all your databases, tables, and columns using utf8 or utf8mb4?
    2. What does the same SHOW VARIABLES query return when executed from within your site code?
  12. #7
  13. No Profile Picture
    Contributing User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Jan 2017
    Posts
    45
    Rep Power
    1
    The database i am interesting into is using utf8. The others are from the mail server. In the code when called it says
    PHP Code:
    character_set_client:latin1
    character_set_connection
    :latin1
    character_set_database
    :utf8
    character_set_filesystem
    :binary
    character_set_results
    :latin1
    character_set_server
    :latin1
    character_set_system
    :utf8
    character_sets_dir
    :/usr/share/mysql/charsets
    Hmm, in the header of the site i have
    PHP Code:
    <meta http-equiv='Content-Type' content='text/html;charset=UTF-8' /> 
    Last edited by pwpx2; July 10th, 2017 at 04:28 AM.
  14. #8
  15. Forgotten Moderator
    Devshed Supreme Being (6500+ posts)

    Join Date
    Mar 2007
    Location
    Washington, USA
    Posts
    16,212
    Rep Power
    9644
    For UTF-8 to work properly you have to use the same charset in multiple places:
    1. In your HTML (done)
    2. In your code files, if you write any non-ASCII characters in them (probably not an issue)
    3. With your database connection (not done) - that means the client, connection, and results variables should all say utf8 or utf8mb4
    4. With your table columns (done) which inherit their charset from the table encoding if not specified, which inherits its encoding from the database encoding if not specified, which inherits its encoding from the server default

    As it is, you're putting UTF-8 bytes into your queries where MySQL thinks that they are actually latin1 bytes. So it will then convert the data to use UTF-8 to store in the database. When you get data out, it will decode back to latin1 where your code will think that it's UTF-8. The result is that the data is being double-encoded as UTF-8; as I said, your site is consistent in this behavior so it's not obvious that it's happening. phpMyAdmin does not have the double-encoding problem so when you use it to look at your database you will see what everything really is.

    For example, the character '' (as seen on your site) in UTF-8 uses the bytes C3 A1. In latin1 those same bytes represent the string "á" (what you would see in phpMyAdmin).

    To fix this you have to change your code and your database at the same time - which could mean taking down your site for a couple minutes. If you don't do it at the same time then you'll keep putting bad data in and your site will get bad data out. How's that sound?
  16. #9
  17. No Profile Picture
    Contributing User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Jan 2017
    Posts
    45
    Rep Power
    1
    Hmm, ok so which steps do i need to do !? Its ok if the site will be offline for a few minutes
  18. #10
  19. Forgotten Moderator
    Devshed Supreme Being (6500+ posts)

    Join Date
    Mar 2007
    Location
    Washington, USA
    Posts
    16,212
    Rep Power
    9644
    In your codebase, find the stuff that connects to the database and make it use UTF-8. That depends on how the connection stuff works, but worst case you can do a query
    Code:
    SET NAMES 'utf8'
    right after you connect. You probably don't need to do anything else with the code.

    Have the changes ready to go/deploy/whatever for the live site but don't do it yet.

    In the database,
    0. Make a backup if you don't have one already. This could be as simple as making a bunch of copies of tables (with data) somewhere else.
    1. Set tables and databases to use utf8 as the default charset (if they aren't already) using ALTER TABLE and ALTER DATABASE.
    2. Find all the columns that have issues and fix their data...

    There are two ways to fix the data. Below I'm using "tbl" as an example table with "col" being a column to fix.

    a) If the column is already utf8 then
    Code:
    UPDATE tbl SET col = CONVERT(CONVERT(col USING latin1) USING BINARY)
    The data going in was UTF-8 bytes (from your code) interpreted as latin1 (due to the charset settings shown with that SHOW VARIABLES query) then converted to UTF-8 (because the column is utf8).
    This query does the reverse: take the UTF-8 characters, convert to latin1 (inner function), then reinterpret the bytes as UTF-8 (by first going through the "binary" charset, which does not do charset conversions).

    b) If the column is latin1 then
    Code:
    ALTER TABLE tbl CHANGE col col BLOB;
    ALTER TABLE tbl CHANGE col col [the old spec, eg, VARCHAR(100) NOT NULL] CHARACTER SET utf8;
    This uses the same process as in (a) but will also fix the column type at the same time.


    I suggest putting the queries you will need to run together into a file so you can execute them all at once rather than one at a time.


    But before actually doing that, test it out - which you can do while the site is still running:

    Make a copy of some affected table, preferably one that isn't too large,
    Code:
    CREATE TABLE sometemporaryname LIKE tbl;
    INSERT INTO sometemporaryname SELECT * FROM tbl;
    then look at it in phpMyAdmin. It should be the same as before. Then try fixing the data and look at it again.


    When you're ready, take the site down, run the file of queries, check the output to make sure there weren't problems, then deploy the code changes. Use phpMyAdmin to look through the tables to make sure they look good, then bring the site back up.

    It could be that your code change is just making a quick edit in a file on the live site, and that you have a good database server without too many tables/columns or data to fix. If so then the entire process may only take a few seconds, in which case you can probably just do the changes with the site still running. Even 10 seconds isn't much of a problem for a medium-sized site with little traffic. I don't know - that's something you'll have to decide.


    If you're uneasy about this then don't rush it. Look through the tables to see just how many columns you need to fix, how much data there is in each table to be fixed, and how you need to do the fixes, and you can get a better feel for how much time and effort the conversion will take. Remember that the site is working just fine right now so this isn't particularly urgent.
    If you really want to be sure, clone everything and try fixing the clones to see how it goes.
    Last edited by requinix; July 10th, 2017 at 01:44 PM.
  20. #11
  21. No Profile Picture
    Contributing User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Jan 2017
    Posts
    45
    Rep Power
    1
    Yep that worked. No more weird symbols in database and this looks like
    PHP Code:
    character_set_client:utf8
    character_set_connection
    :utf8
    character_set_database
    :utf8
    character_set_filesystem
    :binary
    character_set_results
    :utf8
    character_set_server
    :utf8
    character_set_system
    :utf8
    character_sets_dir
    :/usr/share/mysql/charsets
    but when i try to search it still can't give any results. And yes, the table/fields from where it needs to pull the data shows properly this time.
  22. #12
  23. Forgotten Moderator
    Devshed Supreme Being (6500+ posts)

    Join Date
    Mar 2007
    Location
    Washington, USA
    Posts
    16,212
    Rep Power
    9644
    What is the raw search query being generated? What if you run it in phpMyAdmin?
  24. #13
  25. No Profile Picture
    Contributing User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Jan 2017
    Posts
    45
    Rep Power
    1
    Fixed it eventually. Was a problem in the code.
    Thank you

IMN logo majestic logo threadwatch logo seochat tools logo