#1
  1. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Dec 2013
    Posts
    9
    Rep Power
    0

    Post French characters and UTF-8


    Hello everyone,

    I have normal HTML code saved as a PHP file because I call other PHP files to include more HTML code such as my menu. In this case I am calling a form.

    Code:
    <?php include("InvoiceForm.php"); ?>
    The PHP file only includes HTML in it where it populates the screen with the form input fields and stuff.

    I am having issues with french characters. I have set the following <meta> tag inside my <head> tag in the file that is calling the form:

    Code:
    <meta http-equiv="Content-Type" content="text/html;charset=UTF-8" >
    It used to be set to windows-1252 which worked beautifully. I changed it to UTF-8 because I was getting an error in my console that was unrelated. I am new so I read a lot of stuff... good and bad and I can't tell the difference. Anyway, it was recommended that I switch to UTF-8. Shouldn't UTF-8 work just as good? It claims to support many languages and be the wave of future coding. Why are some of my french characters appear wrong on my form?
  2. #2
  3. Contributing User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Oct 2009
    Location
    Jakarta, Indonesia.
    Posts
    206
    Rep Power
    31
    Can you share a link to that page here?
  4. #3
  5. --
    Devshed Expert (3500 - 3999 posts)

    Join Date
    Jul 2012
    Posts
    3,959
    Rep Power
    1014
    Hi,

    you can't just change the charset declaration. When you send Windows-1252-encoded data and claim that it's UTF-8, you get nonsense. The browser simply can't understand the document, because the content doesn't match the content type declaration.

    Before you can declare the content as UTF-8, it has to actually be UTF-8. That is, you have to go through all PHP files and all external resources (databases, template files etc.) and transcode them to UTF-8. You'll also have to adjust your database connection and change the encoding to UTF-8. When that's done, you can change the declaration.

    As I already told you last time, using the meta element to declare the character encoding of the document makes no sense. How is the browser supposed to understand the element when the information necessary to understand the document is in that very element? Yes, this does in fact work under very specific circumstances, but the correct way is to declare the encoding in the response headers from the server. That's how you tell the browser what the data means.
    Last edited by Jacques1; December 26th, 2013 at 02:19 AM.
    The 6 worst sins of security ē How to (properly) access a MySQL database with PHP

    Why canít I use certain words like "drop" as part of my Security Question answers?
    There are certain words used by hackers to try to gain access to systems and manipulate data; therefore, the following words are restricted: "select," "delete," "update," "insert," "drop" and "null".
  6. #4
  7. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Dec 2013
    Posts
    9
    Rep Power
    0
    Originally Posted by hdewantara
    Can you share a link to that page here?
    I modified my home page. The home page is not connected to anything. Its straight HTML (Home_Initial.html). It does have links to open other pages.

    www.alpagawasi.com
    Last edited by PacaMama; December 26th, 2013 at 07:43 AM. Reason: Modified a page that is accessible on web
  8. #5
  9. ~ bald headed old fart ~
    Devshed Newbie (0 - 499 posts)

    Join Date
    May 2005
    Location
    chertsey, a small town s.w. of london, england
    Posts
    250
    Rep Power
    86
    Hi there PacaMama,


    are you using "HTML Entities" for the French letters?

    Example...
    Code:
    
    & eacute; for ť
    & egrave; for Ť
    
    or
    &#x00e9  for ť
    &#x00e8  for Ť
    ...with no space between the & and the e.

    coothead

    Comments on this post

    • Jacques1 disagrees : C'mon, the times when there was only ASCII and EBCDIC are long over.
    Last edited by coothead; December 26th, 2013 at 07:38 AM.
  10. #6
  11. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Dec 2013
    Posts
    9
    Rep Power
    0
    Originally Posted by coothead
    Hi there PacaMama,


    are you using "HTML Entities" for the French letters?

    Example...
    Code:
    
    & eacute; for ť
    & egrave; for Ť
    ...with no space between the & and the e.

    coothead
    No I am not.
  12. #7
  13. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Dec 2013
    Posts
    9
    Rep Power
    0
    Originally Posted by Jacques1
    ... That is, you have to go through all PHP files and all external resources (databases, template files etc.) and transcode them to UTF-8. ...
    Hello Jacques1, what does "transcode to UTF-8" mean?

    I am new at this even though I was able to do more complex stuff. I still ask lots of noob questions. I did the meta tag switch on my home page (www.alpagawasi.com) to show the issue more easily and where I have less connections to other files.

    If I partially understand, the problem comes from my host and have to do something there?
  14. #8
  15. ~ bald headed old fart ~
    Devshed Newbie (0 - 499 posts)

    Join Date
    May 2005
    Location
    chertsey, a small town s.w. of london, england
    Posts
    250
    Rep Power
    86
    Originally Posted by PacaMama
    No I am not.
    ...well, perhaps you should.
  16. #9
  17. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Dec 2013
    Posts
    9
    Rep Power
    0
    Okay, it works but don't want to do this everywhere on my site. My business is in Quebec and must have a french version of all my pages. What I may end up doing is keeping the regular pages set to windows-1252 and fix the printable form (french) only with the entity usage.
  18. #10
  19. Devshed Beginner (1000 - 1499 posts)

    Join Date
    Jan 2004
    Location
    New Springfield, OH
    Posts
    1,236
    Rep Power
    1469
    Originally Posted by PacaMama
    Okay, it works but don't want to do this everywhere on my site.
    You may not want to, but that's how html is designed to handle those characters. If you want consistent results, you need to use the character entities that HTML provides as it was intended.

    Even setting the charset for the page is not entirely foolproof. OS settings can override it in some browsers. The charset only defines the page's input. It's output is determined by the browser and the OS. This is why entities are used.

    Comments on this post

    • Jacques1 disagrees : C'mon, the times when there was only ASCII and EBCDIC are long over.
    Don't like me? Click it.

    Scripting problems? Windows questions? Ask the Windows Guru!

    Stay up to date with all of my latest content. Follow me on Twitter!

    Help us help you! Post your exact error message with these easy tips!
  20. #11
  21. Contributing User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Oct 2009
    Location
    Jakarta, Indonesia.
    Posts
    206
    Rep Power
    31
    You can either stay with windows-1252 encoding, or upgrade to UTF-8.

    But as Jacques1 said, I think the more important thing is to be consistent, i.e. if you've typed/ saved your web contents/ documents as windows-1252, then declare them (with meta tag, or some other ways) as windows-1252. Same thing with UTF-8...

    Here's some easy reading for the why's of UTF-8:
    Last edited by hdewantara; December 27th, 2013 at 01:42 AM.
  22. #12
  23. --
    Devshed Expert (3500 - 3999 posts)

    Join Date
    Jul 2012
    Posts
    3,959
    Rep Power
    1014
    Please, please keep away from HTML entities for non-ASCII characters. I have no idea why they're still being recommended.

    The times when people where restricted to ASCII and had to fumble with HTML entities whenever they needed other characters are long over. Nowadays, every single browser supports modern encodings like UTF-8. There's absolutely no reason for sticking to ASCII anachronisms. In fact, running every piece of text through an entity converter would simply be insane.

    Switching to Unicode and UTF-8 is a very wise decision. Don't let anybody tell you differently. This combination is the de-facto standard of the modern Internet.

    How the transcoding works depends on your text editor. In Notepad++ (which is what I use), you can simply choose "convert to UTF-8 without BOM" in the navigation menu.
    Last edited by Jacques1; December 27th, 2013 at 02:53 AM.
    The 6 worst sins of security ē How to (properly) access a MySQL database with PHP

    Why canít I use certain words like "drop" as part of my Security Question answers?
    There are certain words used by hackers to try to gain access to systems and manipulate data; therefore, the following words are restricted: "select," "delete," "update," "insert," "drop" and "null".
  24. #13
  25. Devshed Beginner (1000 - 1499 posts)

    Join Date
    Jan 2004
    Location
    New Springfield, OH
    Posts
    1,236
    Rep Power
    1469
    Originally Posted by Jacques1
    Please, please keep away from HTML entities for non-ASCII characters. I have no idea why they're still being recommended.
    Because they are still required in certain circumstances.

    The necessity for HTML entities is determined by the DTD. If you're using any kind of XHTML, for instance, entities are always required regardless of UTF-8 support because they are not allowed by the DTD. The page will not validate.

    Some special character such as <, >, and & are required to be entities for any type of document, regardless of encoding.

    I made the recommendation to use html entities because it appears the OP may be using a mix of different encodings and has not suggested any doctype or HTML/XHTML variant. The page may well be rendering in Quirks Mode. Without any further information, and without changing the encoding of every page (as he said he didn't want to do), the only way to ensure consistent rendering across all browsers, OSes, and languages, while ensuring valid markup, is to use HTML entities.

    I'm not suggesting that proper use of UTF-8 isn't the best practice.
    Last edited by Nilpo; December 27th, 2013 at 08:38 AM.
    Don't like me? Click it.

    Scripting problems? Windows questions? Ask the Windows Guru!

    Stay up to date with all of my latest content. Follow me on Twitter!

    Help us help you! Post your exact error message with these easy tips!
  26. #14
  27. --
    Devshed Expert (3500 - 3999 posts)

    Join Date
    Jul 2012
    Posts
    3,959
    Rep Power
    1014
    HTML entities are never required except for the escaping of functional characters. All (X)HTML flavors and all contemporary platforms fully support Unicode delivered as UTF-8. Some ancient Windows 9x systems do require additional libraries, but the people using them have much worse problems than getting the right characters at www.alpagawasi.com.

    Sorry, but recommending the use of HTML entities in this case is very bad advice and makes absolutely no sense. It probably takes 10 times as long to go through the documents and replace all non-ASCII characters than it takes to simply transcode them into UTF-8. And afterwards, you're left with an unreadable, uneditable mess of legacy techniques.



    Originally Posted by Nilpo
    ... changing the encoding of every page (as he said he didn't want to do) ...
    No. He/she was referring to the HTML entities. That's what he/she doesn't want to do on every page (which is very understandable).



    Long story short: Do use UTF-8. If you don't have a proper text editor, download Notepad++. Choose "Format" → "Encode in UTF-8 without BOM". That's it. It may take a while to go through all documents, but it will save you a lot of time and trouble later.
    Last edited by Jacques1; December 27th, 2013 at 10:21 AM.
    The 6 worst sins of security ē How to (properly) access a MySQL database with PHP

    Why canít I use certain words like "drop" as part of my Security Question answers?
    There are certain words used by hackers to try to gain access to systems and manipulate data; therefore, the following words are restricted: "select," "delete," "update," "insert," "drop" and "null".

IMN logo majestic logo threadwatch logo seochat tools logo