#1
  1. No Profile Picture
    Contributing User
    Devshed Frequenter (2500 - 2999 posts)

    Join Date
    Dec 2004
    Posts
    2,968
    Rep Power
    374

    Md5ing a text results in chinese?


    I am not sure what i am doing but when i am echoing md5($text) (I am also appending the result to a file) on the screen it is english characters, but when i actually open the file it seems to be CHINESE?

    PHP Code:
    if ( $handle) {
            while ((
    $buffer fgets($handle)) !== false) {
               
    $buffer str_replace("\r\n","",$buffer);
               
    $buffer str_replace("\n","",$buffer);
               
               
    $array explode(",",$buffer);
               
    $email_column $_POST['column'] - 1;
               
               
    $email $array[$email_column];
               
    $hashed_email md5($email);
               
               
    $array[] = $hashed_email;
               
    file_put_contents($upload_directory,implode(",",$array)."\n",FILE_APPEND);
            }

  2. #2
  3. No Profile Picture
    Lost in code
    Devshed Supreme Being (6500+ posts)

    Join Date
    Dec 2004
    Posts
    8,317
    Rep Power
    7170
    I'm not quite sure what to tell you except that the problem isn't with the md5 function. As long as you don't set the second argument to true, md5 will always return a 32 character string consisting of the ASCII letters A-F and the numbers 0-9; it will never return anything outside of that.

    Maybe you're using the wrong character encoding when you open the file.
    PHP FAQ

    Originally Posted by Spad
    Ah USB, the only rectangular connector where you have to make 3 attempts before you get it the right way around
  4. #3
  5. No Profile Picture
    Contributing User
    Devshed Frequenter (2500 - 2999 posts)

    Join Date
    Dec 2004
    Posts
    2,968
    Rep Power
    374
    yeah. Thanks, I have previously used this with no problem. The file is too big.

    What is happening is when I MD5 each entry in the file, i append this to another file. Now I looked at this file whilst PHP was working on the other file. When i opened this at the beginning, it was english. But can a weird character in the text cause it to output chinese? The thing is though if that was the case why would it change all of the previous entries to chinese?

    I didn't want to split this into few diff files as it was a huge file. My colleague did it and when i worked on the smaller files, no chinese conversion was done?
  6. #4
  7. Did you steal it?
    Devshed Supreme Being (6500+ posts)

    Join Date
    Mar 2007
    Location
    Washington, USA
    Posts
    13,961
    Rep Power
    9397
    One way to get magical Chinese characters in a file is by incorrectly treating it as UTF-16. What $handle are you reading from? Doing anything else to the $upload_directory besides what you've posted?
  8. #5
  9. No Profile Picture
    Contributing User
    Devshed Frequenter (2500 - 2999 posts)

    Join Date
    Dec 2004
    Posts
    2,968
    Rep Power
    374
    not doing anything else, not specifying any encoding either when opening the file. I do know that the file was not utf-8 but "little indian" according to komodo.
    PHP Code:
     $original_file $directory.$argv[1];
        
    $new_file_name basename($original_file,".csv")."-modified.csv";
        
        
    $upload_directory $directory.$new_file_name;
            
        
    //make sure the file is empty:
        
    file_put_contents($upload_directory,"");

        
    $handle = @fopen($original_file"r");
        
        
    $count =0;
        if ( 
    $handle) {
            while ((
    $buffer fgets($handle)) !== false) {
                
    print_r($buffer);
                
    $buffer preg_replace("/\s/","",$buffer); //buffer
    // unexpectedly had spaces so was trying to get rid
    // of spaces but this wasnt working :(
               
    $buffer str_replace("\r\n","",$buffer);
               
    $buffer str_replace("\n","",$buffer);
               
               
    $array explode(",",$buffer);
               
    $email_column $argv[2] - 1;
               
               
    $email $array[$email_column];
               
    $hashed_email md5($email);
               
               
    $array[] = $hashed_email;
               
    file_put_contents($upload_directory,implode(",",$array)."\n",FILE_APPEND);
               
               echo 
    "$email : $hashed_email \n"//here the hashed
    // email was in english but i didnt look at 
    //the output ALL the time just towards the 
    //start of "running the script". towards the end all the data was //in english too.. 
            
    }
            
            echo 
    'done';
            
            if (!
    feof($handle)) {
                echo 
    "Error: unexpected fgets() fail\n";
            }
            
    fclose($handle);
        } 
  10. #6
  11. --
    Devshed Expert (3500 - 3999 posts)

    Join Date
    Jul 2012
    Posts
    3,957
    Rep Power
    1045
    Hi,

    what you need to understand is that a text file doesn't contain characters. It contains a bunch of bits. Nobody except you knows how to interpret those bits as characters. So unless you explicitly set the character encoding when opening the file, you're likely to get random garbage. The editor/IDE will choose some encoding and then interpret the bits accordingly. If this guess is wrong, you'll see strange characters.

    In your case, the IDE has obviously chosen UTF-16LE ("little endian") -- as requinix already assumed. Since that's not the correct encoding, you see the bits misinterpreted as Chinese characters. Choose the correct encoding, and you'll see the correct characters.
    The 6 worst sins of security ē How to (properly) access a MySQL database with PHP

    Why canít I use certain words like "drop" as part of my Security Question answers?
    There are certain words used by hackers to try to gain access to systems and manipulate data; therefore, the following words are restricted: "select," "delete," "update," "insert," "drop" and "null".
  12. #7
  13. No Profile Picture
    Contributing User
    Devshed Frequenter (2500 - 2999 posts)

    Join Date
    Dec 2004
    Posts
    2,968
    Rep Power
    374
    i did open the file in IDE and changed the encoding to UTF-8 but that didnt change anything.

    I also don't know why when I run the script and look at the file generated mid-way (it gives me the correct hast) yet when i look at teh file after the script has finished its job, I get garbled output?

    BTW: the issue has been "solved" in that my colleague split the file into four different files and when i tried them, they all came perfect (I am hoping when he split the file, he had the correct encoding :s )
  14. #8
  15. --
    Devshed Expert (3500 - 3999 posts)

    Join Date
    Jul 2012
    Posts
    3,957
    Rep Power
    1045
    Originally Posted by paulh1983
    i did open the file in IDE and changed the encoding to UTF-8 but that didnt change anything.
    Sounds like you transcoded the file, that is, you took the wrong characters and encoded them as UTF-8. That of course gets you nowhere. You need to change the encoding setting your IDE uses to display the file content.

    If you don't know how to do it with your Komodo IDE, grab Notepad++, open the file and then select the correct encoding in the "encoding" tab.
    The 6 worst sins of security ē How to (properly) access a MySQL database with PHP

    Why canít I use certain words like "drop" as part of my Security Question answers?
    There are certain words used by hackers to try to gain access to systems and manipulate data; therefore, the following words are restricted: "select," "delete," "update," "insert," "drop" and "null".

IMN logo majestic logo threadwatch logo seochat tools logo