#1
  1. No Profile Picture
    Contributing User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Oct 2004
    Posts
    136
    Rep Power
    10

    Reading special characters in a file


    Hi,

    I'm trying to write a program that replaces some very special characters in an xml file. The characters I need to replace is '' and ''(not the same as character as '>').

    However when I create StreamReader on the file and read the content of the xml file, it seems like C# can't read the the special characters when the xml file is saved as ANSI, it just reads '' when it tries to read it. However if it is saved as UTF8 it can read it and replace the symbols without any problem.

    Is there a way to convert the file directly to UTF8 without loosing the special characters inside before reading it? Would anyone be so kind to also give me an example?

    If not, how should I go to solve this if we presume the xml file allways will be saved in ANSI?

    Thanks!

    Here is my code:

    Code:
    namespace ConsoleApplication1
    {
        class Program
        {
            static void Main(string[] args)
            {
              
                string path = @"C:\Documents and Settings\XPMUser\Desktop\Concord\ORDERS.XML";
                string content;
    
    
                int cnt = args.Length;
    
                if (cnt > 0)
                {
                    path = args[0];
                }
    
    
    
                StreamReader sr = new StreamReader(path);
                content = sr.ReadToEnd();
                sr.Close();
    
    
    
    
                StringBuilder sb = new StringBuilder(); 
                for (int i = 0; i < content.Length; i++) {
                    if (content[i] == '')
                    {
                        sb.Append('');
                    }
                    else if (content[i] == '')
                    {
                        sb.Append('');
                    }
                    else if (content[i] == ',')
                    {
                        sb.Append('.');
                    }
    
                    else
                    {
                        sb.Append(content[i]);
                    }
                }
    
    
               
                StreamWriter sw = new StreamWriter(path);
                sw.Write(content);
                sw.Close();
                
    
                Console.Write(content);
                Console.ReadLine();
    
            }
        }
    }
  2. #2
  3. No Profile Picture
    Contributing User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Oct 2004
    Posts
    136
    Rep Power
    10
    I just tried changing the encoding this way:

    Code:
                    byte[] data = new byte[test.Length];
                    test.Read(data, 0, (int)test.Length);
                    int testlength = (int)test.Length;
                    Encoding.Convert(Encoding.ASCII, Encoding.UTF8, data);
                    test.Close();
    
                    FileStream testwrite = new FileStream(path, FileMode.Truncate, FileAccess.Write);
                    testwrite.Write(data, 0, testlength);
                    testwrite.Close();
    
    
                    content = Encoding.UTF8.GetString(data, 0, testlength);
    But it seems like the "special characters" gets removed in the process.. Any ideas anyone?
  4. #3
  5. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Oct 2012
    Posts
    1
    Rep Power
    0
    Have you tried using unicode number instead of literal in the comparisons ?
  6. #4
  7. No Profile Picture
    Contributing User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Oct 2004
    Posts
    136
    Rep Power
    10
    Thanks for the reply,

    I figured it out :-)


    Solution:

    Code:
    StreamReader sr = new StreamReader(path, Encoding.Default);
    content = sr.ReadToEnd();
    Encoding encoding = sr.GetEncoding();
    sr.close();
    then I made a check if the encoding is not UTF8, do the following:

    Code:
    byte[] encBytes = encoding.GetBytes(content); 
    byte[] utf8Bytes = Encoding.Convert(encoding, Encoding.UTF8, encBytes);
    After that I just rewrote the value of Encoding.UTF8.GetString(utf8Bytes) into the file. Then I got it correct

IMN logo majestic logo threadwatch logo seochat tools logo