#1
  1. No Profile Picture
    Dazed&Confused
    Devshed Novice (500 - 999 posts)

    Join Date
    Jun 2002
    Location
    Tempe, AZ
    Posts
    506
    Rep Power
    128

    C, Linux - Decoding URLs


    Greetings...

    I'm working on a program that parses Apache log files and manipulates, then records, their data. I'm nearly finished but am running into a small problem with pulling apart search keywords from the refering page, for example:

    Code:
    http://search.yahoo.com/search?p=this+is+a+test&ei=UTF-8
    I've got it stripping out everything I don't need, leaving the 'this+is+a+test' remaining. Substituting the pluses for spaces is easy enough, but the problem comes in if there are any special characters encoded in the %## format.

    I have an idea of how I can translate these, but I don't doubt it's a verbose and inefficient way, so I was wondering if anyone knows of a function that already does this. Not being familiar with any of this, the answer could well be right in front of me and I'm not seeing it.

    Anyway... any info is appeciated.
    Last edited by dmittner; June 20th, 2003 at 04:26 PM.
  2. #2
  3. Banned ;)
    Devshed Supreme Being (6500+ posts)

    Join Date
    Nov 2001
    Location
    Woodland Hills, Los Angeles County, California, USA
    Posts
    9,648
    Rep Power
    4248
    Well, those numbers represent the ascii value of the character, in hex. E.g. %20 - 20 in hex is 32 in decimal, which is the ascii code for the space character. You could use sscanf() to translate the hex value to int and then cast that to a char.
  4. #3
  5. No Profile Picture
    Dazed&Confused
    Devshed Novice (500 - 999 posts)

    Join Date
    Jun 2002
    Location
    Tempe, AZ
    Posts
    506
    Rep Power
    128
    Thanks for the reply, but I ended up taking the verbose route, I just go through the string, pull the next two characters if it hits a percent sign, finds out what to substitute with a bunch of if statements, then adds it to a new string. Probably not full-proof, but good enough for now.

IMN logo majestic logo threadwatch logo seochat tools logo