June 20th, 2003, 03:24 PM
C, Linux - Decoding URLs
I'm working on a program that parses Apache log files and manipulates, then records, their data. I'm nearly finished but am running into a small problem with pulling apart search keywords from the refering page, for example:
I've got it stripping out everything I don't need, leaving the 'this+is+a+test' remaining. Substituting the pluses for spaces is easy enough, but the problem comes in if there are any special characters encoded in the %## format.
I have an idea of how I can translate these, but I don't doubt it's a verbose and inefficient way, so I was wondering if anyone knows of a function that already does this. Not being familiar with any of this, the answer could well be right in front of me and I'm not seeing it.
Anyway... any info is appeciated.
Last edited by dmittner; June 20th, 2003 at 03:26 PM.
June 21st, 2003, 11:03 AM
Well, those numbers represent the ascii value of the character, in hex. E.g. %20 - 20 in hex is 32 in decimal, which is the ascii code for the space character. You could use sscanf() to translate the hex value to int and then cast that to a char.
June 21st, 2003, 12:21 PM
Thanks for the reply, but I ended up taking the verbose route, I just go through the string, pull the next two characters if it hits a percent sign, finds out what to substitute with a bunch of if statements, then adds it to a new string. Probably not full-proof, but good enough for now.