#1
  1. not a fan of fascism (n00b)
    Devshed Frequenter (2500 - 2999 posts)

    Join Date
    Feb 2003
    Location
    ct
    Posts
    2,756
    Rep Power
    95

    how to create a 'sub-language' in C


    if anyone if familiar with tcpdump and its packet matching expressions, please read on...

    i want to create a similar packet matching pseudo language. the problem with tcpdump/libpcap is that it emulates the BPF in user space resulting in EVERY packet being copied, which makes little to no sense when one can use the LSF instead and filter packets b4 crossing from kernel to user space drastically decreasing the cpu/mem load. the reason libpcap doesnt do this is b/c it is platform independent and not all *nix systems have LSF. but back to my question, how does one go about creating a pseudo language? an example of what i want to do would be this:

    expr: proto tcp and port 23
    code:
    Code:
      struct sock_filter      f_tlnt[] =              //catches any  telnet packets
    {                                               //catch seq: Match ip->tcp->src port 23->dest port 23->rejected
            BPF_STMT(BPF_LD + BPF_H + BPF_ABS, E_OFF),                      //VERIFY IP
            BPF_JUMP(BPF_JMP + BPF_JEQ + BPF_K, ETH_P_IP,  0, 9),           //...
            BPF_STMT(BPF_LD + BPF_B + BPF_ABS, IP_PRO_OFF + IP_H_OFF),      //VERIFY TCP
            BPF_JUMP(BPF_JMP + BPF_JEQ + BPF_K, IPPROTO_TCP,  0, 7),        //....
            BPF_STMT(BPF_LDX + BPF_B + BPF_MSH, IP_H_OFF),                  //LOADS IP HEADER LEN INTO INDEX REGISTER 
            BPF_STMT(BPF_LD + BPF_H + BPF_IND, ETH_HDRLEN + LSF_TCP_SP),    //loads tcp source into accumulator
            BPF_JUMP(BPF_JMP + BPF_JEQ + BPF_K, TLNT, 2, 0),                //test for telnet port, jump to next to test if no
            BPF_STMT(BPF_LD + BPF_H + BPF_IND, ETH_HDRLEN + LSF_TCP_DP),    //load dest port
            BPF_JUMP(BPF_JMP + BPF_JEQ + BPF_K, TLNT, 0, 2),                //est for telnet
            BPF_STMT(BPF_LD + BPF_H + BPF_ABS, IP_LEN_OFF),                 //load the ip total len into Accumulato
            BPF_STMT(BPF_RET + BPF_A, 0),                                   //return packet length
            BPF_STMT(BPF_RET + BPF_K, 0)                                    //return 0 NO MATCH
    
    };
    so as you can see a filter is just an array of instructions. those macros evaluate to simple 8 byte structure like this:
    2bytes,1byte, 1byte, 4bytes
    { 0x28 , 0, 0, 0x0000000c },
    all the macros do is add up the constants passed and put them in proper order in struct.
    so how would i go about generating that type of stuff dynamically? am i incorrectly judging the complexity of this? does anyone have any examples of something similar?
  2. #2
  3. I'm Baaaaaaack!
    Devshed God 1st Plane (5500 - 5999 posts)

    Join Date
    Jul 2003
    Location
    Maryland
    Posts
    5,538
    Rep Power
    244
    I am not sure what you are going after, but I am very interested in packet filtering (am diddling with a sniffer now) and would like to keep abreast of your work if you don't mind. Is there any chance you are working in Windows? If you will take another stab at explaining what you are trying to do I am happy to try and help.

    My blog, The Fount of Useless Information http://sol-biotech.com/wordpress/
    Free code: http://sol-biotech.com/code/.
    Secure Programming: http://sol-biotech.com/code/SecProgFAQ.html.
    Performance Programming: http://sol-biotech.com/code/PerformanceProgramming.html.
    LinkedIn Profile: http://www.linkedin.com/in/keithoxenrider

    It is not that old programmers are any smarter or code better, it is just that they have made the same stupid mistake so many times that it is second nature to fix it.
    --Me, I just made it up

    The reasonable man adapts himself to the world; the unreasonable one persists in trying to adapt the world to himself. Therefore, all progress depends on the unreasonable man.
    --George Bernard Shaw
  4. #3
  5. Contributing User
    Devshed Supreme Being (6500+ posts)

    Join Date
    Jan 2003
    Location
    USA
    Posts
    7,174
    Rep Power
    2222
    Infamous, do you mean you want to create a small scripting language?

    That would involve lexical analysis and parsing. UNIX/Linux offers two tools: lex and yacc -- the GNU forms are flex and bison. lex generates the grammar file which yacc then uses to generate C source code that will do the parsing.

    Of course, as usually I only know about them and haven't had time to play with them yet.
  6. #4
  7. not a fan of fascism (n00b)
    Devshed Frequenter (2500 - 2999 posts)

    Join Date
    Feb 2003
    Location
    ct
    Posts
    2,756
    Rep Power
    95
    thank you much dwise i think that's exactly what im looking for. i'll have to do some reading :)

    mitakeet<< cool im glad your interested in this stuff as well. i've been eating up as much info on the subject as i can get my hands on. i am working on linux, but i will try to give you a better explanation of what i'm doing since u seem interested.
    (also humour me if i tell you stuff u already know )

    Tools like tcpdump/ethereal/packit are all based on libpcap, the portable packet capturing library. although it's handy and extremely easy to use, i've found that it also has some limitations. Due to its 'portability' it is forced to filter packets in user space. This means that every single packet that comes across the wire has to be passed from the kernel into the user space. this is done with one call to recvfrom() per packet. obviously the overhead and amount of packets dropped increases significantly with a high network load. Now usually on a Unix system you would use the BPF(berkely packet filter) to filter out packets in the kernel. You apply a filter like the one i have above, and only the packets you desire are passed to the application. The performance increase is pretty much related to the amount of traffic on the network. So on a very busy box(i.e. a box doing NAT/masquerading) you wont be losing packets like you would w/ pcap. Now on a pure linux box, there is the LSF(linux socket filter). It uses the exact same instruction set as the BPF but the creation/application is much easier. All you do is create a filter and setsockopt() to apply it. So, what i want to do is create a simple pseudo-language to go from human readable(proto ip and port 23) to BPF/LSF instructions. i know ur on windows, but if you want to check out some source i have TONS of crap i've written.
  8. #5
  9. I'm Baaaaaaack!
    Devshed God 1st Plane (5500 - 5999 posts)

    Join Date
    Jul 2003
    Location
    Maryland
    Posts
    5,538
    Rep Power
    244
    Thanks for the info! I would like to keep up with what you are doing, but I am off to the Philippines for 2.5 weeks in a week and am too busy on other stuff to look at my proto-sniffer for now. When I get back I would like to restart communication with you and try to cross polinate our ideas.

    I know that it is a non-trivial pain in the *** to build a packet filter in Windows, which is why I am starting with a sniffer (I have the Win2K DDK but haven't even figured out (yet, I hope) how to compile their own examples!). However, I expect that the bulk of the sniffing code should work unchanged once it has moved to kernel space and all I have to do (I hope) is add a few lines of code to drop the packet if I don't like it's smell. I also plan on a lot of logging and want to build a Java interface to view the results of the sniffed packets (on my way (I hope) to building an intrusion detection system). I have heard of WinPCap, but haven't had time to explore it yet to find out if it will work as a filter. I have some tiny bits of code that read and process packets (some of it is posted around here somewhere), but haven't had the time to do anything interesting yet.

    As for your idea of a script, I know from extensive personal experience that parsing text to convert to some sort of machine meaningful data can be a pain. You might want to look into ASN.1 (the precursor to XML and what is used for network management) as a source of inspiration and possible tons of free code.

    My blog, The Fount of Useless Information http://sol-biotech.com/wordpress/
    Free code: http://sol-biotech.com/code/.
    Secure Programming: http://sol-biotech.com/code/SecProgFAQ.html.
    Performance Programming: http://sol-biotech.com/code/PerformanceProgramming.html.
    LinkedIn Profile: http://www.linkedin.com/in/keithoxenrider

    It is not that old programmers are any smarter or code better, it is just that they have made the same stupid mistake so many times that it is second nature to fix it.
    --Me, I just made it up

    The reasonable man adapts himself to the world; the unreasonable one persists in trying to adapt the world to himself. Therefore, all progress depends on the unreasonable man.
    --George Bernard Shaw
  10. #6
  11. not a fan of fascism (n00b)
    Devshed Frequenter (2500 - 2999 posts)

    Join Date
    Feb 2003
    Location
    ct
    Posts
    2,756
    Rep Power
    95
    i just had an idea that i think will make what i want to do much simpler. the organization of the packet matching routine is hierarchical, the interpreter starts at the first instruction and moves down, jumps can only go forward. so im thinking that this format would lend itself to concatenation. a lot of the matching codes are very similar with only small differences so i think i could build up a routine dynamically; at least that's what im going to try first.

IMN logo majestic logo threadwatch logo seochat tools logo