June 11th, 2011, 06:32 PM
Trying to understand mechanics of basic DNS/HTTP proxy
I have just been asked to write a DNS proxy and an HTTP proxy in C++. To be honest, I really don't have the experience, but I did let the client know that I would have to do a lot of research to figure this out and that I have never written a DNS or HTTP proxy before. I know that after reading my questions many of you will laugh that I am too ignorant about DNS and HTTP proxies to possible succeed at this project. None the less, my questions are very straight forward and I would certainly appreciate some direction.
So now I find myself sitting in front of the computer and wondering exactly what it is that I need to accomplish. The following lists the core requirements as I understand it:
Note: Client = someone requesting a website from a computer that is behind a router that has been configured to use the DNS proxy to resolve addresses.
- Client requests www.google.com
- Dns Proxy checks to see if client is allowed to request www.google.com. If client is allowed to request www.google.com, then DNS proxy somehow directs client to HTTP proxy, otherwise DNS Proxy directs client to a site that explains that they are not permitted to visit www.google.com. By the way, google is just an example here because I am not that creative
- Suppose that the client is permitted to visit www.google.com. Then the DNS proxy direct client to http proxy. The http proxy allows the client access to the content at www.google.com and records the number of bytes returned to the client from www.google.com.
- Client makes another request and the process begins again.
That is it in a nutshell. Here are my questions that I hope will provide me with enough understanding to begin a solution. The main problem is that I do not understand the mechanics involved:
- Does the DNS proxy simply return the IP address of whatever host the client should be directed to? So for example, would the DNS proxy return the IP address of the http proxy if the client is permitted to view the content? Then the DNS proxy would return the IP address of the site that explains that the client is not allowed to view the content otherwise? In other words, does a DNS proxy simply return an IP Address?
- Does an http proxy request the content from www.google.com in bytes and then return those bytes to the client as a middle man?
- What online resources should I be using to gain enough understanding to complete this project.
June 11th, 2011, 09:22 PM
Playing with internet traffic is always a sketchy thing. There are many programs out there that do what I think you are trying to accomplish. Not too sure about how they do it, but I've seen many variations.
One of the more famous of these would be Sandvine. I will tell you now, don't really bother trying to find out how their system works. I've worked with it in the past in one of my networks and it is a truly a black box. Sandvine is very secretive about it for good reasons. That being said, it works very well.
The system is known as a DNS Traffic Switch (or DTS for short). It was developed by Simplicita Software who was later acquired by Sandvine. What the system does is pretty much what you are describing. It looks at DNS queries and responses based on policies set up by the administrator. It's primary focus is for revenue through search engines when someone makes a typo.
This is done by looking for NXDOMAIN responses in an answer for a query. The system re-writes the packet and injects the IP for a search page based on the admin's choosing. This of course is always a controversial topic with those nut jobs worrying that their traffic is being messed with, however the system is brilliant.
I wouldn't even know where to begin with developing a system like that since I have no decent concept of programming. But I would imagine if you are wanting to develop a system of a whitelist/blacklist, it should be easier than re-writing packets as described above.
This is very abstract but I'll describe what I would expect it to be. First of course you would need a server. The fact the device is behind a router shouldn't really matter. Just make sure the networking is done right. This server would need DNS server software (I'm a BIND fan so I would recommend that) and some special software to interface with it.
Firstly, get the DNS software working so that clients may query the server and it can recursively get answers. That should be the easiest part. Now you would need to write some special software to interface with it. So if a domain is added as a blacklisted site, it will actually just modify the DNS server. Let's say you blacklist google.com. Make the software write a zone in the DNS software for google.com. This will mean that DNS server is now authoritative and will answer queries for anything for google.com. So just set the servers IP in that zone, that way any queries for the domain will be directed to that server. Next would be design a splash page that you want the user to see if tehy try a blacklisted domain. The server will need something like apache so it can act as a web server.
So now a query for google.com will go to the server, the server will not recursively look up google but rather respond with it's own IP since it is authoritative. Now the client will receive the server IP and connect to it. Now they will get the splash page of whatever you set.
This is the simplest way I can think to do something like that. If you ran this on a linux server, it could be done with just some simple scripting. If you want to record traffic size and all that jazz, again on a linux server there are many programs such as tcpdump and snort that can easily do that. If you really wanted one machine, you could make a linux server be the web server, DNS server, router and firewall. Market it as a single solution system.
However, this seems like scalability would be impossible unless you start implementing a database type system to track each users whitelist/blacklist based on IP address (or mac address). I'm sure this or some variation of it already exists somewhere online. It may even be some simple linux program. I didn't find much, but program wise, not too sure what to look for.
But there's always The Cloud as a solution as well. A company called AppRiver has a cloud based solution that requires no hardware. And it comes with their high end filtering software.
Sorry for the wall of text response. It sounds like you're diving in to a big project