#1
  1. No Profile Picture
    Contributing User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Aug 2000
    Location
    wendell, mass, USA
    Posts
    87
    Rep Power
    15

    Multiple HTTP file downloader


    I'm building a music site and want to give users the opprotunity to download multiple files at once, or at least in a que, from a client side app, activated from the web. Sorta like Audio Galaxy (actually exactly like audio galaxy) I want to build it in pythin / tk because I don't know Java, any one know if someting like this already exists and if not any clues on how to build it to interact with the browser?(I'm clueless)
  2. #2
  3. No Profile Picture
    Contributing User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Dec 2001
    Location
    Houston, TX
    Posts
    383
    Rep Power
    13
    Well, from my limited use of the AG client, I remember this -
    1) you had to sign up for an account
    2) you had to run a client on your own machine that was locally set up with the account information from step 1
    3) you made your download choices via the web

    Now, ignoring the fact that P2P services like this do exist already, and assuming this is for a learning experience type thing, here's how I'd approach this.


    First, design a sort of account system. I'd recommend using a database of some sort, though flat files would do. Make it extensible as you don't really know what's going to go into it yet at this stage.



    Next, set up a sort of catalog system that will catalog all the files you are going to have listed on your system. I'd suggest using some sort of hashing algorithm that incorporates things like filename, file size, and possibly things like the mtime (modification date/time), and if at all possible, the file type (and of course, the file content itself). These will be stored along with filenames so that two exact copies of a file will always match (ideally), whereas two same-named copies will not. You have to have the filename because this is what people will search on. You can choose to store other metadata as well, if you want to search on that, but filenames are pretty much a must. All this stuff should be stored in a database - flat file access would probably be too slow for anything sizeable. Store only unique hash values, so that duplicate files don't need to be stored twice.

    Then, set up the server side - build a search mechanism that searches through the DB based on things like filename (and whatever other metadata you have above). Based on the hash string that you get back from those searches, you should be able to search through the user catalogs for anything with that hash value, and build the list from there.


    Lastly, you have to figure out how you want to get the transfer accomplished. Basically you want a sort of trivial FTP going on, using the main server as a sort of proxy. A simple mechanism would be to just pull the requested file from a client that has it (obviously this can be made spiffier by choosing a faster client or spreading fragments across various clients with the file), and to send the data to the client as soon as you get it. Whether you want to push TO the client or have the client pull FROM the server is up to you.

    That's a basic sort of "skeleton" for what would need to be done, any questions?
    Debian - because life's too short for worrying.
    Best. (Python.) IRC bot. ever.
  4. #3
  5. No Profile Picture
    Contributing User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Aug 2000
    Location
    wendell, mass, USA
    Posts
    87
    Rep Power
    15

    should have explained better


    Thanks for the reply, although my real question only involves the final paragraph, I already have the account system, the database of files, etc. Currently the user goes to a "download locker" page which shows all the files they are allowed to download and they can click a link which sets a mime type and streams the file to their browser (for security). The problem with this is, they have to download the files one by one and hit save and choose a location, etc. etc. Users have requested a client side app (windows probably) that will do 1 one two things:

    1. When user clicks on download, little client immediately begins downloading the file to a set download directory.

    2. The client connects to the server and via SOAP or the like, returns a recordset of available files and the client can than hit download (segwaying the locker thing)
  6. #4
  7. No Profile Picture
    Contributing User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Aug 2000
    Location
    wendell, mass, USA
    Posts
    87
    Rep Power
    15

    correction


    does 1 or 2 things
  8. #5
  9. No Profile Picture
    Contributing User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Dec 2001
    Location
    Houston, TX
    Posts
    383
    Rep Power
    13
    Well, I don't know the architecture of how you have things set up, but wouldn't it just be as simple as initiating a new connection between the client and the server for each download? In the server processing loop, just have it spawn a new thread that initiates a new connection for each file.
    Debian - because life's too short for worrying.
    Best. (Python.) IRC bot. ever.
  10. #6
  11. No Profile Picture
    Contributing User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Aug 2000
    Location
    wendell, mass, USA
    Posts
    87
    Rep Power
    15

    threads?


    I'm not sure I understand what you mean, the reason I'm writing is because I don't have a clinet app to do multiple downloads, and I'm trying to figure out the best protocol / language , etc.
  12. #7
  13. No Profile Picture
    Contributing User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Dec 2001
    Location
    Houston, TX
    Posts
    383
    Rep Power
    13
    Well, if you want to do it like audiogalaxy does it, you simply set up something on the server that, when the user clicks a link, the IP address is snatched and assumed to be running a client daemon in the background (at least, this is the way it worked on Linux, I never tried it anywhere else). It also notices what file (filename + hash code if you use the above scheme) the user requested and it queries the DB for a place to get that file (or, perhaps you already have it cached from when the user did the filename search). It then basically starts sending the contents of that file to the client daemon running on the user's computer, which they have configured to place downloaded (really uploaded) files into a certain directory.

    As far as what language you want to do it in, I assumed it was Python since you are posting here Protocol - TCP and FTP is fine.

    Threading in Python (should you choose to use that) is dead simple. Just look at the threading module docs to get started and ask questions back here if you have any. This will allow you to handle many connections much more quickly.
    Debian - because life's too short for worrying.
    Best. (Python.) IRC bot. ever.
  14. #8
  15. No Profile Picture
    Contributing User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Aug 2000
    Location
    wendell, mass, USA
    Posts
    87
    Rep Power
    15

    wow! thanks for the help


    Awesome advice!

    Now if I could pick your brain just a little bit more....

    So you're saying I should basically write an FTP server that the client runs. This seems like an okay idea, except, I'd prefer that the PUT happened via HTTP because port 21 is often restricted, is this possible?

    Now that I think about it, it is obvious that audiogalaxy pushed files, because they knew which ones you had downloaded already. I don't know if this would be too convoluted or actually help, but check it out:

    Client clicks,
    webapp pushes an XML doc specifying a link to the file, and any meta data that the client may want to edit before saving (value-add). The clientapp then connects via FTP to download the file, oh okay problem, the FTP is unsecure, that won't work. I guess I will have to do a push, or retrieve by HTTP. I'm just worried that setting up a server on a clients machine is

    1. Risky

    2. A pain, because of routers and firewalls.

    What do you think?
    Last edited by hanumani; December 11th, 2002 at 11:02 AM.
  16. #9
  17. No Profile Picture
    Contributing User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Dec 2001
    Location
    Houston, TX
    Posts
    383
    Rep Power
    13
    Sounds like you've just about got it all down. Yeah, there's inherent risks/troubles with the method I described. I was just describing the way that it seemed audiogalaxy worked. I honestly don't know if they simply pushed request info to the client for them to then request, or if they actually pushed the data itself. I do know, however, that the latter is quite possible. They could have essentially been running some sort of FTP server (with what authentication scheme, I have no clue), on port 6326 (or was it 6236?). Higher port numbers generally aren't such a problem with firewalls, nor as a security risk. Where you go from here ... is up to you
    Debian - because life's too short for worrying.
    Best. (Python.) IRC bot. ever.

IMN logo majestic logo threadwatch logo seochat tools logo