#1
  1. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Mar 2005
    Posts
    15
    Rep Power
    0

    Filesystem structure for website with a large amount of uploads


    I'm working on a web project for a client, and it's going to rely heavily on user uploads. The current issue I'm struggling with is how exactly to design the filesystem structure. The reason I'm coming here is to see if there are any downsides to having say.... 50,000 subdirectories under an uploads/ directory? What about 1,000,000? Is there a limit to directories in a linux filesystem? I'm of course going to segment the directories into chunks of x number of uploaded files, but am currently determining what x number is. Any advice on filesystem design if you were creating a youtube/flickr type site that could possibly have millions of uploads?
  2. #2
  3. No Profile Picture
    Google's No1 Supporter!
    Devshed Novice (500 - 999 posts)

    Join Date
    Jan 2007
    Location
    The Crisp Packet!
    Posts
    603
    Rep Power
    152
    Originally Posted by dgath
    if there are any downsides to having say.... 50,000 subdirectories under an uploads/ directory? What about 1,000,000?
    Yes! It takes a lot longer to search through and find the relevant file. Also when wanting to list the directory contents you will notice the diffence here.
    Originally Posted by dgath
    Is there a limit to directories in a linux filesystem?
    No particularly documented limit. The most likely limit is the number of subdirectories it can cope with in any one path (Windows suffers from this, and I seem to remember *nix does ... just copes with a few more before the problem becomes apparent).
    Originally Posted by dgath
    Any advice on filesystem design if you were creating a youtube/flickr type site that could possibly have millions of uploads?
    Your file system may not be so much the issue. Perhaps this issue is HOW are you going to search and select for the relevent file? One possibility is a Database. This would provide much quicker searching. And then grab the files from the full paths that are stored in a table. This way you would need little filesystem optimising as you won't be directory listing or searching.

    HTH
    Did this post help? Please Click The Next To My Post
    Need help? Did you try Google?
  4. #3
  5. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Mar 2005
    Posts
    15
    Rep Power
    0
    Yeah, I'll be using a database and that will store the filename. I just need to figure out the best way to divide them up in the filesystem. I think I'll divide them up into 1000 directory chunks.
  6. #4
  7. המבין יבין
    Devshed Regular (2000 - 2499 posts)

    Join Date
    Jul 2001
    Location
    Haifa
    Posts
    2,085
    Rep Power
    1485
    What I do is store the filename in the database and the file itself is stored as a file who's name is a random hash. These are stored in a two-level directory like this:
    Code:
    /a/a
    /a/b
    /a/c
    /b/a
    /b/b
    /b/c
    Because the filenames are hashes, the files are evenly spread out across the directories. Thus no one becomes much more full than the others. And users cannot guess at filenames to try to outsmart the system looking for files. Relying on referer data is faulty because not all UAs send that data.

    What filesystem are you using? I hope it's not FAT.
    . . . What is Firefox?
    . . . . . . What is Linux?
    . . . . . . . . . . . What is Love?

IMN logo majestic logo threadwatch logo seochat tools logo