Hi, this problem have been nagging my mind lately. Im not sure if the topic is ok for this forum though
In my university, there are several ftp servers that share all type of stuff, like documents, shareware packages, linux distributions, etc.
I was writing a PHP script that every once in a while, connects to all of the ftp servers, inserting the directory listing information in a mysql database. Then, with the aid of another PHP script, you could do searches upon the database, which turned out to be really cool and useful.
The DB schema is as follows: It has 4 tables, 'paths','files','site_info' and 'ftp_data'. The first three just consist of a unique ID and the info, and 'ftp_data' makes the relation between those three. That way, I can somehow use DB space more efficiently without repeating path and file names.
Then, the query would do a case-insensitive LIKE lookup on the 'path' and/or 'files' tables, get the ID's and JOIN them with the ftp_data table, and VOILA, it works
The bad part of all this, is that the system is very slow. In a pentium 100mhz (Well, i know it sucks but its what ive got to make experiments), a query would take about 5-10 minutes to get itself done. No indexing on the tables would help much, as the LIKE search doesnt have any use of them.
Maybe someone could come up with a more efficient way of indexing the path and file tables? We could work on the basis that we have a lot of HD but 'slow' machines. The use of Mysql isnt mandatory, maybe a text-based DB would be better?
.pd