
November 2nd, 2012, 08:53 PM
|
|
Registered User
|
|
Join Date: Nov 2012
Posts: 2
Time spent in forums: 34 m 52 sec
Reputation Power: 0
|
|
|
MySQL server setup
I'm relatively new to SQL and server setups.
I am trying to configure a mysql database for tracking data files from various research projects. I'm testing it with just one project and it currently contains references to 30 million files. The table contains 9 fields including datetime, path and filename and a few other fields about the equipment and site the datafile was collected at. Mostly varchar fields.
The table is indexed by datetime as virtually any query I would care to make on the database would include a timeframe of interest. It's relatively quick for single query but I'm trying to run an application to add new files to the database.
Due to the rate at which files get moved around on the data server I am scanning it in its entirety every time the application is run. The trouble is I need to compare each file found on the data server with the table in the database to see if it exists already. If it does, skip file, if not, insert new record.
The problem is, it has to query the database for each file it finds (about 30 million files). When it runs the queries sequentially it takes roughly 15 minutes to update the database. I figured if I split the queries up by year it would be faster. Run 10 queries at a time, one for each year. Well this slowed things a ton. After 45 minutes only 3 million files had been processed.
My question is, why. The table is innodb, each query is run in a separate connection to the database. The table is partitioned by year so in theory each query should only need to look at the particular year partition so there shouldn't be much contention between queries. I ran 'explain partitions' on the query and it does in fact only look at the partition containing the year of interest. I'm just not sure why mysql isn't scaling well with concurrent queries.
The mysql server is running on a 16 core server with over 400GB disk space (across 20+ drives in raid 6) and 48 GB memory.
The script is running on the data server with 16 cores, 20 disks in raid 6 + 64 GB memory. The network has a 40 MB/s transfer rate between servers.
The mysql server is the default mysql 5.1.61 setup. The only setting I've changed is setting innodb_buffer_size = 36G.
Anyone have any thoughts about why this is so slow and how I can speed things up?
|