#1
  1. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Nov 2011
    Posts
    3
    Rep Power
    0

    Count all where a <= t and b > t is slow


    Setup:

    SQL Code:
     
    USE test;
     
    CREATE TABLE [dbo].[test](
    	[ID] [INT] IDENTITY(1,1) NOT NULL,
    	[TEST_ID] [INT] NOT NULL,
    	[STARTTIME] [DECIMAL](18, 2) NULL,
    	[ENDTIME] [DECIMAL](18, 2) NOT NULL,
     CONSTRAINT [PK_performance] PRIMARY KEY CLUSTERED 
    (
    	[ID] ASC
    )WITH (PAD_INDEX  = OFF, STATISTICS_NORECOMPUTE  = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS  = ON, ALLOW_PAGE_LOCKS  = ON) ON [PRIMARY]
    ) ON [PRIMARY]
     
    CREATE TABLE [dbo].[runconcurrency](
    	[ID] [INT] IDENTITY(1,1) NOT NULL,
    	[TEST_ID] [INT] NOT NULL,
    	[STARTTIME] [DECIMAL](18, 2) NOT NULL,
    	[CNCY] [INT] NOT NULL,
     CONSTRAINT [PK_runconcurrency] PRIMARY KEY CLUSTERED 
    (
    	[id] ASC
    )WITH (PAD_INDEX  = OFF, STATISTICS_NORECOMPUTE  = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS  = ON, ALLOW_PAGE_LOCKS  = ON) ON [PRIMARY]
     
     
    INSERT INTO [dbo].[test] (TEST_ID, STARTTIME, ENDTIME) VALUES (11, 1, 3);
    INSERT INTO [dbo].[test] (TEST_ID, STARTTIME, ENDTIME) VALUES (11, 4, 5);
    INSERT INTO [dbo].[test] (TEST_ID, STARTTIME, ENDTIME) VALUES (11, 5, 6);
    INSERT INTO [dbo].[test] (TEST_ID, STARTTIME, ENDTIME) VALUES (11, 4, 7);
    INSERT INTO [dbo].[test] (TEST_ID, STARTTIME, ENDTIME) VALUES (11, 6, 8);
    INSERT INTO [dbo].[test] (TEST_ID, STARTTIME, ENDTIME) VALUES (11, 8, 10);
    INSERT INTO [dbo].[test] (TEST_ID, STARTTIME, ENDTIME) VALUES (12, 2, 3);
    INSERT INTO [dbo].[test] (TEST_ID, STARTTIME, ENDTIME) VALUES (12, 2, 3);
    INSERT INTO [dbo].[test] (TEST_ID, STARTTIME, ENDTIME) VALUES (12, 3, 6);
    INSERT INTO [dbo].[test] (TEST_ID, STARTTIME, ENDTIME) VALUES (12, 4, 7);


    The data above represents actions running in parallel within a TEST_ID, where the start and end times are in seconds from a known point in time. So pictorially something like this:
    TIME........0....1....2....3....4....5....6....7....8....9....10
    ID..TEST_ID
    1...11......0....XxxxxxxxxxX....................................
    2...11......0...................XxxxxX..........................
    3...11......0........................XxxxxX.....................
    4...11......0...................XxxxxxxxxxxxxxxX................
    5...11......0.............................XxxxxxxxxxX...........
    6...11......0.......................................XxxxxxxxxxxX
    7...12......0.........XxxxxX....................................
    8...12......0.........XxxxxX....................................
    9...12......0..............XxxxxxxxxxxxxxxX.....................
    10..12......0...................XxxxxxxxxxxxxxxX................
    ID..TEST_ID
    TIME........0....1....2....3....4....5....6....7....8....9....10

    I need to work out how many actions are running at the same time, within a TEST_ID, to give me a value for concurrency e.g.
    For TEST_ID 11, at TIME=6 there are 2 actions UNDERWAY or STARTING - ID 4 and 5. ID 3 is excluded as it COMPLETED at TIME=6.
    For TEST_ID 12 it would be 1 UNDERWAY ID 10.

    For this I'm using the rules:
    1. I'm only interested in concurrency when an action starts, so take the distinct STARTTIMEs (refer to them as TIME) for each TEST_ID.
    2. For each TIME value count the number of rows which have a STARTTIME and ENDTIME that straddle the value (STARTTIME can be equal to or less than TIME).
    3. Store the value calculated in the 2nd table.

    Overall to have a results set that looks like:
    ID..TEST_ID..STARTTIME..CNCY
    ---------------------------------------------
    1...11.......1..........1
    2...11.......4..........2
    3...11.......5..........2
    4...11.......6..........2
    5...11.......8..........1
    5...12.......2..........2
    6...12.......3..........1
    7...12.......4..........2
    (storing only the distinct STARTTIMEs)


    So my question is, what is the best way to work out the CNCY count and store it, when the processing has to deal with multiple TEST_ID values, each with pontentially hundreds of thousands of rows on test?

    ATTEMPT 1
    SQL Code:
     
    	USE test;
    	DECLARE @START_ID_TO_PROCESS INT;
    	DECLARE @END_ID_TO_PROCESS INT;
    	SELECT @START_ID_TO_PROCESS=11;
    	SELECT @END_ID_TO_PROCESS=12;
     
    	WITH e1 AS
    		(SELECT
    			DISTINCT(STARTTIME) AS 'TIME'
    			, TEST_ID
    		FROM test
    		WHERE TEST_ID BETWEEN @START_ID_TO_PROCESS AND @END_ID_TO_PROCESS
    		)
    	SELECT 
    		TIME
    		, COUNT(*) AS 'CNCY'
    		FROM test t
    		INNER JOIN e1 ON
    			(STARTTIME <= TIME
    			 AND ENDTIME > TIME)
    		WHERE t.TEST_ID = e1.TEST_ID
    		GROUP BY TIME;

    Which gives:
    TEST_ID..TIME..CNCY
    11.......1.00..1
    11.......4.00..2
    11.......5.00..2
    11.......6.00..2
    11.......8.00..1
    12.......2.00..2
    12.......3.00..1
    12.......4.00..2

    Right result but it became really slow as the number of rows increased.




    ATTEMPT 2
    Using a cursor, in a stored procedure, and adding in saving the CNCY value. I'm pretty sure this 2nd attempt is really no different to attempt 1 in deriving the CNCY value.
    SQL Code:
     
    	USE test;
    	DECLARE @START_ID_TO_PROCESS INT;
    	DECLARE @END_ID_TO_PROCESS INT;
    	SELECT @START_ID_TO_PROCESS=11;
    	SELECT @END_ID_TO_PROCESS=12;
     
    	DECLARE agent_counts_to_process CURSOR FOR
    		SELECT
    			DISTINCT(STARTTIME)
    			, TEST_ID
    			FROM test
    			WHERE TEST_ID BETWEEN @START_ID_TO_PROCESS AND @END_ID_TO_PROCESS
    			ORDER BY TEST_ID, STARTTIME;
     
    	OPEN agent_counts_to_process
    		DECLARE @TIME DECIMAL(18,2);
    		DECLARE @ID_TO_PROCESS INT;
    		FETCH NEXT FROM agent_counts_to_process INTO @TIME, @ID_TO_PROCESS
     
    		WHILE (@@FETCH_STATUS <> -1)
    			BEGIN
    			IF (@@FETCH_STATUS <> -2)
     
    				INSERT INTO runconcurrency (TEST_ID, STARTTIME,CNCY)
     
    				SELECT
    					@ID_TO_PROCESS
    					, @TIME
    					, COUNT(*)
    				FROM test
    				WHERE TEST_ID = @ID_TO_PROCESS
    				AND STARTTIME <= @TIME
    				AND ENDTIME > @TIME
    				GROUP BY TEST_ID
     
    				FETCH NEXT FROM agent_counts_to_process INTO @TIME, @ID_TO_PROCESS
    			END
    	CLOSE agent_counts_to_process
    	DEALLOCATE agent_counts_to_process;

    Same results set as Attempt 1. The cursor took 2hrs to process 137381 rows from test, for one TEST_ID, and resulted in inserting 116426 rows into runconcurrency.

    Is there a better way to do this? In the course of trying to solve this and other issues, I've added a number of non-clustered indexes to the test table, which I'm going to drop and use the 'actual execution plan' to check which ones are definitely required again.


    EDIT: Using Microsoft SQL Server 2008 R2 & add SQL tags
    Last edited by efj; November 9th, 2012 at 02:54 PM. Reason: Add version and SQL code format tags

IMN logo majestic logo threadwatch logo seochat tools logo