|
|
|||||||||
|
|||||||||
| |||||||||
|
|
|
| |||||||||
![]() |
|
|
«
Previous Thread
|
Next Thread
»
|
Thread Tools | Search this Thread | Rate Thread | Display Modes |
|
#1
|
|||
|
|||
|
Query that combines multiple date range records into a single record
I have a particularly knotty problem at the moment that is driving me insane trying to work out a solution, and hope that somebody on the forum might have had a similar issue before that they've been able to resolve.
Working on an HR system with data held on an Oracle 9 database, I have an absences table with the following key fields:
It seems simple enough: SELECT Person_ID, Count(*) FROM Absence_Table GROUP BY Person_ID Except that there's a couple of issues with the data that make it rather harder. Absences may be entered as a single record for the entire period; so Code:
Start_Date=1-Jan-2005 and End_Date=14-Jan-2005 Code:
Start_Date=1-Jan-2005 and End_Date=3-Jan-2005 Start_Date=4-Jan-2005 and End_Date=10-Jan-2005 Start_Date=11-Jan-2005 and End_Date=14-Jan-2005 And just for luck, I can get multiple records covering the same absence period, or that overlap periods of absence; so Code:
Start_Date=1-Jan-2005 and End_Date=14-Jan-2005 Start_Date=4-Jan-2005 and End_Date=10-Jan-2005 Code:
Start_Date=1-Jan-2005 and End_Date=10-Jan-2005 Start_Date=7-Jan-2005 and End_Date=14-Jan-2005 which should obly be counted as one absence. I can't clean up the core data, but I need to work out a select that will filter all these ambiguities. Code:
select * from (
select A1.date_start ,
A1.date_end
from tbl_absences A1
where A1.person_id = 65955
and not exists ( select A2.person_id
from tbl_absences A2
where A2.person_id = A1.person_id
and A2.date_start = A1.date_end + 1 )
and not exists ( select A3.person_id
from owbrun.per_absence_attendances A3
where A3.person_id = A1.person_id
and A3.date_end = A1.date_start - 1 )
MINUS
select A1.date_start,
A2.date_end,
from tbl_absences A1 ,
tbl_absences A2
where A1.person_id = 65955
and A2.person_id = A1.person_id
and A2.date_start = A1.date_end + 1 )
order by date_start ,
date_end
;
Which works well for two consecutive entries, but 3 consecutive entries cause problems. Likewise Code:
select Person_ID
MIN(date_start),
MAX(date_end),
from ( select date_start,
date_end,
date_start - rank() over ( partition by Person_ID
date_end,
order by date_start ) group_col
from tbl_absences )
group by person_id,
group_col
order by person_id,
group_col
;
Filters out double bookings with a matching start date, but can't handle true overlap bookings. Any help at fixing my logic, or suggestions for a totally new approach would be appreciated. |
|
#2
|
|||
|
|||
|
As an update, I've been working on the query and come up with the following SQL that breaks the date ranges down into 'virtual' records fro each individual date in the range, filters out the duplicates, and then recombines the remaining individual date records into rows for uniquely consecutive ranges.
Code:
SELECT person_id ,
reason ,
MIN(ActualDate) start_date ,
MAX(ActualDate) end_date
FROM ( SELECT person_id ,
reason ,
ActualDate ,
ActualDate - rank() OVER ( PARTITION BY person_id ,
reason
ORDER BY ActualDate ) group_col
FROM ( SELECT abs.person_id ,
abst.name Reason ,
DateRange.DateDay + TO_DATE('01-Jan-2000','DD-Mon-YYYY') ActualDate
FROM tbl_absences abs ,
tbl_absence_reasons abst ,
( SELECT ROWNUM - 1 DateDay
FROM sys.all_objects
WHERE ROWNUM <= TO_DATE('31-Dec-2020','DD-Mon-YYYY') - TO_DATE('01-Jan-2000','DD-Mon-YYYY') ) DateRange
WHERE DateRange.DateDay + TO_DATE('01-Jan-2000','DD-Mon-YYYY') >= abs.date_start (+)
AND DateRange.DateDay + TO_DATE('01-Jan-2000','DD-Mon-YYYY') <= abs.date_end (+)
AND abs.person_id = 65955
AND abs.absence_reason_id = abst.absence_reason_id
GROUP BY abs.person_id ,
abst.name ,
DateRange.DateDay ) )
GROUP BY person_id ,
reason ,
group_col
;
The only change from my original request for help is that I've factored in the absence reason code that I hadn't mentioned previously. It works exactly as required, but with one problem. The inner select: Code:
SELECT ROWNUM - 1 DateDay
FROM sys.all_objects
WHERE ROWNUM <= TO_DATE('31-Dec-2020','DD-Mon-YYYY') - TO_DATE('01-Jan-2000','DD-Mon-YYYY') ) DateRange
generates a complete list of day records for every date between 01-Jan-2000 and 31-Dec-2020, which is extremely time consuming, as it occurs for every absence record on a large database; but if I try to replace TO_DATE('01-Jan-2000','DD-Mon-YYYY') and TO_DATE('31-Dec-2020','DD-Mon-YYYY') with the actual start and end dates from the absence record, Oracle complains about abs.date_start being an invalid identifier. I've brain overloaded on the basic query so far, and can't see the wood for the trees any more. Can anybody help me flush in the correct date values so I'm doing a minimum of work with the day generator loop? |
|
#3
|
|||
|
|||
|
Forget my witterings, I'm really brain-burned and the answer was so obvious when I looked at my SQL again:
Code:
SELECT person_id ,
reason ,
MIN(ActualDate) start_date ,
MAX(ActualDate) end_date
FROM ( SELECT person_id ,
reason ,
ActualDate ,
ActualDate - rank() OVER ( PARTITION BY person_id ,
reason
ORDER BY ActualDate ) group_col
FROM ( SELECT abs.person_id ,
abst.name Reason ,
DateRange.DateDay + abs.date_start ActualDate
FROM owbrun.per_absence_attendances abs ,
owbrun.per_absence_attendance_types abst ,
( SELECT ROWNUM - 1 DateDay
FROM sys.all_objects
WHERE ROWNUM <= 365 ) DateRange
WHERE DateRange.DateDay + abs.date_start >= abs.date_start
AND DateRange.DateDay + abs.date_start <= abs.date_end
AND abs.person_id = 65955
AND abs.absence_attendance_type_id = abst.absence_attendance_type_id
GROUP BY abs.person_id ,
abst.name ,
abs.date_start ,
DateRange.DateDay ) )
GROUP BY person_id ,
reason ,
group_col
;
Thanks for taking the time to read, but I'm considering this problem closed now. |
![]() |
| Viewing: Dev Shed Forums > Databases > Oracle Development > Query that combines multiple date range records into a single record |
| Thread Tools | Search this Thread |
| Display Modes | Rate This Thread |
|
|
|
|