### Thread: Optimizing Query that calculates, groups, and groups again

1. #### Optimizing Query that calculates, groups, and groups again

Seeing if there's any bright optimization ideas out there on a particular query need.

To simplify, imagine that we want to know how many kids in a class are present (P) between 0 and 50 percent of the time, and how many are present 50 to 100 percent of the time. Call the kids with under 50% attendance "bad" and the other ones "good".

The data in the attendance table may look like:

studentid | mark
-------------------
JACK | P
JACK | A
JACK | A
JACK | A
JACK | A
JACK | P
JILL | P
JILL | P
JILL | P
JILL | A

Now, Jack is a "bad" kid because he is absent over half the time, while Jill is a "good kid".

The resulting output should simply look like

array(
'good' => 1,
)

1 bad kid, and one good kid.

While a simple concept, the steps are expensive. The SQL first needs to calculate a percentage for each and every kid (not cheap in itself) and then aggregate the counts of those.

For just the first part - calculating the percentages - I've found that:

SELECT *, ((SELECT COUNT(*) FROM attendance WHERE mark = 'P' AND studentid=a.studentid)/(SELECT COUNT(*) FROM attendance WHERE studentid=a.studentid)) AS percentage FROM attendance AS a ... (join filters) ... GROUP BY studentid

works, but is very expensive (and I have hundreds of thousands of marks). And of course, this would still require either PHP manipulation of the results or more CASE syntax for the second step of aggregation.

Any thoughts on a different strategy? Of course, PHP manipulation from the start is also a possibility, but I'm again finding that solution is taking too long.

Thanks for any input,
Jeremy
2. Code:
```SELECT CASE WHEN percentage >= 50
THEN 'good'
, COUNT(*) AS total
FROM ( SELECT studentid
, 100.0 * COUNT(CASE WHEN mark = 'P'
THEN mark END) /
COUNT(*) AS percentage
FROM attendance
GROUP
BY studentid ) AS t
GROUP
BY category```

• daprezjer agrees
3. Good stuff. Really simple follow up.

There's a portion of the actual script that requires some calculation of db values to do multiple comparison values with the CASE statement. For simplicity sake, let's just say:

CASE
WHEN num/total > 5 OR num/total <=10 THEN 'top'
WHEN num/total > 0 OR num/total <=5 THEN 'bottom'
END

Is there a way to write this script without using num/total twice for each WHEN statement? I've seen something like:
WHEN num/total >5 <= 10 around as a shorthand, though I don't believe I ever got it to work. My (perhaps misplaced) worry is that it has to calculate num/total twice for just the one WHEN statement.
4. Originally Posted by daprezjer
My (perhaps misplaced) worry is that it has to calculate num/total twice for just the one WHEN statement.
i would be very surprised if the optimizer failed to recognize the opportunity to do one calculation instead of two