Our company has a client that needs a specific report generated. They want to structure of the report to be like:
Each user and their first 40 websites they went to:
User1 SiteA Sent Received Total
User1 SiteB Sent Received Total
User1 SiteC Sent Received Total
User2 SiteA Sent Received Total
User2 SiteB Sent Received Total
My issue is on how I can accomplish this in a single query and still be able to limit the output to 40 sites per user reported on. A single query is highly desirable as the client has a lot of users and it is very time consuming to run the query for each user.
Here is the SQL I am using right now:
SELECT ROUND(CAST(SUM(COALESCE(w.sent,0)) as double)/1048576,4) as sent,
ROUND(CAST(SUM(COALESCE(w.rcvd,0)) as double)/1048576,4) as rcvd,
ROUND(CAST(SUM(COALESCE(w.rcvd,0)+COALESCE(w.sent,0)) as double)/1048576,4) AS total,
w.`user` as username,
from webfiltertable w
where tstamp >= ? and tstamp <= ? group by w.hostname, w.`user` order by username asc, total desc limit 40
This SQL works just fine when run for each user individually. But I would like to have a variation of this query that will grab each user and the top 40 sites they have accessed by bandwidth.
Forgot to mention that we are using Apache Hive.