In previous post I’ve tried to present my solution for solving max(count()) problem. The solution was slightly suboptimal and I’ve needed to speed it up a bit, because I’m using it now in some statistical calculations and every millisecond is important.
This is original solution.
CREATE VIEW data_view_source AS SELECT DISTINCT ON (dp_id) dp_id, ds_id FROM ( SELECT dp_id, ds_id, count FROM data_view_source_count ORDER BY dp_id, count DESC) as glob WHERE (dp_id, count) IN (SELECT dp_id, max(count) as max FROM ( SELECT dp_id, ds_id, count FROM data_view_source_count ORDER BY dp_id, count DESC) as minmax GROUP BY dp_id) ORDER BY dp_id;
and after bit of thinking a realizing how dummy I’m
CREATE VIEW data_view_source AS WITH tmp1 AS ( SELECT dp_id, ds_id, count FROM data_view_source_count ) SELECT DISTINCT ON (dp_id) dp_id, ds_id FROM ( SELECT dp_id, ds_id, count FROM tmp1 ORDER BY dp_id, count DESC) as glob WHERE (dp_id, count) IN (SELECT dp_id, max(count) as max FROM ( SELECT dp_id, ds_id, count FROM tmp1 ORDER BY dp_id, count DESC) as minmax GROUP BY dp_id) ORDER BY dp_id;
Using WITH clause removes duplicate selects and speeds up data_view_source view by cca. 15-30% (for me).