Solving MAX(COUNT()) problem 2 – optimizations

In previous post I’ve tried to present my solution for solving max(count()) problem. The solution was slightly suboptimal and I’ve needed to speed it up a bit, because I’m using it now in some statistical calculations and every millisecond is important.

This is original solution.

CREATE VIEW data_view_source AS
   SELECT DISTINCT ON (dp_id) dp_id, ds_id FROM
        ( SELECT dp_id,  ds_id, count FROM data_view_source_count ORDER BY dp_id, count DESC) as glob
   WHERE
       (dp_id, count) IN
               (SELECT dp_id, max(count) as max FROM
                       ( SELECT dp_id,  ds_id, count FROM data_view_source_count ORDER BY dp_id, count DESC) as minmax
                GROUP BY dp_id)
   ORDER BY dp_id;

and after bit of thinking a realizing how dummy I’m

CREATE VIEW data_view_source AS
   WITH tmp1 AS (
     SELECT dp_id,  ds_id, count FROM data_view_source_count
   )
   SELECT DISTINCT ON (dp_id) dp_id, ds_id FROM
        ( SELECT dp_id,  ds_id, count FROM tmp1 ORDER BY dp_id, count DESC) as glob
   WHERE
       (dp_id, count) IN
               (SELECT dp_id, max(count) as max FROM
                       ( SELECT dp_id,  ds_id, count FROM tmp1 ORDER BY dp_id, count DESC) as minmax
                GROUP BY dp_id)
   ORDER BY dp_id;

Using WITH clause removes duplicate selects and speeds up data_view_source view by cca. 15-30% (for me).