Home > devel, how-to, rdbms > Solving MAX(COUNT()) problem 2 – optimizations

Solving MAX(COUNT()) problem 2 – optimizations

In previous post I’ve tried to present my solution for solving max(count()) problem.  The solution was slightly suboptimal and I’ve needed to speed it up a bit, because I’m using it now in some statistical calculations and every millisecond is important.

This is original solution.

CREATE VIEW data_view_source AS
   SELECT DISTINCT ON (dp_id) dp_id, ds_id FROM
        ( SELECT dp_id,  ds_id, count FROM data_view_source_count ORDER BY dp_id, count DESC) as glob
   WHERE
       (dp_id, count) IN
               (SELECT dp_id, max(count) as max FROM
                       ( SELECT dp_id,  ds_id, count FROM data_view_source_count ORDER BY dp_id, count DESC) as minmax
                GROUP BY dp_id)
   ORDER BY dp_id;

and after bit of thinking a realizing how dummy I’m

CREATE VIEW data_view_source AS
   WITH tmp1 AS (
     SELECT dp_id,  ds_id, count FROM data_view_source_count
   )
   SELECT DISTINCT ON (dp_id) dp_id, ds_id FROM
        ( SELECT dp_id,  ds_id, count FROM tmp1 ORDER BY dp_id, count DESC) as glob
   WHERE
       (dp_id, count) IN
               (SELECT dp_id, max(count) as max FROM
                       ( SELECT dp_id,  ds_id, count FROM tmp1 ORDER BY dp_id, count DESC) as minmax
                GROUP BY dp_id)
   ORDER BY dp_id;

Using WITH clause removes duplicate selects and speeds up data_view_source view by cca. 15-30% (for me).

Categories: devel, how-to, rdbms Tags:
  1. No comments yet.
  1. No trackbacks yet.
*