SQL timeline and statistical computations

This is simple way how-to generate time table (list of consequencing timestamps, or timeline) in PostgreSQL. Nothing spectacular, but might help you ,when trying to do some time based statistical selects (can feel like OLAP :-)).


		tim_stop IN TIMESTAMP,
		step IN INT)
AS $$
  RETURN QUERY SELECT (TIMESTAMP 'epoch' + h * INTERVAL '1 second') AS h,
	(TIMESTAMP 'epoch' + (h + step) * INTERVAL '1 second') AS h_prev FROM
	SELECT generate_series(EXTRACT(EPOCH FROM DATE_TRUNC('hour', tim))::bigint,
		(EXTRACT(EPOCH FROM tim_stop)::bigint),
		step) as h
  ) as hour_lists;

This creates list of timestamps ts1, ts2 where ts2 = (ts + step).


select * FROM gen_time_list(now()::timestamp, (now() - interval '10 days')::timestamp, -3600);

Lists all hour timestamps for last 10 days.

With this, cross join and some group by magic you can get very easily statistical information for data you have, like i.e. how many new orders we had weekly in last 6 months. Quick & easy, everything done within database, thats how i like it 😉
Modifying for your needs should be simple.

My Sample:

I’ve data with multiple labels of different kind, and with this i get how many new labels have been published per week grouped by labels. Maybe there is simplier and faster way out there, show me plz 😉

	count(j.id) AS count,
	gtl.ts1 AS ts1, gtl.ts2 as ts2,
	l1.id AS label_id1,
	l2.id AS label_id2
   FROM j
   JOIN jlabel jl1 ON j.id::bigint = jl1.job_id::bigint
   JOIN label l1 ON l1.category = 1 AND l1.id::smallint = jl1.label_id::smallint
   JOIN jlabel jl2 ON j.id::bigint = jl2.job_id::bigint
   JOIN label l2 ON l2.category = 2 AND l2.id::smallint = jl2.label_id::smallint
   CROSS JOIN ( SELECT * FROM gen_time_list(now()::timestamp, (now() - interval '1 year')::timestamp, -86400) ) gtl
        WHERE j.pubtime  gtl.ts2
   GROUP BY l1.id, l2.id, ts1,ts2
   ORDER BY label_id1, label_id2, ts1 DESC, ts2 DESC;

Solving MAX(COUNT()) problem

I’ve been solving problem of doing grouped MAX(COUNT()) in PostgreSQL, and because I’ve not found anything really usable out there (doing correlated sub-queries is definitely not good idea for thousands of records) I’ve had to find my own solution.

Situation plan

My configuration is a bit complicated to explain this, so i’ll try do it on a fictive (not tested) example.  It is a system that collects data records from multiple data providers and from multiple data sources, where data providers can publish all or some of data on multiple data sources. Our mission is for each data provider select best data source, that is the one on which data provider has most data records published. This way we remove duplicate data records.

SQL tables:

  • data_source
CREATE TABLE data_source (
  • data_provider
CREATE TABLE data_provider (
  • data
  ds_id INT NOT NULL REFERENCES data_source (ds_id),
  dp_id INT NOT NULL REFERENCES data_provider (dp_id),
  record TEXT NOT NULL


We will create view data_view_source_count with count of data records per data_source and data_provider, another view data_view_source that will contain best combinations for data_source and data_provider, and final table data_view that will contain only records from one data_source for each data_provider.

CREATE VIEW data_view_source_count AS
     SELECT dp_id, ds_id, count(id) as count
     FROM data
     GROUP BY ds_id, dp_id;
CREATE VIEW data_view_source AS
     SELECT DISTINCT ON (dp_id) dp_id, ds_id FROM
     ( SELECT dp_id, ds_id, count FROM data_view_source_count ORDER BY dp_id, count DESC) as glob
     (dp_id, count) IN
        (SELECT dp_id, max(count) as max FROM
             ( SELECT dp_id,  ds_id, count FROM data_view_source_count ORDER BY dp_id, count DESC) as minmax
                      GROUP BY dp_id)
         ORDER BY dp_id;
CREATE VIEW data_view AS
      SELECT d.* FROM data d
      JOIN data_view_source dvs ON (dvs.dp_id = d.dp_id AND dvs.ds_id = d.ds_id);

And that all, how easy 😉 And it took me only few hours to figure out how to do it. Before creating this solution i thought that anything I do will be too slow and i will have to make data_view_source as permanent table regenerated on hourly basis, but right now this seems to be fast enough for having this as view.
In my situation, for cca. 11 thousand of active (young enough) records, select to data_view takes about 2.5x longer (280ms) as to data (110 ms). Nothing spectacular, but right now I don’t need this to be anyhow faster, so why to bother with more optimizations 😉


  • SELECT DISTINCT ON () is not part of SQL standard and not every RDBMS must implement it, or should work as expected here. I’m also slightly abusing the way it works in PostgreSQL because i expect it throws out every but first data record for unique data_provider, and thus for rightly sorted table it does what I need. But, again it is not guaranteed that this will work as expected in other RDBMS, nor future PostgreSQL versions, because of database query engine query optimizations and so on… Maybe  late I’ll try to replace this with something ANSI compatible (well, and then you will have to find ANSI compatible RDBMS 😉 ).

LDAP + SASL authentification

I’ve found again, that simple things don’t get so simple in OpenSource world, and you have to do it yourself when you want something. That’s the difference, in the world of commercial software you have to, either pay big money or you never get feature you need (and mostly both ;-))

I’ve  slightly unusual and paranoid subversions svnserve setup (which i’ll describe later) and my requirement was to use LDAP for user authentication. Well i’know there is possibility to use apache ldap authentication for this, but running separate web server only for having this, is imho not a good idea. So my next try was SASL authentication that svn now supports.  New dilemma raised, because subversion disallows PLAIN authentication mechanism, I needed either plain text passwords in LDAP or write my own SASL plugin. Fortunately i’ve found this page http://southbrain.com/south/2008/06/writing-a-cyrus-sasl-ldap-auxp.html, with existing solution and (almost) working code.

Subversion however didn’t cooperated well with this (ie. it passes realm within username to sasl etc…), so i’ve to made some modifications to let this work for me and my existing setup. So here it is eldap (easy ldap) auxprop module.

I’ve made some cleanups to the code, simplified installation, added some little features and fixed some bugs and I’m releasing this as separate module (however still compatible with original gienger ldap). Later i’m planning to describe how to modify your LDAP (and some ldap user management applications) to let it work with this. Till than you have to read original post about how to do it (or bug me).

I hope it helps you.

Worpress-mu + WPML + SuperCache integration

Seems to be fixed in SuperCache 0.9.5.

This is very simple patch for WPML plugin v1.2.1 and SuperCache plugin
Download: wordpress+wpml+supercache-0.2.patch

I’ve had some problems with static caching of multilingual pages on my wordpress-mu based CMS system, running both WPML and SuperCache plugin. They both work almost good out of the box, but I’ve experienced problems for non-post pages (like site index). Webserver (serving static content prepared by Supercache) & Supercache didn’t cooperate well. and after second language switch I always received the page in the same language. This patch fixes this.


Apply patch by issuing path -p0 <supercache+wmpl-0.2.patch in wp-content directory of your WordPress installation.


WPML rewrites $_SERVER[REQUEST_URI] and this way removes language identificator from all URLs it serves, i.e.

dob.sk/sk -> dob.sk

But! SuperCache uses $_SERVER[REQUEST_URI] variable to compute how should be the cache page named, and because of already mentioned WPML rewrite hack, for the same page in different languages sees SuperCache only one REQUEST_URI (aha!).

The patch does simple thing, it saves original REQUEST_URI on SuperCache plugin load and than it uses this saved value to the path computations. How easy 😉