Archive

Archive for the ‘rdbms’ Category

Fixing pg_dump invalid memory alloc request size

May 19th, 2012 No comments

I’ve encountered unusual problem while dumping one postgresql database. Every try to run pg_dump resulted in this:


sam@cerberus:~/backup$ pg_dump -v -c js1 >x
pg_dump: Error message from server: ERROR: invalid memory alloc request size 18446744073709551613
pg_dump: The command was: COPY job.description (create_time, job_id, content, hash, last_check_time) TO stdout;
pg_dump: *** aborted because of error

or this


sam@cerberus:~/backup$ pg_dump -v -c --inserts js1 >x
pg_dump: Error message from server: ERROR: invalid memory alloc request size 18446744073709551613
pg_dump: The command was: FETCH 100 FROM _pg_dump_cursor
pg_dump: *** aborted because of error

Few weeks ago I’ve encountered almost fatal failure of two(!) disks in my RAID5 array – well really funny situation, and a bit uncomfortable way how to find out why not having monitoring and alarming system is a really bad idea ;)
Fortunately no serious data damage happened, postgresql seemed to recover from this, without any problems and my databases worked fine – until I’ve tried to perform a database dump.

And now comes the important question, how to find out which table rows are incorrect (and pg_filedump says everything is OK) ?
And the answer is, use custom function:


CREATE OR REPLACE FUNCTION
find_bad_row(tableName TEXT)
RETURNS tid
as $find_bad_row$
DECLARE
result tid;
curs REFCURSOR;
row1 RECORD;
row2 RECORD;
tabName TEXT;
count BIGINT := 0;
BEGIN
SELECT reverse(split_part(reverse($1), '.', 1)) INTO tabName;

OPEN curs FOR EXECUTE ‘SELECT ctid FROM ‘ || tableName;

count := 1;
FETCH curs INTO row1;

WHILE row1.ctid IS NOT NULL LOOP
result = row1.ctid;

count := count + 1;
FETCH curs INTO row1;

EXECUTE ‘SELECT (each(hstore(‘ || tabName || ‘))).* FROM ‘
|| tableName || ‘ WHERE ctid = $1′ INTO row2
USING row1.ctid;

IF count % 100000 = 0 THEN
RAISE NOTICE ‘rows processed: %’, count;
END IF;
END LOOP;

CLOSE curs;
RETURN row1.ctid;
EXCEPTION
WHEN OTHERS THEN
RAISE NOTICE ‘LAST CTID: %’, result;
RAISE NOTICE ‘%: %’, SQLSTATE, SQLERRM;
RETURN result;
END
$find_bad_row$
LANGUAGE plpgsql;

It goes over all records in given table, expands them one by one – what will results in exception if some expansion occurs. Exception will also contain CTID of last correctly processed row. And next row with higher CTID will be the corrupted one.
Like this:


js1=# select find_bad_row('public.description');
NOTICE: LAST CTID: (78497,6)
NOTICE: XX000: invalid memory alloc request size 18446744073709551613
find_bad_row
--------------
(78497,6)
(1 row)

js1=# select * from job.description where ctid = ‘(78498,1)’;
ERROR: invalid memory alloc request size 18446744073709551613
js1=# delete from job.description where ctid = ‘(78498,1)’;
DELETE 1
js1=# select find_bad_row(‘job.description’);
NOTICE: rows processed: 100000
NOTICE: rows processed: 200000
NOTICE: rows processed: 300000
NOTICE: rows processed: 400000
NOTICE: rows processed: 500000
NOTICE: rows processed: 600000
NOTICE: rows processed: 700000
NOTICE: rows processed: 800000
find_bad_row
————–

(1 row)

 

Note: this function requires hstore postgresql extension – it is part of postgresql distribution, you may need to create it with:


CREATE EXTENSION hstore;

 

Records in this table are not that important, and I can restore them from external source – so I could delete this corrupted row. If you can’t you will have to play directly with data files – like described here – good luck :)

Categories: admin, rdbms, time saver Tags:

Oracle VARNUM/NUMBER encoding in C

April 28th, 2011 No comments

Just a short example C implementation of Oracle VARNUM/NUMBER type encoding, in case you could not use library functions. I didn’t found it anywhere when needed (and afterwards I found it doesn’t helps me at all due to little ‘problem’ in ora :( ) and Oracle documentation is pretty unclear about how-to do it.

 

typedef	struct
{
	uint8_t len;
	uint8_t exp;
	uint8_t man[20];
} VARNUM_t;
static void setVARNUM(VARNUM_t *varnum, int64_t value)
{
	int8_t digits = 0;
	uint64_t u_value = 0;
	/* init */
	memset(varnum, 0, sizeof(*varnum));
	varnum->len	= 1;	//start len
	/* check sign */
	if(value >= 0)
	{
		u_value		= value;
		varnum->exp	= trunc(log(u_value) / log(100)) + 128 + 65;
	}
	else
	{
		u_value		= value * - 1;
		varnum->exp	= trunc(log(u_value) / log(100)) + 128 + 65;
		varnum->exp	= ~varnum->exp;
	}
	/* count value digits */
	digits = trunc(log(u_value) / log(10));
	digits = ((digits / 2) + 1);
	/* mantisa */
	for(; digits >= 0 && varnum->len <= 20; digits--)
        {
                uint64_t v_tmp = 0;
                /* prevent INT overflow for too much digits */
                if(digits > 0)
		{
			uint64_t v_pow = powl(100, digits - 1);
			v_tmp = ((u_value / v_pow) / 100) % 100;
		}
		else
			v_tmp = u_value % 100;
		if(!v_tmp)
			continue;
		/* +1 for positive, subtract 101 for negative */
		v_tmp = (value < 0 ? (101 - v_tmp) : (v_tmp + 1));
                varnum->man[varnum->len - 1] = v_tmp;
		varnum->len++;
	}
	/* terminator byte for negative value */
	if(value < 0 && varnum->len <= 20) 	{
                varnum->man[varnum->len - 1] = 102;
		varnum->len++;
	}
	return;
}

NUMBER is just the same as VARNUM but without len member, thus is one byte shorter.

Note: this is just a quick implementation that works, there are some easy optimizations possible…

Categories: devel, rdbms Tags: ,

dmlquery initial release

September 15th, 2009 No comments

Just released initial release of dmlquery tool, for doing DML queries on SQL database (or anything perl DBI abstraction layer can handle) via formated XML files.

Very useful for XML pipelining. With this and XML/XSLT transformations you can do very powerful thinks, like whole ETL tools for extracting data from one source (database, web…), transform it (using XSLT) and load to database (using dmlquery).

Hope it will help you.Click here for more http://devel.dob.sk/dmlquery/.

Note: Right now it has weak documentation, will try to improve it soon.

Categories: devel, dmlquery, rdbms Tags:

Solving MAX(COUNT()) problem 2 – optimizations

August 19th, 2009 No comments

In previous post I’ve tried to present my solution for solving max(count()) problem.  The solution was slightly suboptimal and I’ve needed to speed it up a bit, because I’m using it now in some statistical calculations and every millisecond is important.

This is original solution.

CREATE VIEW data_view_source AS
   SELECT DISTINCT ON (dp_id) dp_id, ds_id FROM
        ( SELECT dp_id,  ds_id, count FROM data_view_source_count ORDER BY dp_id, count DESC) as glob
   WHERE
       (dp_id, count) IN
               (SELECT dp_id, max(count) as max FROM
                       ( SELECT dp_id,  ds_id, count FROM data_view_source_count ORDER BY dp_id, count DESC) as minmax
                GROUP BY dp_id)
   ORDER BY dp_id;

and after bit of thinking a realizing how dummy I’m

CREATE VIEW data_view_source AS
   WITH tmp1 AS (
     SELECT dp_id,  ds_id, count FROM data_view_source_count
   )
   SELECT DISTINCT ON (dp_id) dp_id, ds_id FROM
        ( SELECT dp_id,  ds_id, count FROM tmp1 ORDER BY dp_id, count DESC) as glob
   WHERE
       (dp_id, count) IN
               (SELECT dp_id, max(count) as max FROM
                       ( SELECT dp_id,  ds_id, count FROM tmp1 ORDER BY dp_id, count DESC) as minmax
                GROUP BY dp_id)
   ORDER BY dp_id;

Using WITH clause removes duplicate selects and speeds up data_view_source view by cca. 15-30% (for me).

Categories: devel, how-to, rdbms Tags:

SQL timeline and statistical computations

August 7th, 2009 No comments

This is simple way how-to generate time table (list of consequencing timestamps, or timeline) in PostgreSQL. Nothing spectacular, but might help you ,when trying to do some time based statistical selects (can feel like OLAP :-) ).

Code:

CREATE OR REPLACE FUNCTION gen_time_list(
		tim IN TIMESTAMP,
		tim_stop IN TIMESTAMP,
		step IN INT)
	RETURNS TABLE(ts1 TIMESTAMP, ts2 TIMESTAMP)
AS $$
BEGIN
  RETURN QUERY SELECT (TIMESTAMP 'epoch' + h * INTERVAL '1 second') AS h,
	(TIMESTAMP 'epoch' + (h + step) * INTERVAL '1 second') AS h_prev FROM
  (
	SELECT generate_series(EXTRACT(EPOCH FROM DATE_TRUNC('hour', tim))::bigint,
		(EXTRACT(EPOCH FROM tim_stop)::bigint),
		step) as h
  ) as hour_lists;
END;
$$ LANGUAGE plpgsql STRICT IMMUTABLE;

This creates list of timestamps ts1, ts2 where ts2 = (ts + step).

Example:

select * FROM gen_time_list(now()::timestamp, (now() - interval '10 days')::timestamp, -3600);

Lists all hour timestamps for last 10 days.

With this, cross join and some group by magic you can get very easily statistical information for data you have, like i.e. how many new orders we had weekly in last 6 months. Quick & easy, everything done within database, thats how i like it ;)
Modifying for your needs should be simple.

My Sample:

I’ve data with multiple labels of different kind, and with this i get how many new labels have been published per week grouped by labels. Maybe there is simplier and faster way out there, show me plz ;-)

SELECT
	count(j.id) AS count,
	gtl.ts1 AS ts1, gtl.ts2 as ts2,
	l1.id AS label_id1,
	l2.id AS label_id2
   FROM j
   JOIN jlabel jl1 ON j.id::bigint = jl1.job_id::bigint
   JOIN label l1 ON l1.category = 1 AND l1.id::smallint = jl1.label_id::smallint
   JOIN jlabel jl2 ON j.id::bigint = jl2.job_id::bigint
   JOIN label l2 ON l2.category = 2 AND l2.id::smallint = jl2.label_id::smallint
   CROSS JOIN ( SELECT * FROM gen_time_list(now()::timestamp, (now() - interval '1 year')::timestamp, -86400) ) gtl
        WHERE j.pubtime  gtl.ts2
   GROUP BY l1.id, l2.id, ts1,ts2
   ORDER BY label_id1, label_id2, ts1 DESC, ts2 DESC;
Categories: devel, how-to, rdbms Tags: