I’ve encountered unusual problem while dumping one of the bigger PostgreSQL databases. Every try to run pg_dump resulted in this:
sam@cerberus:~/backup$ pg_dump -v -c js1 >x pg_dump: Error message from server: ERROR: invalid memory alloc request size 18446744073709551613 pg_dump: The command was: COPY job.description (create_time, job_id, content, hash, last_check_time) TO stdout; pg_dump: *** aborted because of error
sam@cerberus:~/backup$ pg_dump -v -c --inserts js1 >x pg_dump: Error message from server: ERROR: invalid memory alloc request size 18446744073709551613 pg_dump: The command was: FETCH 100 FROM _pg_dump_cursor pg_dump: *** aborted because of error
Few weeks ago I’ve experienced almost fatal failure of two(!) disks in my RAID5 array – well really ‘funny’ situation, and a bit uncomfortable way how to find out why not having monitoring and alarming system is a really bad idea 😉
Fortunately no serious data damage happened, PostgreSQL seemed to recover from this without any problems and all my databases worked fine – until I’ve tried to perform an iregullar database dump.
And now comes the important question, how to find out which table rows are invalid (and pg_filedump says everything is OK) ?
The answer is, use custom function:
CREATE OR REPLACE FUNCTION find_bad_row(tableName TEXT) RETURNS tid as $find_bad_row$ DECLARE result tid; curs REFCURSOR; row1 RECORD; row2 RECORD; tabName TEXT; count BIGINT := 0; BEGIN SELECT reverse(split_part(reverse($1), '.', 1)) INTO tabName; OPEN curs FOR EXECUTE 'SELECT ctid FROM ' || tableName; count := 1; FETCH curs INTO row1; WHILE row1.ctid IS NOT NULL LOOP result = row1.ctid; count := count + 1; FETCH curs INTO row1; EXECUTE 'SELECT (each(hstore(' || tabName || '))).* FROM ' || tableName || ' WHERE ctid = $1' INTO row2 USING row1.ctid; IF count % 100000 = 0 THEN RAISE NOTICE 'rows processed: %', count; END IF; END LOOP; CLOSE curs; RETURN row1.ctid; EXCEPTION WHEN OTHERS THEN RAISE NOTICE 'LAST CTID: %', result; RAISE NOTICE '%: %', SQLSTATE, SQLERRM; RETURN result; END $find_bad_row$ LANGUAGE plpgsql;
It goes over all records in given table, expands them one by one – what will results in exception if some expansion error occurs. Exception will also contain CTID of last correctly processed row. And next row with higher CTID will be the corrupted one.
js1=# select find_bad_row('public.description'); NOTICE: LAST CTID: (78497,6) NOTICE: XX000: invalid memory alloc request size 18446744073709551613 find_bad_row -------------- (78497,6) (1 row) js1=# select * from job.description where ctid = '(78498,1)'; ERROR: invalid memory alloc request size 18446744073709551613 js1=# delete from job.description where ctid = '(78498,1)'; DELETE 1 js1=# select find_bad_row('job.description'); NOTICE: rows processed: 100000 NOTICE: rows processed: 200000 NOTICE: rows processed: 300000 NOTICE: rows processed: 400000 NOTICE: rows processed: 500000 NOTICE: rows processed: 600000 NOTICE: rows processed: 700000 NOTICE: rows processed: 800000 find_bad_row -------------- (1 row)
Note: this function requires hstore extension – it is part of PostgreSQL distribution, but you may need to create it with:
CREATE EXTENSION hstore;
Records in this table were not that important, because I could restore them from external source. But if you can’t delete them, you will have to play directly with data files – like described here – good luck 🙂