My database crashed a couple of days ago during an upgrade several seconds after committing a large transaction to the database. Eventually we found out that this was due to the disk being full as the transaction had created several gigs of data. A day or so later the disk is full again and PostgreSQL crashes due to the pg_xlog file taking up all of the disk space. I have cleaned up the drive to have so extra space which allows PostgreSQL to start again but the xlogs are still increasing. I have two errors in my pg_log:
"WARNING: transaction log file "00000001000000A800000078" could not be archived: too many failures" and
"LOG: archive command failed with exit code 1
DETAIL: The failed archive command was: test ! -f /opt/postgres/remote_pgsql/wal_archive/00000001000000A800000078 && cp pg_xlog/00000001000000A800000078 /opt/postgres/remote_pgsql/wal_archive/00000001000000A800000078"
Postgres version 9.0.3 conf:
wal_level = hot_standby
archive_mode = true
archive_command = 'test ! -f /opt/postgres/remote_pgsql/wal_archive/%f && cp %p /opt/postgres/remote_pgsql/wal_archive/%f' # command to use to archive a logfile segment
archive_timeout = 1800
max_wal_senders = 1
max_standby_archive_delay = 900s
max_standby_streaming_delay = 900s
default_statistics_target = 50 # pgtune wizard 2010-11-18
maintenance_work_mem = 480MB # pgtune wizard 2010-11-18
constraint_exclusion = on # pgtune wizard 2010-11-18
checkpoint_completion_target = 0.9 # pgtune wizard 2010-11-18
effective_cache_size = 5632MB # pgtune wizard 2010-11-18
work_mem = 48MB # pgtune wizard 2010-11-18
wal_buffers = 8MB # pgtune wizard 2010-11-18
checkpoint_segments = 16 # pgtune wizard 2010-11-18
shared_buffers = 1920MB # pgtune wizard 2010-11-18
max_connections = 80 # pgtune wizard 2010-11-18
I've tried stopping postgres and then deleting the 00000001000000A800000078.ready file and starting postgres but this appears to be recreated instantly and the error is still in the log file.
I've read about the pg_reset_xlog() command but with having to pg_dump our db with a large amount of blobs and restoring it again is highly problematic as the pg_restore has struggled to restore.
Will setting zero_damaged_pages (true) work in 9.0.1 and would this resolve the issue?
Would creating an empty file and replacing the offending xlog work, would this need to be a specific size?