Backup Slave Failed

Comments

5 comments

  • Official comment
    Avatar
    Ashraf Sharif

    Hi Jortiz,

    This is a common issue if you are running pg_dump on the hot standby slave, because the table data on the hot standby slave server is modified while a long running query is running. During querying some needed rows might be updated or deleted on primary. As a primary does not know that a query is started on secondary it thinks it can clean up (vacuum) old versions of its rows. Then secondary has to replay this cleanup, and has to forcibly cancel all queries which can use these rows.

    There are a number of workarounds for this issue:

    1) Set "hot_standby_feedback = on" on the slave node. It prevents VACUUM from removing recently-dead rows and so cleanup conflicts do not occur. However, this could bloat the master. 

    2) Set the following on the slave:

    max_standby_archive_delay = 900s
    max_standby_streaming_delay = 900s

    This way queries on slaves with a duration less than 900 seconds won't be cancelled. If your workload requires longer queries, just set these options to a higher value.

    3) Pause the replication while the backup is ongoing on the slave and resume it after the backup completes:

    select pg_xlog_replay_pause(); -- suspend
    pg_dump ...; -- your backup query
    select pg_xlog_replay_resume(); --resume

    Regards,
    Ashraf

  • Avatar
    Jortiz

    Thanks for the reply.

    I tried to stop the replication with procedure 3, however an error is reported in the slave host. I investigated this type of error, but until now I can not find the solution, The Postgres version is 10.1.

    ERROR: function pg_xlog_replay_pause () does not exist
    LINE 1: select pg_xlog_replay_pause ();
                    ^
    HINT: No function matches the given name and argument types. You might need to add explicit type casts.

    ********** Error **********

    ERROR: function pg_xlog_replay_pause () does not exist
    SQL state: 42883
    Hint: No function matches the given name and argument types. You might need to add explicit type casts.
    Character: 8

  • Avatar
    Paul Namuag

    Hi Jortiz,

     

    In PG10, there has been function renames. Check this one https://wiki.postgresql.org/wiki/New_in_postgres_10#Renaming_of_.22xlog.22_to_.22wal.22_Globally_.28and_location.2Flsn.29

    So instead, try

    select pg_wal_replay_pause(); -- suspend
    pg_dump ...; -- your backup query
    select pg_wal_replay_resume(); --resume
  • Avatar
    Jortiz

    Perfect, it works correctly. Thank you very much for your help.

  • Avatar
    Paul Namuag

    You're welcome Jortiz

Please sign in to leave a comment.

Powered by Zendesk