PostgreSQL 17 preview - 增强standby节点检查点统计信息

Mon Dec 25, 2023

600 Words|Read in about 3 Min
Tags:

背景

PostgreSQL 物理从库(standby)也有检查点功能(restart checkpoint), 主要是避免wal无限增大, 检查点做完后如果standby重启, 只需要restart checkpoint之后的所有WAL就可以继续恢复.

https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=12915a58eec962f407a6c38ce2bf08a48dde57b5

PG 17增强了restart checkpoint的统计信息.

Enhance checkpointer restartpoint statistics  
author	Alexander Korotkov <akorotkov@postgresql.org>	  
Sun, 24 Dec 2023 22:52:42 +0000 (00:52 +0200)  
committer	Alexander Korotkov <akorotkov@postgresql.org>	  
Sun, 24 Dec 2023 23:12:36 +0000 (01:12 +0200)  
commit	12915a58eec962f407a6c38ce2bf08a48dde57b5  
tree	157866d53bed7766c4bb41ee7be37b0cc4579fb7	tree  
parent	64e77b496af61ee31189ba69b40e785e11e9967f	commit | diff  
Enhance checkpointer restartpoint statistics  
  
Bhis commit introduces enhancements to the pg_stat_checkpointer view by adding  
three new columns: restartpoints_timed, restartpoints_req, and  
restartpoints_done. These additions aim to improve the visibility and  
monitoring of restartpoint processes on replicas.  
  
Previously, it was challenging to differentiate between successful and failed  
restartpoint requests. This limitation arises because restartpoints on replicas  
are dependent on checkpoint records from the primary, and cannot occur more  
frequently than these checkpoints.  
  
The new columns allow for clear distinction and tracking of restartpoint  
requests, their triggers, and successful completions.  This enhancement aids  
database administrators and developers in better understanding and diagnosing  
issues related to restartpoint behavior, particularly in scenarios where  
restartpoint requests may fail.  
  
System catalog is changed.  Catversion is bumped.  
  
Discussion: https://postgr.es/m/99b2ccd1-a77a-962a-0837-191cdf56c2b9%40inbox.ru  
Author: Anton A. Melnikov  
Reviewed-by: Kyotaro Horiguchi, Alexander Korotkov  
+   A restartpoint can be demanded by a schedule or by an external request.  
+   The <structfield>restartpoints_timed</structfield> counter in the  
+   <link linkend="monitoring-pg-stat-checkpointer-view"><structname>pg_stat_checkpointer</structname></link>  
+   view counts the first ones while the <structfield>restartpoints_req</structfield>  
+   the second.  
+   A restartpoint is triggered by schedule when a checkpoint record is reached  
+   if at least <xref linkend="guc-checkpoint-timeout"/> seconds have passed since  
+   the last performed restartpoint or when the previous attempt to perform  
+   the restartpoint has failed. In the last case, the next restartpoint  
+   will be scheduled in 15 seconds.  
+   A restartpoint is triggered by request due to similar reasons like checkpoint  
+   but mostly if WAL size is about to exceed <xref linkend="guc-max-wal-size"/>  
+   However, because of limitations on when a restartpoint can be performed,  
+   <varname>max_wal_size</varname> is often exceeded during recovery,  
+   by up to one checkpoint cycle's worth of WAL.  
  
  
+   The <structfield>restartpoints_done</structfield> counter in the  
+   <link linkend="monitoring-pg-stat-checkpointer-view"><structname>pg_stat_checkpointer</structname></link>  
+   view counts the restartpoints that have really been performed.  
+  </para>  
+  
+  <para>  
+   In some cases, when the WAL size on the primary increases quickly,  
+   for instance during massive INSERT,  
+   the <structfield>restartpoints_req</structfield> counter on the standby  
+   may demonstrate a peak growth.  
+   This occurs because requests to create a new restartpoint due to increased  
+   XLOG consumption cannot be performed because the safe checkpoint record  
+   since the last restartpoint has not yet been replayed on the standby.  
+   This behavior is normal and does not lead to an increase in system resource  
+   consumption.  
+   Only the <structfield>restartpoints_done</structfield>  
+   counter among the restartpoint-related ones indicates that noticeable system  
+   resources have been spent.  
+     <row>  
+      <entry role="catalog_table_entry"><para role="column_definition">  
+       <structfield>restartpoints_timed</structfield> <type>bigint</type>  
+      </para>  
+      <para>  
+       Number of scheduled restartpoints due to timeout or after a failed attempt to perform it  
+      </para></entry>  
+     </row>  
+  
+     <row>  
+      <entry role="catalog_table_entry"><para role="column_definition">  
+       <structfield>restartpoints_req</structfield> <type>bigint</type>  
+      </para>  
+      <para>  
+       Number of requested restartpoints  
+      </para></entry>  
+     </row>  
+  
+     <row>  
+      <entry role="catalog_table_entry"><para role="column_definition">  
+       <structfield>restartpoints_done</structfield> <type>bigint</type>  
+      </para>  
+      <para>  
+       Number of restartpoints that have been performed  
+      </para></entry>  
+     </row>  

See Also

Mon Dec 25, 2023

600 Words|Read in about 3 Min
Tags: