Installation de patrons et de stolons et pratique des chutes. Maxim Milyutin

Patroni et Stolon sont deux des solutions les plus connues et avancées pour l'orchestration PostgreSQL et les clusters de configuration Leader-Followers haute disponibilité (basculement automatique). Cependant, les ingénieurs migrant à partir d'anciennes solutions éprouvées (Corosync & Pacemaker) et embarqués à partir d'autres SGBD rencontrent des difficultés pour installer ces outils et un manque de compréhension des rÎles de chacun des composants. Dans cette classe de maßtre, le processus typique d'installation des clusters Patroni et Stolon sur des machines virtuelles (pas dans des conteneurs) sera examiné, ainsi que le comportement de ces clusters avec diverses défaillances dans l'infrastructure. L'ensemble du processus sera démontré sur trois machines virtuelles exécutant vagrant à l'aide d'images prédéfinies. S'il le souhaite, l'auditeur peut suivre le processus, aprÚs avoir préalablement préparé son environnement.


! . Ozon . . Postgres Pro Patroni Stolon. .

-. , Stolon, Patroni . .

, Ansible , Postgres Pro , .

Patroni , , — .

, .

  • PostgreSQL – shared-nothing, .
  • . , .
  • hot standby, . . .
  • :
  • pg_basebackup , , .
  • . standby .
  • pg_rewind, standby.
  • , .

  • 10- PostgreSQL . , , , . , , , . Write amplification, - , , WAL full page images, checkpoint. hit beat . . WAL. « PostgreSQL MySQL» .

  • .

  • , , DDL, sequence, , , . WAL. WAL -. GTID MySQL, CSN MS SQL Server.

  • pg_rewind.

  • Stolon Patroni , , , rolling upgrade Postgres .

, ? , . . - , health checks - .

, , – promote . .

, , , .

, ? , promote . .

, split brain . - , .

, , , .

, . .

? Postgres , . , , , , .

? , , , - .

– . , read only. .

fail. , . , .

. . pg_auto_failover Citus Data.

. , . pg_stat_replication.

, . . , , . primary ( ) , .

, , . , , .

fail. , .

, , .

, . .

, . , , .


, . DCS (Distributed Configuration System – ). IP , .

DCS – Consul, Etcd, Raft Zookeeper, Zab. Zab – Paxos.

, DCS.

Patroni/ Stolon.

Postgres Postgres .

, Patroni/ Stolon.

  • -, autofailover. - .
  • . PostgreQSL.
  • , Kubernetes.
  • DBaaS (database as a service).
  • – . , - . , - .

(DCS) Etcd

. DCS. . , «» . DCS, , .

? . , Postgres, , DCS , , split , split brain. , fail DCS .

, DCS 3-5-7 , , 3- . ? . net split, , DCS.

Etcd RAFT . .

DCS , follower PostgreSQL. RAFT.

. . .

, . follower, . . - RTT fsync.

, follower, . , , . . .

, - .

14 42 .

vagrant status
Current machine states:

node1                     running (virtualbox)
node2                     running (virtualbox)
node3                     running (virtualbox)

This environment represents multiple VMs. The VMs are all listed
above with their current state. For more information about a specific
VM, run `vagrant status NAME`.

. vagrant.

: , . , , . . .

. . , .

Etcd . Etcd , Etcd.




. . . Ansible. . , .

. Etcd .

, term 2. Term – timeline PostgreSQL. term .

etcdctl member list. , () , followers.

sudo pkill -STOP etcd

. , fail , . Etcd , . . .

. . , term.

, , .

«etcdctl cluster-health». , . .

Etcd. , . term follower’.

- . . ? – . Etcd . «comcast». API tables Etcd. , .

? «Comcast — - device eth1 – packet – loss 100 %».

. , . time line. , -, . , term 4.

. , heartbeat_interval election_timeout. , followers , heartbeat , followers , . follower heartbeat - - -, . .

, , - . , . heartbeat_interval – 100 . , -, . election_timeout – .

. . , , RTT , election_timeout. Election_timeout . Ansible. .

`comcast --device eth1 --stop

: comcast --device eth1 --latency 600. .

latency 600 . 600 – . RTT 200 .

ping . RTT 1 .

. , term . . , - , term. .

, heartbeat_interval election_timeout. , heartbeat , election_timeout 10 . Ansible. . Etcd-config. , . , . . , -. Etcd .

. . follower’.

member list, , , fallowers .

, , , , - 10 .

- Etcd, . bar. Deadline exceeded – , , . Etcd. timeout . 5 . total_timeout , 10 .

«get», . -. .

. , .

. Election_timeout , heartbeat 100 .

, RAFT - . , : , , . .

. Etcdctl member list. . – follower.

. bash – comcast – device, . . . - sleep . Comcast – device eth1 – stop sleep 1,5. done . , , . .

Etcd. , term , , - , term, . Term . . . .

, , Etcd, . 1 . , . . . , , Etcd fsync . , .

. Comcast – device eth1 – stop.

. Etcd , .

, , .

, Etcd , , , , .

Patroni Stolon. , .

. netsplit, , DCS. , , Postgres , DCS . , Postgres, .

DCS. , . . . , DCS Patroni Stolon.


. DCS stolon-sentinel, . DCS : election, , statefull .

Postgres’ Stolon-keeper. – stolon-proxy, .

3- , . , , 2- sentinel, . , 2- stolon-proxy. , , 2- Stolon-keeper, postgres-.

41 20

, . . , . , stolon’ . -, . Etcd . . . . – superuser, . . , .

Stolon. stolon.d/test-cluster.conf. , «test.cluster» . , , . Postgres, -. ,

- . , . Superuser, Stolon-keeper . . . , .

«test.cluster»? system/system/Stolon-keeper@.service. template-, . , - . ? Stolon, 
 . , , - , -, .

Ansible. . . , . . . Stolon-keeper. Name=Stolon-keeper@test-cluster state=started enable=on. .

. Test-cluster. , . lock - . , : Stolon-keeper, sentinel proxy . .

sentinel. . , , DCS. . . . sentinel , sentinel . . State=started enabled=on. - . , . , test-cluster. . , - . .


workflow Stolon:

  • «stolonctl init».
  • PostgreSQL pg_hba update.
  • , PostgreSQL , , , . . , Keeper, post-master. Stolon-keeper PostgreSQL.
  • «automaticPgRestart», postgres- .
  • , . , max_connections, max_lock_per_transaction postgres-. . , , «max_connections» «max_lock_per_transaction». , , , . .
  • – Stolon-keeper. – Stolon-keeper. . , .

, pg_pba. , pba. /opt/stolon/test-cluster. . . Stolon-test-cluster-spec.json. , . . , .


Stolon :

  • – .
  • – PITR, . standby cluster.
  • – existing. , DCS. DCS , , . , «existing».

. unitdb, checksums, , pgrewind . . Stolonctl. . .

Keeper, . . Keeper , sentinel, . , unitdb, . standby.

. «status». , Keepers, heaths check Keepers Postgres, . , , . sentinel.

, . wantedgeneration currentgeneration. Stolon-keeper . sentinel , , , . Keeper . .

. json, . . . Keepers . , , . , . . . , . Etcd .

. : Etcd . , Etcd. , . , , Consul. Consul , . , , , , Stolon-keeper . Postgres, Stolon . , Stolon-keeper. systemd, on abort, kill -9 .

Postgres. kill -9 , . . – . . Stolon-keeper, Ok.

. . - . Postgres . Stolon-keeper . Postgres. .

. fail. Postgres-. , . pgbench.

- , Postgres, ? select , , select.

, checksums, , checksums , . Postgres , . , , checksums , - . Postgres. Patroni/Stolon .

pgbench. . , . 25432. . . Stolon/test-cluster/postgres/pg_hba.conf.

, Stolon superuser, , . , .

. «default», . «pg_hba». «update». json- pgHBA . local all posters. Posters trust. – host all postgres trust.

, . . , Postgres. . Create user postgres superuser. , Postgres . pg_bench . HBA user test. Patroni. .

while. 20 , . , . .

Stolon . :

  • SleepInterval – .
  • RequestTimeout – deadline PostgreSQL. Deadline DCS – 5 .
  • FailInterval – , sentinel , . Sentinel failInterval, , . , , , . . - , . . failInterval .

autofailover Stolon?

1 – fail . Stolon-keeper Postgres . sentinel. , sentinel. . sleepInterval. 10 .

2 – - , , sentinel. , Keeper .

3 – sentinel. Keepers. sleepInterval.

: (λ1 + λ2) * sleepInterval. . .

4 – . DCS. sentinel , .

, , DCS sentinel , failover 25 50 .

fail sentinel’, failover sentinel. sentinel. failover .

, Stolon-proxy Keeper , Keeper read only . Postgres. Postgres Stolon-proxy.

. DCS, , , , .

  • Stolon. Stolon . , DCS . , «deadKeeperRemovalInterval». 48 . , DCS. , . , , WAL. 48 , .

  • , Stolon . . , -, deadlines - Postgres. , dbWaitReadyTimeout deadline . – 60 . checkpoints, deadline .

  • syncTimeout – deadline . 30 . , . .

  • InitTimeout – deadline , initdb .

  • -. conversion timeout. , Keeper . -. Stolon . - -, Stolon .


Patroni, , , . ? Stolon. Patroni . DCS, , Patroni.

Patroni, . . , DCS time to live . , , . . , - . s
 . Patroni , WAL-, REST API, , . WAL . Proxy – .

. . . 3- Etcd. Postgres Pro HAProxy confd, Etcd .

2- Patroni. Patroni Postgres.

Patroni , . basebackup’ . Patroni , , .

basebackup. , , tablespace.

workflow Patroni. , bootstrap. , , . . Stolon, , , . bootstrap. .

Patroni? postgres.conf pg_hba.conf, recovery.conf DCS , Stolon. . .

Patroni postgres-. , , .

, – , Patroni.


– . . . Patroni- REST API endpoints, , , .

HAProxy, healthchecks Patroni.

Patroni callbacks. , . .

HAProxy , DCS HAProxy. HAProxy + confd. consul-temlate. . .

10- Postgres libpq , , «target_session_attrs», . ? – , target_session_attrs.

, , watchdog Postgres, , , Patroni-. ? Postgres , . .

Stolon Patroni , . - , .

, DNS. Consul . DNS . .

IP-. HAProxy + keepalived. vip-manager, DCS, IP- , . , Postgres Pro , , IP-. , kill stop keepalived’, VRRP IP- HAProxy, IP- . , , . vip-manager. vip-manager , switchover, IP . , .

, , . Stolon :

  • ttl – .
  • Loop_wait – Patroni-.
  • Retry-timeout – DCS PostgreSQL.
  • Master_start_timeout- PostgreSQL ( Patroni-).

, , . , Patroni- Postgres, DCS. - loop_wait. , .

failover Patroni?

  • , DCS . Patroni- . . – 20-30 .

  • Patroni- REST API, endpoint Patroni WAL-. - 2 . 2 , , . , , WAL-.

  • DCS. - .

  • .

DCS , - , , 5 .

, , .

, , Patroni-. - - .


  • . , , , Postgres. , Postgres WAL-commit .
  • Zalando – watchdog. Patroni- - , : , .
  • HAProxy Confd, . . , .
  • Corosync & Pacemaker — ( ) , . . . , , , .

, HAProxy Confd .

, netsplit? HAProxy Patroni . . health check’ Patroni-.

Confd. Confd , DCS.

, HAProxy PgBouncer. PgBouncer DCS. , , Patroni .

  • , Patroni . . , , - DCS . downtime , . wal_keep_stgments, .
  • , , . . . , , , . .
  • Patroni? Patroni Stolon , enterprise . :
  • . .
  • . , , . , , failover , - . Max Availability Oracle Data Guard.
  • PostgreSQL Stolon.



Etcd, . , - ?

-, . , , Etcd, Consul mail , . fsync , .

-? , . , , , , ?

, -.

, Postgres , . DCS , .

, .

? ? . , , Consul . Etcd . - ?

Consul Etcd. RAFT. fsync . Postgres DCS , , . , . . , .

, !

Zookeeper? , ? ?

Zookeeper , . Etcd . Stolon , Patroni – .

- Patroni? . - ?

. wal_keep_segments, . . WAL- , . , issue Patroni. , Stolon, , , - .

! -! , . Patroni , . , , .

, . . .

. . . , . WAL-, . , WALs . . , . !

. , - -. , . . . switchover failover, . promote checkpoint, WALs . . , , .

! , - - . , ?

enterprise, Patroni. Stolon. , . . -- Kubernetes, , . Keeper, Sentinel. , , .

Patroni . WAL-, , DCS. DCS . (, , ), DCS, . . issue, Consul . Patroni. Stolon . Kubernetes.

. , Stolon ?


– master-slave Stolon.

, . , standby . – Stolon, . , , standby .

. . .

, ?

, . . , , , .

, .

. . ?

, .

Patroni . , , , . , . .

, Patroni , . Stolon , Postgres keeper data, .

, Stolon ?

open source, .

, , ?

, issue. .

- , . - . , .

issue. , , , .

-, . .

! . . , HAProxy, . . . . HAProxy "on-marked-down shutdown-sessions", , .

, ? health checks?

, http check REST API.

, -, HAProxy, IP- . – PgBouncer, health checks. HAProxy – , health checks , . , , – Patroni, - .

Patroni Etcd REST API.

, Etcd , Etcd.

Etcd? , , . watchdog, Patroni , , , watchdog reboot.

, watchdog – . watchdog. Patroni PostgreSQL, Patroni. watchdog – , , . .

, .

watchdog -, .. , , Patroni- , reboot. .

watchdog , , , , failover , . .

, . ? .


, Patroni, . . - . watchdog – .

Patroni Etcd , , standby. , watchdog .

. , Patroni , , , . . : watchdog, HAProxy.


Etcd. ?


- ?


? , ?


. . , , ?

Oui. J'ai mentionné cela dans une thÚse pour une configuration instable. Et le temps mort est le seul moyen. Il s'agit notamment de heartbeat_interval et d'élection_timeout.


All Articles