Installation de patrons et de stolons et pratique des chutes. Maxim Milyutin



Patroni et Stolon sont deux des solutions les plus connues et avancées pour l'orchestration PostgreSQL et les clusters de configuration Leader-Followers haute disponibilité (basculement automatique). Cependant, les ingénieurs migrant à partir d'anciennes solutions éprouvées (Corosync & Pacemaker) et embarqués à partir d'autres SGBD rencontrent des difficultés pour installer ces outils et un manque de compréhension des rÎles de chacun des composants. Dans cette classe de maßtre, le processus typique d'installation des clusters Patroni et Stolon sur des machines virtuelles (pas dans des conteneurs) sera examiné, ainsi que le comportement de ces clusters avec diverses défaillances dans l'infrastructure. L'ensemble du processus sera démontré sur trois machines virtuelles exécutant vagrant à l'aide d'images prédéfinies. S'il le souhaite, l'auditeur peut suivre le processus, aprÚs avoir préalablement préparé son environnement.



PGConf.Russia



! . Ozon . . Postgres Pro Patroni Stolon. .





-. , Stolon, Patroni . .



, Ansible , Postgres Pro , .



Patroni , , — https://github.com/vitabaks/postgresql_cluster. .





, .



  • PostgreSQL – shared-nothing, .
  • . , .
  • hot standby, . . .
  • :
  • pg_basebackup , , .
  • . standby .
  • pg_rewind, standby.
  • , .




https://eng.uber.com/mysql-migration/



https://github.com/sorintlab/stolon/issues/519



https://github.com/zalando/patroni/issues/538



  • 10- PostgreSQL . , , , . , , , . Write amplification, - , , WAL full page images, checkpoint. hit beat . . WAL. « PostgreSQL MySQL» .



  • .



  • , , DDL, sequence, , , . WAL. WAL -. GTID MySQL, CSN MS SQL Server.



  • pg_rewind.



  • Stolon Patroni , , , rolling upgrade Postgres .







, ? , . . - , health checks - .





, , – promote . .





, , , .





, ? , promote . .





, split brain . - , .



, , , .



, . .





? Postgres , . , , , , .





? , , , - .



– . , read only. .





fail. , . , .





https://github.com/citusdata/pg_auto_failover



https://github.com/citusdata/pg_auto_failover/issues/12#issuecomment-490551255



. . pg_auto_failover Citus Data.



. , . pg_stat_replication.





, . . , , . primary ( ) , .



, , . , , .



fail. , .





, , .





, . .



, . , , .



.





, . DCS (Distributed Configuration System – ). IP , .



DCS – Consul, Etcd, Raft Zookeeper, Zab. Zab – Paxos.



, DCS.



Patroni/ Stolon.



Postgres Postgres .





, Patroni/ Stolon.



  • -, autofailover. - .
  • . PostgreQSL.
  • , Kubernetes.
  • DBaaS (database as a service).
  • – . , - . , - .


(DCS) Etcd





https://raft.github.io/



. DCS. . , «» . DCS, , .



? . , Postgres, , DCS , , split , split brain. , fail DCS .



, DCS 3-5-7 , , 3- . ? . net split, , DCS.



Etcd RAFT . .





DCS , follower PostgreSQL. RAFT.



. . .



, . follower, . . - RTT fsync.



, follower, . , , . . .



, - .



14 42 .



vagrant status
Current machine states:

node1                     running (virtualbox)
node2                     running (virtualbox)
node3                     running (virtualbox)

This environment represents multiple VMs. The VMs are all listed
above with their current state. For more information about a specific
VM, run `vagrant status NAME`.


. vagrant.





: , . , , . . .





. . , .





Etcd . Etcd , Etcd.





config Etcd. , Etcd, , IP , . . ETCD_LISTEN_CLIENT_URLS . ETCD_LISTEN_PEER_URLS .



ETCD_ADVERTISE_CLIENT_URLS ETCD_INITIAL_ADVERTISE_PEER_URLS. . discovery, .



: ETCD_HEARTBEAT_INTERVAL ETCD_ELECTION_TIMEOUT.





. . . Ansible. . , .





. Etcd .





, term 2. Term – timeline PostgreSQL. term .





etcdctl member list. , () , followers.



sudo pkill -STOP etcd


. , fail , . Etcd , . . .





. . , term.





, , .





«etcdctl cluster-health». , . .





Etcd. , . term follower’.





- . . ? – . Etcd . «comcast». API tables Etcd. , .



? «Comcast — - device eth1 – packet – loss 100 %».





. , . time line. , -, . , term 4.





. , heartbeat_interval election_timeout. , followers , heartbeat , followers , . follower heartbeat - - -, . .



, , - . , . heartbeat_interval – 100 . , -, . election_timeout – .





. . , , RTT , election_timeout. Election_timeout . Ansible. .



`comcast --device eth1 --stop



: comcast --device eth1 --latency 600. .





latency 600 . 600 – . RTT 200 .





ping . RTT 1 .





. , term . . , - , term. .





, heartbeat_interval election_timeout. , heartbeat , election_timeout 10 . Ansible. . Etcd-config. , . , . . , -. Etcd .





. . follower’.





member list, , , fallowers .



, , , , - 10 .





- Etcd, . bar. Deadline exceeded – , , . Etcd. timeout . 5 . total_timeout , 10 .





«get», . -. .





. , .





. Election_timeout , heartbeat 100 .



, RAFT - . , : , , . .





. Etcdctl member list. . – follower.





. bash – comcast – device, . . . - sleep . Comcast – device eth1 – stop sleep 1,5. done . , , . .





Etcd. , term , , - , term, . Term . . . .





, , Etcd, . 1 . , . . . , , Etcd fsync . , .



. Comcast – device eth1 – stop.





https://github.com/etcd-io/etcd/blob/master/Documentation/tuning.md#time-parameters

https://github.com/etcd-io/etcd/blob/master/Documentation/op-guide/hardware.md#example-hardware-configurations



. Etcd , .



, , .



, Etcd , , , , .



Patroni Stolon. , .





. netsplit, , DCS. , , Postgres , DCS . , Postgres, .



DCS. , . . . , DCS Patroni Stolon.





Stolon.





. DCS stolon-sentinel, . DCS : election, , statefull .



Postgres’ Stolon-keeper. – stolon-proxy, .





https://github.com/sorintlab/stolon/issues/313



3- , . , , 2- sentinel, . , 2- stolon-proxy. , , 2- Stolon-keeper, postgres-.



41 20



, . . , . , stolon’ . -, . Etcd . . . . – superuser, . . , .



Stolon. stolon.d/test-cluster.conf. , «test.cluster» . , , . Postgres, -. ,



- . , . Superuser, Stolon-keeper . . . , .



«test.cluster»? system/system/Stolon-keeper@.service. template-, . , - . ? Stolon, 
 . , , - , -, .



Ansible. . . , . . . Stolon-keeper. Name=Stolon-keeper@test-cluster state=started enable=on. .



. Test-cluster. , . lock - . , : Stolon-keeper, sentinel proxy . .



sentinel. . , , DCS. . . . sentinel , sentinel . . State=started enabled=on. - . , . , test-cluster. . , - . .



.





https://postgrespro.ru/docs/postgrespro/12/server-shutdown

https://github.com/sorintlab/stolon/issues/707



workflow Stolon:



  • «stolonctl init».
  • PostgreSQL pg_hba update.
  • , PostgreSQL , , , . . , Keeper, post-master. Stolon-keeper PostgreSQL.
  • «automaticPgRestart», postgres- .
  • , . , max_connections, max_lock_per_transaction postgres-. . , , «max_connections» «max_lock_per_transaction». , , , . .
  • – Stolon-keeper. – Stolon-keeper. . , .


, pg_pba. , pba. /opt/stolon/test-cluster. . . Stolon-test-cluster-spec.json. , . . , .



.





https://github.com/sorintlab/stolon/blob/master/doc/initialization.md

https://github.com/sorintlab/stolon/blob/master/doc/standbycluster.md



Stolon :



  • – .
  • – PITR, . standby cluster.
  • – existing. , DCS. DCS , , . , «existing».


. unitdb, checksums, , pgrewind . . Stolonctl. . .



Keeper, . . Keeper , sentinel, . , unitdb, . standby.



. «status». , Keepers, heaths check Keepers Postgres, . , , . sentinel.



, . wantedgeneration currentgeneration. Stolon-keeper . sentinel , , , . Keeper . .



. json, . . . Keepers . , , . , . . . , . Etcd .



. : Etcd . , Etcd. , . , , Consul. Consul , . , , , , Stolon-keeper . Postgres, Stolon . , Stolon-keeper. systemd, on abort, kill -9 .



Postgres. kill -9 , . . – . . Stolon-keeper, Ok.



. . - . Postgres . Stolon-keeper . Postgres. .



. fail. Postgres-. , . pgbench.



- , Postgres, ? select , , select.



, checksums, , checksums , . Postgres , . , , checksums , - . Postgres. Patroni/Stolon .



pgbench. . , . 25432. . . Stolon/test-cluster/postgres/pg_hba.conf.



, Stolon superuser, , . , .



. «default», . «pg_hba». «update». json- pgHBA . local all posters. Posters trust. – host all postgres 172.20.20.0/24 trust.



, . . , Postgres. . Create user postgres superuser. , Postgres . pg_bench . HBA user test. Patroni. .



while. 20 , . , . .





Stolon . :



  • SleepInterval – .
  • RequestTimeout – deadline PostgreSQL. Deadline DCS – 5 .
  • FailInterval – , sentinel , . Sentinel failInterval, , . , , , . . - , . . failInterval .




autofailover Stolon?



1 – fail . Stolon-keeper Postgres . sentinel. , sentinel. . sleepInterval. 10 .





2 – - , , sentinel. , Keeper .





3 – sentinel. Keepers. sleepInterval.



: (λ1 + λ2) * sleepInterval. . .





4 – . DCS. sentinel , .





, , DCS sentinel , failover 25 50 .



fail sentinel’, failover sentinel. sentinel. failover .





, Stolon-proxy Keeper , Keeper read only . Postgres. Postgres Stolon-proxy.



. DCS, , , , .





  • Stolon. Stolon . , DCS . , «deadKeeperRemovalInterval». 48 . , DCS. , . , , WAL. 48 , .



  • , Stolon . . , -, deadlines - Postgres. , dbWaitReadyTimeout deadline . – 60 . checkpoints, deadline .



  • syncTimeout – deadline . 30 . , . .



  • InitTimeout – deadline , initdb .



  • -. conversion timeout. , Keeper . -. Stolon . - -, Stolon .







Patroni.





Patroni, , , . ? Stolon. Patroni . DCS, , Patroni.





Patroni, . . , DCS time to live . , , . . , - . s
 . Patroni , WAL-, REST API, , . WAL . Proxy – .





. . . 3- Etcd. Postgres Pro HAProxy confd, Etcd .



2- Patroni. Patroni Postgres.





https://patroni.readthedocs.io/en/latest/existing_data.html



Patroni , . basebackup’ . Patroni , , .



basebackup. , , tablespace.





https://www.postgresql.org/docs/current/hot-standby.html#HOT-STANDBY-ADMIN



workflow Patroni. , bootstrap. , , . . Stolon, , , . bootstrap. .



Patroni? postgres.conf pg_hba.conf, recovery.conf DCS , Stolon. . .



Patroni postgres-. , , .



, – , Patroni.





https://github.com/zalando/patroni/blob/master/haproxy.cfg

https://github.com/zalando/patroni/tree/master/extras/confd



.



– . . . Patroni- REST API endpoints, , , .



HAProxy, healthchecks Patroni.



Patroni callbacks. , . .



HAProxy , DCS HAProxy. HAProxy + confd. consul-temlate. . .



10- Postgres libpq , , «target_session_attrs», . ? – , target_session_attrs.



, , watchdog Postgres, , , Patroni-. ? Postgres , . .



Stolon Patroni , . - , .





https://www.consul.io/docs/guides/forwarding.html

https://learn.hashicorp.com/consul/day-2-operations/advanced-operations/dns-caching

https://pgconf.ru/2019/242817 https://pgconf.ru/2019/242821

https://github.com/cybertec-postgresql/vip-manager



, DNS. Consul . DNS . .



IP-. HAProxy + keepalived. vip-manager, DCS, IP- , . , Postgres Pro , , IP-. , kill stop keepalived’, VRRP IP- HAProxy, IP- . , , . vip-manager. vip-manager , switchover, IP . , .





, , . Stolon :



  • ttl – .
  • Loop_wait – Patroni-.
  • Retry-timeout – DCS PostgreSQL.
  • Master_start_timeout- PostgreSQL ( Patroni-).


, , . , Patroni- Postgres, DCS. - loop_wait. , .



failover Patroni?





  • , DCS . Patroni- . . – 20-30 .

  • Patroni- REST API, endpoint Patroni WAL-. - 2 . 2 , , . , , WAL-.




  • DCS. - .




  • .




DCS , - , , 5 .





, , .



, , Patroni-. - - .





https://www.postgresql.org/message-id/C1F7905E-5DB2-497D-ABCC-E14D4DEE506C@yandex-team.ru

https://github.com/zalando/patroni/blob/master/docs/watchdog.rst



.



  • . , , , Postgres. , Postgres WAL-commit .
  • Zalando – watchdog. Patroni- - , : , .
  • HAProxy Confd, . . , .
  • Corosync & Pacemaker — ( ) , . . . , , , .




, HAProxy Confd .





, netsplit? HAProxy Patroni . . health check’ Patroni-.



Confd. Confd , DCS.





, HAProxy PgBouncer. PgBouncer DCS. , , Patroni .





  • , Patroni . . , , - DCS . downtime , . wal_keep_stgments, .
  • , , . . . , , , . .
  • Patroni? Patroni Stolon , enterprise . :
  • . .
  • . , , . , , failover , - . Max Availability Oracle Data Guard.
  • PostgreSQL Stolon.


!



.



Etcd, . , - ?



-, . , , Etcd, Consul mail , . fsync , .



-? , . , , , , ?



, -.



, Postgres , . DCS , .



, .



? ? . , , Consul . Etcd . - ?



Consul Etcd. RAFT. fsync . Postgres DCS , , . , . . , .



, !



Zookeeper? , ? ?



Zookeeper , . Etcd . Stolon , Patroni – .



- Patroni? . - ?



. wal_keep_segments, . . WAL- , . , issue Patroni. , Stolon, , , - .



! -! , . Patroni , . , , .



, . . .



. . . , . WAL-, . , WALs . . , . !



. , - -. , . . . switchover failover, . promote checkpoint, WALs . . , , .



! , - - . , ?



enterprise, Patroni. Stolon. , . . -- Kubernetes, , . Keeper, Sentinel. , , .



Patroni . WAL-, , DCS. DCS . (, , ), DCS, . . issue, Consul . Patroni. Stolon . Kubernetes.



. , Stolon ?



.



– master-slave Stolon.



, . , standby . – Stolon, . , , standby .



. . .



, ?



, . . , , , .



, .



. . ?



, .



Patroni . , , , . , . .



, Patroni , . Stolon , Postgres keeper data, .



, Stolon ?



open source, .



, , ?



, issue. .



- , . - . , .



issue. , , , .



-, . .



! . . , HAProxy, . . . . HAProxy "on-marked-down shutdown-sessions", , .



, ? health checks?



, http check REST API.



, -, HAProxy, IP- . – PgBouncer, health checks. HAProxy – , health checks , . , , – Patroni, - .



Patroni Etcd REST API.



, Etcd , Etcd.



Etcd? , , . watchdog, Patroni , , , watchdog reboot.



, watchdog – . watchdog. Patroni PostgreSQL, Patroni. watchdog – , , . .



, .



watchdog -, .. , , Patroni- , reboot. .



watchdog , , , , failover , . .



, . ? .



, 




, Patroni, . . - . watchdog – .



Patroni Etcd , , standby. , watchdog .



. , Patroni , , , . . : watchdog, HAProxy.



.



Etcd. ?



-.



- ?



-.



? , ?



-.



. . , , ?



Oui. J'ai mentionné cela dans une thÚse pour une configuration instable. Et le temps mort est le seul moyen. Il s'agit notamment de heartbeat_interval et d'élection_timeout.



Merci!






All Articles