MySQL PXC 集群死锁分析案例中,如何定位导致的问题?

摘要:前不久一个系统死锁导致部分业务受到影响,今次补上详细的节点日志分析过程。 这个PXC集群有三个节点,分别是 108、109、110,日志信息的ip6地址、节点编号等信息均已做脱敏处理。 以下日志里面,3个节点对应的配置信息是: 108 99
前不久一个系统死锁导致部分业务受到影响,今次补上详细的节点日志分析过程。 这个PXC集群有三个节点,分别是 108、109、110,日志信息的ip6地址、节点编号等信息均已做脱敏处理。 以下日志里面,3个节点对应的配置信息是: 108 99999999-9908 9999:9999:9999:9999::6c 109 99999999-9909 9999:9999:9999:9999::6d 110 99999999-9910 9999:9999:9999:9999::6e 一、节点日志分析109节点 1.1 之前数据库节点一直正常,且无较大的事务,直到 11:06 ~ 11:07 成功完成了1次较大的事务,全局缓存页达到了 500MB,如下: 2024-07-15T03:06:29.228915Z 0 [Note] [MY-000000] [Galera] Created page /mysqldata/mysql/gcache.page.000768 of size 515922640 bytes ... 2024-07-15T03:07:48.922772Z 0 [Note] [MY-000000] [Galera] Deleted page /mysqldata/mysql/gcache.page.000768 ... 1.2 9分钟以后即 11:16 再次发生大事务,全局缓存页达到了 570MB,这次报错了,报错如下: 2024-07-15T03:16:48.358335Z 0 [Note] [MY-000000] [Galera] Created page /mysqldata/mysql/gcache.page.000769 of size 586372048 bytes 1.3 以下报错信息显示,表 tm_xj.rms_device_info_zw_month_bak 的操作,在本节点同步执行 replication 时失败,原因在于该多语句大事务过大,超过了系统设置 max_binlog_cache_size,经查该设置是 512M:max_binlog_cache_size = 512M 2024-07-15T03:17:47.506531Z 2 [ERROR] [MY-010584] [Repl] Replica SQL: Could not execute Write_rows event on table tm_xj.rms_device_info_zw_month_bak; Multi-statement transaction required more than 'max_binlog_cache_size' bytes of storage; increase this mysqld variable and try again, Error_code: 1197; Writing one row to the row-based binary log failed, Error_code: 1534; handler error HA_ERR_RBR_LOGGING_FAILED; the event's source log FIRST, end_log_pos 0, Error_code: MY-001197 2024-07-15T03:17:47.506611Z 2 [Warning] [MY-000000] [WSREP] Event 66115 Write_rows apply failed: 161, seqno 9338777 1.4 以下信息显示,节点仲裁时 Received bogus VOTE message,进而检测到了冲突 Inconsistency detected,然后导致本节点于 11:18 停止服务。 2024-07-15T03:18:34.561764Z 0 [Note] [MY-000000] [Galera] Member 1(pxc-cluster-node-2) initiates vote on 2a37cad4-f834-11ee-916e-da042bca7bc3:9338777,892e7c82cf62c56b: Multi-statement transaction required more than 'max_binlog_cache_size' bytes of storage; increase this mysqld variable and try again, Error_code: 1197; Writing one row to the row-based binary log failed, Error_code: 1534; 2024-07-15T03:18:34.562059Z 0 [Note] [MY-000000] [Galera] Recomputed vote based on error codes: 1197, 1534. New vote d659be586fd355cc will be used for further steps. Old Vote: 892e7c82cf62c56b 2024-07-15T03:18:34.562104Z 0 [Note] [MY-000000] [Galera] Votes over 2a37cad4-f834-11ee-916e-da042bca7bc3:9338777: 0000000000000000: 1/3 d659be586fd355cc: 1/3 Waiting for more votes. 2024-07-15T03:18:34.563801Z 0 [Note] [MY-000000] [Galera] Member 0(pxc-cluster-node-3) responds to vote on 2a37cad4-f834-11ee-916e-da042bca7bc3:9338777,0000000000000000: Success 2024-07-15T03:18:34.563836Z 0 [Warning] [MY-000000] [Galera] Received bogus VOTE message: 9338777.0, from node 2a36a321-f834-11ee-bc44-bb66609ee19b, expected > 9338860. Ignoring. 2024-07-15T03:18:34.563857Z 0 [Note] [MY-000000] [Galera] Votes over 2a37cad4-f834-11ee-916e-da042bca7bc3:9338777: 0000000000000000: 1/3 d659be586fd355cc: 1/3 Waiting for more votes. 2024-07-15T03:18:34.563873Z 0 [Note] [MY-000000] [Galera] Member 2(pxc-cluster-node-1) responds to vote on 2a37cad4-f834-11ee-916e-da042bca7bc3:9338777,0000000000000000: Success 2024-07-15T03:18:34.563888Z 0 [Note] [MY-000000] [Galera] Votes over 2a37cad4-f834-11ee-916e-da042bca7bc3:9338777: 0000000000000000: 2/3 d659be586fd355cc: 1/3 Winner: 0000000000000000 2024-07-15T03:18:34.564007Z 2 [ERROR] [MY-000000] [Galera] Inconsistency detected: Inconsistent by consensus on 2a37cad4-f834-11ee-916e-da042bca7bc3:9338777 at /mnt/jenkins/workspace/pxc80-autobuild-RELEASE/test/rpmbuild/BUILD/Percona-XtraDB-Cluster-8.0.36/percona-xtradb-cluster-galera/galera/src/replicator_smm.cpp:process_apply_error():1462 2024-07-15T03:18:34.565436Z 2 [Note] [MY-000000] [Galera] Closing send monitor... 2024-07-15T03:18:34.565477Z 2 [Note] [MY-000000] [Galera] Closed send monitor. 2024-07-15T03:18:34.565497Z 2 [Note] [MY-000000] [Galera] gcomm: terminating thread 2024-07-15T03:18:34.565530Z 2 [Note] [MY-000000] [Galera] gcomm: joining thread 2024-07-15T03:18:34.567791Z 2 [Note] [MY-000000] [Galera] gcomm: closing backend 2024-07-15T03:18:34.569470Z 2 [Note] [MY-000000] [Galera] Current view of cluster as seen by this node view (view_id(NON_PRIM,99999999-9910,9) memb { 601c1fce-8a87,0 } joined { } left { } partitioned { 99999999-9910,0 99999999-9908,0 } ) 2024-07-15T03:18:34.569567Z 2 [Note] [MY-000000] [Galera] PC protocol downgrade 1 -> 0 2024-07-15T03:18:34.569592Z 2 [Note] [MY-000000] [Galera] Current view of cluster as seen by this node view ((empty)) 2024-07-15T03:18:34.570032Z 2 [Note] [MY-000000] [Galera] gcomm: closed 2024-07-15T03:18:34.570113Z 0 [Note] [MY-000000] [Galera] New COMPONENT: primary = no, bootstrap = no, my_idx = 0, memb_num = 1 2024-07-15T03:18:34.570222Z 0 [Note] [MY-000000] [Galera] Flow-control interval: [100, 100] 2024-07-15T03:18:34.570258Z 0 [Note] [MY-000000] [Galera] Received NON-PRIMARY. 2024-07-15T03:18:34.570275Z 0 [Note] [MY-000000] [Galera] Shifting SYNCED -> OPEN (TO: 9338911) 2024-07-15T03:18:34.570294Z 0 [Note] [MY-000000] [Galera] New SELF-LEAVE. 2024-07-15T03:18:34.570322Z 0 [Note] [MY-000000] [Galera] Flow-control interval: [0, 0] 2024-07-15T03:18:34.570334Z 0 [Note] [MY-000000] [Galera] Received SELF-LEAVE. Closing connection. 2024-07-15T03:18:34.570343Z 0 [Note] [MY-000000] [Galera] Shifting OPEN -> CLOSED (TO: 9338911) 2024-07-15T03:18:34.570356Z 0 [Note] [MY-000000] [Galera] RECV thread exiting 0: Success 2024-07-15T03:18:34.571771Z 2 [Note] [MY-000000] [Galera] recv_thread() joined. 2024-07-15T03:18:34.571808Z 2 [Note] [MY-000000] [Galera] Closing replication queue. 2024-07-15T03:18:34.571823Z 2 [Note] [MY-000000] [Galera] Closing slave action queue. 2024-07-15T03:18:34.575876Z 14 [Note] [MY-000000] [Galera] ================================================ View: id: 2a37cad4-f834-11ee-916e-da042bca7bc3:9338911 status: non-primary protocol_version: 4 capabilities: MULTI-MASTER, CERTIFICATION, PARALLEL_APPLYING, REPLAY, ISOLATION, PAUSE, CAUSAL_READ, INCREMENTAL_WS, UNORDERED, PREORDERED, STREAMING, NBO final: no own_index: 0 members(1): 0: 601c1fce-f834-11ee-8a87-33f7192b1c5c, pxc-cluster-node-2 ================================================= 2024-07-15T03:18:34.575947Z 14 [Note] [MY-000000] [Galera] Non-primary view 2024-07-15T03:18:34.575967Z 14 [Note] [MY-000000] [WSREP] Server status change synced -> connected 2024-07-15T03:18:34.575983Z 14 [Note] [MY-000000] [WSREP] wsrep_notify_cmd is not defined, skipping notification. 2024-07-15T03:18:34.576009Z 14 [Note] [MY-000000] [WSREP] wsrep_notify_cmd is not defined, skipping notification. 2024-07-15T03:18:34.576066Z 14 [Note] [MY-000000] [Galera] ================================================ View: id: 2a37cad4-f834-11ee-916e-da042bca7bc3:9338911 status: non-primary protocol_version: 4 capabilities: MULTI-MASTER, CERTIFICATION, PARALLEL_APPLYING, REPLAY, ISOLATION, PAUSE, CAUSAL_READ, INCREMENTAL_WS, UNORDERED, PREORDERED, STREAMING, NBO final: yes own_index: -1 members(0): ================================================= 2024-07-15T03:18:34.576088Z 14 [Note] [MY-000000] [Galera] Non-primary view 2024-07-15T03:18:34.576099Z 14 [Note] [MY-000000] [WSREP] Server status change connected -> disconnected 2024-07-15T03:18:34.576108Z 14 [Note] [MY-000000] [WSREP] wsrep_notify_cmd is not defined, skipping notification. 2024-07-15T03:18:34.576121Z 14 [Note] [MY-000000] [WSREP] wsrep_notify_cmd is not defined, skipping notification. 2024-07-15T03:18:34.576141Z 14 [Note] [MY-000000] [Galera] Waiting 600 seconds for 8 receivers to finish 2024-07-15T03:18:34.577081Z 13 [Note] [MY-000000] [Galera] Slave thread exit. Return code: 0 2024-07-15T03:18:34.577082Z 11 [Note] [MY-000000] [Galera] Slave thread exit. Return code: 0 2024-07-15T03:18:34.577186Z 11 [Note] [MY-000000] [WSREP] Applier thread exiting ret: 0 thd: 11 2024-07-15T03:18:34.577148Z 13 [Note] [MY-000000] [WSREP] Applier thread exiting ret: 0 thd: 13 2024-07-15T03:18:34.577235Z 12 [Note] [MY-000000] [Galera] Slave thread exit. Return code: 0 2024-07-15T03:18:34.577210Z 16 [Note] [MY-000000] [Galera] Slave thread exit. Return code: 0 2024-07-15T03:18:34.577088Z 15 [Note] [MY-000000] [Galera] Slave thread exit. Return code: 0 2024-07-15T03:18:34.577335Z 16 [Note] [MY-000000] [WSREP] Applier thread exiting ret: 0 thd: 16 2024-07-15T03:18:34.577356Z 15 [Note] [MY-000000] [WSREP] Applier thread exiting ret: 0 thd: 15 2024-07-15T03:18:34.577365Z 10 [Note] [MY-000000] [Galera] Slave thread exit. Return code: 0 2024-07-15T03:18:34.577389Z 10 [Note] [MY-000000] [WSREP] Applier thread exiting ret: 0 thd: 10 2024-07-15T03:18:34.577284Z 12 [Note] [MY-000000] [WSREP] Applier thread exiting ret: 0 thd: 12 2024-07-15T03:18:34.593783Z 2 [ERROR] [MY-000000] [Galera] Failed to apply write set: gtid: 2a37cad4-f834-11ee-916e-da042bca7bc3:9338777 server_id: 2a36a321-f834-11ee-bc44-bb66609ee19b client_id: 2946137 trx_id: 710800782 flags: 3 (start_transaction | commit) 2024-07-15T03:18:34.593950Z 2 [Note] [MY-000000] [Galera] Slave thread exit. Return code: 0 2024-07-15T03:18:34.593996Z 2 [Note] [MY-000000] [WSREP] Applier thread exiting ret: 0 thd: 2 2024-07-15T03:18:35.268179Z 2865366 [Warning] [MY-013360] [Server] Plugin mysql_native_password reported: ''mysql_native_password' is deprecated and will be removed in a future release. Please use caching_sha2_password instead' 2024-07-15T03:18:35.887809Z 0 [Note] [MY-000000] [Galera] Service thread queue flushed. 2024-07-15T03:18:35.888089Z 14 [Note] [MY-000000] [Galera] ####### Assign initial position for certification: 00000000-0000-0000-0000-000000000000:-1, protocol version: 6 2024-07-15T03:18:35.888185Z 14 [Note] [MY-000000] [Galera] Slave thread exit. Return code: 0 2024-07-15T03:18:35.888213Z 14 [Note] [MY-000000] [WSREP] Applier thread exiting ret: 0 thd: 14 2024-07-15T03:18:35.901142Z 2865367 [Warning] [MY-013360] [Server] Plugin mysql_native_password reported: ''mysql_native_password' is deprecated and will be removed in a future release. Please use caching_sha2_password instead' 2024-07-15T03:18:36.023349Z 2865368 [Warning] [MY-013360] [Server] Plugin mysql_native_password reported: ''mysql_native_password' is deprecated and will be removed in a future release. Please use caching_sha2_password instead' 2024-07-15T03:18:36.118825Z 0 [Note] [MY-000000] [Galera] Deleted page /mysqldata/mysql/gcache.page.000769 1.5 之后直到 11:59 有人关闭该节点: 2024-07-15T03:59:05.397213Z 0 [System] [MY-013172] [Server] Received SHUTDOWN from user <via user signal>. Shutting down mysqld (Version: 8.0.36-28.1). 2024-07-15T03:59:05.397292Z 0 [Note] [MY-000000] [WSREP] Received shutdown signal. Will sleep for 10 secs before initiating shutdown. pxc_maint_mode switched to SHUTDOWN ## 1.6 睡眠 10 秒后节点关闭: 2024-07-15T03:59:15.401674Z 0 [Note] [MY-000000] [WSREP] Shutdown replication 2024-07-15T03:59:15.401759Z 0 [Note] [MY-000000] [WSREP] Waiting for active wsrep applier to exit 2024-07-15T03:59:15.401863Z 0 [Note] [MY-000000] [WSREP] All applier thread terminated. Will now terminate rollback thread 2024-07-15T03:59:15.401919Z 0 [Note] [MY-000000] [WSREP] Waiting for rollback thread to terminate 2024-07-15T03:59:15.401971Z 1 [Note] [MY-000000] [WSREP] rollbacker thread exiting 1 2024-07-15T03:59:16.402226Z 0 [Note] [MY-000000] [WSREP] Rollback thread terminated 2024-07-15T03:59:18.406295Z 0 [Warning] [MY-010909] [Server] /usr/sbin/mysqld: Forcing close of thread 2875920 user: 'tm_xj'. 2024-07-15T03:59:18.406692Z 0 [Warning] [MY-010909] [Server] /usr/sbin/mysqld: Forcing close of thread 2875496 user: 'tm_xj'. 2024-07-15T03:59:18.406865Z 0 [Warning] [MY-010909] [Server] /usr/sbin/mysqld: Forcing close of thread 2875376 user: 'root'. 2024-07-15T03:59:18.407021Z 0 [Warning] [MY-010909] [Server] /usr/sbin/mysqld: Forcing close of thread 2875632 user: 'root'. 2024-07-15T03:59:18.407164Z 0 [Warning] [MY-010909] [Server] /usr/sbin/mysqld: Forcing close of thread 2875352 user: 'root'. 2024-07-15T03:59:18.407303Z 0 [Warning] [MY-010909] [Server] /usr/sbin/mysqld: Forcing close of thread 2875648 user: 'root'. 2024-07-15T03:59:18.408120Z 0 [Warning] [MY-010909] [Server] /usr/sbin/mysqld: Forcing close of thread 2875616 user: 'tm_xj'. 2024-07-15T03:59:18.408298Z 0 [Warning] [MY-010909] [Server] /usr/sbin/mysqld: Forcing close of thread 2875848 user: 'root'. 2024-07-15T03:59:18.408436Z 0 [Warning] [MY-010909] [Server] /usr/sbin/mysqld: Forcing close of thread 2874352 user: 'tm_xj'. 2024-07-15T03:59:18.408550Z 0 [Warning] [MY-010909] [Server] /usr/sbin/mysqld: Forcing close of thread 2875912 user: 'root'. 2024-07-15T03:59:18.408652Z 0 [Warning] [MY-010909] [Server] /usr/sbin/mysqld: Forcing close of thread 2875560 user: 'tm_xj'. 2024-07-15T03:59:18.408749Z 0 [Warning] [MY-010909] [Server] /usr/sbin/mysqld: Forcing close of thread 2874336 user: 'tm_xj'. 2024-07-15T03:59:18.408850Z 0 [Warning] [MY-010909] [Server] /usr/sbin/mysqld: Forcing close of thread 2875360 user: 'tm_xj'. 2024-07-15T03:59:18.408951Z 0 [Warning] [MY-010909] [Server] /usr/sbin/mysqld: Forcing close of thread 2875744 user: 'root'. 2024-07-15T03:59:18.409067Z 0 [Warning] [MY-010909] [Server] /usr/sbin/mysqld: Forcing close of thread 2876000 user: 'tm_xj'. 2024-07-15T03:59:18.409169Z 0 [Warning] [MY-010909] [Server] /usr/sbin/mysqld: Forcing close of thread 2875368 user: 'root'. 2024-07-15T03:59:18.409244Z 0 [Warning] [MY-010909] [Server] /usr/sbin/mysqld: Forcing close of thread 2875928 user: 'root'. 2024-07-15T03:59:18.409344Z 0 [Warning] [MY-010909] [Server] /usr/sbin/mysqld: Forcing close of thread 2872568 user: 'tm_xj'. 2024-07-15T03:59:18.409439Z 0 [Warning] [MY-010909] [Server] /usr/sbin/mysqld: Forcing close of thread 2875720 user: 'tm_xj'. 2024-07-15T03:59:18.409571Z 0 [Warning] [MY-010909] [Server] /usr/sbin/mysqld: Forcing close of thread 2875440 user: 'tm_xj'. 2024-07-15T03:59:18.409674Z 0 [Warning] [MY-010909] [Server] /usr/sbin/mysqld: Forcing close of thread 2875464 user: 'root'. 2024-07-15T03:59:18.409778Z 0 [Warning] [MY-010909] [Server] /usr/sbin/mysqld: Forcing close of thread 2875480 user: 'tm_xj'. 2024-07-15T03:59:18.409889Z 0 [Warning] [MY-010909] [Server] /usr/sbin/mysqld: Forcing close of thread 2872560 user: 'tm_xj'. 2024-07-15T03:59:18.409984Z 0 [Warning] [MY-010909] [Server] /usr/sbin/mysqld: Forcing close of thread 2875737 user: 'root'. 2024-07-15T03:59:18.410252Z 0 [Warning] [MY-010909] [Server] /usr/sbin/mysqld: Forcing close of thread 2875521 user: 'tm_xj'. 2024-07-15T03:59:18.411058Z 0 [Warning] [MY-010909] [Server] /usr/sbin/mysqld: Forcing close of thread 2875353 user: 'root'. 2024-07-15T03:59:18.411180Z 0 [Warning] [MY-010909] [Server] /usr/sbin/mysqld: Forcing close of thread 2875625 user: 'root'. 2024-07-15T03:59:18.411344Z 0 [Warning] [MY-010909] [Server] /usr/sbin/mysqld: Forcing close of thread 2875649 user: 'root'. 2024-07-15T03:59:18.411446Z 0 [Warning] [MY-010909] [Server] /usr/sbin/mysqld: Forcing close of thread 2875569 user: 'root'. 2024-07-15T03:59:18.411550Z 0 [Warning] [MY-010909] [Server] /usr/sbin/mysqld: Forcing close of thread 2872569 user: 'tm_xj'. 2024-07-15T03:59:18.411660Z 0 [Warning] [MY-010909] [Server] /usr/sbin/mysqld: Forcing close of thread 2875441 user: 'tm_xj'. 2024-07-15T03:59:18.411860Z 0 [Warning] [MY-010909] [Server] /usr/sbin/mysqld: Forcing close of thread 2875921 user: 'tm_xj'. 2024-07-15T03:59:18.411963Z 0 [Warning] [MY-010909] [Server] /usr/sbin/mysqld: Forcing close of thread 2875721 user: 'tm_xj'. 2024-07-15T03:59:18.412091Z 0 [Warning] [MY-010909] [Server] /usr/sbin/mysqld: Forcing close of thread 2875929 user: 'root'. 2024-07-15T03:59:18.412289Z 0 [Warning] [MY-010909] [Server] /usr/sbin/mysqld: Forcing close of thread 2875417 user: 'tm_xj'. 2024-07-15T03:59:18.412422Z 0 [Warning] [MY-010909] [Server] /usr/sbin/mysqld: Forcing close of thread 2872561 user: 'tm_xj'. 2024-07-15T03:59:18.412569Z 0 [Warning] [MY-010909] [Server] /usr/sbin/mysqld: Forcing close of thread 2875457 user: 'tm_xj'. 2024-07-15T03:59:18.412723Z 0 [Warning] [MY-010909] [Server] /usr/sbin/mysqld: Forcing close of thread 2875465 user: 'root'. 2024-07-15T03:59:18.412841Z 0 [Warning] [MY-010909] [Server] /usr/sbin/mysqld: Forcing close of thread 2875497 user: 'tm_xj'. 2024-07-15T03:59:18.412946Z 0 [Warning] [MY-010909] [Server] /usr/sbin/mysqld: Forcing close of thread 2875906 user: 'root'. 2024-07-15T03:59:18.413037Z 0 [Warning] [MY-010909] [Server] /usr/sbin/mysqld: Forcing close of thread 2875434 user: 'tm_xj'. 2024-07-15T03:59:18.413203Z 0 [Warning] [MY-010909] [Server] /usr/sbin/mysqld: Forcing close of thread 2875418 user: 'tm_xj'. 2024-07-15T03:59:18.413343Z 0 [Warning] [MY-010909] [Server] /usr/sbin/mysqld: Forcing close of thread 2874354 user: 'tm_xj'. 2024-07-15T03:59:18.413444Z 0 [Warning] [MY-010909] [Server] /usr/sbin/mysqld: Forcing close of thread 2875522 user: 'tm_xj'. 2024-07-15T03:59:18.413583Z 0 [Warning] [MY-010909] [Server] /usr/sbin/mysqld: Forcing close of thread 2875530 user: 'tm_xj'. 2024-07-15T03:59:18.413739Z 0 [Warning] [MY-010909] [Server] /usr/sbin/mysqld: Forcing close of thread 2875850 user: 'root'. 2024-07-15T03:59:18.413871Z 0 [Warning] [MY-010909] [Server] /usr/sbin/mysqld: Forcing close of thread 2875650 user: 'root'. 2024-07-15T03:59:18.414264Z 0 [Warning] [MY-010909] [Server] /usr/sbin/mysqld: Forcing close of thread 2875930 user: 'root'. 2024-07-15T03:59:18.414372Z 0 [Warning] [MY-010909] [Server] /usr/sbin/mysqld: Forcing close of thread 2875634 user: 'root'. 2024-07-15T03:59:18.414464Z 0 [Warning] [MY-010909] [Server] /usr/sbin/mysqld: Forcing close of thread 2875354 user: 'root'. 2024-07-15T03:59:18.414548Z 0 [Warning] [MY-010909] [Server] /usr/sbin/mysqld: Forcing close of thread 2875562 user: 'tm_xj'. 2024-07-15T03:59:18.414702Z 0 [Warning] [MY-010909] [Server] /usr/sbin/mysqld: Forcing close of thread 2875722 user: 'tm_xj'. 2024-07-15T03:59:18.414809Z 0 [Warning] [MY-010909] [Server] /usr/sbin/mysqld: Forcing close of thread 2875370 user: 'root'. 2024-07-15T03:59:18.414906Z 0 [Warning] [MY-010909] [Server] /usr/sbin/mysqld: Forcing close of thread 2875986 user: 'tm_xj'. 2024-07-15T03:59:18.414994Z 0 [Warning] [MY-010909] [Server] /usr/sbin/mysqld: Forcing close of thread 2875730 user: 'tm_xj'. 2024-07-15T03:59:18.415106Z 0 [Warning] [MY-010909] [Server] /usr/sbin/mysqld: Forcing close of thread 2875458 user: 'tm_xj'. 2024-07-15T03:59:18.415233Z 0 [Warning] [MY-010909] [Server] /usr/sbin/mysqld: Forcing close of thread 2875922 user: 'root'. 2024-07-15T03:59:18.415332Z 0 [Warning] [MY-010909] [Server] /usr/sbin/mysqld: Forcing close of thread 2875738 user: 'tm_xj'. 2024-07-15T03:59:18.415430Z 0 [Warning] [MY-010909] [Server] /usr/sbin/mysqld: Forcing close of thread 2872562 user: 'tm_xj'. 2024-07-15T03:59:18.415645Z 0 [Warning] [MY-010909] [Server] /usr/sbin/mysqld: Forcing close of thread 2875466 user: 'root'. 2024-07-15T03:59:18.415737Z 0 [Warning] [MY-010909] [Server] /usr/sbin/mysqld: Forcing close of thread 2875482 user: 'tm_xj'. 2024-07-15T03:59:18.415836Z 0 [Warning] [MY-010909] [Server] /usr/sbin/mysqld: Forcing close of thread 2875490 user: 'tm_xj'. 2024-07-15T03:59:18.415932Z 0 [Warning] [MY-010909] [Server] /usr/sbin/mysqld: Forcing close of thread 2875739 user: 'root'. 2024-07-15T03:59:18.416016Z 0 [Warning] [MY-010909] [Server] /usr/sbin/mysqld: Forcing close of thread 2875435 user: 'tm_xj'. 2024-07-15T03:59:18.416274Z 0 [Warning] [MY-010909] [Server] /usr/sbin/mysqld: Forcing close of thread 2875907 user: 'root'. 2024-07-15T03:59:18.416394Z 0 [Warning] [MY-010909] [Server] /usr/sbin/mysqld: Forcing close of thread 2875731 user: 'tm_xj'. 2024-07-15T03:59:18.416500Z 0 [Warning] [MY-010909] [Server] /usr/sbin/mysqld: Forcing close of thread 2875283 user: 'tm_xj'. 2024-07-15T03:59:18.416620Z 0 [Warning] [MY-010909] [Server] /usr/sbin/mysqld: Forcing close of thread 2872563 user: 'tm_xj'. 2024-07-15T03:59:18.416717Z 0 [Warning] [MY-010909] [Server] /usr/sbin/mysqld: Forcing close of thread 2875555 user: 'tm_xj'. 2024-07-15T03:59:18.416839Z 0 [Warning] [MY-010909] [Server] /usr/sbin/mysqld: Forcing close of thread 2875923 user: 'root'. 2024-07-15T03:59:18.416926Z 0 [Warning] [MY-010909] [Server] /usr/sbin/mysqld: Forcing close of thread 2875563 user: 'root'. 2024-07-15T03:59:18.417035Z 0 [Warning] [MY-010909] [Server] /usr/sbin/mysqld: Forcing close of thread 2875627 user: 'root'. 2024-07-15T03:59:18.417266Z 0 [Warning] [MY-010909] [Server] /usr/sbin/mysqld: Forcing close of thread 2875851 user: 'root'. 2024-07-15T03:59:18.417373Z 0 [Warning] [MY-010909] [Server] /usr/sbin/mysqld: Forcing close of thread 2875355 user: 'root'. 2024-07-15T03:59:18.417483Z 0 [Warning] [MY-010909] [Server] /usr/sbin/mysqld: Forcing close of thread 2875635 user: 'root'. 2024-07-15T03:59:18.417572Z 0 [Warning] [MY-010909] [Server] /usr/sbin/mysqld: Forcing close of thread 2875443 user: 'tm_xj'. 2024-07-15T03:59:18.417706Z 0 [Warning] [MY-010909] [Server] /usr/sbin/mysqld: Forcing close of thread 2875467 user: 'root'. 2024-07-15T03:59:18.417863Z 0 [Warning] [MY-010909] [Server] /usr/sbin/mysqld: Forcing close of thread 2875419 user: 'tm_xj'. 2024-07-15T03:59:18.417966Z 0 [Warning] [MY-010909] [Server] /usr/sbin/mysqld: Forcing close of thread 2875371 user: 'root'. 2024-07-15T03:59:18.418063Z 0 [Warning] [MY-010909] [Server] /usr/sbin/mysqld: Forcing close of thread 2875483 user: 'tm_xj'. 2024-07-15T03:59:18.418155Z 0 [Warning] [MY-010909] [Server] /usr/sbin/mysqld: Forcing close of thread 2875491 user: 'tm_xj'. 2024-07-15T03:59:18.418251Z 0 [Warning] [MY-010909] [Server] /usr/sbin/mysqld: Forcing close of thread 2875651 user: 'root'. 2024-07-15T03:59:18.418393Z 0 [Warning] [MY-010909] [Server] /usr/sbin/mysqld: Forcing close of thread 2875660 user: 'root'. 2024-07-15T03:59:18.418477Z 0 [Warning] [MY-010909] [Server] /usr/sbin/mysqld: Forcing close of thread 2875732 user: 'tm_xj'. 2024-07-15T03:59:18.418641Z 0 [Warning] [MY-010909] [Server] /usr/sbin/mysqld: Forcing close of thread 2875908 user: 'root'. 2024-07-15T03:59:18.418792Z 0 [Warning] [MY-010909] [Server] /usr/sbin/mysqld: Forcing close of thread 2875724 user: 'tm_xj'. 2024-07-15T03:59:18.418901Z 0 [Warning] [MY-010909] [Server] /usr/sbin/mysqld: Forcing close of thread 2875628 user: 'root'. 2024-07-15T03:59:18.418977Z 0 [Warning] [MY-010909] [Server] /usr/sbin/mysqld: Forcing close of thread 2875420 user: 'tm_xj'. 2024-07-15T03:59:18.419081Z 0 [Warning] [MY-010909] [Server] /usr/sbin/mysqld: Forcing close of thread 2875348 user: 'root'. 2024-07-15T03:59:18.419183Z 0 [Warning] [MY-010909] [Server] /usr/sbin/mysqld: Forcing close of thread 2872564 user: 'tm_xj'. 2024-07-15T03:59:18.419273Z 0 [Warning] [MY-010909] [Server] /usr/sbin/mysqld: Forcing close of thread 2875564 user: 'tm_xj'. 2024-07-15T03:59:18.420210Z 0 [Warning] [MY-010909] [Server] /usr/sbin/mysqld: Forcing close of thread 2875372 user: 'root'. 2024-07-15T03:59:18.420323Z 0 [Warning] [MY-010909] [Server] /usr/sbin/mysqld: Forcing close of thread 2875652 user: 'root'. 2024-07-15T03:59:18.420427Z 0 [Warning] [MY-010909] [Server] /usr/sbin/mysqld: Forcing close of thread 2875428 user: 'root'. 2024-07-15T03:59:18.420522Z 0 [Warning] [MY-010909] [Server] /usr/sbin/mysqld: Forcing close of thread 2875740 user: 'root'. 2024-07-15T03:59:18.420608Z 0 [Warning] [MY-010909] [Server] /usr/sbin/mysqld: Forcing close of thread 2875436 user: 'tm_xj'. 2024-07-15T03:59:18.420749Z 0 [Warning] [MY-010909] [Server] /usr/sbin/mysqld: Forcing close of thread 2875556 user: 'tm_xj'. 2024-07-15T03:59:18.420840Z 0 [Warning] [MY-010909] [Server] /usr/sbin/mysqld: Forcing close of thread 2875284 user: 'tm_xj'. 2024-07-15T03:59:18.420947Z 0 [Warning] [MY-010909] [Server] /usr/sbin/mysqld: Forcing close of thread 2875460 user: 'root'. 2024-07-15T03:59:18.421068Z 0 [Warning] [MY-010909] [Server] /usr/sbin/mysqld: Forcing close of thread 2875476 user: 'tm_xj'. 2024-07-15T03:59:18.421161Z 0 [Warning] [MY-010909] [Server] /usr/sbin/mysqld: Forcing close of thread 2875484 user: 'tm_xj'. 2024-07-15T03:59:18.421248Z 0 [Warning] [MY-010909] [Server] /usr/sbin/mysqld: Forcing close of thread 2875492 user: 'tm_xj'. 2024-07-15T03:59:18.421354Z 0 [Warning] [MY-010909] [Server] /usr/sbin/mysqld: Forcing close of thread 2875852 user: 'root'. 2024-07-15T03:59:18.421461Z 0 [Warning] [MY-010909] [Server] /usr/sbin/mysqld: Forcing close of thread 2875725 user: 'tm_xj'. 2024-07-15T03:59:18.421607Z 0 [Warning] [MY-010909] [Server] /usr/sbin/mysqld: Forcing close of thread 2875325 user: 'tm_xj'. 2024-07-15T03:59:18.421737Z 0 [Warning] [MY-010909] [Server] /usr/sbin/mysqld: Forcing close of thread 2875373 user: 'root'. 2024-07-15T03:59:18.421869Z 0 [Warning] [MY-010909] [Server] /usr/sbin/mysqld: Forcing close of thread 2875285 user: 'tm_xj'. 2024-07-15T03:59:18.421971Z 0 [Warning] [MY-010909] [Server] /usr/sbin/mysqld: Forcing close of thread 2875885 user: 'tm_xj'. 2024-07-15T03:59:18.422128Z 0 [Warning] [MY-010909] [Server] /usr/sbin/mysqld: Forcing close of thread 2875629 user: 'root'. 2024-07-15T03:59:18.422275Z 0 [Warning] [MY-010909] [Server] /usr/sbin/mysqld: Forcing close of thread 2875293 user: 'ingp_auth'. 2024-07-15T03:59:18.422411Z 0 [Warning] [MY-010909] [Server] /usr/sbin/mysqld: Forcing close of thread 2875909 user: 'root'. 2024-07-15T03:59:18.422554Z 0 [Warning] [MY-010909] [Server] /usr/sbin/mysqld: Forcing close of thread 2875477 user: 'tm_xj'. 2024-07-15T03:59:18.422668Z 0 [Warning] [MY-010909] [Server] /usr/sbin/mysqld: Forcing close of thread 2875557 user: 'tm_xj'. 2024-07-15T03:59:18.422772Z 0 [Warning] [MY-010909] [Server] /usr/sbin/mysqld: Forcing close of thread 2875421 user: 'tm_xj'. 2024-07-15T03:59:18.422857Z 0 [Warning] [MY-010909] [Server] /usr/sbin/mysqld: Forcing close of thread 2872565 user: 'tm_xj'. 2024-07-15T03:59:18.422942Z 0 [Warning] [MY-010909] [Server] /usr/sbin/mysqld: Forcing close of thread 2875981 user: 'tm_xj'. 2024-07-15T03:59:18.423032Z 0 [Warning] [MY-010909] [Server] /usr/sbin/mysqld: Forcing close of thread 2875741 user: 'root'. 2024-07-15T03:59:18.423167Z 0 [Warning] [MY-010909] [Server] /usr/sbin/mysqld: Forcing close of thread 2875733 user: 'tm_xj'. 2024-07-15T03:59:18.423277Z 0 [Warning] [MY-010909] [Server] /usr/sbin/mysqld: Forcing close of thread 2875349 user: 'root'. 2024-07-15T03:59:18.423371Z 0 [Warning] [MY-010909] [Server] /usr/sbin/mysqld: Forcing close of thread 2875925 user: 'root'. 2024-07-15T03:59:18.423463Z 0 [Warning] [MY-010909] [Server] /usr/sbin/mysqld: Forcing close of thread 2875461 user: 'tm_xj'. 2024-07-15T03:59:18.423567Z 0 [Warning] [MY-010909] [Server] /usr/sbin/mysqld: Forcing close of thread 2875493 user: 'tm_xj'. 2024-07-15T03:59:18.423695Z 0 [Warning] [MY-010909] [Server] /usr/sbin/mysqld: Forcing close of thread 2875653 user: 'root'. 2024-07-15T03:59:18.423801Z 0 [Warning] [MY-010909] [Server] /usr/sbin/mysqld: Forcing close of thread 2875790 user: 'tm_xj'. 2024-07-15T03:59:18.423921Z 0 [Warning] [MY-010909] [Server] /usr/sbin/mysqld: Forcing close of thread 2875854 user: 'root'. 2024-07-15T03:59:18.424036Z 0 [Warning] [MY-010909] [Server] /usr/sbin/mysqld: Forcing close of thread 2875526 user: 'tm_xj'. 2024-07-15T03:59:18.424166Z 0 [Warning] [MY-010909] [Server] /usr/sbin/mysqld: Forcing close of thread 2875734 user: 'root'. 2024-07-15T03:59:18.424257Z 0 [Warning] [MY-010909] [Server] /usr/sbin/mysqld: Forcing close of thread 2875654 user: 'root'. 2024-07-15T03:59:18.424339Z 0 [Warning] [MY-010909] [Server] /usr/sbin/mysqld: Forcing close of thread 2875350 user: 'root'. 2024-07-15T03:59:18.424426Z 0 [Warning] [MY-010909] [Server] /usr/sbin/mysqld: Forcing close of thread 2875438 user: 'tm_xj'. 2024-07-15T03:59:18.424736Z 0 [Warning] [MY-010909] [Server] /usr/sbin/mysqld: Forcing close of thread 2875662 user: 'root'. 2024-07-15T03:59:18.424827Z 0 [Warning] [MY-010909] [Server] /usr/sbin/mysqld: Forcing close of thread 2875918 user: 'tm_xj'. 2024-07-15T03:59:18.425109Z 0 [Warning] [MY-010909] [Server] /usr/sbin/mysqld: Forcing close of thread 2875286 user: 'tm_xj'. 2024-07-15T03:59:18.425408Z 0 [Warning] [MY-010909] [Server] /usr/sbin/mysqld: Forcing close of thread 2875558 user: 'tm_xj'. 2024-07-15T03:59:18.425490Z 0 [Warning] [MY-010909] [Server] /usr/sbin/mysqld: Forcing close of thread 2875566 user: 'tm_xj'. 2024-07-15T03:59:18.425587Z 0 [Warning] [MY-010909] [Server] /usr/sbin/mysqld: Forcing close of thread 2875742 user: 'root'. 2024-07-15T03:59:18.425779Z 0 [Warning] [MY-010909] [Server] /usr/sbin/mysqld: Forcing close of thread 2875726 user: 'tm_xj'. 2024-07-15T03:59:18.425889Z 0 [Warning] [MY-010909] [Server] /usr/sbin/mysqld: Forcing close of thread 2875910 user: 'root'. 2024-07-15T03:59:18.426107Z 0 [Warning] [MY-010909] [Server] /usr/sbin/mysqld: Forcing close of thread 2875462 user: 'root'. 2024-07-15T03:59:18.426193Z 0 [Warning] [MY-010909] [Server] /usr/sbin/mysqld: Forcing close of thread 2875926 user: 'root'. 2024-07-15T03:59:18.426260Z 0 [Warning] [MY-010909] [Server] /usr/sbin/mysqld: Forcing close of thread 2875414 user: 'tm_xj'. 2024-07-15T03:59:18.426359Z 0 [Warning] [MY-010909] [Server] /usr/sbin/mysqld: Forcing close of thread 2875494 user: 'tm_xj'. 2024-07-15T03:59:18.426451Z 0 [Warning] [MY-010909] [Server] /usr/sbin/mysqld: Forcing close of thread 2875735 user: 'tm_xj'. 2024-07-15T03:59:18.426805Z 0 [Warning] [MY-010909] [Server] /usr/sbin/mysqld: Forcing close of thread 2875911 user: 'root'. 2024-07-15T03:59:18.426919Z 0 [Warning] [MY-010909] [Server] /usr/sbin/mysqld: Forcing close of thread 2875719 user: 'tm_xj'. 2024-07-15T03:59:18.427008Z 0 [Warning] [MY-010909] [Server] /usr/sbin/mysqld: Forcing close of thread 2875415 user: 'tm_xj'. 2024-07-15T03:59:18.427700Z 0 [Warning] [MY-010909] [Server] /usr/sbin/mysqld: Forcing close of thread 2875655 user: 'root'. 2024-07-15T03:59:18.427805Z 0 [Warning] [MY-010909] [Server] /usr/sbin/mysqld: Forcing close of thread 2875535 user: 'tm_xj'. 2024-07-15T03:59:18.427934Z 0 [Warning] [MY-010909] [Server] /usr/sbin/mysqld: Forcing close of thread 2875351 user: 'root'. 2024-07-15T03:59:18.428104Z 0 [Warning] [MY-010909] [Server] /usr/sbin/mysqld: Forcing close of thread 2875631 user: 'root'. 2024-07-15T03:59:18.428205Z 0 [Warning] [MY-010909] [Server] /usr/sbin/mysqld: Forcing close of thread 2875439 user: 'tm_xj'. 2024-07-15T03:59:18.428303Z 0 [Warning] [MY-010909] [Server] /usr/sbin/mysqld: Forcing close of thread 2875367 user: 'root'. 2024-07-15T03:59:18.428431Z 0 [Warning] [MY-010909] [Server] /usr/sbin/mysqld: Forcing close of thread 2875455 user: 'tm_xj'. 2024-07-15T03:59:18.428571Z 0 [Warning] [MY-010909] [Server] /usr/sbin/mysqld: Forcing close of thread 2875743 user: 'root'. 2024-07-15T03:59:18.428693Z 0 [Warning] [MY-010909] [Server] /usr/sbin/mysqld: Forcing close of thread 2875855 user: 'root'. 2024-07-15T03:59:18.429126Z 0 [Warning] [MY-010909] [Server] /usr/sbin/mysqld: Forcing close of thread 2875375 user: 'root'. 2024-07-15T03:59:18.429254Z 0 [Warning] [MY-010909] [Server] /usr/sbin/mysqld: Forcing close of thread 2872567 user: 'tm_xj'. 2024-07-15T03:59:18.429359Z 0 [Warning] [MY-010909] [Server] /usr/sbin/mysqld: Forcing close of thread 2875919 user: 'tm_xj'. 2024-07-15T03:59:18.429483Z 0 [Warning] [MY-010909] [Server] /usr/sbin/mysqld: Forcing close of thread 2874343 user: 'tm_xj'. 2024-07-15T03:59:18.429589Z 0 [Warning] [MY-010909] [Server] /usr/sbin/mysqld: Forcing close of thread 2875927 user: 'root'. 2024-07-15T03:59:18.429679Z 0 [Warning] [MY-010909] [Server] /usr/sbin/mysqld: Forcing close of thread 2874335 user: 'tm_xj'. 2024-07-15T03:59:18.429781Z 0 [Warning] [MY-010909] [Server] /usr/sbin/mysqld: Forcing close of thread 2872559 user: 'tm_xj'. 2024-07-15T03:59:18.429873Z 0 [Warning] [MY-010909] [Server] /usr/sbin/mysqld: Forcing close of thread 2875463 user: 'root'. 2024-07-15T03:59:18.429977Z 0 [Warning] [MY-010909] [Server] /usr/sbin/mysqld: Forcing close of thread 2875479 user: 'tm_xj'. 2024-07-15T03:59:18.430072Z 0 [Warning] [MY-010909] [Server] /usr/sbin/mysqld: Forcing close of thread 2875495 user: 'tm_xj'. 2024-07-15T03:59:18.432939Z 0 [Note] [MY-000000] [Galera] dtor state: CLOSED 2024-07-15T03:59:18.433005Z 0 [Note] [MY-000000] [Galera] MemPool(TrxHandleSlave): hit ratio: 0.999859, misses: 1319, in use: 0, in pool: 1319 2024-07-15T03:59:18.437579Z 0 [Note] [MY-000000] [Galera] mon: entered 9338757 oooe fraction 0 oool fraction 0 2024-07-15T03:59:18.441999Z 0 [Note] [MY-000000] [Galera] mon: entered 9338757 oooe fraction 0.0111082 oool fraction 0.000346834 2024-07-15T03:59:18.446209Z 0 [Note] [MY-000000] [Galera] mon: entered 9565290 oooe fraction 0 oool fraction 2.09089e-07 2024-07-15T03:59:18.446249Z 0 [Note] [MY-000000] [Galera] cert index usage at exit 0 2024-07-15T03:59:18.446274Z 0 [Note] [MY-000000] [Galera] cert trx map usage at exit 0 2024-07-15T03:59:18.446285Z 0 [Note] [MY-000000] [Galera] deps set usage at exit 0 2024-07-15T03:59:18.446298Z 0 [Note] [MY-000000] [Galera] avg deps dist 1.13495 2024-07-15T03:59:18.446310Z 0 [Note] [MY-000000] [Galera] avg cert interval 77.2288 2024-07-15T03:59:18.446321Z 0 [Note] [MY-000000] [Galera] cert index size 3110460 2024-07-15T03:59:18.446399Z 0 [Note] [MY-000000] [Galera] Service thread queue flushed. 2024-07-15T03:59:18.494811Z 0 [Note] [MY-000000] [Galera] wsdb trx map usage 0 conn query map usage 0 2024-07-15T03:59:18.494926Z 0 [Note] [MY-000000] [Galera] MemPool(LocalTrxHandle): hit ratio: 0.999251, misses: 152, in use: 0, in pool: 152 2024-07-15T03:59:18.500371Z 0 [Note] [MY-000000] [Galera] Flushing memory map to disk... 2024-07-15T03:59:25.158921Z 0 [System] [MY-010910] [Server] /usr/sbin/mysqld: Shutdown complete (mysqld 8.0.36-28.1) Percona XtraDB Cluster (GPL), Release rel28, Revision bfb687f, WSREP version 26.1.4.3. 1.7 试图启动失败,看日志是集群状态问题导致的: 【77 (File descriptor in bad state). Most likely it is due to inability to communicate with the cluster primary component. Restart required.】 2024-07-15T04:01:50.357538Z 0 [ERROR] [MY-000000] [WSREP-SST] ******************* FATAL ERROR ********************** 2024-07-15T04:01:50.357629Z 0 [ERROR] [MY-000000] [WSREP-SST] Possible timeout in receving first data from donor in gtid/keyring stage. 2024-07-15T04:01:50.357642Z 0 [ERROR] [MY-000000] [WSREP-SST] After waiting for the 100 seconds 2024-07-15T04:01:50.357753Z 0 [ERROR] [MY-000000] [WSREP-SST] (defined by sst-initial-timeout/joiner-timeout variable), 2024-07-15T04:01:50.357787Z 0 [ERROR] [MY-000000] [WSREP-SST] the SST process has not started. 2024-07-15T04:01:50.357869Z 0 [ERROR] [MY-000000] [WSREP-SST] This error could be caused by broken network connectivity between 2024-07-15T04:01:50.357934Z 0 [ERROR] [MY-000000] [WSREP-SST] the donor and the joiner (this node). 2024-07-15T04:01:50.358004Z 0 [ERROR] [MY-000000] [WSREP-SST] Check the network connection and restart the joiner node. 2024-07-15T04:01:50.358085Z 0 [ERROR] [MY-000000] [WSREP-SST] Line 1395 2024-07-15T04:01:50.358155Z 0 [ERROR] [MY-000000] [WSREP-SST] ****************************************************** 2024-07-15T04:01:50.358379Z 0 [ERROR] [MY-000000] [WSREP-SST] Cleanup after exit with status:32 2024-07-15T04:01:50.384925Z 0 [ERROR] [MY-000000] [WSREP] Process completed with error: wsrep_sst_xtrabackup-v2 --role 'joiner' --address '[9999:9999:9999:9999::6d]' --datadir '/mysqldata/mysql/' --basedir '/usr/' --plugindir '/usr/lib64/mysql/plugin/' --defaults-file '/etc/my.cnf' --defaults-group-suffix '' --parent '2581362' --mysqld-version '8.0.36-28.1' '' : 32 (Broken pipe) 2024-07-15T04:01:50.385073Z 0 [ERROR] [MY-000000] [WSREP] Failed to read uuid:seqno from joiner script. 2024-07-15T04:01:50.385104Z 0 [ERROR] [MY-000000] [WSREP] SST script aborted with error 32 (Broken pipe) 2024-07-15T04:01:50.385295Z 3 [Note] [MY-000000] [Galera] Processing SST received 2024-07-15T04:01:50.385333Z 3 [Note] [MY-000000] [Galera] SST request was cancelled 2024-07-15T04:01:50.385355Z 3 [ERROR] [MY-000000] [Galera] State transfer request failed unrecoverably: 32 (Broken pipe). Most likely it is due to inability to communicate with the cluster primary component. Restart required. 2024-07-15T04:01:50.385367Z 3 [Note] [MY-000000] [Galera] ReplicatorSMM::abort() 2024-07-15T04:01:50.385388Z 3 [Note] [MY-000000] [Galera] Closing send monitor... 2024-07-15T04:01:50.385509Z 3 [Note] [MY-000000] [Galera] Closed send monitor. 2024-07-15T04:01:50.385514Z 1 [ERROR] [MY-000000] [Galera] Requesting state transfer failed: -77(File descriptor in bad state) 2024-07-15T04:01:50.385543Z 3 [Note] [MY-000000] [Galera] gcomm: terminating thread 2024-07-15T04:01:50.385560Z 1 [ERROR] [MY-000000] [Galera] State transfer request failed unrecoverably: 77 (File descriptor in bad state). Most likely it is due to inability to communicate with the cluster primary component. Restart required. 2024-07-15T04:01:50.385575Z 1 [Note] [MY-000000] [Galera] ReplicatorSMM::abort() 2024-07-15T04:01:50.385585Z 3 [Note] [MY-000000] [Galera] gcomm: joining thread 2024-07-15T04:01:50.385612Z 1 [Note] [MY-000000] [Galera] /usr/sbin/mysqld: Terminated. 2024-07-15T04:01:50.385639Z 1 [Note] [MY-000000] [WSREP] Initiating SST cancellation 2024-07-15T04:01:50.385656Z 1 [Note] [MY-000000] [WSREP] Terminating SST process 二、节点日志分析108节点 ■■ 以下日志显示,3:18分109节点出现问题(问题详见"109节点日志分析"),继而109节点被驱逐出去集群。 2024-07-15T03:18:34.562042Z 0 [Note] [MY-000000] [Galera] Member 1(pxc-cluster-node-2) initiates vote on 2a37cad4-f834-11ee-916e-da042b ca7bc3:9338777,892e7c82cf62c56b: Multi-statement transaction required more than 'max_binlog_cache_size' bytes of storage; increase thi s mysqld variable and try again, Error_code: 1197; Writing one row to the row-based binary log failed, Error_code: 1534; 2024-07-15T03:18:34.562391Z 0 [Note] [MY-000000] [Galera] Recomputed vote based on error codes: 1197, 1534. New vote d659be586fd355cc w ill be used for further steps. Old Vote: 892e7c82cf62c56b 2024-07-15T03:18:34.562448Z 0 [Note] [MY-000000] [Galera] Votes over 2a37cad4-f834-11ee-916e-da042bca7bc3:9338777: 0000000000000000: 1/3 d659be586fd355cc: 1/3 Waiting for more votes. 2024-07-15T03:18:34.562568Z 15 [Note] [MY-000000] [Galera] Got vote request for seqno 2a37cad4-f834-11ee-916e-da042bca7bc3:9338777 2024-07-15T03:18:34.564109Z 0 [Note] [MY-000000] [Galera] Member 0(pxc-cluster-node-3) responds to vote on 2a37cad4-f834-11ee-916e-da04 2bca7bc3:9338777,0000000000000000: Success 2024-07-15T03:18:34.564159Z 0 [Warning] [MY-000000] [Galera] Received bogus VOTE message: 9338777.0, from node 2a36a321-f834-11ee-bc44- bb66609ee19b, expected > 9338860. Ignoring. 2024-07-15T03:18:34.564188Z 0 [Note] [MY-000000] [Galera] Votes over 2a37cad4-f834-11ee-916e-da042bca7bc3:9338777: 0000000000000000: 1/3 d659be586fd355cc: 1/3 Waiting for more votes. 2024-07-15T03:18:34.564212Z 0 [Note] [MY-000000] [Galera] Member 2(pxc-cluster-node-1) responds to vote on 2a37cad4-f834-11ee-916e-da04 2bca7bc3:9338777,0000000000000000: Success 2024-07-15T03:18:34.564229Z 0 [Note] [MY-000000] [Galera] Votes over 2a37cad4-f834-11ee-916e-da042bca7bc3:9338777: 0000000000000000: 2/3 d659be586fd355cc: 1/3 Winner: 0000000000000000 Winner: 0000000000000000 2024-07-15T03:18:34.564279Z 15 [Note] [MY-000000] [Galera] Vote 0 (success) on 2a37cad4-f834-11ee-916e-da042bca7bc3:9338777 is consiste nt with group. Continue. 2024-07-15T03:18:34.571056Z 0 [Note] [MY-000000] [Galera] declaring 99999999-9910 at tcp://[9999:9999:9999:9999::6e]:4567 stable 2024-07-15T03:18:34.571138Z 0 [Note] [MY-000000] [Galera] forgetting 601c1fce-8a87 (tcp://[9999:9999:9999:9999::6d]:4567) 2024-07-15T03:18:34.572400Z 0 [Note] [MY-000000] [Galera] Node 99999999-9910 state primary 2024-07-15T03:18:34.578145Z 0 [Note] [MY-000000] [Galera] Current view of cluster as seen by this node view (view_id(PRIM,99999999-9910,10) memb { 99999999-9910,0 99999999-9908,0 } joined { } left { } partitioned { 601c1fce-8a87,0 } ) 2024-07-15T03:18:34.578202Z 0 [Note] [MY-000000] [Galera] Save the discovered primary-component to disk 2024-07-15T03:18:34.579951Z 0 [Note] [MY-000000] [Galera] forgetting 601c1fce-8a87 (tcp://[9999:9999:9999:9999::6d]:4567) 2024-07-15T03:18:34.579977Z 0 [Note] [MY-000000] [Galera] New COMPONENT: primary = yes, bootstrap = no, my_idx = 1, memb_num = 2 2024-07-15T03:18:34.580032Z 0 [Note] [MY-000000] [Galera] STATE EXCHANGE: Waiting for state UUID. 2024-07-15T03:18:34.581317Z 0 [Note] [MY-000000] [Galera] STATE EXCHANGE: sent state msg: eb67b0a0-4258-11ef-a082-2e388f64aaa3 2024-07-15T03:18:34.582510Z 0 [Note] [MY-000000] [Galera] STATE EXCHANGE: got state msg: eb67b0a0-4258-11ef-a082-2e388f64aaa3 from 0 (p xc-cluster-node-3) 2024-07-15T03:18:34.582561Z 0 [Note] [MY-000000] [Galera] STATE EXCHANGE: got state msg: eb67b0a0-4258-11ef-a082-2e388f64aaa3 from 1 (p xc-cluster-node-1) 2024-07-15T03:18:34.582581Z 0 [Note] [MY-000000] [Galera] Quorum results: version = 6, component = PRIMARY, conf_id = 9, members = 2/2 (primary/total), act_id = 9338911, last_appl. = 9338766, protocols = 2/11/4 (gcs/repl/appl), vote policy= 0, group UUID = 2a37cad4-f834-11ee-916e-da042bca7bc3 2024-07-15T03:18:34.582642Z 0 [Note] [MY-000000] [Galera] Flow-control interval: [141, 141] 2024-07-15T03:18:34.582782Z 13 [Note] [MY-000000] [Galera] ####### processing CC 9338912, local, ordered 2024-07-15T03:18:34.582833Z 13 [Note] [MY-000000] [Galera] Maybe drain monitors from 9338911 upto current CC event 9338912 upto:9338911 2024-07-15T03:18:34.582853Z 13 [Note] [MY-000000] [Galera] Drain monitors from 9338911 up to 9338911 2024-07-15T03:18:34.582875Z 13 [Note] [MY-000000] [Galera] ####### My UUID: 6663bf31-40da-11ef-b7ff-dfd7e45ce66f 2024-07-15T03:18:34.582890Z 13 [Note] [MY-000000] [Galera] Skipping cert index reset 2024-07-15T03:18:34.582925Z 13 [Note] [MY-000000] [Galera] REPL Protocols: 11 (6) 2024-07-15T03:18:34.582951Z 13 [Note] [MY-000000] [Galera] ####### Adjusting cert position: 9338911 -> 9338912 2024-07-15T03:18:34.583030Z 0 [Note] [MY-000000] [Galera] Service thread queue flushed. 2024-07-15T03:18:34.586904Z 13 [Note] [MY-000000] [Galera] ================================================ View: id: 2a37cad4-f834-11ee-916e-da042bca7bc3:9338912 status: primary protocol_version: 4 capabilities: MULTI-MASTER, CERTIFICATION, PARALLEL_APPLYING, REPLAY, ISOLATION, PAUSE, CAUSAL_READ, INCREMENTAL_WS, UNORDERED, PREOR DERED, STREAMING, NBO final: no own_index: 1 members(2): 0: 2a36a321-f834-11ee-bc44-bb66609ee19b, pxc-cluster-node-3 1: 6663bf31-40da-11ef-b7ff-dfd7e45ce66f, pxc-cluster-node-1 ================================================= 2024-07-15T03:18:34.587017Z 13 [Note] [MY-000000] [WSREP] wsrep_notify_cmd is not defined, skipping notification. 2024-07-15T03:18:34.595121Z 13 [Note] [MY-000000] [Galera] Recording CC from group: 9338912 2024-07-15T03:18:34.595181Z 13 [Note] [MY-000000] [Galera] Lowest cert index boundary for CC from group: 9338767 2024-07-15T03:18:34.595204Z 13 [Note] [MY-000000] [Galera] Min available from gcache for CC from group: 9337022 ■■ 但随后的日志显示,108节点处于正常服务状态。 ■■ 随后的日志显示,04:00有节点重启加入集群: 2024-07-15T04:00:07.309528Z 0 [Note] [MY-000000] [Galera] (99999999-9908, 'tcp://[9999:9999:9999:9999::6c]:4567') connection established to 99999999-9909 tcp://[9999:9999:9999:9999::6d]:4567 2024-07-15T04:00:07.573846Z 116353 [Warning] [MY-013360] [Server] Plugin mysql_native_password reported: ''mysql_native_password' is dep recated and will be removed in a future release. Please use caching_sha2_password instead' 2024-07-15T04:00:07.812276Z 0 [Note] [MY-000000] [Galera] declaring 99999999-9910 at tcp://[9999:9999:9999:9999::6e]:4567 stable 2024-07-15T04:00:07.812392Z 0 [Note] [MY-000000] [Galera] declaring 99999999-9909 at tcp://[9999:9999:9999:9999::6d]:4567 stable 2024-07-15T04:00:07.813733Z 0 [Note] [MY-000000] [Galera] Node 99999999-9910 state primary 2024-07-15T04:00:07.815269Z 0 [Note] [MY-000000] [Galera] Current view of cluster as seen by this node view (view_id(PRIM,99999999-9910,11) memb { 99999999-9910,0 99999999-9908,0 99999999-9909,0 } joined { } left { } partitioned { } ) 2024-07-15T04:00:07.815311Z 0 [Note] [MY-000000] [Galera] Save the discovered primary-component to disk 2024-07-15T04:00:07.817183Z 0 [Note] [MY-000000] [Galera] New COMPONENT: primary = yes, bootstrap = no, my_idx = 1, memb_num = 3 2024-07-15T04:00:07.817262Z 0 [Note] [MY-000000] [Galera] STATE EXCHANGE: Waiting for state UUID. 2024-07-15T04:00:07.821664Z 0 [Note] [MY-000000] [Galera] STATE EXCHANGE: sent state msg: b97e1f85-425e-11ef-af8a-cac3db9396f3 2024-07-15T04:00:07.822896Z 0 [Note] [MY-000000] [Galera] STATE EXCHANGE: got state msg: b97e1f85-425e-11ef-af8a-cac3db9396f3 from 0 (px c-cluster-node-3) 2024-07-15T04:00:07.822956Z 0 [Note] [MY-000000] [Galera] STATE EXCHANGE: got state msg: b97e1f85-425e-11ef-af8a-cac3db9396f3 from 1 (px c-cluster-node-1) 2024-07-15T04:00:08.310999Z 0 [Note] [MY-000000] [Galera] STATE EXCHANGE: got state msg: b97e1f85-425e-11ef-af8a-cac3db9396f3 from 2 (px c-cluster-node-2) 2024-07-15T04:00:08.311090Z 0 [Note] [MY-000000] [Galera] Quorum results: version = 6, component = PRIMARY, conf_id = 10, members = 2/3 (primary/total), act_id = 9339054, last_appl. = 9338766, protocols = 2/11/4 (gcs/repl/appl), vote policy= 0, group UUID = 2a37cad4-f834-11ee-916e-da042bca7bc3 2024-07-15T04:00:08.311280Z 0 [Note] [MY-000000] [Galera] Flow-control interval: [173, 173] 2024-07-15T04:00:08.311566Z 13 [Note] [MY-000000] [Galera] ####### processing CC 9339055, local, ordered 2024-07-15T04:00:08.311730Z 13 [Note] [MY-000000] [Galera] Maybe drain monitors from 9339054 upto current CC event 9339055 upto:9339054 2024-07-15T04:00:08.311758Z 13 [Note] [MY-000000] [Galera] Drain monitors from 9339054 up to 9339054 2024-07-15T04:00:08.311777Z 13 [Note] [MY-000000] [Galera] ####### My UUID: 6663bf31-40da-11ef-b7ff-dfd7e45ce66f 2024-07-15T04:00:08.311791Z 13 [Note] [MY-000000] [Galera] Skipping cert index reset 2024-07-15T04:00:08.311804Z 13 [Note] [MY-000000] [Galera] REPL Protocols: 11 (6) 2024-07-15T04:00:08.311820Z 13 [Note] [MY-000000] [Galera] ####### Adjusting cert position: 9339054 -> 9339055 2024-07-15T04:00:08.311910Z 0 [Note] [MY-000000] [Galera] Service thread queue flushed. 2024-07-15T04:00:08.315209Z 13 [Note] [MY-000000] [Galera] ================================================ View: id: 2a37cad4-f834-11ee-916e-da042bca7bc3:9339055 status: primary protocol_version: 4 capabilities: MULTI-MASTER, CERTIFICATION, PARALLEL_APPLYING, REPLAY, ISOLATION, PAUSE, CAUSAL_READ, INCREMENTAL_WS, UNORDERED, PREORD ERED, STREAMING, NBO final: no own_index: 1 members(3): 0: 2a36a321-f834-11ee-bc44-bb66609ee19b, pxc-cluster-node-3 1: 6663bf31-40da-11ef-b7ff-dfd7e45ce66f, pxc-cluster-node-1 2: b93005c2-425e-11ef-8b79-73a96136cc10, pxc-cluster-node-2 ================================================= ■■ 但紧接着是较多的锁表等待,根据以往测试情况,每次冲突也是总有线程获胜,而同时必然有线程受害: TRANSACTION 114209457, ACTIVE 0 sec starting index read mysql tables in use 1, locked 1 LOCK WAIT , undo log entries 3 MySQL thread id 16, OS thread handle 139734347618048, query id 6745542 Applying batch of row changes (update) TRANSACTION 114208702, ACTIVE 1338 sec , undo log entries 6 MySQL thread id 114856, OS thread handle 139691406636800, query id 6676148 2409:807c:5a06:1::104:1705 tm_xj wsrep: replicating and certi fying write set commit 2024-07-15T04:00:08.333081Z 16 [Note] [MY-000000] [WSREP] --------- CONFLICT DETECTED -------- 2024-07-15T04:00:08.333098Z 16 [Note] [MY-000000] [WSREP] cluster conflict due to high priority abort for threads: 2024-07-15T04:00:08.333111Z 16 [Note] [MY-000000] [WSREP] Winning thread: THD: 16, mode: high priority, state: exec, conflict: executing, seqno: 9339058 SQL: (null) 2024-07-15T04:00:08.333122Z 16 [Note] [MY-000000] [WSREP] Victim thread: THD: 114856, mode: local, state: exec, conflict: certifying, seqno: -1 SQL: commit TRANSACTION 114209458, ACTIVE 0 sec starting index read mysql tables in use 1, locked 1 LOCK WAIT , undo log entries 3 MySQL thread id 12, OS thread handle 139734349731584, query id 6745554 Applying batch of row changes (update) TRANSACTION 114208878, ACTIVE 994 sec , undo log entries 10 MySQL thread id 115190, OS thread handle 139671641110272, query id 6694341 2409:807c:5a06:1::104:1702 tm_xj wsrep: replicating and certi fying write set commit 2024-07-15T04:00:08.334316Z 12 [Note] [MY-000000] [WSREP] --------- CONFLICT DETECTED -------- 2024-07-15T04:00:08.334343Z 12 [Note] [MY-000000] [WSREP] cluster conflict due to high priority abort for threads: 2024-07-15T04:00:08.334363Z 12 [Note] [MY-000000] [WSREP] Winning thread: THD: 12, mode: high priority, state: exec, conflict: executing, seqno: 9339060 SQL: (null) 2024-07-15T04:00:08.334379Z 12 [Note] [MY-000000] [WSREP] Victim thread: THD: 115190, mode: local, state: exec, conflict: certifying, seqno: -1 SQL: commit 连续发生了188次冲突 [root@node1:0 /u01/mysqld.log]# grep "CONFLICT DETECTED" mysqld.log.108|grep "2024-07-15T04:00"|wc -l 188 最后一次冲突如下: 2024-07-15T04:00:08.729167Z 15 [Note] [MY-000000] [WSREP] --------- CONFLICT DETECTED -------- 2024-07-15T04:00:08.729202Z 15 [Note] [MY-000000] [WSREP] cluster conflict due to high priority abort for threads: 2024-07-15T04:00:08.729221Z 15 [Note] [MY-000000] [WSREP] Winning thread: THD: 15, mode: high priority, state: exec, conflict: executing, seqno: 9339273 SQL: (null) 2024-07-15T04:00:08.729236Z 15 [Note] [MY-000000] [WSREP] Victim thread: THD: 115577, mode: local, state: exec, conflict: must_replay, seqno: 9339387 SQL: commit ■■ 如下日志显示,在这些冲突发生110秒后,108节点试图连接109节点: 2024-07-15T04:01:50.417379Z 0 [Note] [MY-000000] [Galera] (99999999-9908, 'tcp://[9999:9999:9999:9999::6c]:4567') turning message relay requesting on, nonlive peers: tcp://[9999:9999:9999:9999::6d]:4567 2024-07-15T04:01:51.894449Z 0 [Note] [MY-000000] [Galera] (99999999-9908, 'tcp://[9999:9999:9999:9999::6c]:4567') reconnecting to b93005 c2-8b79 (tcp://[9999:9999:9999:9999::6d]:4567), attempt 0 2024-07-15T04:01:52.574837Z 116695 [Warning] [MY-013360] [Server] Plugin mysql_native_password reported: ''mysql_native_password' is dep recated and will be removed in a future release. Please use caching_sha2_password instead' 2024-07-15T04:01:53.277070Z 116696 [Warning] [MY-013360] [Server] Plugin mysql_native_password reported: ''mysql_native_password' is dep recated and will be removed in a future release. Please use caching_sha2_password instead' 2024-07-15T04:01:54.435874Z 116697 [Warning] [MY-013360] [Server] Plugin mysql_native_password reported: ''mysql_native_password' is dep recated and will be removed in a future release. Please use caching_sha2_password instead' +++++++++++ 以下日志显示,3秒钟没能连接成功,怀疑超时,断定109节点非活动,该节点被驱逐 2024-07-15T04:01:54.847941Z 0 [Note] [MY-000000] [Galera] declaring node with index 2 suspected, timeout PT5S (evs.suspect_timeout) 2024-07-15T04:01:54.848077Z 0 [Note] [MY-000000] [Galera] evs::proto(99999999-9908, OPERATIONAL, view_id(REG,99999999-9910,11)) suspecti ng node: 99999999-9909 2024-07-15T04:01:54.848102Z 0 [Note] [MY-000000] [Galera] evs::proto(99999999-9908, OPERATIONAL, view_id(REG,99999999-9910,11)) suspecte d node without join message, declaring inactive 2024-07-15T04:01:55.348302Z 0 [Note] [MY-000000] [Galera] declaring node with index 2 inactive (evs.inactive_timeout) 2024-07-15T04:01:55.851247Z 0 [Note] [MY-000000] [Galera] declaring 99999999-9910 at tcp://[9999:9999:9999:9999::6e]:4567 stable 2024-07-15T04:01:55.852703Z 0 [Note] [MY-000000] [Galera] Node 99999999-9910 state primary 2024-07-15T04:01:55.858696Z 0 [Note] [MY-000000] [Galera] Current view of cluster as seen by this node view (view_id(PRIM,99999999-9910,12) memb { 99999999-9910,0 99999999-9908,0 } joined { } left { } partitioned { 99999999-9909,0 } ) 2024-07-15T04:01:55.858739Z 0 [Note] [MY-000000] [Galera] Save the discovered primary-component to disk 2024-07-15T04:01:55.860637Z 0 [Note] [MY-000000] [Galera] forgetting 99999999-9909 (tcp://[9999:9999:9999:9999::6d]:4567) 2024-07-15T04:01:55.860717Z 0 [Note] [MY-000000] [Galera] New COMPONENT: primary = yes, bootstrap = no, my_idx = 1, memb_num = 2 2024-07-15T04:01:55.860857Z 0 [Note] [MY-000000] [Galera] STATE EXCHANGE: Waiting for state UUID. 2024-07-15T04:01:55.860748Z 0 [Note] [MY-000000] [Galera] (99999999-9908, 'tcp://[9999:9999:9999:9999::6c]:4567') turning message relay requesting off ■■ 之后交换状态信息,获取到仲裁结果,确认了2个活动节点:108、110,并清理了109节点 2024-07-15T04:01:55.862455Z 0 [Note] [MY-000000] [Galera] STATE EXCHANGE: sent state msg: f9e37a10-425e-11ef-86d1-67533f4bae37 2024-07-15T04:01:55.863749Z 0 [Note] [MY-000000] [Galera] STATE EXCHANGE: got state msg: f9e37a10-425e-11ef-86d1-67533f4bae37 from 0 (px c-cluster-node-3) 2024-07-15T04:01:55.863809Z 0 [Note] [MY-000000] [Galera] STATE EXCHANGE: got state msg: f9e37a10-425e-11ef-86d1-67533f4bae37 from 1 (px c-cluster-node-1) 2024-07-15T04:01:55.863837Z 0 [Note] [MY-000000] [Galera] Quorum results: version = 6, component = PRIMARY, conf_id = 11, members = 2/2 (primary/total), act_id = 9339389, last_appl. = 9338766, protocols = 2/11/4 (gcs/repl/appl), vote policy= 0, group UUID = 2a37cad4-f834-11ee-916e-da042bca7bc3 2024-07-15T04:01:55.864077Z 0 [Note] [MY-000000] [Galera] Flow-control interval: [141, 141] 2024-07-15T04:01:55.864321Z 14 [Note] [MY-000000] [Galera] ####### processing CC 9339390, local, ordered 2024-07-15T04:01:55.864384Z 14 [Note] [MY-000000] [Galera] Maybe drain monitors from 9339389 upto current CC event 9339390 upto:9339389 2024-07-15T04:01:55.864411Z 14 [Note] [MY-000000] [Galera] Drain monitors from 9339389 up to 9339389 2024-07-15T04:01:55.864436Z 14 [Note] [MY-000000] [Galera] ####### My UUID: 6663bf31-40da-11ef-b7ff-dfd7e45ce66f 2024-07-15T04:01:55.864456Z 14 [Note] [MY-000000] [Galera] Skipping cert index reset 2024-07-15T04:01:55.864475Z 14 [Note] [MY-000000] [Galera] REPL Protocols: 11 (6) 2024-07-15T04:01:55.864521Z 14 [Note] [MY-000000] [Galera] ####### Adjusting cert position: 9339389 -> 9339390 2024-07-15T04:01:55.864632Z 0 [Note] [MY-000000] [Galera] Service thread queue flushed. 2024-07-15T04:01:55.867920Z 14 [Note] [MY-000000] [Galera] ================================================ View: id: 2a37cad4-f834-11ee-916e-da042bca7bc3:9339390 status: primary protocol_version: 4 capabilities: MULTI-MASTER, CERTIFICATION, PARALLEL_APPLYING, REPLAY, ISOLATION, PAUSE, CAUSAL_READ, INCREMENTAL_WS, UNORDERED, PREORD ERED, STREAMING, NBO final: no own_index: 1 members(2): 0: 2a36a321-f834-11ee-bc44-bb66609ee19b, pxc-cluster-node-3 1: 6663bf31-40da-11ef-b7ff-dfd7e45ce66f, pxc-cluster-node-1 ================================================= 2024-07-15T04:01:55.868021Z 14 [Note] [MY-000000] [WSREP] wsrep_notify_cmd is not defined, skipping notification. 2024-07-15T04:01:55.874762Z 14 [Note] [MY-000000] [Galera] Recording CC from group: 9339390 2024-07-15T04:01:55.874837Z 14 [Note] [MY-000000] [Galera] Lowest cert index boundary for CC from group: 9338767 2024-07-15T04:01:55.874866Z 14 [Note] [MY-000000] [Galera] Min available from gcache for CC from group: 9337022 2024-07-15T04:01:58.396276Z 0 [Note] [MY-000000] [Galera] cleaning up 99999999-9909 (tcp://[9999:9999:9999:9999::6d]:4567) ■■ 之后的日志显示,109节点再次加入未成,继续被驱逐出集群。 三、节点日志分析110节点 ■■ 前面的日志同108节点 ■■ 4:03的日志显示,109节点再次加入未成,继续被驱逐出集群。 四、总结 根据现场反馈的信息,109节点于11:18分被驱逐后,系统缓慢,业务阻断,此时数据库任务积压,SQL运行慢,大量的锁表,数据库连接数1100左右,大量的wrsp同步线程,杀掉变成killed状态后怀疑资源并未释放。此时即使停掉2个节点,仅保留1个节点,仍然无法提交事务,进而继续大量锁表,此时只能重启数据库解决问题。 总结以上分析过程,以下按照时间顺序客观描述一些症状、现象,以利于找到问题的根本原因。 1、11:16 节点109 第二次发生大事务,全局缓存页达到了 570MB,报错,导致节点109 11:18 停止服务,被驱逐出集群。 2、11:23 一线人员反馈登录时报服务异常,然后自动跳转回登陆页面,导致登录不上系统。项目经理收到问题反馈后开始排查系统问题。登录ELK系统,查看系统监控发现家客异步导出微服务延迟异常。进一步查看详情,发现数据库出现积压,SQL执行变慢。 3、11:47 其他系统侧人员电话联系反馈该系统接口调用失败,导致相关工单无法办理。 4、12时 系统恢复正常登陆后,其他侧反馈接口正常,但是仍有小部分接口出现异常情况,需多次提交,才能成功。 5、12:03 运维人员后台监控发现出入库异常,"Error updating database. Caus" 运维人员根据异常信息定位到有锁表进程并对锁表进程进行释放 6、12:35左右 出入库接口依然存在无法正常回单问题,经核查发现数据库进程状态虽然变成“kill”,但是并没有释放。然后立即将现场最新情况反馈给公司专家,寻求远程协助。 7、13:30 停掉了两个备库节点,只保留主数据库节点。然后滚动重启相关服务及接口。重启完后,验证系统登陆正常,观察相关接口也正常。 8、14:00 支撑群里面一线装维又反馈回单出现报错如下: Could not commit JDBC transaction; nested exception is com.mysql.cj.jdbc.exceptions.MySQLTra Deadlock found when trying to getlock; try restarting transaction 经与公司专家紧急开会讨论,初步估计是PXC集群异常,虽然已经下线了两个备库节点,但是主节点还在做强一致性同步,导致事务无法提交,从而出现大量锁表的情况。 9、16:15 电话联系系统管理员XXX,申请重启数据库。于16:25完成数据库重启,然后滚动重启相关服务及接口。重启完后,验证系统登陆正常,系统接口也正常。 10、16:44 在支撑群中通知一线人员系统恢复正常,重新进行回单。 一线分析参考: 本次故障是一线异步导出全网数据时,后台会同时更新XX平台割接过来的XXX关系状态,导出数据量过大时,出现了大事务提交,导致备库节点异常,从而整个PXC集群各节点数据同步卡死,最终导致事务提交不了,出现锁表的情况。