MHA 故障切换演练 --- masterha_master_switch

前言

昨天公司要进行线上MHA高可用集群故障切换演练。由于我刚入职公司。公司还没有做过太多的灾难级别的故障演练。

我写了一些步骤:
故障切换
  1. 查看主从延迟状态
  2. 登录MHA中控机,查看演练切换的数据库MHA log文件,使用tail -f 命令来获取切换的log日志信息
  3. 登录master机器,执行 /etc/init.d/mysql stop
  4. 查看MHA中控机日志信息。
  5. 切换成功,登录到 new_master 中查看vip是否已经绑定到网卡上
  6. 登录slave,查看 show slave status 是否已经change master to new_master ip上,查看slave正常与否
  7. 在 MHA 中控机启动 masterha_manager 进程,查看MHA集群启动是否正常。(两节点)
    结构图:
    new_master: (master)
    —> slave: (slave)
master 数据库恢复
  1. 登录到old_master 机器上执行 /etc/init.d/mysql start 是否正常启动服务
  2. 在MHA中控机中获取切换的log 日志信息找出 change master to new_master 信息语句
  3. 在old_master 中执行change master to 语句,变为 new_master的slave
  4. show slave status 查看 old_master 是否正常变为 new_master 的slave
  5. 在MHA 中控机节点,执行 masterha_stop 命令结束 masterha_manager进程。
  6. 在MHA 中控机启动 masterha_manager 进程,查看恢复好的old_master是否正常加入到MHA节点 (三节点)
    结构图:
    new_master: (master)
    —> old_master: (slave)
    —> slave: (slave)
恢复 old_master为 master 角色
  1. 查看主从延迟状态
  2. 登录MHA中控机,查看演练切换的数据库MHA log文件,使用tail -f 命令来获取切换的log日志信息
  3. 登录new_master机器,执行 /etc/init.d/mysql stop
  4. 查看MHA中控机日志信息。
  5. 切换成功,登录到 old_master 中查看vip是否已经绑定到网卡上
  6. 登录slave,查看 show slave status 是否已经change master to old_master ip上,查看slave正常与否
  7. 在 MHA 中控机启动 masterha_manager 进程,查看MHA集群启动是否正常。(两节点)
    结构图:
    master(old_master): (master)
    —> slave(new_master): (slave)
    —> slave: (slave)
但是这么操作虽然能更贴近真实灾难(暂时没有把主从数据延迟考虑进入),需要的时间会更多。领导建议手动操作MHA failover测试能不能切换。

服务器架构

server role
master old_master
slave new_master
slave slave

具体操作

在MHA 中控机中执行命令

1
# masterha_check_repl --conf=/root/dba/mha/conf/babybi_#1.conf

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
Wed Mar 27 23:28:28 2019 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.
Wed Mar 27 23:28:28 2019 - [info] Reading application default configuration from /root/dba/mha/conf/babybi_#1.conf..
Wed Mar 27 23:28:28 2019 - [info] Reading server configuration from /root/dba/mha/conf/babybi_#1.conf..
Wed Mar 27 23:28:28 2019 - [info] MHA::MasterMonitor version 0.56.
Wed Mar 27 23:28:28 2019 - [info] GTID failover mode = 0
Wed Mar 27 23:28:28 2019 - [info] Dead Servers:
Wed Mar 27 23:28:28 2019 - [info] Alive Servers:
Wed Mar 27 23:28:28 2019 - [info] 10.25.1.66(10.25.1.66:3306)
Wed Mar 27 23:28:28 2019 - [info] 10.25.1.67(10.25.1.67:3306)
Wed Mar 27 23:28:28 2019 - [info] 10.25.1.68(10.25.1.68:3306)
Wed Mar 27 23:28:28 2019 - [info] Alive Slaves:
Wed Mar 27 23:28:28 2019 - [info] 10.25.1.67(10.25.1.67:3306) Version=5.6.29-76.2-log (oldest major version between slaves) log-bin:enabled
Wed Mar 27 23:28:28 2019 - [info] Replicating from 10.25.1.66(10.25.1.66:3306)
Wed Mar 27 23:28:28 2019 - [info] Primary candidate for the new Master (candidate_master is set)
Wed Mar 27 23:28:28 2019 - [info] 10.25.1.68(10.25.1.68:3306) Version=5.6.29-76.2-log (oldest major version between slaves) log-bin:enabled
Wed Mar 27 23:28:28 2019 - [info] Replicating from 10.25.1.66(10.25.1.66:3306)
Wed Mar 27 23:28:28 2019 - [info] Current Alive Master: 10.25.1.66(10.25.1.66:3306)
Wed Mar 27 23:28:28 2019 - [info] Checking slave configurations..
Wed Mar 27 23:28:28 2019 - [warning] relay_log_purge=0 is not set on slave 10.25.1.67(10.25.1.67:3306).
Wed Mar 27 23:28:28 2019 - [warning] relay_log_purge=0 is not set on slave 10.25.1.68(10.25.1.68:3306).
Wed Mar 27 23:28:28 2019 - [info] Checking replication filtering settings..
Wed Mar 27 23:28:28 2019 - [info] binlog_do_db= , binlog_ignore_db=
Wed Mar 27 23:28:28 2019 - [info] Replication filtering check ok.
Wed Mar 27 23:28:28 2019 - [info] GTID (with auto-pos) is not supported
Wed Mar 27 23:28:28 2019 - [info] Starting SSH connection tests..
Wed Mar 27 23:28:29 2019 - [info] All SSH connection tests passed successfully.
Wed Mar 27 23:28:29 2019 - [info] Checking MHA Node version..
Wed Mar 27 23:28:30 2019 - [info] Version check ok.
Wed Mar 27 23:28:30 2019 - [info] Checking SSH publickey authentication settings on the current master..
Wed Mar 27 23:28:30 2019 - [info] HealthCheck: SSH to 10.25.1.66 is reachable.
Wed Mar 27 23:28:30 2019 - [info] Master MHA Node version is 0.56.
Wed Mar 27 23:28:30 2019 - [info] Checking recovery script configurations on 10.25.1.66(10.25.1.66:3306)..
Wed Mar 27 23:28:30 2019 - [info] Executing command: save_binary_logs --command=test --start_pos=4 --binlog_dir=/log/mysql/binlog/ --output_file=/var/tmp/save_binary_logs_test --manager_version=0.56 --start_file=mysql-bin.004691
Wed Mar 27 23:28:30 2019 - [info] Connecting to root@10.25.1.66(10.25.1.66:22)..
Creating /var/tmp if not exists.. ok.
Checking output directory is accessible or not..
ok.
Binlog found at /log/mysql/binlog/, up to mysql-bin.004691
Wed Mar 27 23:28:31 2019 - [info] Binlog setting check done.
Wed Mar 27 23:28:31 2019 - [info] Checking SSH publickey authentication and checking recovery script configurations on all alive slave servers..
Wed Mar 27 23:28:31 2019 - [info] Executing command : apply_diff_relay_logs --command=test --slave_user='mha' --slave_host=10.25.1.67 --slave_ip=10.25.1.67 --slave_port=3306 --workdir=/var/tmp --target_version=5.6.29-76.2-log --manager_version=0.56 --relay_log_info=/my/data/percona/relay-log.info --relay_dir=/my/data/percona/ --slave_pass=xxx
Wed Mar 27 23:28:31 2019 - [info] Connecting to root@10.25.1.67(10.25.1.67:22)..
Checking slave recovery environment settings..
Opening /my/data/percona/relay-log.info ... ok.
Relay log found at /log/mysql/relaylog, up to relay-log.014072
Temporary relay log file is /log/mysql/relaylog/relay-log.014072
Testing mysql connection and privileges..Warning: Using a password on the command line interface can be insecure.
done.
Testing mysqlbinlog output.. done.
Cleaning up test file(s).. done.
Wed Mar 27 23:28:31 2019 - [info] Executing command : apply_diff_relay_logs --command=test --slave_user='mha' --slave_host=10.25.1.68 --slave_ip=10.25.1.68 --slave_port=3306 --workdir=/var/tmp --target_version=5.6.29-76.2-log --manager_version=0.56 --relay_log_info=/my/data/percona/relay-log.info --relay_dir=/my/data/percona/ --slave_pass=xxx
Wed Mar 27 23:28:31 2019 - [info] Connecting to root@10.25.1.68(10.25.1.68:22)..
Checking slave recovery environment settings..
Opening /my/data/percona/relay-log.info ... ok.
Relay log found at /log/mysql/relaylog, up to relay-log.014072
Temporary relay log file is /log/mysql/relaylog/relay-log.014072
Testing mysql connection and privileges..Warning: Using a password on the command line interface can be insecure.
done.
Testing mysqlbinlog output.. done.
Cleaning up test file(s).. done.
Wed Mar 27 23:28:31 2019 - [info] Slaves settings check done.
Wed Mar 27 23:28:31 2019 - [info]
10.25.1.66(10.25.1.66:3306) (current master)
+--10.25.1.67(10.25.1.67:3306)
+--10.25.1.68(10.25.1.68:3306)

Wed Mar 27 23:28:31 2019 - [info] Checking replication health on 10.25.1.67..
Wed Mar 27 23:28:31 2019 - [info] ok.
Wed Mar 27 23:28:31 2019 - [info] Checking replication health on 10.25.1.68..
Wed Mar 27 23:28:31 2019 - [info] ok.
Wed Mar 27 23:28:31 2019 - [info] Checking master_ip_failover_script status:
Wed Mar 27 23:28:31 2019 - [info] /root/dba/mha/script/master_ip_failover.sh --command=status --ssh_user=root --orig_master_host=10.25.1.66 --orig_master_ip=10.25.1.66 --orig_master_port=3306


IN SCRIPT TEST====/etc/init.d/keepalived stop==/etc/init.d/keepalived start===

Checking the Status of the script.. OK
Wed Mar 27 23:28:31 2019 - [info] OK.
Wed Mar 27 23:28:31 2019 - [warning] shutdown_script is not defined.
Wed Mar 27 23:28:31 2019 - [info] Got exit code 0 (Not master dead).

MySQL Replication Health is OK.

执行 masterha_master_switch 命令

1
# masterha_master_switch --conf=/root/dba/mha/conf/babybi_#1.conf --master_state=alive --new_master_host=10.25.1.67 --new_master_port=3306 --orig_master_is_new_slave --running_updates_limit=10000

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
Wed Mar 27 23:34:27 2019 - [info] MHA::MasterRotate version 0.56.
Wed Mar 27 23:34:27 2019 - [info] Starting online master switch..
Wed Mar 27 23:34:27 2019 - [info]
Wed Mar 27 23:34:27 2019 - [info] * Phase 1: Configuration Check Phase..
Wed Mar 27 23:34:27 2019 - [info]
Wed Mar 27 23:34:27 2019 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.
Wed Mar 27 23:34:27 2019 - [info] Reading application default configuration from /root/dba/mha/conf/babybi_#1.conf..
Wed Mar 27 23:34:27 2019 - [info] Reading server configuration from /root/dba/mha/conf/babybi_#1.conf..
Wed Mar 27 23:34:27 2019 - [info] GTID failover mode = 0
Wed Mar 27 23:34:27 2019 - [info] Current Alive Master: 10.25.1.66(10.25.1.66:3306)
Wed Mar 27 23:34:27 2019 - [info] Alive Slaves:
Wed Mar 27 23:34:27 2019 - [info] 10.25.1.67(10.25.1.67:3306) Version=5.6.29-76.2-log (oldest major version between slaves) log-bin:enabled
Wed Mar 27 23:34:27 2019 - [info] Replicating from 10.25.1.66(10.25.1.66:3306)
Wed Mar 27 23:34:27 2019 - [info] Primary candidate for the new Master (candidate_master is set)
Wed Mar 27 23:34:27 2019 - [info] 10.25.1.68(10.25.1.68:3306) Version=5.6.29-76.2-log (oldest major version between slaves) log-bin:enabled
Wed Mar 27 23:34:27 2019 - [info] Replicating from 10.25.1.66(10.25.1.66:3306)

It is better to execute FLUSH NO_WRITE_TO_BINLOG TABLES on the master before switching. Is it ok to execute on 10.25.1.66(10.25.1.66:3306)? (YES/no): ---> 输入: yes
Wed Mar 27 23:34:34 2019 - [info] Executing FLUSH NO_WRITE_TO_BINLOG TABLES. This may take long time..
Wed Mar 27 23:34:34 2019 - [info] ok.
Wed Mar 27 23:34:34 2019 - [info] Checking MHA is not monitoring or doing failover..
Wed Mar 27 23:34:34 2019 - [info] Checking replication health on 10.25.1.67..
Wed Mar 27 23:34:34 2019 - [info] ok.
Wed Mar 27 23:34:34 2019 - [info] Checking replication health on 10.25.1.68..
Wed Mar 27 23:34:34 2019 - [info] ok.
Wed Mar 27 23:34:34 2019 - [info] 10.25.1.67 can be new master.
Wed Mar 27 23:34:34 2019 - [info]
From:
10.25.1.66(10.25.1.66:3306) (current master)
+--10.25.1.67(10.25.1.67:3306)
+--10.25.1.68(10.25.1.68:3306)

To:
10.25.1.67(10.25.1.67:3306) (new master)
+--10.25.1.68(10.25.1.68:3306)
+--10.25.1.66(10.25.1.66:3306)

Starting master switch from 10.25.1.66(10.25.1.66:3306) to 10.25.1.67(10.25.1.67:3306)? (yes/NO): ---> 输入: yes
Wed Mar 27 23:34:37 2019 - [info] Checking whether 10.25.1.67(10.25.1.67:3306) is ok for the new master..
Wed Mar 27 23:34:37 2019 - [info] ok.
Wed Mar 27 23:34:37 2019 - [info] 10.25.1.66(10.25.1.66:3306): SHOW SLAVE STATUS returned empty result. To check replication filtering rules, temporarily executing CHANGE MASTER to a dummy host.
Wed Mar 27 23:34:37 2019 - [info] 10.25.1.66(10.25.1.66:3306): Resetting slave pointing to the dummy host.
Wed Mar 27 23:34:37 2019 - [info] ** Phase 1: Configuration Check Phase completed.
Wed Mar 27 23:34:37 2019 - [info]
Wed Mar 27 23:34:37 2019 - [info] * Phase 2: Rejecting updates Phase..
Wed Mar 27 23:34:37 2019 - [info]
Wed Mar 27 23:34:37 2019 - [info] Executing master ip online change script to disable write on the current master:
Wed Mar 27 23:34:37 2019 - [info] /root/dba/mha/script/master_ip_online_change.sh --command=stop --orig_master_host=10.25.1.66 --orig_master_ip=10.25.1.66 --orig_master_port=3306 --orig_master_user='mha' --orig_master_password='mhapassword' --new_master_host=10.25.1.67 --new_master_ip=10.25.1.67 --new_master_port=3306 --new_master_user='mha' --new_master_password='mhapassword' --orig_master_ssh_user=root --new_master_ssh_user=root --orig_master_is_new_slave
2019-03-27 23:34:38 set_mysql_read_only successful!
Stopping keepalived: [ OK ]
2019-03-27 23:34:38 stop_keepalived successful!
Wed Mar 27 23:34:38 2019 - [info] ok.
Wed Mar 27 23:34:38 2019 - [info] Locking all tables on the orig master to reject updates from everybody (including root):
Wed Mar 27 23:34:38 2019 - [info] Executing FLUSH TABLES WITH READ LOCK..
Wed Mar 27 23:34:38 2019 - [info] ok.
Wed Mar 27 23:34:38 2019 - [info] Orig master binlog:pos is mysql-bin.004691:854373586.
Wed Mar 27 23:34:38 2019 - [info] Waiting to execute all relay logs on 10.25.1.67(10.25.1.67:3306)..
Wed Mar 27 23:34:38 2019 - [info] master_pos_wait(mysql-bin.004691:854373586) completed on 10.25.1.67(10.25.1.67:3306). Executed 0 events.
Wed Mar 27 23:34:38 2019 - [info] done.
Wed Mar 27 23:34:38 2019 - [info] Getting new master's binlog name and position..
Wed Mar 27 23:34:38 2019 - [info] mysql-bin.000001:120
Wed Mar 27 23:34:38 2019 - [info] All other slaves should start replication from here. Statement should be: CHANGE MASTER TO MASTER_HOST='10.25.1.67', MASTER_PORT=3306, MASTER_LOG_FILE='mysql-bin.000001', MASTER_LOG_POS=120, MASTER_USER='repl', MASTER_PASSWORD='xxx';
Wed Mar 27 23:34:38 2019 - [info] Executing master ip online change script to allow write on the new master:
Wed Mar 27 23:34:38 2019 - [info] /root/dba/mha/script/master_ip_online_change.sh --command=start --orig_master_host=10.25.1.66 --orig_master_ip=10.25.1.66 --orig_master_port=3306 --orig_master_user='mha' --orig_master_password='mhapassword' --new_master_host=10.25.1.67 --new_master_ip=10.25.1.67 --new_master_port=3306 --new_master_user='mha' --new_master_password='mhapassword' --orig_master_ssh_user=root --new_master_ssh_user=root --orig_master_is_new_slave
Starting keepalived: [ OK ]
2019-03-27 23:34:38 start_keepalived successful!
Wed Mar 27 23:34:38 2019 - [info] ok.
Wed Mar 27 23:34:38 2019 - [info] Setting read_only=0 on 10.25.1.67(10.25.1.67:3306)..
Wed Mar 27 23:34:38 2019 - [info] ok.
Wed Mar 27 23:34:38 2019 - [info]
Wed Mar 27 23:34:38 2019 - [info] * Switching slaves in parallel..
Wed Mar 27 23:34:38 2019 - [info]
Wed Mar 27 23:34:38 2019 - [info] -- Slave switch on host 10.25.1.68(10.25.1.68:3306) started, pid: 20673
Wed Mar 27 23:34:38 2019 - [info]
Wed Mar 27 23:34:38 2019 - [info] Log messages from 10.25.1.68 ...
Wed Mar 27 23:34:39 2019 - [info]
Wed Mar 27 23:34:38 2019 - [info] Waiting to execute all relay logs on 10.25.1.68(10.25.1.68:3306)..
Wed Mar 27 23:34:38 2019 - [info] master_pos_wait(mysql-bin.004691:854373586) completed on 10.25.1.68(10.25.1.68:3306). Executed 0 events.
Wed Mar 27 23:34:38 2019 - [info] done.
Wed Mar 27 23:34:38 2019 - [info] Resetting slave 10.25.1.68(10.25.1.68:3306) and starting replication from the new master 10.25.1.67(10.25.1.67:3306)..
Wed Mar 27 23:34:38 2019 - [info] Executed CHANGE MASTER.
Wed Mar 27 23:34:38 2019 - [info] Slave started.
Wed Mar 27 23:34:39 2019 - [info] End of log messages from 10.25.1.68 ...
Wed Mar 27 23:34:39 2019 - [info]
Wed Mar 27 23:34:39 2019 - [info] -- Slave switch on host 10.25.1.68(10.25.1.68:3306) succeeded.
Wed Mar 27 23:34:39 2019 - [info] Unlocking all tables on the orig master:
Wed Mar 27 23:34:39 2019 - [info] Executing UNLOCK TABLES..
Wed Mar 27 23:34:39 2019 - [info] ok.
Wed Mar 27 23:34:39 2019 - [info] Starting orig master as a new slave..
Wed Mar 27 23:34:39 2019 - [info] Resetting slave 10.25.1.66(10.25.1.66:3306) and starting replication from the new master 10.25.1.67(10.25.1.67:3306)..
Wed Mar 27 23:34:39 2019 - [info] Executed CHANGE MASTER.
Wed Mar 27 23:34:39 2019 - [info] Slave started.
Wed Mar 27 23:34:39 2019 - [info] All new slave servers switched successfully.
Wed Mar 27 23:34:39 2019 - [info]
Wed Mar 27 23:34:39 2019 - [info] * Phase 5: New master cleanup phase..
Wed Mar 27 23:34:39 2019 - [info]
Wed Mar 27 23:34:39 2019 - [info] 10.25.1.67: Resetting slave info succeeded.
Wed Mar 27 23:34:39 2019 - [info] Switching master to 10.25.1.67(10.25.1.67:3306) completed successfully.
1
2
3
4
# ps -ef|grep babybi_#1.conf
root 3578 1 0 Jan08 ? 00:42:35 perl /usr/bin/masterha_manager --conf=/root/dba/mha/conf/babybi_#1.conf --ignore_last_failover
root 21082 17518 0 23:35 pts/0 00:00:00 grep --colour=auto babybi_#1.conf
#

到new_master机器中查看VIP有没有绑定过来到网卡上

1
# ip a

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
1: lo:  mtu 65536 qdisc noqueue state UNKNOWN 
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
inet6 ::1/128 scope host
valid_lft forever preferred_lft forever
2: em1: mtu 1500 qdisc mq master bond0 state UP qlen 1000
link/ether 24:6e:96:13:61:30 brd ff:ff:ff:ff:ff:ff
3: em2: mtu 1500 qdisc mq master bond0 state UP qlen 1000
link/ether 24:6e:96:13:61:30 brd ff:ff:ff:ff:ff:ff
4: em3: mtu 1500 qdisc noop state DOWN qlen 1000
link/ether 24:6e:96:13:61:34 brd ff:ff:ff:ff:ff:ff
5: em4: mtu 1500 qdisc noop state DOWN qlen 1000
link/ether 24:6e:96:13:61:35 brd ff:ff:ff:ff:ff:ff
6: bond0: mtu 1500 qdisc noqueue state UP
link/ether 24:6e:96:13:61:30 brd ff:ff:ff:ff:ff:ff
inet 10.25.1.67/24 brd 10.25.1.255 scope global bond0
inet 10.25.1.203/32 scope global bond0
inet6 fe80::266e:96ff:fe13:6130/64 scope link
valid_lft forever preferred_lft forever

登录到slave机器上

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
mysql> show slave status\G
ERROR 2006 (HY000): MySQL server has gone away
No connection. Trying to reconnect...
Connection id: 2049252955
Current database: *** NONE ***

*************************** 1. row ***************************
Slave_IO_State: Waiting for master to send event
Master_Host: 10.25.1.67
Master_User: repl
Master_Port: 3306
Connect_Retry: 60
Master_Log_File: mysql-bin.000001
Read_Master_Log_Pos: 62528
Relay_Log_File: relay-log.000002
Relay_Log_Pos: 62691
Relay_Master_Log_File: mysql-bin.000001
Slave_IO_Running: Yes
Slave_SQL_Running: Yes
Replicate_Do_DB:
Replicate_Ignore_DB:
Replicate_Do_Table:
Replicate_Ignore_Table:
Replicate_Wild_Do_Table:
Replicate_Wild_Ignore_Table:
Last_Errno: 0
Last_Error:
Skip_Counter: 0
Exec_Master_Log_Pos: 62528
Relay_Log_Space: 62858
Until_Condition: None
Until_Log_File:
Until_Log_Pos: 0
Master_SSL_Allowed: No
Master_SSL_CA_File:
Master_SSL_CA_Path:
Master_SSL_Cert:
Master_SSL_Cipher:
Master_SSL_Key:
Seconds_Behind_Master: 0
Master_SSL_Verify_Server_Cert: No
Last_IO_Errno: 0
Last_IO_Error:
Last_SQL_Errno: 0
Last_SQL_Error:
Replicate_Ignore_Server_Ids:
Master_Server_Id: 1673306
Master_UUID: ca479c32-fa0d-11e8-bc0f-246e96136130
Master_Info_File: /my/data/percona/master.info
SQL_Delay: 0
SQL_Remaining_Delay: NULL
Slave_SQL_Running_State: Slave has read all relay log; waiting for the slave I/O thread to update it
Master_Retry_Count: 86400
Master_Bind:
Last_IO_Error_Timestamp:
Last_SQL_Error_Timestamp:
Master_SSL_Crl:
Master_SSL_Crlpath:
Retrieved_Gtid_Set:
Executed_Gtid_Set:
Auto_Position: 0
1 row in set (0.00 sec)

mysql>

正常 CHANGE MASTER TO NEW_MASTER

登录到old_master中查看

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
mysql> show slave status\G
*************************** 1. row ***************************
Slave_IO_State: Waiting for master to send event
Master_Host: 10.25.1.67
Master_User: repl
Master_Port: 3306
Connect_Retry: 60
Master_Log_File: mysql-bin.000001
Read_Master_Log_Pos: 66875
Relay_Log_File: relay-log.000002
Relay_Log_Pos: 67038
Relay_Master_Log_File: mysql-bin.000001
Slave_IO_Running: Yes
Slave_SQL_Running: Yes
Replicate_Do_DB:
Replicate_Ignore_DB:
Replicate_Do_Table:
Replicate_Ignore_Table:
Replicate_Wild_Do_Table:
Replicate_Wild_Ignore_Table:
Last_Errno: 0
Last_Error:
Skip_Counter: 0
Exec_Master_Log_Pos: 66875
Relay_Log_Space: 67205
Until_Condition: None
Until_Log_File:
Until_Log_Pos: 0
Master_SSL_Allowed: No
Master_SSL_CA_File:
Master_SSL_CA_Path:
Master_SSL_Cert:
Master_SSL_Cipher:
Master_SSL_Key:
Seconds_Behind_Master: 0
Master_SSL_Verify_Server_Cert: No
Last_IO_Errno: 0
Last_IO_Error:
Last_SQL_Errno: 0
Last_SQL_Error:
Replicate_Ignore_Server_Ids:
Master_Server_Id: 1673306
Master_UUID: ca479c32-fa0d-11e8-bc0f-246e96136130
Master_Info_File: /my/data/percona/master.info
SQL_Delay: 0
SQL_Remaining_Delay: NULL
Slave_SQL_Running_State: Slave has read all relay log; waiting for the slave I/O thread to update it
Master_Retry_Count: 86400
Master_Bind:
Last_IO_Error_Timestamp:
Last_SQL_Error_Timestamp:
Master_SSL_Crl:
Master_SSL_Crlpath:
Retrieved_Gtid_Set:
Executed_Gtid_Set:
Auto_Position: 0
1 row in set (0.00 sec)

mysql>

到new_master上查看

1
2
3
4
5
$ ps -ef|grep keepalived
root 37477 1 0 23:41 ? 00:00:00 /usr/sbin/keepalived -D
root 37478 37477 0 23:41 ? 00:00:00 /usr/sbin/keepalived -D
root 37479 37477 0 23:41 ? 00:00:00 /usr/sbin/keepalived -D
dbctl 38747 35598 0 23:49 pts/0 00:00:00 grep --colour=auto keepalived

在切换回来到old_master提成为master

步骤按照上面操作,就不重复搬砖了。

结果

MHA 高可用集群切换成功,虽然是手动MHA failover测试,并不能代表MHA的高可用。以后还会按照上面缩写的步骤在加上一定的主从数据同步延迟来进行演练。

-------------本文结束感谢您的阅读-------------

本文标题:MHA 故障切换演练 --- masterha_master_switch

文章作者:Wang Jiemin

发布时间:2019年03月28日 - 10:03

最后更新:2019年04月22日 - 11:04

原始链接:https://jiemin.wang/2019/03/28/mha-switch/

许可协议: 署名-非商业性使用-禁止演绎 4.0 国际 转载请保留原文链接及作者。

0%