GreatSQL MGR 集群部署整理
GreatSQL MGR 集群部署整理
简介
GreatSQL是国产开源、免费的MySQL分支版本,由万里数据库发起并捐赠给开放原子开源基金会。该数据库专注于提升MGR高可用性能,支持InnoDB并行查询、地理标签等特性,完全兼容MySQL/Percona Server,适合金融级应用场景,支持多种计算架构(ARM、x86等)。
前期准备
当前计划部署目标为2节点+1仲裁,数据库版本为GreatSQL 8.0.32-27,需准备三台服务器/云服务器,每台服务器配置如下:
| 角色 | 主机名 | IP地址 | 分配要点 |
|---|---|---|---|
| Primary (主) | database-node1 | 192.168.19.210 | 存数据,全功能节点 |
| Secondary (备) | database-node2 | 192.168.19.214 | 存数据,全功能节点 |
| Arbitrator (仲裁) | database-arbit | 192.168.19.215 | 不存数据,仅投票,低配 |
| 主机名 | 核心数 | 内存 | 硬盘 | 系统 |
|---|---|---|---|---|
| database-node1 | 8 | 16G | 500G | CentOS 7.9 |
| database-node2 | 8 | 16G | 500G | CentOS 7.9 |
| database-arbit | 4 | 8G | 50G | CentOS 7.9 |
基础环境准备
配置hosts解析
MGR 内部通信依赖主机名,务必把 .local 加上,否则数据传输时会报 Unknown host。
vi /etc/hosts
# 追加以下内容:
192.168.19.210 database-node1 database-node1.local node1
192.168.19.214 database-node2 database-node2.local node2
192.168.19.215 database-arbit database-arbit.local node3
关闭防火墙和SELinux
# 关闭防火墙
systemctl stop firewalld
systemctl disable firewalld
# 永久关闭 SELinux
sed -i 's/SELINUX=enforcing/SELINUX=disabled/g' /etc/selinux/config
setenforce 0
检查UUID唯一性
如果是克隆的虚拟机,必须检查 auto.cnf。如果 UUID 一样,必须删除重启。
cat /var/lib/mysql/auto.cnf
# 如果重复,执行:rm -f /var/lib/mysql/auto.cnf && systemctl restart greatsql
安装GreatSQL
安装教程参考https://github.com/GreatSQL
配置GreatSQL
在配置下面的三个配置文件配置完成后再初始化数据库,初始化命令如下(直接使用 --console 参数,让密码输出在屏幕上,不再到处找日志):
/usr/sbin/mysqld --defaults-file=/etc/my.cnf \
--initialize --user=mysql --lower-case-table-names=1 \
--console
database-node1节点配置
安装完成后不要初始化数据库,先调整配置文件/etc/my.cnf,内容如下:
[client]
socket = /var/lib/mysql/mysql.sock
[mysql]
loose-skip-binary-as-hex
prompt = "(\\D)[\\u@GreatSQL][\\d]>"
no-auto-rehash
[mysqld]
lower_case_table_names = 1
sql_mode = "STRICT_TRANS_TABLES,NO_ZERO_IN_DATE,NO_ZERO_DATE,ERROR_FOR_DIVISION_BY_ZERO,NO_ENGINE_SUBSTITUTION"
# --- 基础路径 ---
user = mysql
port = 3306
server_id = 3306 # 以后做集群时,每台机器这里要不一样(如3307, 3308)
basedir = /usr/
datadir = /var/lib/mysql
socket = /var/lib/mysql/mysql.sock
pid-file = mysql.pid
character-set-server = UTF8MB4
skip_name_resolve = ON
default_time_zone = "+8:00"
bind_address = "0.0.0.0"
secure_file_priv = /var/lib/mysql
# --- 性能 ---
# 核心内存:给 8G (50% 物理内存)
innodb_buffer_pool_size = 8G
innodb_buffer_pool_instances = 4
# 连接数
max_connections = 500
open_files_limit = 65535
table_open_cache = 2048
table_definition_cache = 1024
# 线程缓存
thread_cache_size = 64
# --- 缓冲区 ---
sort_buffer_size = 1M
join_buffer_size = 1M
read_buffer_size = 1M
read_rnd_buffer_size = 1M
bulk_insert_buffer_size = 16M
tmp_table_size = 64M
max_heap_table_size = 64M
max_allowed_packet = 64M
# --- GreatSQL 特性 ---
# 强制自动生成主键(为 MGR 铺路,建议开启)
sql_generate_invisible_primary_key = ON
# 关闭 Turbo 引擎以节省内存
# loose-turbo_memory_limit= 64G
# --- 日志 (安全第一) ---
log_timestamps = SYSTEM
log_error = error.log
log_error_verbosity = 3
slow_query_log = ON
long_query_time = 0.5 # 慢查询阈值设为 0.5秒,方便开发测试
log_bin = binlog
binlog_format = ROW
# 双1强一致性(防断电丢失)
sync_binlog = 1
binlog_expire_logs_seconds = 604800 # 日志保留7天
gtid_mode = ON
enforce_gtid_consistency = ON
# --- 适配 8核 CPU ---
relay_log_recovery = ON
replica_parallel_type = LOGICAL_CLOCK
# 并行复制线程数,建议设为 CPU 核数的一半
replica_parallel_workers = 4
binlog_transaction_dependency_tracking = WRITESET
replica_preserve_commit_order = ON
# --- InnoDB 引擎 ---
innodb_data_file_path = ibdata1:12M:autoextend
# 每次提交刷盘(防断电丢失)
innodb_flush_log_at_trx_commit = 1
innodb_log_buffer_size = 16M
innodb_redo_log_capacity = 2G
# IO 能力 (适配虚拟磁盘)
innodb_io_capacity = 2000
innodb_io_capacity_max = 4000
innodb_flush_method = O_DIRECT
innodb_print_all_deadlocks = ON
# ==========================================
# GreatSQL MGR Configuration (Node1 - Primary)
# ==========================================
# 1. 基础复制设定
master_info_repository = TABLE
relay_log_info_repository = TABLE
binlog_checksum = NONE # MGR 建议关闭 checksum
log_slave_updates = ON
# 2. 插件加载 (MGR 核心)
loose-plugin_load_add = 'mysql_clone.so'
loose-plugin_load_add = 'group_replication.so'
# 3. 集群组标识 (所有节点必须完全一致)
loose-group_replication_group_name = "29fb195d-f266-4897-b56b-bd50b3aa71fb"
# 4. 启动策略 (为了安全,设为 OFF,手动引导)
loose-group_replication_start_on_boot = OFF
loose-group_replication_bootstrap_group = OFF
# 5. 网络通讯配置 (关键!)
# 本机通讯地址 (IP必须是本机IP, 端口建议 33061)
loose-group_replication_local_address = "192.168.19.210:33061"
# 种子节点 (填入所有3台机器的地址,用于互相发现)
loose-group_replication_group_seeds = "192.168.19.210:33061,192.168.19.214:33061,192.168.19.215:33061"
# 白名单 (允许通讯的网段)
loose-group_replication_ip_allowlist = "192.168.19.0/24"
# 6. 单主模式设定
loose-group_replication_single_primary_mode = ON
loose-group_replication_enforce_update_everywhere_checks = OFF
database-node2节点配置
配置文件同上,但 server_id = 3307
[client]
socket = /var/lib/mysql/mysql.sock
[mysql]
loose-skip-binary-as-hex
prompt = "(\\D)[\\u@GreatSQL][\\d]>"
no-auto-rehash
[mysqld]
lower_case_table_names = 1
sql_mode = "STRICT_TRANS_TABLES,NO_ZERO_IN_DATE,NO_ZERO_DATE,ERROR_FOR_DIVISION_BY_ZERO,NO_ENGINE_SUBSTITUTION"
# --- 基础路径 ---
user = mysql
port = 3306
server_id = 3307 # 以后做集群时,每台机器这里要不一样(如3307, 3308)
basedir = /usr/
datadir = /var/lib/mysql
socket = /var/lib/mysql/mysql.sock
pid-file = mysql.pid
character-set-server = UTF8MB4
skip_name_resolve = ON
default_time_zone = "+8:00"
bind_address = "0.0.0.0"
secure_file_priv = /var/lib/mysql
# --- 性能 ---
# 核心内存:给 8G (50% 物理内存)
innodb_buffer_pool_size = 8G
innodb_buffer_pool_instances = 4
# 连接数:验证环境 500 足够,节省内存
max_connections = 500
open_files_limit = 65535
table_open_cache = 2048
table_definition_cache = 1024
# 线程缓存
thread_cache_size = 64
# --- 缓冲区 (改小以防 OOM) ---
sort_buffer_size = 1M
join_buffer_size = 1M
read_buffer_size = 1M
read_rnd_buffer_size = 1M
bulk_insert_buffer_size = 16M
tmp_table_size = 64M
max_heap_table_size = 64M
max_allowed_packet = 64M
# --- GreatSQL 特性 ---
# 强制自动生成主键(为 MGR 铺路,建议开启)
sql_generate_invisible_primary_key = ON
# 关闭 Turbo 引擎以节省内存
# loose-turbo_memory_limit= 64G
# --- 日志 (安全第一) ---
log_timestamps = SYSTEM
log_error = error.log
log_error_verbosity = 3
slow_query_log = ON
long_query_time = 0.5 # 慢查询阈值设为 0.5秒,方便开发测试
log_bin = binlog
binlog_format = ROW
# 双1强一致性(防断电丢失)
sync_binlog = 1
binlog_expire_logs_seconds = 604800 # 日志保留7天
gtid_mode = ON
enforce_gtid_consistency = ON
# --- 复制 (适配 8核 CPU) ---
relay_log_recovery = ON
replica_parallel_type = LOGICAL_CLOCK
# 并行复制线程数,建议设为 CPU 核数的一半
replica_parallel_workers = 4
binlog_transaction_dependency_tracking = WRITESET
replica_preserve_commit_order = ON
# --- InnoDB 引擎 ---
innodb_data_file_path = ibdata1:12M:autoextend
# 每次提交刷盘(防断电丢失)
innodb_flush_log_at_trx_commit = 1
innodb_log_buffer_size = 16M
innodb_redo_log_capacity = 2G
# IO 能力 (适配虚拟磁盘)
innodb_io_capacity = 2000
innodb_io_capacity_max = 4000
innodb_flush_method = O_DIRECT
innodb_print_all_deadlocks = ON
# ==========================================
# GreatSQL MGR Configuration (Node2 - Secondary)
# ==========================================
master_info_repository = TABLE
relay_log_info_repository = TABLE
binlog_checksum = NONE
log_slave_updates = ON
loose-plugin_load_add = 'mysql_clone.so'
loose-plugin_load_add = 'group_replication.so'
loose-group_replication_group_name = "29fb195d-f266-4897-b56b-bd50b3aa71fb"
loose-group_replication_start_on_boot = OFF
loose-group_replication_bootstrap_group = OFF
# 【注意】这里是 Node2 的 IP
loose-group_replication_local_address = "192.168.19.214:33061"
# 种子节点保持一致
loose-group_replication_group_seeds = "192.168.19.210:33061,192.168.19.214:33061,192.168.19.215:33061"
loose-group_replication_ip_allowlist = "192.168.19.0/24"
loose-group_replication_single_primary_mode = ON
loose-group_replication_enforce_update_everywhere_checks = OFF
database-arbit节点配置
仲裁节点配置不同
[client]
socket = /var/lib/mysql/mysql.sock
[mysqld]
# --- 基础身份 ---
server_id = 3
user = mysql
port = 3306
basedir = /usr/
datadir = /var/lib/mysql
socket = /var/lib/mysql/mysql.sock
pid-file = mysql.pid
# --- 核心兼容性设置 (必须加) ---
lower_case_table_names = 1
sql_mode = "STRICT_TRANS_TABLES,NO_ZERO_IN_DATE,NO_ZERO_DATE,ERROR_FOR_DIVISION_BY_ZERO,NO_ENGINE_SUBSTITUTION"
skip_name_resolve = ON
character-set-server = UTF8MB4
default_time_zone = "+8:00"
bind_address = "0.0.0.0"
# --- 内存与性能 (已适配 4C/8G 机器) ---
# 给数据库分配 4G 内存,留 4G 给系统和仲裁逻辑,非常安全
innodb_buffer_pool_size = 4G
innodb_buffer_pool_instances = 2
# 连接数:仲裁节点不需要太多连接,200足够
max_connections = 200
open_files_limit = 65535
table_open_cache = 1024
table_definition_cache = 512
# 线程缓存
thread_cache_size = 32
# 临时内存控制 (防止瞬间内存暴涨)
sort_buffer_size = 512K
join_buffer_size = 512K
read_buffer_size = 512K
read_rnd_buffer_size = 512K
tmp_table_size = 32M
max_heap_table_size = 32M
max_allowed_packet = 64M
# --- 存储与日志 (适配 50G 小硬盘) ---
# 双1安全设置 (虽然是仲裁,但保持一致性习惯)
innodb_flush_log_at_trx_commit = 1
sync_binlog = 1
# 日志文件
log_timestamps = SYSTEM
log_error = error.log
slow_query_log = ON
long_query_time = 1
log_bin = binlog
binlog_format = ROW
gtid_mode = ON
enforce_gtid_consistency = ON
# 【关键】日志只保留 3 天,防止撑爆 50G 硬盘
binlog_expire_logs_seconds = 259200
# IO 能力 (虚拟机硬盘通常一般,设低点防止卡顿)
innodb_io_capacity = 1000
innodb_io_capacity_max = 2000
# --- 复制配置 (适配 4核 CPU) ---
# 既然只有 4 个核,并行线程设为 2 就可以了,多了反而慢
replica_parallel_workers = 2
relay_log_recovery = ON
replica_preserve_commit_order = ON
# --- 仲裁节点配置 ---
master_info_repository = TABLE
relay_log_info_repository = TABLE
binlog_checksum = NONE
# --- MGR 预埋配置 (暂时不启动,但把参数写好) ---
loose-plugin_load_add = 'mysql_clone.so'
loose-plugin_load_add = 'group_replication.so'
loose-group_replication_group_name = "29fb195d-f266-4897-b56b-bd50b3aa71fb"
loose-group_replication_start_on_boot = OFF
loose-group_replication_local_address = "192.168.19.215:33061" # Node3 的IP
loose-group_replication_group_seeds = "192.168.19.210:33061,192.168.19.214:33061,192.168.19.215:33061"
loose-group_replication_bootstrap_group = OFF
loose-group_replication_ip_allowlist = "192.168.19.0/24"
loose-group_replication_single_primary_mode = ON
loose-group_replication_enforce_update_everywhere_checks = OFF
# 【仲裁节点关键开关】
# 作为一个单体运行时,这个参数没影响;加入集群时,它决定了你是仲裁者
loose-group_replication_arbitrator = ON
启动主节点
目标:让 Node1 成为第一个 Primary。
登录Node1 SQL界面,执行以下命令启动主节点:
-- 1. 创建复制用户 (用于节点间通信)
SET SQL_LOG_BIN=0;
CREATE USER 'repl'@'%' IDENTIFIED BY 'ReplPass123!';
GRANT REPLICATION SLAVE, REPLICATION CLIENT, BACKUP_ADMIN, CLONE_ADMIN, GROUP_REPLICATION_ADMIN ON *.* TO 'repl'@'%';
FLUSH PRIVILEGES;
SET SQL_LOG_BIN=1;
-- 2. 配置恢复通道
CHANGE MASTER TO MASTER_USER='repl', MASTER_PASSWORD='ReplPass123!' FOR CHANNEL 'group_replication_recovery';
-- 3. 引导启动 (Bootstrap)
SET GLOBAL group_replication_bootstrap_group=ON;
START GROUP_REPLICATION;
SET GLOBAL group_replication_bootstrap_group=OFF; -- 启动后必须立刻关闭
启动备节点
目标:利用 Clone 技术,自动拉取数据并加入。
登录 Node2 SQL界面,执行以下命令启动备节点:
-- 1. 临时给 root 授权 (否则 CLONE 会报 1227 错误)
-- 这一步视情况而定,如果报 Access denied 就执行
GRANT CLONE_ADMIN, BACKUP_ADMIN ON *.* TO 'root'@'%';
-- 或者 'root'@'localhost',视 current_user() 而定
-- 2. 设置克隆源
SET GLOBAL clone_valid_donor_list = '192.168.19.210:3306';
CHANGE MASTER TO MASTER_USER='repl', MASTER_PASSWORD='ReplPass123!' FOR CHANNEL 'group_replication_recovery';
-- 3. 执行克隆 (会自动重启)
CLONE INSTANCE FROM 'repl'@'192.168.19.210':3306 IDENTIFIED BY 'ReplPass123!';
-- 4. 重启后再次登录,启动 MGR
START GROUP_REPLICATION;
启动仲裁节点
目标:手动加入,不做 Clone,且必须清理脏数据。
登录 Node3 SQL界面,执行以下命令启动仲裁节点:
-- 1. 清理环境 (防止报 "more executed transactions" 错误)
STOP GROUP_REPLICATION;
RESET MASTER; -- 关键!清空所有本地事务记录,变为白纸
-- 2. 手动创建用户 (因为没克隆,所以没用户)
SET SQL_LOG_BIN=0;
CREATE USER 'repl'@'%' IDENTIFIED BY 'ReplPass123!';
GRANT REPLICATION SLAVE, REPLICATION CLIENT, BACKUP_ADMIN, CLONE_ADMIN, GROUP_REPLICATION_ADMIN ON *.* TO 'repl'@'%';
FLUSH PRIVILEGES;
SET SQL_LOG_BIN=1;
-- 3. 配置通道并加入
CHANGE MASTER TO MASTER_USER='repl', MASTER_PASSWORD='ReplPass123!' FOR CHANNEL 'group_replication_recovery';
START GROUP_REPLICATION;
最终验证
登录 Node1 SQL界面,执行以下命令查看集群状态:
SELECT MEMBER_HOST, MEMBER_ROLE, MEMBER_STATE FROM performance_schema.replication_group_members;
若输出如下,则部署成功。
+---------------------------+--------------------------------------+----------------------+-------------+--------------+-------------+----------------+----------------------------+
| CHANNEL_NAME | MEMBER_ID | MEMBER_HOST | MEMBER_PORT | MEMBER_STATE | MEMBER_ROLE | MEMBER_VERSION | MEMBER_COMMUNICATION_STACK |
+---------------------------+--------------------------------------+----------------------+-------------+--------------+-------------+----------------+----------------------------+
| group_replication_applier | 3512adbf-fc17-11f0-8a1b-bc2411574dc2 | database-node2.local | 3306 | ONLINE | SECONDARY | 8.0.32 | XCom |
| group_replication_applier | 89eee833-fbfe-11f0-9499-bc2411dee7fd | database-node1.local | 3306 | ONLINE | PRIMARY | 8.0.32 | XCom |
| group_replication_applier | de6dc1c1-fc19-11f0-9e22-bc2411c21b6c | database-arbit | 3306 | ONLINE | ARBITRATOR | 8.0.32 | XCom |
+---------------------------+--------------------------------------+----------------------+-------------+--------------+-------------+----------------+----------------------------+
3 rows in set (0.00 sec)
成功标准:
- Node1 成为 PRIMARY / ONLINE
- Node2 成为 SECONDARY / ONLINE
- Node3 成为 ARBITRATOR / ONLINE
宕机验证
模拟Node1宕机
进入Node1 SSH,停止进程。
systemctl stop mysqld
登录Node2 SQL界面,执行以下命令查看集群状态:
SELECT MEMBER_HOST, MEMBER_ROLE, MEMBER_STATE FROM performance_schema.replication_group_members;
-- 节点1离线
+----------------------+-------------+--------------+
| MEMBER_HOST | MEMBER_ROLE | MEMBER_STATE |
+----------------------+-------------+--------------+
| database-node2.local | PRIMARY | ONLINE |
| database-arbit | ARBITRATOR | ONLINE |
+----------------------+-------------+--------------+
2 rows in set (0.00 sec)
重启Node1,进入SQL界面,执行以下命令查看集群状态:
SELECT MEMBER_HOST, MEMBER_ROLE, MEMBER_STATE FROM performance_schema.replication_group_members;
-- 此时仍为离线,不必担心,只是需要手动恢复
+-------------+-------------+--------------+
| MEMBER_HOST | MEMBER_ROLE | MEMBER_STATE |
+-------------+-------------+--------------+
| | | OFFLINE |
+-------------+-------------+--------------+
1 row in set (0.00 sec)
-- 手动恢复
START GROUP_REPLICATION;
-- 此时再看集群状态,显示恢复成功
SELECT MEMBER_HOST, MEMBER_ROLE, MEMBER_STATE FROM performance_schema.replication_group_members;
+----------------------+-------------+--------------+
| MEMBER_HOST | MEMBER_ROLE | MEMBER_STATE |
+----------------------+-------------+--------------+
| database-node2.local | PRIMARY | ONLINE |
| database-node1.local | SECONDARY | RECOVERING |
| database-arbit | ARBITRATOR | ONLINE |
+----------------------+-------------+--------------+
3 rows in set (0.00 sec)
-- 此时Node2会成为主节点
但是仔细看,Node1变成了RECOVERING状态,如果长时间没有恢复,则需要看日志。可能会看到类似如下报错:
tail -f /var/lib/mysql/error.log
···
2026-01-28T17:45:30.353449+08:00 77 [ERROR] [MY-010584] [Repl] Slave I/O for channel 'group_replication_recovery': error connecting to master 'repl@database-node2.local:3306' - retry-time: 60 retries: 1 message: Authentication plugin 'caching_sha2_password' reported error: Authentication requires secure connection. Error_code: MY-002061
···
这是SSL认证问题,这一行核心报错: Authentication requires secure connection. Error_code: MY-002061。这是 MySQL 8.0 默认认证插件 caching_sha2_password 的特性。 当使用 repl 用户进行远程连接(MGR 恢复数据就是一种远程连接)时,如果没有建立 SSL 加密通道,MySQL 为了安全,拒绝直接传输密码,除非客户端明确表示:“我去把公钥(Public Key)拿过来,自己在本地把密码加密了再传给你”。Node1 现在的状态是:想连 Node2,但既没有配 SSL 证书,又没开启“获取公钥”的开关,所以被 Node2 拒绝了。
解决办法:
- 登录Node2 SQL界面,执行以下命令
-- 修改 repl 用户的认证插件为原生模式
ALTER USER 'repl'@'%' IDENTIFIED WITH mysql_native_password BY 'ReplPass123!';
-- 刷新权限
FLUSH PRIVILEGES;
- 登录Node1 SQL界面,执行以下命令
-- 重新加入集群
STOP GROUP_REPLICATION;
START GROUP_REPLICATION;
- 验证状态
SELECT MEMBER_HOST, MEMBER_ROLE, MEMBER_STATE FROM performance_schema.replication_group_members;
+----------------------+-------------+--------------+
| MEMBER_HOST | MEMBER_ROLE | MEMBER_STATE |
+----------------------+-------------+--------------+
| database-node2.local | PRIMARY | ONLINE |
| database-node1.local | SECONDARY | ONLINE |
| database-arbit | ARBITRATOR | ONLINE |
+----------------------+-------------+--------------+
3 rows in set (0.00 sec)
易错点整理
仲裁节点MGR插件锁死
· 现象:执行 ALTER USER 报错 ERROR 3100 ... running replication hook 'before_commit'。
· 原因:MGR 插件 (group_replication.so) 已加载但未组建集群,此时为了安全,插件会拦截所有写操作。
· 解决方案:临时方案: SET SQL_LOG_BIN=0; 然后改密码(如果未被拦截)。终极方案: 修改 my.cnf 注释掉 plugin_load_add -> 重启 -> 改密码 -> 恢复注释 -> 重启。
MGR 启动报错 (Error 3092 & Timeout)
· 现象:执行 START GROUP_REPLICATION 后卡住,最后报 Timeout on wait for view,日志提示 Local port: 33061 无法连接。
· 原因:防火墙 (Firewalld) 或 SELinux 拦截了 MGR 专用的 33061 端口。
· 解决方案:关闭 Firewalld。彻底关闭 SELinux: 修改 /etc/selinux/config 设为 disabled,并执行 setenforce 0。
节点被踢出 (UUID 冲突)
· 现象:节点加入集群瞬间被踢出,日志报错 UUID 重复。
· 原因:克隆机默认复制了 /var/lib/mysql/auto.cnf,导致所有机器 server_uuid 一样。
· 解决方案:删除 auto.cnf 文件,重启数据库自动重新生成。
克隆报错 (Error 1227 / 1410 权限问题)
· 现象:执行 CLONE INSTANCE 时报错 Access denied ... CLONE_ADMIN。
· 原因:mysql_clone.so 是动态插件,加载后 root 不会自动获得权限。
· 解决方案:手动授权:GRANT CLONE_ADMIN, BACKUP_ADMIN ON . TO 'root'@'%';注意 root 的 host 匹配 (是 % 还是 localhost)。
仲裁节点无法加入 (Error 11526 事务超前)
· 现象:报错 This member has more executed transactions than those present in the group。
· 原因:仲裁节点虽然不存数据,但在加入前执行了 CREATE USER 等操作,产生了本地 GTID,变得“不干净”了。
· 解决方案:
- STOP GROUP_REPLICATION;
- RESET MASTER; (清空所有本地事务记录,回归白纸状态)
- START GROUP_REPLICATION;
节点卡在 RECOVERING (DNS 解析失败)
· 现象:节点加入后一直 RECOVERING,日志报 Unknown MySQL server host 'database-node1.local'。
· 原因:MGR 内部通信使用主机名 (.local),但 /etc/hosts 未配置对应的域名解析。
· 原因:在 /etc/hosts 中补全集群所有节点的 IP Hostname Hostname.local 映射。
大小写敏感 (不可逆配置)
· 警示:lower_case_table_names=1 (不敏感) 必须在初始化之前配置好。
· 后果:如果初始化后想改这个参数,必须删库重来。对于 Nacos、Java 应用,通常必须设为 1。
节点离线后重新上线没有自动加入集群
在配置文件中存在参数"loose-group_replication_start_on_boot = OFF",这个参数的意思是:“MySQL 服务重启后,MGR 插件默认不启动,需等待人工指令。” 这也是生产环境比较保守和安全的做法(防止坏掉的节点自动加入捣乱)。
登录Node1 SQL界面,执行以下命令手动加入集群:
START GROUP_REPLICATION;
将以下配置参数调整为ON,会自动使离线重启的节点加入集群:
loose-group_replication_start_on_boot = OFF
# 修改为
loose-group_replication_start_on_boot = ON
