I’ve been evaluating MySQL 8.0 for a while and thought to share quick steps to compile the source code for debugging. The steps I’ve shared are for Centos 7 x86_64 platform(or similar)
This blog is to share salient features and brief steps for Installation/Configuration and Monitoring of MySQL Group Replication.
What is Group Replication?
Group Replication is a new shared-nothing multi-master HA which is based on Paxos algorithm. Group Replication would be the part of InnoDB Cluster(currently in Beta) MySQL 5.7.17 GA version comes with Group Replication plugin. Group replication provides HA with automatic failover when the primary crashes.
How it works?
On group replication, all nodes do a group communication with each other where every message/transaction has its own transaction ID, GTID (which was first introduced on MySQL 5.6).
Group Communication
Any read-write (RW) transaction ‘say T1’ is not committed on either of the node until all of the nodes certify that T1 is safe to play (no conflicts) and hence T1 gets committed on all nodes. It should not be assumed similar to semi-synchronous replication where in semi-synchronous the slave node/s just acknowledges the transactions(that its been written on their relay-logs) irrespective of any conflict or not. To maintain the quorum the group should have atleast 3 nodes.
Starting the Group Replication:
Install client,server and MySQL lib packages of MySQL 5.7.17 available for the flavor of OS which you are going to use.
[root@localhost103 server]# cat group.init
SET SQL_LOG_BIN=0;
CREATE USER rpl_user@'192.168.1.103';
GRANT REPLICATION SLAVE ON *.* TO rpl_user@'192.168.1.103' IDENTIFIED BY 'rpl123#';
FLUSH PRIVILEGES;
CHANGE MASTER TO MASTER_USER='rpl_user', MASTER_PASSWORD='rpl123#' FOR CHANNEL 'group_replication_recovery';
SET SQL_LOG_BIN=1;
Init file is not part of steps for group replication configuration. I added for my own convenience.
Bootstrapping the group:
Execute on the node, which you want to be the primary OR the initial node.
[root@localhost103 ~]# mysql -S/data/server/d1/mysql.sock
mysql> SET GLOBAL group_replication_bootstrap_group=ON;
mysql> START GROUP_REPLICATION;
mysql> SET GLOBAL group_replication_bootstrap_group=OFF;
Error logs for the primary should say:
Plugin group_replication reported: 'This server was declared online within the replication group'
Check on the initial/primary node:
[root@localhost103 ~]# mysql -S/data/server/d1/mysql.sock
mysql> SELECT * FROM performance_schema.replication_group_members;
+---------------------------+--------------------------------------+-------------------------------+-------------+--------------+
| CHANNEL_NAME | MEMBER_ID | MEMBER_HOST | MEMBER_PORT | MEMBER_STATE |
+---------------------------+--------------------------------------+-------------------------------+-------------+--------------+
| group_replication_applier | c68da19c-c439-11e6-b601-0800277caf98 | localhost103.nareshserver.com | 3301 | ONLINE |
+---------------------------+--------------------------------------+-------------------------------+-------------+--------------+
1 rows in set (0.00 sec)
Adding secondary/other nodes:
Make sure you have all of the node’s hostnames and their listener ports in group_replication_group_seeds of master. If not we can add it as
[root@localhost103 ~]# mysql -S/data/server/d1/mysql.sock
mysql> set global group_replication_group_seeds= "192.168.1.103:24901,192.168.1.103:24902,192.168.1.103:24903"
While for nodes we need to add:
[root@localhost103 ~]# mysql -S/data/server/d2/mysql.sock
mysql> START GROUP_REPLICATION;
[root@localhost103 ~]# mysql -S/data/server/d3/mysql.sock
mysql> START GROUP_REPLICATION;
Once added, for all nodes below status query should return as:
Here TRANSACTIONS_COMMITTED_ALL_MEMBERS is the last committed transaction ids on all nodes.
While LAST_CONFLICT_FREE_TRANSACTION is the last conflict free transaction.
When there are no writes happening in any nodes, we could see the stats as
If not configured for multi-source replication, by-default only 2 channels will be seen on group replication as see above, and default desired_delay as 0 if no delayed replication configured.
mysql> select * from performance_schema.replication_applier_status;
+----------------------------+---------------+-----------------+----------------------------+
| CHANNEL_NAME | SERVICE_STATE | REMAINING_DELAY | COUNT_TRANSACTIONS_RETRIES |
+----------------------------+---------------+-----------------+----------------------------+
| group_replication_applier | ON | NULL | 0 |
| group_replication_recovery | OFF | NULL | 0 |
+----------------------------+---------------+-----------------+----------------------------+
2 rows in set (0.00 sec)
If group_replication_applier is OFF that means the applier thread is not running and the node is not replicating.
It shows the stats of group replication. Here VIEW_ID is the ID of the quorum, it shows how many times the quorum got modified after the bootsrtap of group replication.
replication_applier_status_by_coordinator will show no rows/info as we haven’t enabled parallel threads.
mysql> select * from performance_schema.replication_applier_status_by_coordinator \G
Empty set (0.00 sec)
After enabling parallel workers:
mysql> select * from performance_schema.replication_applier_status_by_coordinator \G
*************************** 1. row ***************************
CHANNEL_NAME: group_replication_applier
THREAD_ID: 33
SERVICE_STATE: ON
LAST_ERROR_NUMBER: 0
LAST_ERROR_MESSAGE:
LAST_ERROR_TIMESTAMP: 0000-00-00 00:00:00
*************************** 2. row ***************************
CHANNEL_NAME: group_replication_recovery
THREAD_ID: NULL
SERVICE_STATE: OFF
LAST_ERROR_NUMBER: 0
LAST_ERROR_MESSAGE:
LAST_ERROR_TIMESTAMP: 0000-00-00 00:00:00
2 rows in set (0.00 sec)
Enabling Parallel Workers
By default( without parallel workers enabled) replication_applier_status_by_worker will always show 2 rows, one for channel group_replication_applier and another group_replication_recovery. If parallel workers are enabled, then the row count for this performance_schema.replication_applier_status_by_worker will be n*2 where n is number of parallel workers.
mysql> stop group_replication; set global slave_preserve_commit_order=1; set global slave_parallel_workers=10; set global slave_parallel_type='LOGICAL_CLOCK'; stop group_replication; start group_replication;
We need to enable slave_preserve_commit_order as well to start parallel workers.
Multi-Master/Multi-Primary
By default not all nodes will be ready to take writes. All nodes except the initial node, will have super_read_only=ON
mysql> show global variables like '%super%';
+-----------------+-------+
| Variable_name | Value |
+-----------------+-------+
| super_read_only | ON |
+-----------------+-------+
1 row in set (0.00 sec)
To have enable writes on any node, other than primary node we have to disable super_read_only.
SET GLOBAL super_read_only=OFF;
Its obvious we need not have to restart the MySQL server in this case.
To enable writes on all nodes, and to have a true multi-primary case. We need to set
The moment you add other node, it reads all the binary log events and replicates all changes.
Lets insert some data and start a node without any data, and check if its catches up without any hassle.
If all nodes are in sync, their “TRANSACTIONS_COMMITTED_ALL_MEMBERS” must be same.
Things to consider while adding new node or starting crashed node of the group.
You cannot change group_replication_group_name while group replication is running.
group_replication_group_seeds shows the seeds/peer addresses from my.cnf of the instance but not all of the seeds which were later added OR even removed/crashed.
If you want to add a new node say node3, then either of the existing node/donor of group replication cluster must have all the binary logs from the inception of the group or else you may end up with error
[ERROR] Error reading packet from server for channel 'group_replication_recovery': The slave is connecting using CHANGE MASTER TO MASTER_AUTO_POSITION = 1, but the master has purged binary logs containing GTIDs that the slave requires. (server_errno=1236)
I know this is expected, as we have to first sync the data from an existing node to new node to be added. But it’s must to specify for the users who are using Galera flavored HA clusters(MariaDB Galera Cluster/Percona XtraDB Cluster) where with the help of SST (State Snapshot Transfer) either rsync or xtrabackup takes place from donor to joiner, which is entirely automated.
When starting the crashed node we might get the error like below:
[ERROR] Plugin group_replication reported: 'The member contains transactions not present in the group. The member will now exit the group.
The error logs might also contain the log as
[Note] Plugin group_replication reported: 'To force this member into the group you can use the group_replication_allow_local_disjoint_gtids_join option'
But enabling group_replication_allow_local_disjoint_gtids_join is not a wise step to proceed as it will lead to data inconsistencies.
How to deal with such situation?
Since the data of the node with above error cannot be relied upon, the data is worth nothing. Re-sync the crashed node from one of the donor node of Replication Group.
Some important and unique attributes for group replication which we need to review:
1. All status monitoring views for group replication are on performance_schema.
2. Supports only row based replication. i.e. binlog_format=ROW
3. Supports only InnoDB tables.
4. DDL statements not considered safe for multi-primary environment.
5. Multi-primary mode do not support tables with multi-level foreign key dependencies,
6. specifically tables that have defined CASCADING foreign key constraints.
7. SERIALIZABLE isolation level is not supported in multi-primary groups.
8. Transaction savepoints are not supported.
9. All tables must have a primary key.
10. Auto_increment
* The moment group replication is started auto_increment_increment changes as default 7 and auto_increment_offset changes to its own node number(if the node was added 3rd on group replication, the auto_increment_offset will change to 3)
* group_replication_auto_increment_increment stays as default 7.
11. Group replication with SELINUX
semanage port -a -t mysqld_port_t -p tcp 3306.
12. At max 9 nodes can participate in single group.
13. Need of a load balancer: Either HA Proxy or the new MySQL Router
I believe there are lot of things to cover and to analyze different behaviors(if not bugs) of Group Replication like, Group Replication with InnoDB Compressed Tables, SSL, parallel workers and complex topology with multi-source replication and traditional replication .
This blog is to provide ‘to the point’, to install and configure MySQL router. For in-depth information please check the Router Manual. MySQL Router is a intelligent routing connection or a gateway for MySQL HA environment. Installing and configuring router is pretty simple and straight forward.
The installation and configuration is only focused on Linux platform for now.
[root@nrama12 router]# mysqlrouter --help
MySQL Router v2.0.3 on Linux (64-bit)
Copyright (c) 2015, 2016 Oracle and/or its affiliates. All rights reserved.
Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.
Start MySQL Router.
Configuration read from the following files in the given order (enclosed in parentheses means not available for reading): /etc/mysqlrouter/mysqlrouter.ini (/root/.mysqlrouter.ini)
Options: -v, –version Display version information and exit. -h, –help Display this help and exit. -c <path>, –config <path> Only read configuration from given file. -a <path>, –extra-config <path> Read this file after configuration files are read from either default locations or from files specified by the –config option.
By default the router config file will look like below
[root@nrama12 router]# cat /etc/mysqlrouter/mysqlrouter.ini
# Copyright (c) 2015, Oracle and/or its affiliates. All
rights reserved.
# MySQL Router configuration file
#
# Documentation is available at
# http://dev.mysql.com/doc/mysql-router/en/
[DEFAULT]
logging_folder = /var/log/mysqlrouter/
plugin_folder = /usr/lib64/mysqlrouter
runtime_folder = /var/run/mysqlrouter
config_folder = /etc/mysqlrouter
[logger]
level = info
# If no plugin is configured which starts a service, keepalive
# will make sure MySQL Router will not immediately exit. It is
# safe to remove once Router is configured.
[keepalive]
interval = 60
Where MySQL instance localhost:3311 is fabric repository/controller And rest all localhost:3312, localhost:3313, localhost:3314, localhost:3315, localhost:3316 are part of fabric group local_grp.
The lookup for fabric group local_grp is like below:
[root@nrama12 Downloads]# cat /etc/mysqlrouter/mysqlrouter.ini
# Copyright (c) 2015, Oracle and/or its affiliates. All rights reserved.
# MySQL Router configuration file
#
# Documentation is available at
# http://dev.mysql.com/doc/mysql-router/en/
[DEFAULT]
#Make sure all the folder exists and edit the plugin path
logging_folder = /var/log/mysqlrouter/
plugin_folder = /usr/lib64/mysqlrouter
runtime_folder = /var/run/mysqlrouter
config_folder = /etc/mysqlrouter
[logger]
#You can set it to ‘INFO’, if you don’t want debug logs
level = DEBUG
# If no plugin is configured which starts a service, keepalive
# will make sure MySQL Router will not immediately exit. It is
# safe to remove once Router is configured.
[keepalive]
interval = 60
#Below is for mysqlfabric connection
#address is hostname of fabric controller/repository
#user is user for fabric controller/repository
[fabric_cache:my_cache]
address = localhost
user = admin
#Below config parameters are for router
#bind_address is hostname for router, with unique port
#destination is connection URL for fabric
#mode is to set more either to read-write or read-only
[routing]
bind_address = 127.0.0.1:7002
destinations = fabric+cache://my_cache/group/local_grp
mode = read-write
Start router
[root@nrama12]# mysqlrouter –config=/etc/mysqlrouter/mysqlrouter.ini
Logging to /var/log/mysqlrouter/mysqlrouter.log
Password for [fabric_cache:my_cache], user admin:
^Z[2] Killed mysqlrouter –config=/etc/mysqlrouter/mysqlrouter.ini
[3]+ Stopped mysqlrouter –config=/etc/mysqlrouter/mysqlrouter.ini
[root@nrama12]# bg
[3]+ mysqlrouter –config=/etc/mysqlrouter/mysqlrouter.ini &
Check logs
[root@nrama12 Downloads]# tail -f /var/log/mysqlrouter/mysqlrouter.log
2016-05-01 15:08:42 INFO [7f48adbc6700] [routing] listening on 127.0.0.1:7002; read-write
2016-05-01 15:08:42 INFO [7f48aebc8700] Starting Fabric Cache ‘my_cache’ using MySQL Fabric running on localhost:32275
2016-05-01 15:08:42 INFO [7f48ae3c7700] keepalive started with interval 60
2016-05-01 15:08:42 INFO [7f48ae3c7700] keepalive
2016-05-01 15:08:42 INFO [7f48aebc8700] Connected with Fabric running on 127.0.0.1
2016-05-01 15:09:12 INFO [7f5a0ffe9700] [routing] listening on 127.0.0.1:7002; read-write
2016-05-01 15:09:12 INFO [7f5a10feb700] Starting Fabric Cache ‘my_cache’ using MySQL Fabric running on localhost:32275
2016-05-01 15:09:12 INFO [7f5a107ea700] keepalive started with interval 60
2016-05-01 15:09:12 INFO [7f5a107ea700] keepalive
2016-05-01 15:09:12 INFO [7f5a10feb700] Connected with Fabric running on 127.0.0.1
Try to connect to fabric environment
[root@nrama12 Downloads]# mysql -P7002 -h127.0.0.1
Welcome to the MySQL monitor. Commands end with ; or \g.
Your MySQL connection id is 1029
Server version: 5.7.10-community-advanced-log MySQL Community Server
Copyright (c) 2000, 2015, Oracle and/or its affiliates. All rights reserved.
Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.
Type ‘help;’ or ‘\h’ for help. Type ‘\c’ to clear the current input statement.
mysql> show global variables like ‘hostname’;
+—————+—————————–+
| Variable_name | Value |
+—————+—————————–+
| hostname | nrama12.hsd1.ca.comcast.net |
+—————+—————————–+
1 row in set (0.01 sec)
mysql> show global variables like ‘port’;
+—————+——-+
| Variable_name | Value |
+—————+——-+
| port | 3312 |
+—————+——-+
1 row in set (0.00 sec)
[root@nrama12 Downloads]# cat /etc/mysqlrouter/mysqlrouter2.ini
# Copyright (c) 2015, Oracle and/or its affiliates. All rights reserved.
# MySQL Router configuration file
#
# Documentation is available at
# http://dev.mysql.com/doc/mysql-router/en/
[DEFAULT]
logging_folder = /var/log/mysqlrouter/
plugin_folder = /usr/lib64/mysqlrouter
runtime_folder = /var/run/mysqlrouter
config_folder = /etc/mysqlrouter
[logger]
level = DEBUG
# If no plugin is configured which starts a service, keepalive
# will make sure MySQL Router will not immediately exit. It is
# safe to remove once Router is configured.
[keepalive]
interval = 60
[routing]
bind_address = 127.0.0.1:7002
destinations = localhost:3312,localhost:3313,localhost:3314,localhost:3315,localhost:3316
mode = read-write
[root@nrama12 nareshm]# tail -f /var/log/mysqlrouter/mysqlrouter.log
2016-05-01 17:06:42 INFO [7f5bc236f700] keepalive
2016-05-01 17:06:42 INFO [7f5bc1b6e700] [routing] listening on 127.0.0.1:7002; read-write
2016-05-01 17:07:01 DEBUG [7f5bc0b6c700] Trying server localhost:3312 (index 0)
2016-05-01 17:07:01 DEBUG [7f5bc0b6c700] [routing] [127.0.0.1]:41553 – [::1]:3312
2016-05-01 17:07:17 DEBUG [7f5bc0b6c700] [routing] Routing stopped (up:3815b;down:962b)
2016-05-01 17:07:42 INFO [7f5bc236f700] keepalive
2016-05-01 17:08:42 INFO [7f5bc236f700] keepalive
2016-05-01 17:09:42 INFO [7f5bc236f700] keepalive
2016-05-01 17:10:42 INFO [7f5bc236f700] keepalive
Lets login and check:
[root@nrama12 nareshm]# mysql -P7002 -h127.0.0.1
Welcome to the MySQL monitor. Commands end with ; or \g.
Your MySQL connection id is 4885
Server version: 5.7.10-community-advanced-log MySQL Community Server
Copyright (c) 2000, 2015, Oracle and/or its affiliates. All rights reserved.
Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.
Type ‘help;’ or ‘\h’ for help. Type ‘\c’ to clear the current input statement.
mysql> show global variables like ‘port’;
+—————+——-+
| Variable_name | Value |
+—————+——-+
| port | 3312 |
+—————+——-+
1 row in set (0.00 sec)
Lets kill the master and see the router behavior.
[root@nrama12 nareshm]# kill -9 4356
[root@nrama12 nareshm]# mysql -P7002 -h127.0.0.1
Welcome to the MySQL monitor. Commands end with ; or \g.
Your MySQL connection id is 5288
Server version: 5.7.10-community-advanced-log MySQL Community Server
Copyright (c) 2000, 2015, Oracle and/or its affiliates. All rights reserved.
Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.
Type ‘help;’ or ‘\h’ for help. Type ‘\c’ to clear the current input statement.
mysql> show global variables like ‘port’;
+—————+——-+
| Variable_name | Value |
+—————+——-+
| port | 3313 |
+—————+——-+
1 row in set (0.00 sec)
The router starts pointing to mysql instance localhost:3313 . Note: When in simple replication setup, like above case, it will connect in round-robin method. The router does not trigger failover. We can make the best out of it if we use it with mysql HA environments like fabric and/or group replication.
This post it to share a brief and precise information about MySQL 5.7 Multi Source Replication (configuration and monitoring) and not specifically for concepts of MySQL replication.
There have been couple of third-party tools for multi-source replication in MySQL. But in MySQL57, its officially been included as one of the feature of MySQL.
A single source replication usually (with one master and 3 slaves) looks like the figure below:
For checking the complexity of multi-master, we configured 6 MySQL instances, with GTID enabled. (We can have multi-master feature with traditional file position based replication too.)
Below is the config file for a single instance. Rest 5 are all similar.
For every master we have to name a channel and hence the command “Change Master to”
mysql> change master to master_host=’localhost’, master_port=3311, master_user=’repl’,master_password=’repl123#’ for channel ‘master1’; mysql> change master to master_host=’localhost’, master_port=3312, master_user=’repl’,master_password=’repl123#’ for channel ‘master2’; mysql> change master to master_host=’localhost’, master_port=3313, master_user=’repl’,master_password=’repl123#’ for channel ‘master3’; mysql> change master to master_host=’localhost’, master_port=3314, master_user=’repl’,master_password=’repl123#’ for channel ‘master4’; mysql> change master to master_host=’localhost’, master_port=3315, master_user=’repl’,master_password=’repl123#’ for channel ‘master5’;
Starting slave for multiple channels
mysql> start slave for channel ‘master1’; mysql> start slave for channel ‘master2’; mysql> start slave for channel ‘master3’; mysql> start slave for channel ‘master4’; mysql> start slave for channel ‘master5’;
mysql> select * from test1 limit 3\G *************************** 1. row *************************** Title: 1 Col1: raw Col2: Raw123 Date: 2016-03-20 03:30:00 *************************** 2. row *************************** Title: 2 Col1: raw Col2: Raw123 Date: 2016-03-20 03:30:04 *************************** 3. row *************************** Title: 3 Col1: raw Col2: Raw123 Date: 2016-03-20 03:30:05 3 rows in set (0.00 sec)
Checking slave on nrama12:3316 [root@nrama12 mysql]# mysql -u root -S/mysql/mysql6/data/mysql6.socket
mysql> show databases; +——————–+ | Database | +——————–+ | information_schema | | mysql | | performance_schema | | sys | | test1 | +——————–+ 5 rows in set (0.00 sec)
mysql> use test1 Reading table information for completion of table and column names You can turn off this feature to get a quicker startup with -A
Database changed mysql> select * from test1 limit 3\G *************************** 1. row *************************** Title: 1 Col1: raw Col2: Raw123 Date: 2016-03-20 03:30:00 *************************** 2. row *************************** Title: 2 Col1: raw Col2: Raw123 Date: 2016-03-20 03:30:04 *************************** 3. row *************************** Title: 3 Col1: raw Col2: Raw123 Date: 2016-03-20 03:30:05 3 rows in set (0.00 sec)
The commands for stopping and starting slave, thread sql_thread and io_thread will be the same if we want to do any changes for all channels. Below are few examples:
This table shows the transaction execution status on the slave server.
mysql> select * from performance_schema.replication_applier_status\G *************************** 1. row *************************** CHANNEL_NAME: master1 SERVICE_STATE: ON REMAINING_DELAY: NULL COUNT_TRANSACTIONS_RETRIES: 0 *************************** 2. row *************************** CHANNEL_NAME: master2 SERVICE_STATE: ON REMAINING_DELAY: NULL COUNT_TRANSACTIONS_RETRIES: 0 *************************** 3. row *************************** CHANNEL_NAME: master3 SERVICE_STATE: ON REMAINING_DELAY: NULL COUNT_TRANSACTIONS_RETRIES: 0 *************************** 4. row *************************** CHANNEL_NAME: master4 SERVICE_STATE: ON REMAINING_DELAY: NULL COUNT_TRANSACTIONS_RETRIES: 0 *************************** 5. row *************************** CHANNEL_NAME: master5 SERVICE_STATE: ON REMAINING_DELAY: NULL COUNT_TRANSACTIONS_RETRIES: 0 5 rows in set (0.01 sec)
Here remaining_delay is waiting for DESIRED_DELAY seconds to pass since the master applied an event, this field contains the number of delay seconds remaining. At other times, this field is NULL.
The service_state is effected only when sql_thread is stopped.
5. replication_applier_status_by_coordinator
For a multi-threaded slave, the slave uses multiple worker threads and a coordinator thread to manage them, and this table shows the status of the coordinator thread.
FYI: We have to enable slave_parallel_workers by increasing the number of parallel workers from 0 to our desired number of workers.
mysql> select * from performance_schema.replication_applier_status_by_coordinator\G Empty set (0.00 sec)
Since default 0 parallel workers or no parallel workers are there the above table will be empty. Lets enable parallel workers.
mysql> set global slave_parallel_type=”LOGICAL_CLOCK”; Query OK, 0 rows affected (0.00 sec)
In our scenario we have 5 channels and have enabled 20 parallel workers. So total number of parallel workers or coordinater threads in a slave we will see as 20*5=100