This issue recommends a Redis multi-data center replication management system developed by Ctrip’s framework department — X-Pipe.

Ctrip open source Redis multi-data center replication management system插图

Redis has been widely used inside Ctrip. According to client data statistics, the read and write requests of all Redis in Ctrip are 2000W per second, of which the write requests are about 100W. Many businesses even use Redis as a persistent memory database. In this way, Redis multi-data centers are in great demand. One is to improve availability and solve the Disaster Recovery (DR) problem of data centers; the other is to improve access performance. Each data center can read the data of the current data center without cross-room reading. XPipe was born.

For ease of description, DC is used to represent the Data Center.

Overall structure

Ctrip open source Redis multi-data center replication management system插图1

Console The console is used to manage the metadata data of multiple equipment rooms and provides a user interface for configuration and DR Switching.
Keeper caches Redis operation logs and compresses and encrypts cross-room transmission.
Meta Server manages all keeper status in a single room and corrects abnormal status.

Redis data replication problem

The first thing to solve in a multi-data center is the data replication problem, that is, how data is transferred from one DC to another DC. We decided to adopt the pseudo-slave scheme, that is, to implement the Redis protocol, pretend to be a Redis slave, and let the Redis master push data to the pseudo-slave. This pseudo-slave, which we call keeper, is shown below:

Ctrip open source Redis multi-data center replication management system插图2

Advantages of using keeper :

Reduce master full synchronization

If the slave in the remote equipment room is directly connected to the master, multiple slaves will cause the master to fully synchronize multiple times, and the keeper can cache rdb and replication log. The slave in the remote equipment room directly obtains data from the keeper to enhance the stability of the master.

Reduce multi-data center network traffic

Between two data centers, data needs to be transmitted only once through keeper, and the transmission protocol between keeper can be customized to facilitate compression (currently not supported).

Reduce full synchronization when network exception

keeper caches Redis log data to disk, so that a large amount of log data can be cached (Redis caches data to the memory ring buffer, the capacity is limited). If the network between data centers is abnormal for a long time, log data can still be transmitted.

Security improvement

data transmission between multiple computer rooms often needs to be carried out through the public network, so the security of data becomes extremely important. Data transmission between keeper can also be encrypted (not yet realized) to improve security.

Equipment Room Switching process

Check if you can switch DR

Similar to the 2PC protocol, prepare first to ensure that the process can proceed smoothly.

The original master cannot write

This step ensures that there is only one master during the migration process to resolve possible data loss during the migration process.

Upgrade new master

Synchronize other equipment rooms to the new main equipment room

Both rollback and retry functions are provided. The rollback function can be rolled back to the original state. The retry function can repair abnormal conditions and continue the switchover when the DBA manually intervenes.

High availability

High availability of XPipe system

If keeper fails, data transmission between DCS may be interrupted. To solve this problem, keeper has two active and standby nodes, and the standby node copies data from the active node in real time. When the active node fails, the standby node will be promoted to the active node to replace the active node.

The promotion operation needs to be carried out through a third-party node, which we call MetaServer, which is mainly responsible for the transformation of keeper status and the storage of meta information in the equipment room. At the same time, MetaServer should also be highly available: each MetaServer is responsible for a specific Redis cluster. When a MetaServer node fails, its Redis cluster will be replaced by another node. If a new node is added to the entire cluster, load balancing is automatically performed and part of the cluster is transferred to the new node.

High availability of Redis itself

Redis may also fail, and Redis itself provides a Sentinel mechanism to ensure high availability of the cluster. However, before Redis4.0, after a new master is promoted, other nodes will perform full synchronization after connecting to the master. During full synchronization, the slave will be unavailable. The master will export the rdb, reducing the availability of the master. At the same time, a large amount of data (RDB) is transmitted in the cluster, which will lead to the instability of the whole system.

Up to the time of writing, 4.0 still has not been released, and the Redis version used by Ctrip internally is 2.8.19, if it is upgraded to 4.0, the version span is too large, based on this, We optimized the version of Redis3.0.7 to implement the psync2.0 protocol and achieve incremental synchronization.

Ctrip open source Redis multi-data center replication management system插图3

Test data

Test scheme

The test method is shown in the figure below. Data is sent from the client to the master and the slave notifies the client through keyspace notification. The test delay is t1+t2+t3.

Ctrip open source Redis multi-data center replication management system插图4

Test data

First, we test the delay of Redis master copying directly to slave, which is 0.2ms. Then add a layer of keeper between master and slave, and the overall delay increases by 0.1ms to 0.3ms.

The test was carried out in the production environment of Ctrip, and the ping RTT between the two machine rooms in the production environment was about 0.61ms. After the two-layer keeper across the data center, the average delay obtained by the test was about 0.8ms, and the delay of 99.9 lines was 2ms.

docker Quick start

1 Start preparation

- Need to start docker process in advance, and Docker-compose

is supported

Create a path and start from it (recommended)

2 Start docker

Method 1: Start the image on dockerhub

/bin/bash -c "$(curl  -sSL  https://raw.githubusercontent.com/ctripcorp/x-pipe/master/redis/dockerPackage/start-xpipe-container.sh)"</ code>

Container distribution after startup

 Run the command: docker ps -a

CONTAINER ID   IMAGE                              COMMAND                  CREATED      STATUS      PORTS                                                                                                                          NAMES
1e77491414a9   ctripcorpxpipe/xpipe-console:1.0   "docker-entrypoint.sh"   3 days ago   Up 3 days   0.0.0.0:8079->8080/tcp, :::8079->8080/tcp                                                                                       consolejq
6694f9eff0ad   ctripcorpxpipe/xpipe-console:1.0   "docker-entrypoint.sh"   3 days ago   Up 3 days   0.0.0.0:8081->8080/tcp, :::8081->8080/tcp                                                                                       consoleoy

aa0d109c7aae   ctripcorpxpipe/xpipe-meta:1.0      "docker-entrypoint.sh"   3 days ago   Up 3 days   0.0.0.0:9747->8080/tcp, :::9747->8080/tcp                                                                                       metajq
0c6cb6dfe51f   ctripcorpxpipe/xpipe-meta:1.0      "docker-entrypoint.sh"   3 days ago   Up 3 days   0.0.0.0:9748->8080/tcp, :::9748->8080/tcp                                                                                       metaoy

0e0f78ae096d   ctripcorpxpipe/xpipe-keeper:1.0    "docker-entrypoint.sh"   3 days ago   Up 3 days   0.0.0.0:7080->8080/tcp, :::7080->8080/tcp                                                                                       keeperjq1
16c5fdd14a5e   ctripcorpxpipe/xpipe-keeper:1.0    "docker-entrypoint.sh"   3 days ago   Up 3 days   0.0.0.0:7081->8080/tcp, :::7081->8080/tcp                                                                                       keeperjq2
1915292f3a7f   ctripcorpxpipe/xpipe-keeper:1.0    "docker-entrypoint.sh"   3 days ago   Up 3 days   0.0.0.0:7180->8080/tcp, :::7180->8080/tcp                                                                                       keeperoy1
0c885945d8f3   ctripcorpxpipe/xpipe-keeper:1.0    "docker-entrypoint.sh"   3 days ago   Up 3 days   0.0.0.0:7181->8080/tcp, :::7181->8080/tcp                                                                                       keeperoy2

15062ab45feb   ctripcorpxpipe/xpipe-proxy:1.0     "docker-entrypoint.sh"   3 days ago   Up 3 days   0.0.0.0:19079->80/tcp, :::19079->80/tcp, 0.0.0.0:19442->443/tcp, :::19442->443/tcp, 0.0.0.0:8092->8080/tcp, :::8092->8080/tcp   proxyjq
ed38daf8e71e   ctripcorpxpipe/xpipe-proxy:1.0     "docker-entrypoint.sh"   3 days ago   Up 3 days   0.0.0.0:19081->80/tcp, :::19081->80/tcp, 0.0.0.0:19444->443/tcp, :::19444->443/tcp, 0.0.0.0:8091->8080/tcp, :::8091->8080/tcp   proxyoy

d0e811ea5d3d   zookeeper                          "/docker-entrypoint.…"   3 days ago   Up 3 days   2888/tcp, 3888/tcp, 0.0.0.0:2181->2181/tcp, :::2181->2181/tcp, 8080/tcp                                                         zoo1
41381b5bd3a9   zookeeper                          "/docker-entrypoint.…"   3 days ago   Up 3 days   2888/tcp, 3888/tcp, 8080/tcp, 0.0.0.0:2182->2181/tcp, :::2182->2181/tcp                                                         zoo2

d5f85ee0360e   ctripcorpxpipe/xpipe-mysql:2.0     "docker-entrypoint.s…"   3 days ago   Up 3 days   0.0.0.0:3306->3306/tcp, :::3306->3306/tcp, 33060/tcp                                                                            mysql

ba2b64f10700   redis:4.0                          "docker-entrypoint.s…"   3 days ago   Up 3 days   0.0.0.0:6379->6379/tcp, :::6379->6379/tcp                                                                                       redis-6379
4687df7ac486   redis:4.0                          "docker-entrypoint.s…"   3 days ago   Up 3 days   0.0.0.0:6479->6379/tcp, :::6479->6379/tcp                                                                                       redis-6479
58cfdf41284a   redis:4.0                          "docker-entrypoint.s…"   3 days ago   Up 3 days   0.0.0.0:6579->6379/tcp, :::6579->6379/tcp                                                                                       redis-6579
d180471bb010   redis:4.0                          "docker-entrypoint.s…"   3 days ago   Up 3 days   0.0.0.0:6679->6379/tcp, :::6679->6379/tcp                                                                                       redis-6679

4539be899dd3   redis:4.0                          "docker-entrypoint.s…"   3 days ago   Up 3 days   0.0.0.0:7379->6379/tcp, :::7379->6379/tcp                                                                                       redis-7379
dcf1b8079c1e   redis:4.0                          "docker-entrypoint.s…"   3 days ago   Up 3 days   0.0.0.0:7479->6379/tcp, :::7479->6379/tcp                                                                                       redis-7479	
180a2255d038   redis:4.0                          "docker-entrypoint.s…"   3 days ago   Up 3 days   0.0.0.0:7579->6379/tcp, :::7579->6379/tcp                                                                                       redis-7579
a877a83287b4   redis:4.0                          "docker-entrypoint.s…"   3 days ago   Up 3 days   0.0.0.0:7679->6379/tcp, :::7679->6379/tcp                                                                                       redis-7679

Method 2: Compile the local image according to the latest code and restart

3 Verification

data replication

Test whether data in the master can be synchronized to the standby equipment room redis:

1. If reids are not installed locally, enter a redis container at random

docker exec -ti redis-6379  bash

2. Connect to master redis, add data, and exit

`redis-cli -h 172.19.0.10

set test1 12345`

3. Connect to the slave redis and exit after obtaining data

`redis-cli -h 172.19.0.13

get test1`

4, example

song@ubuntu:~/yusong/code/test$ docker exec  -ti redis-6379 bash
root@c568933bae57:/data# redis-cli -h 172.19.0.10
172.19.0.10:6379>  set test1 12345
OK
172.19.0.10:6379>  exit
root@c568933bae57:/data# redis-cli-h 172.19.0.13
172.19.0.13:6379>  get test1
"12345"
172.19.0.13:6379>  exit