Recommended in this period is Jingdong App background middleware, millisecond level detection of hot data, millisecond level push to the server cluster memory, greatly reduce the hot key query pressure on the data layer.

Design and practice of Jingdong millisecond hot key detection frame插图

Project Description

Any sudden hot data that cannot be sensed in advance, including but not limited to hot data (such as sudden large number of requests for the same product), hot users (such as malicious crawler brush), hot interface (sudden large number of requests for the same interface), etc. With millisecond precision detection. These hot data, hot users, etc., are then pushed to all server JVM memory to greatly reduce the impact on the back-end data storage layer, and it is up to the user to decide how to allocate and use these hot keys (such as local caching of hot goods, denying access to hot users, fusing hot interfaces or returning default values). These hot data are consistent in the entire server cluster, and services are isolated, and worker performance is strong.

The framework has undergone several pressure tests, and there are two main performance indicators:

1 Detection performance: the 8-core single worker can receive and process 160,000 key detection tasks per second, and the 16-core single worker can process at least 300,000 key detection tasks per second smoothly, and the actual pressure test reaches 370,000. The CPU is supported smoothly and the frame is not abnormal.

2 Push performance: At the same time of high concurrent writing, the current performance of external push is about 100,000 to 120,000 times per second, for example, there are 1,000 servers, a worker generates 100 hot keys per second, then this second will be pushed smoothly 100 * 1000 = 100,000 times, 100,000 times of push will be clearly delivered within 1s. If it is written less, push more, and count by pure push, the framework can stably push 400,000-600,000 times per second, and 800,000 times the limit can hold for a few seconds.

Single machine throughput per second (write + external push) is currently stable at about 700,000.

Core function:

Hot data detection and push to cluster servers

Application scenario:

mysql Hot data local cache
redis Hot data local cache
Blacklist user local cache
Crawler user limit
Interface and user dimension limit
Single-machine interface, user dimension limit
Cluster user dimension traffic limit
Cluster interface dimension limiting traffic

What is a hot key

Hot data that MySQL and other databases will be frequently accessed
Such as the skuId of explosive goods.
redis’ intensively accessed key
Such as the dimensional information of popular goods, skuId, shopId, etc.
Robot, crawler, brush user
Such as the user’s userId, uuid, ip, etc.
An interface address
such as /sku/query or finer dimensions.
User id+ interface information
For example, userId + /sku/query, this represents how often a user accesses an interface.
Server id+ interface information
For example, ip + /sku/query, this represents how often an interface on a server is accessed.
User id+ interface information + specific product
For example, userId + /sku/query + skuId, which represents how often a user accesses an item.

How to solve the Hot key problem

We take the typical scenarios of redis hot key, brush user, current limiting, etc.

redis hot key:

This previous solution is more blooming, more common are:

Upper-level cache, after reading the redis key-value information, it will directly write a copy to the jvm cache, set an expiration time, and set an elimination policy, such as eliminating the first to join when the queue is full. Or use guava cache or caffeine cache for stand-alone local caching, the overall hit rate is low.
Rewrite redis source code to add hot spot detection function, push to jvm when there is a hot key. The main problem is that it is not universal, and there is a certain degree of difficulty.
Rewrite the jar of redis clients such as jedis and letture, and detect hotspot keys through local calculation. If hot keys are hot, they are cached locally and notified to other machines in the cluster.

Brush crawler User:

After daily accumulation, push these blacklists to the jvm memory through the configuration center. There is a problem that the lag cannot be sensed in real time.
Perform real-time calculation through local accumulation, and calculate brushes that exceed the threshold value per unit time. If there are more servers, there is a problem that user requests are scattered, and the local calculation cannot identify the brush.
Other components, such as redis, are introduced for centralized accumulation calculation, and those exceeding the threshold are pulled to the local memory. The problem is that redis needs to be read and written frequently, and there are still redis performance bottlenecks.

Current limit:

- Interface current limiting in the single-machine dimension mostly uses local cumulative counting
- Third-party middleware such as sentinel

for cluster dimensions

Gateway layer, e.g. Nginx+lua

To sum up, we will find that although they can all be attributed to the field of hot key, there is no unified solution, we prefer to have a unified framework, it can solve all the scenarios that have real-time perception of hot keys, preferably no matter what key, what dimension, as long as I concatenate this string, give it to the framework to detect. Set a threshold for determining the hot value (such as 20 occurrences of the string in 2 seconds), then the hot key can enter the application’s jvm memory within milliseconds, and maintain consistency throughout the service cluster, to have all, to delete all.

The framework consists of four main parts

1, etcd cluster

etcd, as a high-performance configuration center, can provide efficient monitoring subscription services with minimal resource consumption. It is used to store rule configurations, ip addresses of each worker, detected hot keys, and manually added hot keys.

2. jar package on client

is the reference jar added to the service. After it is introduced, it can be a convenient way to determine whether a certain key is a hot key. At the same time, the jar completes key reporting, monitoring rule changes in etcd, worker information changes, hot key changes, and local caffeine caching of hot keys.

3. worker cluster

worker is an independently deployed Java program that connects to etcd after startup and periodically reports its ip information for client to obtain an address and connect to. After that, the key to be tested from each client is aggregated and calculated. When the value reaches the rule threshold set in etcd, the hot key is pushed to each client.

4. dashboard console

The console is a Java program with a visual interface, which is also connected to etcd, and then sets the key rules of each APP in the console, such as 20 hot keys appearing in 2 seconds. Then, when the worker detects the hot key, it will send the key to etcd, and the dashboard will also monitor the hot key information and save records in the database. In addition, you can manually add or delete hot keys for each client to monitor.

Design and practice of Jingdong millisecond hot key detection frame插图1

To sum up, it can be seen that the framework does not rely on any customized components, let alone redis. The core is netty connection, the client sends the key to be tested, and then each worker completes the distributed calculation. After calculating the hot key, the system will be able to solve the problem. Push it directly to the client side, very lightweight.

Powerful performance of worker

Print one line every 10 seconds. totalDealCount represents the total number of keys processed. It can be seen that the processing volume is between 2.7 million and 3.1 million every 10 seconds, corresponding to about 300,000 QPS per second.

Design and practice of Jingdong millisecond hot key detection frame插图2

Design and practice of Jingdong millisecond hot key detection frame插图3

Performance is further improved with protobuf serialization. When the second level is above 360,000, it can stabilize at 60% of the CPU, and the pressure test duration is more than 5 hours, without any abnormality. At 300,000, the pressure measurement lasted for more than a few days, and no abnormalities were observed.

Design and practice of Jingdong millisecond hot key detection frame插图4

Design and practice of Jingdong millisecond hot key detection frame插图5

Installation tutorial

Install etcd

1. Download the etcd of the corresponding operating system from the etcd download page.
https://github.com/etcd-io/etcd/releases use 3.4.x and above.

2. Start worker (cluster) Download and compile the code, package the worker as a jar, and start it. For example:

java -jar $JAVA_OPTS worker-0.0.1-SNAPSHOT.jar --etcd.server=${etcdServer}

worker can be configured as follows:

Design and practice of Jingdong millisecond hot key detection frame插图6

etcdServer is the address of the etcd cluster, separated by commas

JAVA_OPTS is configured JVM related, can be configured according to the actual situation

threadCount is the number of threads that process keys, which is calculated by the program if not specified.

workerPath indicates the application for which the worker provides computing services. For example, different appnames of applications need to be separated by different workers to avoid resource competition.

3. Start the console

Download and compile the dashboard project, create the database, and import the db.sql file under resource. Configure the database-related and etcdServer addresses in application.yml.

Start the dashboard project, access ip:8081, you can see the interface.

The node information is the list of currently started workers.

Rule configuration is where rules are set for each app. You need to add an APP when using it for the first time. In the User Management menu, add a new user and set his APP name, such as sample. The newly added user can then log in to the dashboard and set rules for their APP. The default login password is 123456.

Design and practice of Jingdong millisecond hot key detection frame插图7

The figure shows a set of rules. For example, the rule of the hot key starting with as__ is that if threshold-10 times occur within intervose-2 seconds, it is considered to be a hot key, and it will be pushed to the jvm memory and cached for 60 seconds. Prefix-true indicates prefix matching. Then in the application, you can put a set of keys, all with as__ beginning, for detection.

4.client access

Importing the pom dependency of the client.

Initializes HotKey where the application starts, such as

@PostConstruct

public void initHotkey() {

    ClientStarter.Builder builder = new ClientStarter.Builder();
    ClientStarter starter = builder.setAppName("appName").setEtcdServer("http://1.8.8.4:2379,http://1.1.4.4:2379,http://1.1.1.1:2379").build();
    starter.startPipeline();
}

In which setCaffeineSize(int size) can also set the maximum number of local cache, the default is 50,000, setPushPeriod(Long period) set the interval of batch push key, the default is 500ms, the smaller the value, the longer the value is. The more frequently hot keys are reported, the more timely the response is. You are advised to adjust the hot keys according to the actual situation. For example, if a single machine has 10 QPSS per second, the hot keys can be reported once every 0.5 seconds. The minimum value is 1, that is, it is reported once every 1ms.

< Note:

If the original project uses guava, it is necessary to upgrade guava to the following version, otherwise too low guava version may cause jar package conflict. Or remove guava’s maven dependency from your own project, guava upgrade will not affect any of the original logic.

<dependency>
 <groupId>com.google.guava</groupId>
 <artifactId>guava</artifactId>
 <version>28.2-jre</version>
 <scope>compile</scope>
</dependency>

Sometimes it is possible that the project does not directly rely on guava, but it is introduced in a pom, and guava needs to be excluded.

Use

There are four main methods to use

boolean JdHotKeyStore.isHotKey(String key)
Object JdHotKeyStore.get(String key)
void JdHotKeyStore.smartSet(String key,  Object value)
Object JdHotKeyStore.getValue(String  key)

1 boolean isHotKey(String key), this method returns whether the key is a hot key, true if it is, false if it is not, and reports the key to the probe cluster for quantity calculation. This method is usually used to judge only need to determine whether the key is hot, do not need to cache the value of the scenario, such as brush users, interface access frequency.

2 Object get(String key), this method returns the value of the local cache of the key, can be used to determine the hot key, and then obtain the value of the local cache, usually used for redis hot key cache

3 void smartSet(String key, Object value), method to assign a value to the hot key, if it is a hot key, the method will assign a value, non-hot key, do nothing

4 Object getValue(String key), this method is an integration method, equivalent to the integration of isHotKey and get methods, the method directly returns the value of the local cache. If it is a hot key, there are two cases, 1 is returned value, 2 is returned null. Return null because it has not yet been set to a real value, return a non-NULL means that the set method has been called, the local cache value has a value. If the key is not a hot key, null is returned and the key is reported to the detection cluster for quantity detection.