JD App’s second-level 100G log transmission and storage architecture design and practice

JD App’s second-level 100G log transmission and storage architecture design and practice

2022-09-14 0 565
Resource Number 38340 Last Updated 2025-02-24
¥ 0USD Upgrade VIP
Download Now Matters needing attention
Can't download? Please contact customer service to submit a link error!
Value-added Service: Installation Guide Environment Configuration Secondary Development Template Modification Source Code Installation

The recommended projects in this issue are mainly designed for the collection, transmission, and storage of massive logs (second-level gigabytes).

Project Introduction

Different from traditional packages such as ELK series suites, this project mainly aims to solve the problems of high hardware cost and low performance caused by a large number of logs (entry and exit parameters, link trace logs) from collection to final retrieval and query.

Compared with common ELK solutions such as Filebeat, MQ transfer, and ES storage, this framework has more than 10 times the performance improvement and more than 70% disk savings. This means that with the same hardware configuration, only 100 MB of logs per second can be transmitted and stored, but this solution can process 1 GB of logs per second, and the disk space occupied by the whole link due to logs can be reduced by 70%.

background

In order to ensure the robustness of the system, the traceability of requests, and the troubleshooting of problems after problems, we usually save the user request from the entry and exit parameters, the logs of key nodes in the middle of the system (info, error), link logs, etc., and will save the logs for a period of time.

The log system is basically divided into several modules: collection (such as filebeat and logstash), transmission (Kafka and TCP), storage (ES, MYSQL, HIVE, etc.), and query (Kibana and self-built).

Taking the traditional log solution as an example, when a gigabyte of logs is generated, the logs are written to the local disk (occupying 1 GB of disk), read and write to mq (mq occupies 2 Gbit/s, single backup), and consume mq to write to es (es occupies 1 Gbit/s), which occupies a total of 4 GB of disk, which is accompanied by network bandwidth occupation and server resource occupation of the same size (100 Mbit/s processing per second for a single server).

With the size of JD App, when 100 G logs are generated in seconds, the hardware resources consumed are quite huge, and the cost has reached an unbearable point.

As mentioned above, the solution increases the log throughput by 10 times compared with the Filebeat+MQ+ES mode under the same hardware configuration, and the full-link disk occupation is reduced by more than 70%.

Brief introduction of the solution

JD App’s second-level 100G log transmission and storage architecture design and practice插图

From this general process, in fact, we can easily see that we have experienced a lot of reads and writes, and each read and write is accompanied by the read and write of the disk (including MQ is also written to the disk), and frequent serialization and deserialization, as well as doubling the network IO.

We found that it is not necessary to write local disk and MQ, we can write log data to local memory, and then make a thread to send the log directly to the worker through UDP on a regular basis.

After the worker receives it, it parses it, writes it to its own memory queue, and then starts several asynchronous threads to write the data of the queue to the ClickHouse database in batches.

Log collection system

JD App’s second-level 100G log transmission and storage architecture design and practice插图1

The basic process is to collect the logs printed in the middle of the process by obtaining the web request entry and exit parameters, custom log4j, and logback appender through the filter, correlate it with the tracerId generated at the request entry, write it to local memory (instead of writing disk), compress it (reduce the string space occupation by more than 80%), send it to the worker side through Udp (replace MQ), and the worker receives the data and extracts the index field. Except for the index fields to be used for future queries, the content is compressed until it is stored in the database.

Configuration Center: It stores the IP address of a worker and allows the client to obtain the IP address of the worker cluster assigned by its module.

client: After the client starts, it pulls the IP address of the worker cluster assigned to its module from the configuration center, polls and compresses the collected logs, and sends them to UDP.

worker: Each module will allocate a different number of worker machines, and report its own IP address to the configuration center after starting. After receiving the logs from the client, parse the corresponding fields and write them to the ClickHouse database in batches.

ClickHouse: A powerful database with a high compression ratio, extremely strong write performance, sharding by day, and good query speed. It is ideal for logging systems with a large amount of writes and a small number of queries.

Dashboard: A visual interface that displays data from ClickHouse to users, with multi-condition and multi-dimensional query functions.

JD App’s second-level 100G log transmission and storage architecture design and practice插图2

Aggregate logs on the client side

(1) Requested entry and exit parameters

If it is an HTTP web application, the easiest way to get the entry and exit parameters is through a custom filter. A filter is defined in the SDK of the client, and the access party can collect the entry and exit parameters by configuring the filter to take effect.

If it is a non-HTTP request from other RPC applications, a corresponding filter interceptor is also provided to obtain the entry and exit parameters.

After obtaining the input and input parameters, the SDK compresses the large packets, mainly the input parameters, and defines the entire data as a JAVA object, serializes it protobuf, and polls and transmits it to its corresponding worker cluster through UDP.

(2) Some key information printed by yourself on the link, such as calling the entry and exit parameters of other systems, and some information and error information printed by yourself

The SDK provides custom appenders for log4j, logback, and log4j2 respectively, and users can define my custom appender in their own log configuration files (such as logback.xml), so that all the info, error and other logs printed by subsequent users in the code will execute this custom appender.

Similarly, this custom appender also staging the logs in memory and then sending them out to the worker over asynchronous UDP.

Workers consume logs and store them in the database

The worker side is the focus of tuning, because it needs to receive a large number of logs sent by clients, parse them and put them into the database, so the worker needs to have a strong buffering ability.

Through online applications and rigorous stress testing, such a stand-alone docker container can process 1-50 million lines of raw logs per second, for example, a user request generates a total of more than 1,000 lines of logs in the middle, then such a single worker can handle 20,000 client QPS per second. If you write to a clickhouse database, you can write more than 200 Mbit/s per second.

Multi-Criteria Query Console

Do a good job of sharding data, such as sharding by day. Using the prewhere query function can improve performance. Do a good job in the design of index fields, such as the time and PIN commonly used for searching.

 

JD App’s second-level 100G log transmission and storage architecture design and practice插图3 JD App’s second-level 100G log transmission and storage architecture design and practice插图4
资源下载此资源为免费资源立即下载
Telegram:@John_Software

Disclaimer: This article is published by a third party and represents the views of the author only and has nothing to do with this website. This site does not make any guarantee or commitment to the authenticity, completeness and timeliness of this article and all or part of its content, please readers for reference only, and please verify the relevant content. The publication or republication of articles by this website for the purpose of conveying more information does not mean that it endorses its views or confirms its description, nor does it mean that this website is responsible for its authenticity.

Ictcoder Free source code JD App’s second-level 100G log transmission and storage architecture design and practice https://ictcoder.com/kyym/jd-apps-second-level-100g-log-transmission-and-storage-architecture-design-and-practice.html

Share free open-source source code

Q&A
  • 1, automatic: after taking the photo, click the (download) link to download; 2. Manual: After taking the photo, contact the seller to issue it or contact the official to find the developer to ship.
View details
  • 1, the default transaction cycle of the source code: manual delivery of goods for 1-3 days, and the user payment amount will enter the platform guarantee until the completion of the transaction or 3-7 days can be issued, in case of disputes indefinitely extend the collection amount until the dispute is resolved or refunded!
View details
  • 1. Heptalon will permanently archive the process of trading between the two parties and the snapshots of the traded goods to ensure that the transaction is true, effective and safe! 2, Seven PAWS can not guarantee such as "permanent package update", "permanent technical support" and other similar transactions after the merchant commitment, please identify the buyer; 3, in the source code at the same time there is a website demonstration and picture demonstration, and the site is inconsistent with the diagram, the default according to the diagram as the dispute evaluation basis (except for special statements or agreement); 4, in the absence of "no legitimate basis for refund", the commodity written "once sold, no support for refund" and other similar statements, shall be deemed invalid; 5, before the shooting, the transaction content agreed by the two parties on QQ can also be the basis for dispute judgment (agreement and description of the conflict, the agreement shall prevail); 6, because the chat record can be used as the basis for dispute judgment, so when the two sides contact, only communicate with the other party on the QQ and mobile phone number left on the systemhere, in case the other party does not recognize self-commitment. 7, although the probability of disputes is very small, but be sure to retain such important information as chat records, mobile phone messages, etc., in case of disputes, it is convenient for seven PAWS to intervene in rapid processing.
View details
  • 1. As a third-party intermediary platform, Qichou protects the security of the transaction and the rights and interests of both buyers and sellers according to the transaction contract (commodity description, content agreed before the transaction); 2, non-platform online trading projects, any consequences have nothing to do with mutual site; No matter the seller for any reason to require offline transactions, please contact the management report.
View details

Related Article

make a comment
No comments available at the moment
Official customer service team

To solve your worries - 24 hours online professional service