JD App’s second-level 100G log transmission and storage architecture design and practice

JD App’s second-level 100G log transmission and storage architecture design and practice

2022-09-14 0 983
Resource Number 38340 Last Updated 2025-02-24
¥ 0HKD Upgrade VIP
Download Now Matters needing attention
Can't download? Please contact customer service to submit a link error!
Value-added Service: Installation Guide Environment Configuration Secondary Development Template Modification Source Code Installation

The recommended projects in this issue are mainly designed for the collection, transmission, and storage of massive logs (second-level gigabytes).

Project Introduction

Different from traditional packages such as ELK series suites, this project mainly aims to solve the problems of high hardware cost and low performance caused by a large number of logs (entry and exit parameters, link trace logs) from collection to final retrieval and query.

Compared with common ELK solutions such as Filebeat, MQ transfer, and ES storage, this framework has more than 10 times the performance improvement and more than 70% disk savings. This means that with the same hardware configuration, only 100 MB of logs per second can be transmitted and stored, but this solution can process 1 GB of logs per second, and the disk space occupied by the whole link due to logs can be reduced by 70%.

background

In order to ensure the robustness of the system, the traceability of requests, and the troubleshooting of problems after problems, we usually save the user request from the entry and exit parameters, the logs of key nodes in the middle of the system (info, error), link logs, etc., and will save the logs for a period of time.

The log system is basically divided into several modules: collection (such as filebeat and logstash), transmission (Kafka and TCP), storage (ES, MYSQL, HIVE, etc.), and query (Kibana and self-built).

Taking the traditional log solution as an example, when a gigabyte of logs is generated, the logs are written to the local disk (occupying 1 GB of disk), read and write to mq (mq occupies 2 Gbit/s, single backup), and consume mq to write to es (es occupies 1 Gbit/s), which occupies a total of 4 GB of disk, which is accompanied by network bandwidth occupation and server resource occupation of the same size (100 Mbit/s processing per second for a single server).

With the size of JD App, when 100 G logs are generated in seconds, the hardware resources consumed are quite huge, and the cost has reached an unbearable point.

As mentioned above, the solution increases the log throughput by 10 times compared with the Filebeat+MQ+ES mode under the same hardware configuration, and the full-link disk occupation is reduced by more than 70%.

Brief introduction of the solution

JD App’s second-level 100G log transmission and storage architecture design and practice插图

From this general process, in fact, we can easily see that we have experienced a lot of reads and writes, and each read and write is accompanied by the read and write of the disk (including MQ is also written to the disk), and frequent serialization and deserialization, as well as doubling the network IO.

We found that it is not necessary to write local disk and MQ, we can write log data to local memory, and then make a thread to send the log directly to the worker through UDP on a regular basis.

After the worker receives it, it parses it, writes it to its own memory queue, and then starts several asynchronous threads to write the data of the queue to the ClickHouse database in batches.

Log collection system

JD App’s second-level 100G log transmission and storage architecture design and practice插图1

The basic process is to collect the logs printed in the middle of the process by obtaining the web request entry and exit parameters, custom log4j, and logback appender through the filter, correlate it with the tracerId generated at the request entry, write it to local memory (instead of writing disk), compress it (reduce the string space occupation by more than 80%), send it to the worker side through Udp (replace MQ), and the worker receives the data and extracts the index field. Except for the index fields to be used for future queries, the content is compressed until it is stored in the database.

Configuration Center: It stores the IP address of a worker and allows the client to obtain the IP address of the worker cluster assigned by its module.

client: After the client starts, it pulls the IP address of the worker cluster assigned to its module from the configuration center, polls and compresses the collected logs, and sends them to UDP.

worker: Each module will allocate a different number of worker machines, and report its own IP address to the configuration center after starting. After receiving the logs from the client, parse the corresponding fields and write them to the ClickHouse database in batches.

ClickHouse: A powerful database with a high compression ratio, extremely strong write performance, sharding by day, and good query speed. It is ideal for logging systems with a large amount of writes and a small number of queries.

Dashboard: A visual interface that displays data from ClickHouse to users, with multi-condition and multi-dimensional query functions.

JD App’s second-level 100G log transmission and storage architecture design and practice插图2

Aggregate logs on the client side

(1) Requested entry and exit parameters

If it is an HTTP web application, the easiest way to get the entry and exit parameters is through a custom filter. A filter is defined in the SDK of the client, and the access party can collect the entry and exit parameters by configuring the filter to take effect.

If it is a non-HTTP request from other RPC applications, a corresponding filter interceptor is also provided to obtain the entry and exit parameters.

After obtaining the input and input parameters, the SDK compresses the large packets, mainly the input parameters, and defines the entire data as a JAVA object, serializes it protobuf, and polls and transmits it to its corresponding worker cluster through UDP.

(2) Some key information printed by yourself on the link, such as calling the entry and exit parameters of other systems, and some information and error information printed by yourself

The SDK provides custom appenders for log4j, logback, and log4j2 respectively, and users can define my custom appender in their own log configuration files (such as logback.xml), so that all the info, error and other logs printed by subsequent users in the code will execute this custom appender.

Similarly, this custom appender also staging the logs in memory and then sending them out to the worker over asynchronous UDP.

Workers consume logs and store them in the database

The worker side is the focus of tuning, because it needs to receive a large number of logs sent by clients, parse them and put them into the database, so the worker needs to have a strong buffering ability.

Through online applications and rigorous stress testing, such a stand-alone docker container can process 1-50 million lines of raw logs per second, for example, a user request generates a total of more than 1,000 lines of logs in the middle, then such a single worker can handle 20,000 client QPS per second. If you write to a clickhouse database, you can write more than 200 Mbit/s per second.

Multi-Criteria Query Console

Do a good job of sharding data, such as sharding by day. Using the prewhere query function can improve performance. Do a good job in the design of index fields, such as the time and PIN commonly used for searching.

 

JD App’s second-level 100G log transmission and storage architecture design and practice插图3 JD App’s second-level 100G log transmission and storage architecture design and practice插图4
资源下载此资源为免费资源立即下载
Telegram:@John_Software

Disclaimer: This article is published by a third party and represents the views of the author only and has nothing to do with this website. This site does not make any guarantee or commitment to the authenticity, completeness and timeliness of this article and all or part of its content, please readers for reference only, and please verify the relevant content. The publication or republication of articles by this website for the purpose of conveying more information does not mean that it endorses its views or confirms its description, nor does it mean that this website is responsible for its authenticity.

Ictcoder Free Source Code JD App’s second-level 100G log transmission and storage architecture design and practice https://ictcoder.com/jd-apps-second-level-100g-log-transmission-and-storage-architecture-design-and-practice/

Share free open-source source code

Q&A
  • 1. Automatic: After making an online payment, click the (Download) link to download the source code; 2. Manual: Contact the seller or the official to check if the template is consistent. Then, place an order and make payment online. The seller ships the goods, and both parties inspect and confirm that there are no issues. ICTcoder will then settle the payment for the seller. Note: Please ensure to place your order and make payment through ICTcoder. If you do not place your order and make payment through ICTcoder, and the seller sends fake source code or encounters any issues, ICTcoder will not assist in resolving them, nor can we guarantee your funds!
View details
  • 1. Default transaction cycle for source code: The seller manually ships the goods within 1-3 days. The amount paid by the user will be held in escrow by ICTcoder until 7 days after the transaction is completed and both parties confirm that there are no issues. ICTcoder will then settle with the seller. In case of any disputes, ICTcoder will have staff to assist in handling until the dispute is resolved or a refund is made! If the buyer places an order and makes payment not through ICTcoder, any issues and disputes have nothing to do with ICTcoder, and ICTcoder will not be responsible for any liabilities!
View details
  • 1. ICTcoder will permanently archive the transaction process between both parties and snapshots of the traded goods to ensure the authenticity, validity, and security of the transaction! 2. ICTcoder cannot guarantee services such as "permanent package updates" and "permanent technical support" after the merchant's commitment. Buyers are advised to identify these services on their own. If necessary, they can contact ICTcoder for assistance; 3. When both website demonstration and image demonstration exist in the source code, and the text descriptions of the website and images are inconsistent, the text description of the image shall prevail as the basis for dispute resolution (excluding special statements or agreements); 4. If there is no statement such as "no legal basis for refund" or similar content, any indication on the product that "once sold, no refunds will be supported" or other similar declarations shall be deemed invalid; 5. Before the buyer places an order and makes payment, the transaction details agreed upon by both parties via WhatsApp or email can also serve as the basis for dispute resolution (in case of any inconsistency between the agreement and the description of the conflict, the agreement shall prevail); 6. Since chat records and email records can serve as the basis for dispute resolution, both parties should only communicate with each other through the contact information left on the system when contacting each other, in order to prevent the other party from denying their own commitments. 7. Although the probability of disputes is low, it is essential to retain important information such as chat records, text messages, and email records, in case a dispute arises, so that ICTcoder can intervene quickly.
View details
  • 1. As a third-party intermediary platform, ICTcoder solely protects transaction security and the rights and interests of both buyers and sellers based on the transaction contract (product description, agreed content before the transaction); 2. For online trading projects not on the ICTcoder platform, any consequences are unrelated to this platform; regardless of the reason why the seller requests an offline transaction, please contact the administrator to report.
View details

Related Source code

ICTcoder Customer Service

24-hour online professional services