Ali Open Source Quick and easy to avoid OOM Excel processing tools

Ali Open Source Quick and easy to avoid OOM Excel processing tools

2022-09-02 0 1,173
Resource Number 38023 Last Updated 2025-02-24
¥ 0HKD Upgrade VIP
Download Now Matters needing attention
Can't download? Please contact customer service to submit a link error!
Value-added Service: Installation Guide Environment Configuration Secondary Development Template Modification Source Code Installation

This issue recommends an Ali open source Java-based Excel parsing tool——EasyExcel。

Ali Open Source Quick and easy to avoid OOM Excel processing tools插图

Java parsing, the more famous framework has Apache poi, jxl, but they all have a serious problem is memory consumption, poi has a special SAX pattern can solve some memory problems to a certain extent, but poi still has some defects, For example, some versions of Excel decompression and storage after decompression are completed in memory, memory is still a lot of consumption. easyexcel rewrites the poi analysis of Excel, a 3M Excel file using poi analysis still needs about 100M memory, after switching to easyexcel can be reduced to a few M, no matter how large excel will not appear memory calls.

latest version

<dependency>

<groupId>com.alibaba</groupId>

<artifactId>easyexcel</artifactId>

<version>3.0.5</version>

</dependency>

give a typical example

  • read Excel

Ali Open Source Quick and easy to avoid OOM Excel processing tools插图1

object

@Getter
@Setter
@EqualsAndHashCode
public class DemoData {
    private String string;
    private Date date;
    private Double doubleData;
}

audio monitor

// There's a very important point DemoDataListener Cannot be managed by spring,You want to call new every time you read excel,And then you use spring inside and you can pass constructors in
@Slf4j
public class DemoDataListener implements ReadListener<DemoData> {

    /**
     * Every 5 pieces of storage database, in actual use can be 100, and then clear the list, convenient memory recycling
     */
    private static final int BATCH_COUNT = 100;
    /**
     * Cached data
     */
    private List<DemoData> cachedDataList = ListUtils.newArrayListWithExpectedSize(BATCH_COUNT);
    /**
     * Let's say this is a DAO, and of course there's business logic and this could also be a service. Of course it's useless if you don't have to store this object.
     */
    private DemoDAO demoDAO;

    public DemoDataListener() {
        // This is a demo, so just make a new one. If you are in spring, use the following parameterized constructor
        demoDAO = new DemoDAO();
    }

    /**
     * If spring is used, use this constructor. Each time you create a Listener, you need to pass in a Spring-managed class
     *
     * @param demoDAO
     */
    public DemoDataListener(DemoDAO demoDAO) {
        this.demoDAO = demoDAO;
    }

    /**
     * This will be called for every piece of data parsing
     *
     * @param data    one row value. Is is same as {@link AnalysisContext#readRowHolder()}
     * @param context
     */
    @Override
    public void invoke(DemoData data, AnalysisContext context) {
        log.info("Parse a piece of data:{}", JSON.toJSONString(data));
        cachedDataList.add(data);
        // When BATCH_COUNT is reached, the database needs to be stored once to prevent tens of thousands of data from being stored in memory, which is easy to OOM
        if (cachedDataList.size() >= BATCH_COUNT) {
            saveData();
            // Memory completion cleanup list
            cachedDataList = ListUtils.newArrayListWithExpectedSize(BATCH_COUNT);
        }
    }

    /**
     * It's called when all the data is parsed
     *
     * @param context
     */
    @Override
    public void doAfterAllAnalysed(AnalysisContext context) {
        // This is also where data is saved to ensure that the last remaining data is also stored in the database
        saveData();
        log.info("All data analysis completed!");
    }

    /**
     * Plus storage database
     */
    private void saveData() {
        log.info("{}Start to store the database!", cachedDataList.size());
        demoDAO.save(cachedDataList);
        log.info("Save database successfully!");
    }
}

persistent layer

/**
 * Let's say this is your DAO store. Of course you need this class for spring to manage, of course you don't need storage, and you don't need this class.
 **/
public class DemoDAO {
    public void save(List<DemoData> list) {
        // If it is mybatis, try not to call insert directly many times, write a new mapper method batchInsert, all the data is inserted at once
    }
}

The simplest read sample code

    /**
     * The easiest read
     * <p>
     * 1. To create an entity object corresponding to excel, go to {@link DemoData}
     * <p>
     * 2. Since excel is read line by line by default, you need to create a callback listener for excel line by line, see {@link DemoDataListener}
     * <p>
     * 3. Just read
     */
    @Test
    public void simpleRead() {
        // Script 1: JDK8+, no need to write an additional DemoDataListener
        // since: 3.0.0-beta1
        String fileName = TestFileUtil.getPath() + "demo" + File.separator + "demo.xlsx";
        // Here you need to specify which class to read, and then read the first sheet file stream will automatically close
        // So you're going to read 3000 pieces of data at a time and then you're going to go back and you're just going to use the data
        EasyExcel.read(fileName, DemoData.class, new PageReadListener<DemoData>(dataList -> {
            for (DemoData demoData : dataList) {
                log.info("A piece of data was read{}", JSON.toJSONString(demoData));
            }
        })).sheet().doRead();

        // Writing Method 2:
        // Anonymous inner classes don't have to write an extra one DemoDataListener
        fileName = TestFileUtil.getPath() + "demo" + File.separator + "demo.xlsx";
        // Here you need to specify which class to read, and then read the first sheet file stream will automatically close
        EasyExcel.read(fileName, DemoData.class, new ReadListener<DemoData>() {
            /**
             * The amount of data cached at a time
             */
            public static final int BATCH_COUNT = 100;
            /**
             *temporary storage
             */
            private List<DemoData> cachedDataList = ListUtils.newArrayListWithExpectedSize(BATCH_COUNT);

            @Override
            public void invoke(DemoData data, AnalysisContext context) {
                cachedDataList.add(data);
                if (cachedDataList.size() >= BATCH_COUNT) {
                    saveData();
                    // Memory completion cleanup list
                    cachedDataList = ListUtils.newArrayListWithExpectedSize(BATCH_COUNT);
                }
            }

            @Override
            public void doAfterAllAnalysed(AnalysisContext context) {
                saveData();
            }

            /**
             * Plus storage database
             */
            private void saveData() {
                log.info("{}Start to store the database!", cachedDataList.size());
                log.info("Save database successfully!");
            }
        }).sheet().doRead();

        // There's a very important point DemoDataListener It can't be managed by spring, you need to get new every time you read excel, and then you can use spring to build methods into it
        // Writing Method 3:
        fileName = TestFileUtil.getPath() + "demo" + File.separator + "demo.xlsx";
        // Here you need to specify which class to read, and then read the first sheet file stream will automatically close
        EasyExcel.read(fileName, DemoData.class, new DemoDataListener()).sheet().doRead();

        // Writing Method 4:
        fileName = TestFileUtil.getPath() + "demo" + File.separator + "demo.xlsx";
        // File by file reader
        ExcelReader excelReader = null;
        try {
            excelReader = EasyExcel.read(fileName, DemoData.class, new DemoDataListener()).build();
            // Build a sheet here you can specify a name or no
            ReadSheet readSheet = EasyExcel.readSheet(0).build();
            // Read one sheet
            excelReader.read(readSheet);
        } finally {
            if (excelReader != null) {
                // Don't forget to close it here, it will create a temporary file while reading, and the disk will crash
                excelReader.finish();
            }
        }
    }

64M内存20秒读取75M(46W行25列)的Excel:

Ali Open Source Quick and easy to avoid OOM Excel processing tools插图2

Of course, the quick mode is faster, but the memory usage is a little more than 100M.

You can read more on your own.

资源下载此资源为免费资源立即下载
Telegram:@John_Software

Disclaimer: This article is published by a third party and represents the views of the author only and has nothing to do with this website. This site does not make any guarantee or commitment to the authenticity, completeness and timeliness of this article and all or part of its content, please readers for reference only, and please verify the relevant content. The publication or republication of articles by this website for the purpose of conveying more information does not mean that it endorses its views or confirms its description, nor does it mean that this website is responsible for its authenticity.

Ictcoder Free Source Code Ali Open Source Quick and easy to avoid OOM Excel processing tools https://ictcoder.com/ali-open-source-quick-and-easy-to-avoid-oom-excel-processing-tools/

Share free open-source source code

Q&A
  • 1. Automatic: After making an online payment, click the (Download) link to download the source code; 2. Manual: Contact the seller or the official to check if the template is consistent. Then, place an order and make payment online. The seller ships the goods, and both parties inspect and confirm that there are no issues. ICTcoder will then settle the payment for the seller. Note: Please ensure to place your order and make payment through ICTcoder. If you do not place your order and make payment through ICTcoder, and the seller sends fake source code or encounters any issues, ICTcoder will not assist in resolving them, nor can we guarantee your funds!
View details
  • 1. Default transaction cycle for source code: The seller manually ships the goods within 1-3 days. The amount paid by the user will be held in escrow by ICTcoder until 7 days after the transaction is completed and both parties confirm that there are no issues. ICTcoder will then settle with the seller. In case of any disputes, ICTcoder will have staff to assist in handling until the dispute is resolved or a refund is made! If the buyer places an order and makes payment not through ICTcoder, any issues and disputes have nothing to do with ICTcoder, and ICTcoder will not be responsible for any liabilities!
View details
  • 1. ICTcoder will permanently archive the transaction process between both parties and snapshots of the traded goods to ensure the authenticity, validity, and security of the transaction! 2. ICTcoder cannot guarantee services such as "permanent package updates" and "permanent technical support" after the merchant's commitment. Buyers are advised to identify these services on their own. If necessary, they can contact ICTcoder for assistance; 3. When both website demonstration and image demonstration exist in the source code, and the text descriptions of the website and images are inconsistent, the text description of the image shall prevail as the basis for dispute resolution (excluding special statements or agreements); 4. If there is no statement such as "no legal basis for refund" or similar content, any indication on the product that "once sold, no refunds will be supported" or other similar declarations shall be deemed invalid; 5. Before the buyer places an order and makes payment, the transaction details agreed upon by both parties via WhatsApp or email can also serve as the basis for dispute resolution (in case of any inconsistency between the agreement and the description of the conflict, the agreement shall prevail); 6. Since chat records and email records can serve as the basis for dispute resolution, both parties should only communicate with each other through the contact information left on the system when contacting each other, in order to prevent the other party from denying their own commitments. 7. Although the probability of disputes is low, it is essential to retain important information such as chat records, text messages, and email records, in case a dispute arises, so that ICTcoder can intervene quickly.
View details
  • 1. As a third-party intermediary platform, ICTcoder solely protects transaction security and the rights and interests of both buyers and sellers based on the transaction contract (product description, agreed content before the transaction); 2. For online trading projects not on the ICTcoder platform, any consequences are unrelated to this platform; regardless of the reason why the seller requests an offline transaction, please contact the administrator to report.
View details

Related Source code

ICTcoder Customer Service

24-hour online professional services