博客
关于我
强烈建议你试试无所不能的chatGPT,快点击我
Poi导出产生OOM解决方案
阅读量:5970 次
发布时间:2019-06-19

本文共 9327 字,大约阅读时间需要 31 分钟。


title: Poi导出产生OOM解决方案 tags:

  • Poi
  • Stream
  • Excel
  • 导出
  • oom categories: ci date: 2017-11-30 22:53:39

背景

  1. 目前生产系统大量的数据的导出,情况不是特别理想,将常出现OOM
  2. 如果不限制程序最大内存可能导致其他应用收到影响 比如稽计任务

思路

考虑出现OOM是如下几个因素

  1. 查询大量数据 需要维护一个大的列表【如果出现大量的数据 超大的List】我们目前正常最大的list大概能达到接近10w
  2. 封装Excel的workBook 创建Sheet和Cell【基本上单行得有接近20个字段】通常的思路就是workbook==>sheet==》cell

从这边可以看出通常我们出现系统OOM的应该是在EXCEL导出出现的概率较高

可以在一个导出完成的系统中查看占比

可以看出我们的业务数据排在第16位 4w+ 大约占用15.3M 而对应的Excel基本上最大的一个类就占用了接近550M

可想而知目前最大的瓶颈在于优化Excel导出导致的OOM【如果限制XMX的话可能会频繁OOM  如果不限制的话可能会使得JVM急速膨胀】

那思路就会有如下

  1. 分解多个Excel 使用多个Excel来进行导出后进行合并 【比较麻烦 并且侵入性太高】打包压缩?
  2. 将Excel导出时写到临时文件中最终把临时文件发出去即可

Poi在新版本支持了我们所说的特性

首先了解一下office03和office07的区别

通常来说office03的拓展名是xls office07的拓展名是xlsx

xls一般支持最大行数为65535【单sheet】而在xlsx中支持达到1048576

当我们导出大量的报表的数据时建议采用xlsx的方式

而在Poi中实现xlsx中有两种方式

分别是

其中

  1. HSSFWorkbook是office03
  2. XSSFWorkbook是我们在office07总最常使用
  3. SXSSFWorkbook这个提供了OOM的解决方案 如上述所说会临时写到临时文件中
/**         * Streaming version of XSSFWorkbook implementing the "BigGridDemo" strategy.         *         * This allows to write very large files without running out of memory as only         * a configurable portion of the rows are kept in memory at any one time.         *         * You can provide a template workbook which is used as basis for the written         * data.         *         * See https://poi.apache.org/spreadsheet/how-to.html#sxssf for details.         *         * Please note that there are still things that still may consume a large         * amount of memory based on which features you are using, e.g. merged regions,         * comments, ... are still only stored in memory and thus may require a lot of         * memory if used extensively.         *         * SXSSFWorkbook defaults to using inline strings instead of a shared strings         * table. This is very efficient, since no document content needs to be kept in         * memory, but is also known to produce documents that are incompatible with         * some clients. With shared strings enabled all unique strings in the document         * has to be kept in memory. Depending on your document content this could use         * a lot more resources than with shared strings disabled.         *         * Carefully review your memory budget and compatibility needs before deciding         * whether to enable shared strings or not.         */        public class SXSSFWorkbook implements Workbook {            /**             * Specifies how many rows can be accessed at most via {
@link SXSSFSheet#getRow}. * When a new node is created via {
@link SXSSFSheet#createRow} and the total number * of unflushed records would exceed the specified value, then the * row with the lowest index value is flushed and cannot be accessed * via {
@link SXSSFSheet#getRow} anymore. */ public static final int DEFAULT_WINDOW_SIZE = 100; }复制代码

默认情况下提供100行的访问【换言之 访问窗口为100】

源码

我们确认一下代码中如何实现临时文件的建立

/**         * Streaming version of XSSFSheet implementing the "BigGridDemo" strategy.        */        public class SXSSFSheet implements Sheet        {            /*package*/ final XSSFSheet _sh;            private final SXSSFWorkbook _workbook;            private final TreeMap
_rows=new TreeMap
(); private final SheetDataWriter _writer; private int _randomAccessWindowSize = SXSSFWorkbook.DEFAULT_WINDOW_SIZE; private final AutoSizeColumnTracker _autoSizeColumnTracker; private int outlineLevelRow = 0; private int lastFlushedRowNumber = -1; private boolean allFlushed = false; public SXSSFSheet(SXSSFWorkbook workbook, XSSFSheet xSheet) throws IOException { _workbook = workbook; _sh = xSheet; _writer = workbook.createSheetDataWriter(); setRandomAccessWindowSize(_workbook.getRandomAccessWindowSize()); _autoSizeColumnTracker = new AutoSizeColumnTracker(this); }复制代码

创建Sheet的时候传入对应参数 【默认100】

首先在创建rows或进行判断

@Override        public SXSSFRow createRow(int rownum)        {            int maxrow = SpreadsheetVersion.EXCEL2007.getLastRowIndex();            if (rownum < 0 || rownum > maxrow) {                throw new IllegalArgumentException("Invalid row number (" + rownum                        + ") outside allowable range (0.." + maxrow + ")");            }                     // attempt to overwrite a row that is already flushed to disk            if(rownum <= _writer.getLastFlushedRow() ) {                throw new IllegalArgumentException(                        "Attempting to write a row["+rownum+"] " +                        "in the range [0," + _writer.getLastFlushedRow() + "] that is already written to disk.");            }                     // attempt to overwrite a existing row in the input template            if(_sh.getPhysicalNumberOfRows() > 0 && rownum <= _sh.getLastRowNum() ) {                throw new IllegalArgumentException(                        "Attempting to write a row["+rownum+"] " +                                "in the range [0," + _sh.getLastRowNum() + "] that is already written to disk.");            }                     SXSSFRow newRow=new SXSSFRow(this);            _rows.put(rownum,newRow);            allFlushed = false;            if(_randomAccessWindowSize>=0&&_rows.size()>_randomAccessWindowSize)            {                try                {                   flushRows(_randomAccessWindowSize);                }                catch (IOException ioe)                {                    throw new RuntimeException(ioe);                }            }            return newRow;        }        /**         * Specifies how many rows can be accessed at most via getRow().         * The exeeding rows (if any) are flushed to the disk while rows         * with lower index values are flushed first.         */        public void flushRows(int remaining) throws IOException        {            while(_rows.size() > remaining) flushOneRow();            if (remaining == 0) allFlushed = true;        }复制代码

当前行数超过阈值的时候将会将最上面一行刷到磁盘

private void flushOneRow() throws IOException        {            Integer firstRowNum = _rows.firstKey();            if (firstRowNum!=null) {                int rowIndex = firstRowNum.intValue();                SXSSFRow row = _rows.get(firstRowNum);                // Update the best fit column widths for auto-sizing just before the rows are flushed                _autoSizeColumnTracker.updateColumnWidths(row);                _writer.writeRow(rowIndex, row);                _rows.remove(firstRowNum);                lastFlushedRowNumber = rowIndex;            }        }复制代码

这边_writer负责将对应的数据刷到临时文件中

/**         * Initially copied from BigGridDemo "SpreadsheetWriter".         * Unlike the original code which wrote the entire document,         * this class only writes the "sheetData" document fragment         * so that it was renamed to "SheetDataWriter"         */        public class SheetDataWriter {            private static final POILogger logger = POILogFactory.getLogger(SheetDataWriter.class);                         private final File _fd;            private final Writer _out;            private int _rownum;            private int _numberOfFlushedRows;            private int _lowestIndexOfFlushedRows; // meaningful only of _numberOfFlushedRows>0            private int _numberOfCellsOfLastFlushedRow; // meaningful only of _numberOfFlushedRows>0            private int _numberLastFlushedRow = -1; // meaningful only of _numberOfFlushedRows>0                     /**             * Table of strings shared across this workbook.             * If two cells contain the same string, then the cell value is the same index into SharedStringsTable             */            private SharedStringsTable _sharedStringSource;                     public SheetDataWriter() throws IOException {                _fd = createTempFile();                _out = createWriter(_fd);            }                     public SheetDataWriter(SharedStringsTable sharedStringsTable) throws IOException{                this();                this._sharedStringSource = sharedStringsTable;            }            /**             * Create a temp file to write sheet data.             * By default, temp files are created in the default temporary-file directory             * with a prefix "poi-sxssf-sheet" and suffix ".xml".  Subclasses can override             * it and specify a different temp directory or filename or suffix, e.g. .gz             *             * @return temp file to write sheet data             */            public File createTempFile() throws IOException {                return TempFile.createTempFile("poi-sxssf-sheet", ".xml");            }        }复制代码

很明显此处将会在Temp目录下创建一个临时文件

当数据写完的时候如下

public InputStream getWorksheetXMLInputStream() throws IOException        {            // flush all remaining data and close the temp file writer            flushRows(0);            _writer.close();            return _writer.getWorksheetXMLInputStream();        }            public InputStream getWorksheetXMLInputStream() throws IOException {            File fd = getTempFile();            return new FileInputStream(fd);        }复制代码

做了一次导出 大约Excel 大小15m 共计4w行数据

这样再次导出时将会性能OK

经历多次导出后仍然显示如下

当执行GC后数据可以显著回收

转载地址:http://huzox.baihongyu.com/

你可能感兴趣的文章
UVA11825 黑客的攻击 Hackers' Crackdown 状压DP,二进制,子集枚举
查看>>
[工具]各种截图工具和录屏软件
查看>>
最近跑hadoop遇到的一些问题
查看>>
online_judge_1046
查看>>
文件的读取流和书写流
查看>>
ASP.NET Core中的配置
查看>>
Django ORM 中的批量操作
查看>>
执行python文件报错SyntaxError: Non-ASCII character '\xe8' in file, but no encoding declared
查看>>
VMware Workstation Pro下载
查看>>
IOSday01 连线和程序标识
查看>>
Eclipse快捷键
查看>>
SpringMvc之集成Swagger
查看>>
URAL 1721 Two Sides of the Same Coin(二分图匹配,输出匹配对象)
查看>>
【转】iOS实时卡顿监控
查看>>
XCode中安装cocoapods步骤
查看>>
iOS 汉字转拼音
查看>>
动态矩阵控制 MATLAB代码
查看>>
《c程序设计语言》读书笔记-3.4-数字转字符串
查看>>
Pig的安装和简单使用
查看>>
三个获取浏览器URL中参数值的方法
查看>>