title: Poi导出产生OOM解决方案 tags:
- Poi
- Stream
- Excel
- 导出
- oom categories: ci date: 2017-11-30 22:53:39
背景
- 目前生产系统大量的数据的导出,情况不是特别理想,将常出现OOM
- 如果不限制程序最大内存可能导致其他应用收到影响 比如稽计任务
思路
考虑出现OOM是如下几个因素
- 查询大量数据 需要维护一个大的列表【如果出现大量的数据 超大的List】我们目前正常最大的list大概能达到接近10w
- 封装Excel的workBook 创建Sheet和Cell【基本上单行得有接近20个字段】通常的思路就是workbook==>sheet==》cell
从这边可以看出通常我们出现系统OOM的应该是在EXCEL导出出现的概率较高
可以在一个导出完成的系统中查看占比
可以看出我们的业务数据排在第16位 4w+ 大约占用15.3M 而对应的Excel基本上最大的一个类就占用了接近550M
可想而知目前最大的瓶颈在于优化Excel导出导致的OOM【如果限制XMX的话可能会频繁OOM 如果不限制的话可能会使得JVM急速膨胀】
那思路就会有如下
- 分解多个Excel 使用多个Excel来进行导出后进行合并 【比较麻烦 并且侵入性太高】打包压缩?
- 将Excel导出时写到临时文件中最终把临时文件发出去即可
Poi在新版本支持了我们所说的特性
首先了解一下office03和office07的区别
通常来说office03的拓展名是xls office07的拓展名是xlsx
xls一般支持最大行数为65535【单sheet】而在xlsx中支持达到1048576
当我们导出大量的报表的数据时建议采用xlsx的方式
而在Poi中实现xlsx中有两种方式
分别是
其中
- HSSFWorkbook是office03
- XSSFWorkbook是我们在office07总最常使用
- SXSSFWorkbook这个提供了OOM的解决方案 如上述所说会临时写到临时文件中
/** * Streaming version of XSSFWorkbook implementing the "BigGridDemo" strategy. * * This allows to write very large files without running out of memory as only * a configurable portion of the rows are kept in memory at any one time. * * You can provide a template workbook which is used as basis for the written * data. * * See https://poi.apache.org/spreadsheet/how-to.html#sxssf for details. * * Please note that there are still things that still may consume a large * amount of memory based on which features you are using, e.g. merged regions, * comments, ... are still only stored in memory and thus may require a lot of * memory if used extensively. * * SXSSFWorkbook defaults to using inline strings instead of a shared strings * table. This is very efficient, since no document content needs to be kept in * memory, but is also known to produce documents that are incompatible with * some clients. With shared strings enabled all unique strings in the document * has to be kept in memory. Depending on your document content this could use * a lot more resources than with shared strings disabled. * * Carefully review your memory budget and compatibility needs before deciding * whether to enable shared strings or not. */ public class SXSSFWorkbook implements Workbook { /** * Specifies how many rows can be accessed at most via { @link SXSSFSheet#getRow}. * When a new node is created via { @link SXSSFSheet#createRow} and the total number * of unflushed records would exceed the specified value, then the * row with the lowest index value is flushed and cannot be accessed * via { @link SXSSFSheet#getRow} anymore. */ public static final int DEFAULT_WINDOW_SIZE = 100; }复制代码
默认情况下提供100行的访问【换言之 访问窗口为100】
源码
我们确认一下代码中如何实现临时文件的建立
/** * Streaming version of XSSFSheet implementing the "BigGridDemo" strategy. */ public class SXSSFSheet implements Sheet { /*package*/ final XSSFSheet _sh; private final SXSSFWorkbook _workbook; private final TreeMap_rows=new TreeMap (); private final SheetDataWriter _writer; private int _randomAccessWindowSize = SXSSFWorkbook.DEFAULT_WINDOW_SIZE; private final AutoSizeColumnTracker _autoSizeColumnTracker; private int outlineLevelRow = 0; private int lastFlushedRowNumber = -1; private boolean allFlushed = false; public SXSSFSheet(SXSSFWorkbook workbook, XSSFSheet xSheet) throws IOException { _workbook = workbook; _sh = xSheet; _writer = workbook.createSheetDataWriter(); setRandomAccessWindowSize(_workbook.getRandomAccessWindowSize()); _autoSizeColumnTracker = new AutoSizeColumnTracker(this); }复制代码
创建Sheet的时候传入对应参数 【默认100】
首先在创建rows或进行判断
@Override public SXSSFRow createRow(int rownum) { int maxrow = SpreadsheetVersion.EXCEL2007.getLastRowIndex(); if (rownum < 0 || rownum > maxrow) { throw new IllegalArgumentException("Invalid row number (" + rownum + ") outside allowable range (0.." + maxrow + ")"); } // attempt to overwrite a row that is already flushed to disk if(rownum <= _writer.getLastFlushedRow() ) { throw new IllegalArgumentException( "Attempting to write a row["+rownum+"] " + "in the range [0," + _writer.getLastFlushedRow() + "] that is already written to disk."); } // attempt to overwrite a existing row in the input template if(_sh.getPhysicalNumberOfRows() > 0 && rownum <= _sh.getLastRowNum() ) { throw new IllegalArgumentException( "Attempting to write a row["+rownum+"] " + "in the range [0," + _sh.getLastRowNum() + "] that is already written to disk."); } SXSSFRow newRow=new SXSSFRow(this); _rows.put(rownum,newRow); allFlushed = false; if(_randomAccessWindowSize>=0&&_rows.size()>_randomAccessWindowSize) { try { flushRows(_randomAccessWindowSize); } catch (IOException ioe) { throw new RuntimeException(ioe); } } return newRow; } /** * Specifies how many rows can be accessed at most via getRow(). * The exeeding rows (if any) are flushed to the disk while rows * with lower index values are flushed first. */ public void flushRows(int remaining) throws IOException { while(_rows.size() > remaining) flushOneRow(); if (remaining == 0) allFlushed = true; }复制代码
当前行数超过阈值的时候将会将最上面一行刷到磁盘
private void flushOneRow() throws IOException { Integer firstRowNum = _rows.firstKey(); if (firstRowNum!=null) { int rowIndex = firstRowNum.intValue(); SXSSFRow row = _rows.get(firstRowNum); // Update the best fit column widths for auto-sizing just before the rows are flushed _autoSizeColumnTracker.updateColumnWidths(row); _writer.writeRow(rowIndex, row); _rows.remove(firstRowNum); lastFlushedRowNumber = rowIndex; } }复制代码
这边_writer负责将对应的数据刷到临时文件中
/** * Initially copied from BigGridDemo "SpreadsheetWriter". * Unlike the original code which wrote the entire document, * this class only writes the "sheetData" document fragment * so that it was renamed to "SheetDataWriter" */ public class SheetDataWriter { private static final POILogger logger = POILogFactory.getLogger(SheetDataWriter.class); private final File _fd; private final Writer _out; private int _rownum; private int _numberOfFlushedRows; private int _lowestIndexOfFlushedRows; // meaningful only of _numberOfFlushedRows>0 private int _numberOfCellsOfLastFlushedRow; // meaningful only of _numberOfFlushedRows>0 private int _numberLastFlushedRow = -1; // meaningful only of _numberOfFlushedRows>0 /** * Table of strings shared across this workbook. * If two cells contain the same string, then the cell value is the same index into SharedStringsTable */ private SharedStringsTable _sharedStringSource; public SheetDataWriter() throws IOException { _fd = createTempFile(); _out = createWriter(_fd); } public SheetDataWriter(SharedStringsTable sharedStringsTable) throws IOException{ this(); this._sharedStringSource = sharedStringsTable; } /** * Create a temp file to write sheet data. * By default, temp files are created in the default temporary-file directory * with a prefix "poi-sxssf-sheet" and suffix ".xml". Subclasses can override * it and specify a different temp directory or filename or suffix, e.g. .gz
* * @return temp file to write sheet data */ public File createTempFile() throws IOException { return TempFile.createTempFile("poi-sxssf-sheet", ".xml"); } }复制代码
很明显此处将会在Temp目录下创建一个临时文件
当数据写完的时候如下
public InputStream getWorksheetXMLInputStream() throws IOException { // flush all remaining data and close the temp file writer flushRows(0); _writer.close(); return _writer.getWorksheetXMLInputStream(); } public InputStream getWorksheetXMLInputStream() throws IOException { File fd = getTempFile(); return new FileInputStream(fd); }复制代码
做了一次导出 大约Excel 大小15m 共计4w行数据
这样再次导出时将会性能OK
经历多次导出后仍然显示如下
当执行GC后数据可以显著回收