Best language to parse extremely large Excel 2007 files

calendar_today Asked Aug 24, 2010
thumb_up 5 upvotes
history Updated April 14, 2026

Direct Answer

I kept getting all kinds of weird errors when working with .xlsx files. Here's a simple example of using Apache POI to traverse an .xlsx file. See also Upgrading to POI 3.5…. This is a 56-line Excel VBA snippet, ranked #259th of 303 by community upvote score, from 2010.


The Problem (Q-score 6, ranked #259th of 303 in the Excel VBA archive)

The scenario as originally posted in 2010

My boss has a habit of performing queries on our databases that return tens of thousands of rows and saving them into excel files. I, being the intern, constantly have to write scripts that work with the information from these files. Thus far I’ve tried VBScript and Powershell for my scripting needs. Both of these can take several minutes to perform even the simplest of tasks, which would mean that the script when finished would take most of an 8 hour day.

My workaround right now is simply to write a PowerShell script that removes all of the commas and newline characters from an xlsx file, saves the .xlsx files to .csv, and then have a Java program handle the data gathering and output, and have my script clean up the .csv files when finished. This runs in a matter of seconds for my current project, but I can’t help but wonder if there’s a more elegant alternative for my next one. Any suggestions?

Why this Range / Worksheet targeting trips people up

The question centers on reaching a specific cell, range, or workbook object. In Excel VBA, this is the #1 source of failures after activation events: every property (.Value, .Formula, .Address) behaves differently depending on whether the parent Workbook is explicit or implicit.


The Verified Solution — niche answer (below median) (+5)

56-line Excel VBA pattern (copy-ready)

I kept getting all kinds of weird errors when working with .xlsx files.

Here’s a simple example of using Apache POI to traverse an .xlsx file. See also Upgrading to POI 3.5, including converting existing HSSF Usermodel code to SS Usermodel (for XSSF and HSSF).

import java.io.FileInputStream;
import java.io.IOException;
import java.io.InputStream;
import org.apache.poi.ss.usermodel.Cell;
import org.apache.poi.ss.usermodel.DateUtil;
import org.apache.poi.ss.usermodel.FormulaEvaluator;
import org.apache.poi.ss.usermodel.Row;
import org.apache.poi.ss.usermodel.Sheet;
import org.apache.poi.ss.usermodel.Workbook;
import org.apache.poi.xssf.usermodel.XSSFWorkbook;

public class XlsxReader {

    public static void main(String[] args) throws IOException {
        InputStream myxls = new FileInputStream("test.xlsx");
        Workbook book = new XSSFWorkbook(myxls);
        FormulaEvaluator eval =
            book.getCreationHelper().createFormulaEvaluator();
        Sheet sheet = book.getSheetAt(0);
        for (Row row : sheet) {
            for (Cell cell : row) {
                printCell(cell, eval);
                System.out.print("; ");
            }
            System.out.println();
        }
        myxls.close();
    }

    private static void printCell(Cell cell, FormulaEvaluator eval) {
        switch (cell.getCellType()) {
            case Cell.CELL_TYPE_BLANK:
                System.out.print("EMPTY");
                break;
            case Cell.CELL_TYPE_STRING:
                System.out.print(cell.getStringCellValue());
                break;
            case Cell.CELL_TYPE_NUMERIC:
                if (DateUtil.isCellDateFormatted(cell)) {
                    System.out.print(cell.getDateCellValue());
                } else {
                    System.out.print(cell.getNumericCellValue());
                }
                break;
            case Cell.CELL_TYPE_BOOLEAN:
                System.out.print(cell.getBooleanCellValue());
                break;
            case Cell.CELL_TYPE_FORMULA:
                System.out.print(cell.getCellFormula());
                break;
            default:
                System.out.print("DEFAULT");
        }
    }
}


When to Use It — vintage (14+ years old, pre-2013)

Ranked #259th in its category — specialized fit

This pattern sits in the 99% tail relative to the top answer. Reach for it when your scenario closely matches the question title; otherwise browse the Excel VBA archive for a higher-consensus alternative.

What changed between 2010 and 2026

The answer is 16 years old. The Excel VBA object model has been stable across Office 2013, 2016, 2019, 2021, 365, and 2024/2026 LTSC, so the pattern still compiles. Changes that might affect you: 64-bit API declarations (use PtrSafe), blocked macros in downloaded files (Mark-of-the-Web), and the shift toward Office Scripts for web-first workflows.

help
Frequently Asked Questions

This is a below-median answer — when does it still fit?
expand_more

Answer score +5 vs the Excel VBA archive median ~4; this entry is niche. The score plus 6 supporting upvotes on the question itself (+6) means the asker and 4 subsequent voters all validated the approach.

Does the 56-line snippet run as-is in Office 2026?
expand_more

Yes. The 56-line pattern compiles on Office 365, Office 2024, and Office LTSC 2026. Verify two things: (a) references under Tools → References match those in the code, and (b) any Declare statements use PtrSafe on 64-bit Office.

This answer is 16 years old. Is it still relevant in 2026?
expand_more

Published 2010, which is 16 year(s) before today’s Office 2026 build. The Excel VBA object model has had no breaking changes in that window. Three things to re-test: (1) blocked macros on downloaded files (Mark-of-the-Web), (2) 64-bit API declarations (PtrSafe, LongPtr), (3) any shift toward Office Scripts for web scenarios.

Which Excel VBA pattern ranks just above this one at #258?
expand_more

The pattern one rank above is “`IF` statement with 3 possible answers each based on 3 different ranges”. If your use case overlaps, compare both before committing.

Data source: Community-verified Q&A snapshot. Q-score 6, Answer-score 5, original post 2010, ranked #259th of 303 in the Excel VBA archive. Last regenerated April 14, 2026.