Skip to content
Advertisement

Is it possible to get an Excel document’s row count without loading the entire document into memory?

I’m working on an application that processes huge Excel 2007 files, and I’m using OpenPyXL to do it. OpenPyXL has two different methods of reading an Excel file – one “normal” method where the entire document is loaded into memory at once, and one method where iterators are used to read row-by-row.

The problem is that when I’m using the iterator method, I don’t get any document meta-data like column widths and row/column count, and i really need this data. I assume this data is stored in the Excel document close to the top, so it shouldn’t be necessary to load the whole 10MB file into memory to get access to it.

So, is there a way to get ahold of the row/column count and column widths without loading the entire document into memory first?

Advertisement

Answer

Adding on to what Hubro said, apparently get_highest_row() has been deprecated. Using the max_row and max_column properties returns the row and column count. For example:

    wb = load_workbook(path, use_iterators=True)
    sheet = wb.worksheets[0]

    row_count = sheet.max_row
    column_count = sheet.max_column
User contributions licensed under: CC BY-SA
4 People found this is helpful
Advertisement