Being able to define the ranges in a manner similar to excel, i.e. ‘A5:B10’ is important to what I need so reading the entire sheet to a dataframe isn’t very useful.
So what I need to do is read the values from multiple ranges in the Excel sheet to multiple different dataframes.
valuerange1 = ['a5:b10'] valuerange2 = ['z10:z20'] df = pd.DataFrame(values from valuerange) df = pd.DataFrame(values from valuerange1)
or
df = pd.DataFrame(values from ['A5:B10'])
I have searched but either I have done a very poor job of searching or everyone else has gotten around this problem but I really can’t.
Thanks.
Advertisement
Answer
Using openpyxl
Since you have indicated, that you are looking into a very user friendly way to specify the range (like the excel-syntax) and as Charlie Clark already suggested, you can use openpyxl.
The following utility function takes a workbook and a column/row range and returns a pandas DataFrame:
from openpyxl import load_workbook from openpyxl.utils import get_column_interval import re def load_workbook_range(range_string, ws): col_start, col_end = re.findall("[A-Z]+", range_string) data_rows = [] for row in ws[range_string]: data_rows.append([cell.value for cell in row]) return pd.DataFrame(data_rows, columns=get_column_interval(col_start, col_end))
Usage:
wb = load_workbook(filename='excel-sheet.xlsx', read_only=True) ws = wb.active load_workbook_range('B1:C2', ws)
Output:
B C 0 5 6 1 8 9
Pandas only Solution
Given the following data in an excel sheet:
A B C 0 1 2 3 1 4 5 6 2 7 8 9 3 10 11 12
You can load it with the following command:
pd.read_excel('excel-sheet.xlsx')
If you were to limit the data being read, the pandas.read_excel method offers a number of options. Use the parse_cols
, skiprows
and skip_footer
to select the specific subset that you want to load:
pd.read_excel( 'excel-sheet.xlsx', # name of excel sheet names=['B','C'], # new column header skiprows=range(0,1), # list of rows you want to omit at the beginning skip_footer=1, # number of rows you want to skip at the end parse_cols='B:C' # columns to parse (note the excel-like syntax) )
Output:
B C 0 5 6 1 8 9
Some notes:
The API of the read_excel method is not meant to support more complex selections. In case you require a complex filter it is much easier (and cleaner) to load the whole data into a DataFrame and use the excellent slicing and indexing mechanisms provided by pandas.