How can I read a range(‘A5:B10’) and place these values into a dataframe using openpyxl

Question

Being able to define the ranges in a manner similar to excel, i.e. 'A5:B10' is important to what I need so reading the entire sheet to a dataframe isn't very useful. So what I need to do is read the values from multiple ranges in the Excel sheet to multiple different dataframes. or I have searched but either I have

Accepted Answer

Using openpyxlSince you have indicated, that you are looking into a very user friendly way to specify the range (like the excel-syntax) and as Charlie Clark already suggested, you can use openpyxl. The following utility function takes a workbook and a column/row range and returns a pandas DataFrame:from openpyxl import load_workbookfrom openpyxl.utils import get_column_intervalimport redef load_workbook_range(range_string, ws):    col_start, col_end = re.findall("[A-Z]+", range_string)    data_rows = []    for row in ws[range_string]:        data_rows.append([cell.value for cell in row])    return pd.DataFrame(data_rows, columns=get_column_interval(col_start, col_end))Usage:wb = load_workbook(filename='excel-sheet.xlsx',                    read_only=True)ws = wb.activeload_workbook_range('B1:C2', ws)Output:   B  C0  5  61  8  9Pandas only SolutionGiven the following data in an excel sheet:    A   B   C0   1   2   31   4   5   62   7   8   93  10  11  12You can load it with the following command:pd.read_excel('excel-sheet.xlsx')If you were to limit the data being read, the pandas.read_excel method offers a number of options. Use the parse_cols, skiprows and skip_footer to select the specific subset that you want to load:pd.read_excel(    'excel-sheet.xlsx',    # name of excel sheet    names=['B','C'],       # new column header    skiprows=range(0,1),   # list of rows you want to omit at the beginning    skip_footer=1,         # number of rows you want to skip at the end    parse_cols='B:C'       # columns to parse (note the excel-like syntax))Output:   B  C0  5  61  8  9Some notes:The API of the read_excel method is not meant to support more complex selections. In case you require a complex filter it is much easier (and cleaner) to load the whole data into a DataFrame and use the excellent slicing and indexing mechanisms provided by pandas.

Advertisement

Answer

Using openpyxl

Pandas only Solution