Skip to content
Advertisement

Python Pandas dataframe reading exact specified range in an excel sheet

I have a lot of different table (and other unstructured data in an excel sheet) .. I need to create a dataframe out of range ‘A3:D20’ from ‘Sheet2’ of Excel sheet ‘data’.

All examples that I come across drilldown up to sheet level, but not how to pick it from an exact range.

import openpyxl
import pandas as pd

wb = openpyxl.load_workbook('data.xlsx')
sheet = wb.get_sheet_by_name('Sheet2')
range = ['A3':'D20']   #<-- how to specify this?
spots = pd.DataFrame(sheet.range) #what should be the exact syntax for this?

print (spots)

Once I get this, I plan to look up data in column A and find its corresponding value in column B.

Edit 1: I realised that openpyxl takes too long, and so have changed that to pandas.read_excel('data.xlsx','Sheet2') instead, and it is much faster at that stage at least.

Edit 2: For the time being, I have put my data in just one sheet and:

  • removed all other info
  • added column names,
  • applied index_col on my leftmost column
  • then used wb.loc[]

Advertisement

Answer

One way to do this is to use the openpyxl module.

Here’s an example:

from openpyxl import load_workbook

wb = load_workbook(filename='data.xlsx', 
                   read_only=True)

ws = wb['Sheet2']

# Read the cell values into a list of lists
data_rows = []
for row in ws['A3':'D20']:
    data_cols = []
    for cell in row:
        data_cols.append(cell.value)
    data_rows.append(data_cols)

# Transform into dataframe
import pandas as pd
df = pd.DataFrame(data_rows)
User contributions licensed under: CC BY-SA
10 People found this is helpful
Advertisement