I have a lot of different table (and other unstructured data in an excel sheet) .. I need to create a dataframe out of range ‘A3:D20’ from ‘Sheet2’ of Excel sheet ‘data’.
All examples that I come across drilldown up to sheet level, but not how to pick it from an exact range.
import openpyxl import pandas as pd wb = openpyxl.load_workbook('data.xlsx') sheet = wb.get_sheet_by_name('Sheet2') range = ['A3':'D20'] #<-- how to specify this? spots = pd.DataFrame(sheet.range) #what should be the exact syntax for this? print (spots)
Once I get this, I plan to look up data in column A and find its corresponding value in column B.
Edit 1: I realised that openpyxl takes too long, and so have changed that to pandas.read_excel('data.xlsx','Sheet2')
instead, and it is much faster at that stage at least.
Edit 2: For the time being, I have put my data in just one sheet and:
- removed all other info
- added column names,
- applied
index_col
on my leftmost column - then used
wb.loc[]
Advertisement
Answer
One way to do this is to use the openpyxl module.
Here’s an example:
from openpyxl import load_workbook wb = load_workbook(filename='data.xlsx', read_only=True) ws = wb['Sheet2'] # Read the cell values into a list of lists data_rows = [] for row in ws['A3':'D20']: data_cols = [] for cell in row: data_cols.append(cell.value) data_rows.append(data_cols) # Transform into dataframe import pandas as pd df = pd.DataFrame(data_rows)