I have a lot of different table (and other unstructured data in an excel sheet) .. I need to create a dataframe out of range ‘A3:D20’ from ‘Sheet2’ of Excel sheet ‘data’.
All examples that I come across drilldown up to sheet level, but not how to pick it from an exact range.
JavaScript
x
10
10
1
import openpyxl
2
import pandas as pd
3
4
wb = openpyxl.load_workbook('data.xlsx')
5
sheet = wb.get_sheet_by_name('Sheet2')
6
range = ['A3':'D20'] #<-- how to specify this?
7
spots = pd.DataFrame(sheet.range) #what should be the exact syntax for this?
8
9
print (spots)
10
Once I get this, I plan to look up data in column A and find its corresponding value in column B.
Edit 1: I realised that openpyxl takes too long, and so have changed that to pandas.read_excel('data.xlsx','Sheet2')
instead, and it is much faster at that stage at least.
Edit 2: For the time being, I have put my data in just one sheet and:
- removed all other info
- added column names,
- applied
index_col
on my leftmost column - then used
wb.loc[]
Advertisement
Answer
One way to do this is to use the openpyxl module.
Here’s an example:
JavaScript
1
19
19
1
from openpyxl import load_workbook
2
3
wb = load_workbook(filename='data.xlsx',
4
read_only=True)
5
6
ws = wb['Sheet2']
7
8
# Read the cell values into a list of lists
9
data_rows = []
10
for row in ws['A3':'D20']:
11
data_cols = []
12
for cell in row:
13
data_cols.append(cell.value)
14
data_rows.append(data_cols)
15
16
# Transform into dataframe
17
import pandas as pd
18
df = pd.DataFrame(data_rows)
19