After doing some research I understood that pandas can give us better performace comparing to openpyxl. I’m trying to convert this code from openpyxl to pandas:
def get_restaurant_data(self, res: Restaurant, sf_id: int) -> RestaurantSchema: sheet = self.workbook["VendorInfo"] for row in sheet.iter_rows(min_row=2): if row[0].value == sf_id: res.address = self.check_data(row[1].value, res.address) if self.check_data(row[2].value) and self.check_data(row[3].value): res.loc = Point(row[3].value, row[2].value) res.phone = self.check_data(row[4].value, res.manager_phone) logo = self.download_file(row[5].value) cover = self.download_file(row[6].value) if logo: res.logo.save(self.check_data(row[5].value), logo, save=False) if cover: res.cover.save(self.check_data(row[6].value), cover, save=False) return RestaurantSchema(**res.__dict__)
Anyone can give me a suggestion about how to convert this code?
df = pd.read_excel(self.path,index_col=None) print(f'>>>{df}')```
Advertisement
Answer
Here is the equivalent code using pandas:
import pandas as pd def get_restaurant_data(self, res: Restaurant, sf_id: int) -> RestaurantSchema: sheet = self.workbook["VendorInfo"] df = pd.DataFrame(sheet.values) for i, row in df[1:].iterrows(): if row[0] == sf_id: res.address = self.check_data(row[1], res.address) if self.check_data(row[2]) and self.check_data(row[3]): res.loc = Point(row[3], row[2]) res.phone = self.check_data(row[4], res.manager_phone) logo = self.download_file(row[5]) cover = self.download_file(row[6]) if logo: res.logo.save(self.check_data(row[5]), logo, save=False) if cover: res.cover.save(self.check_data(row[6]), cover, save=False) return RestaurantSchema(**res.__dict__)
This code uses pandas to read the data from the “VendorInfo” sheet into a dataframe, and then uses the iterrows() method to iterate over the rows of the dataframe. The rest of the code is mostly the same as the original, with the exception of using the [] operator to access the values in each row instead of the value attribute.
I hope this helps!