I am trying to import a file from xlsx into a Python Pandas dataframe. I would like to prevent fields/columns being interpreted as integers and thus losing leading zeros or other desired heterogenous formatting.
So for an Excel sheet with 100 columns, I would do the following using a dict comprehension with range(99).
JavaScript
x
7
1
import pandas as pd
2
filename = 'C:DemoFile.xlsx'
3
4
fields = {col: str for col in range(99)}
5
6
df = pd.read_excel(filename, sheetname=0, converters=fields)
7
These import files do have a varying number of columns all the time, and I am looking to handle this differently than changing the range manually all the time.
Does somebody have any further suggestions or alternatives for reading Excel files into a dataframe and treating all fields as strings by default?
Many thanks!
Advertisement
Answer
Try this:
JavaScript
1
4
1
xl = pd.ExcelFile(r'C:DemoFile.xlsx')
2
ncols = xl.book.sheet_by_index(0).ncols
3
df = xl.parse(0, converters={i : str for i in range(ncols)})
4
UPDATE:
JavaScript
1
6
1
In [261]: type(xl)
2
Out[261]: pandas.io.excel.ExcelFile
3
4
In [262]: type(xl.book)
5
Out[262]: xlrd.book.Book
6