I have been trying to create an excel file with several sheets from delta tables, however some of my column names include _x10e6 _ which is apparently translated to ღ. I have tried to use encoding=’unicode_escape’ and encoding=’utf-8′ without luck. I cannot use xlsxwriter because I am appending to an existing file.
Does anybody know how I can keep _x10e6 _ in column names?
A part of the code I am using:
goldDF = spark.read.format('csv') .option('header', True) .option('delimiter', ',') .load(file_path) pandaDF = goldDF.toPandas() with pd.ExcelWriter('/tmp/output.xlsx',mode='a', engine='openpyxl') as writer: pandaDF.to_excel(writer, sheet_name=name, index=False) dbutils.fs.cp("file:/tmp/output.xlsx", "/mnt/delta_lake_path")
Advertisement
Answer
The issue is that _x10e6_
is an XML escape code for the character 0x10E6 which equates to the Unicode character “ღ”.
In a similar case Excel would escape this as _x005F_x10e6_
. It looks like openpyxl doesn’t do this so you could do it manually using the string _x005F_x10e6_
or (for anyone reading this who isn’t constrained by the append mode requirement) switch the “engine” to xlsxwriter, which will escape it automatically.