I have a dataframe as follows:
Index A B C D E F 1 0 0 C 0 E 0 2 A 0 0 0 0 F 3 0 0 0 0 E 0 4 0 0 C D 0 0 5 A B 0 0 0 0
Basically I would like to write the dataframe to a txt file, such that every row consists of the index and the subsequent column name only, excluding the zeroes.
For example:
txt file 1 C E 2 A F 3 E 4 C D 5 A B
The dataset is quite big, about 1k rows, 16k columns. Is there any way I can do this using a function in Pandas?
Advertisement
Answer
Take a matrix vector multiplication between the boolean matrix generated by “is this entry "0"
or not” and the columns of the dataframe, and write it to a text file with to_csv
(thanks to @Andreas’ answer!):
df.ne("0").dot(df.columns + " ").str.rstrip().to_csv("text_file.txt")
where we right strip the spaces at the end due to the added " "
to the last entries.
If you don’t want the name Index
appearing in the text file, you can chain a rename_axis(index=None)
to get rid of it i.e.,
df.ne("0").dot(df.columns + " ").str.rstrip().rename_axis(index=None)
and then to_csv
as above.