Skip to content

Tag: python

problem with pd.wide_to_long specifications

I have a dataframe that looks like the following: id xx_04-Feb-94 yyy_04-Feb-94 z_04-Feb-94 xx_22-Mar-94 yyy_22-Mar-94 z_22-Mar-94 123 456 789 with values inside the table filled out. I would like to pivot the data from wide to long. the desired output looks as follows: id date xx yyy z 123 04-Feb-94 123 22-M…

Loop through files in folder, read and group files

I am trying to read a csv file, resample it and save it with a different name So far I got this: But I get an error due to syntax being wrong, any ideas? Answer One error I could see is that you use backslashes (“”) in your path. The backslash serves as escape character in python strings which mea…

Caching a PySpark Dataframe

Suppose we have a PySpark dataframe df with ~10M rows. Also let the columns be [col_a, col_b]. Which would be faster: or Would caching df_test make sense here? Answer It won’t make much difference. it is just one loop where you can skip cache like below Here spark is loading Data once in memory. If you …