I have a column of arrays made of numbers, ie [0,80,160,220]
, and would like to create a column of arrays of the differences between adjacent terms, ie [80,80,60]
Does anyone have an idea how to approach this in Python or PySpark? I’m thinking of something iterative (ith term minus i-1th term starting at second term) but am really stuck how to code that. Thanks!
Advertisement
Answer
Edit:
d=[0,80,160,220] df=pd.DataFrame(d,columns= ['col_list']) df['col_new']=df['col_list'].diff() print(df) #output col_list col_new 0 0 NaN 1 80 80.0 2 160 80.0 3 220 60.0
Also, if you want to delete the row with NaN
you can do:
df.dropna(subset = ['col_new']) #output col_list col_new 1 80 80.0 2 160 80.0 3 220 60.0