I have a dataframe with some NaN values in my s_x
columns. If NaN
values exist in them, I want them to be in the last columns.
Example: Given values in the s_x
columns of [Nan, 1, Nan, 2]
I want the values to shift left over the columns to result in [1, 2, NaN, NaN]
Example 2:
My current solution is very slow as I:
- iterate over the rows
- transform the
s_x
values into a list - remove
NaN
values - left-pad the list with
NaN
values - write it back into the dataframe
How can I improve on the function below? The order of values (low to high) needs to remain the same. Every value is found only once in the s_x
columns of a row.
I know that “leaving the pandas-logic” by parsing to a list and back is problematic concerning performance and was thinking of trying to do it with a lambda function, but didn’t get anywhere with it.
My current code as a minimal working example:
import pandas as pd import numpy as np def shift_values(df, leading_chars): """Shifts all values in columns with common leading chars to the left if there are NaN values. Example: Given a row of [NaN, 1, NaN, 2] the values are shifted to [1, 2, NaN, NaN] """ cols = [c for c in list(df.columns) if c[:len(leading_chars)] == leading_chars] for index, row in df.iterrows(): # create list without NaN values values = [v for v in row[cols] if not pd.isna(v)] # pad with NaN to get correct number of values again values += [np.nan] * (len(cols) - len(values)) # overwrite row values with modified list for i, c in enumerate(cols): row[c] = values[i] # overwrite row in the dataframe df.iloc[index] = row return df mylist = [["key", "s_1", "s_2", "s_3", "s_4"], [1, np.nan, 1, 2, np.nan], [1, 10, 20, 25, np.nan], [1, 10, np.nan, 25, np.nan] ] df = pd.DataFrame(mylist[1:], columns=mylist[0]) print("______ PREVIOUS ______") print(df.head()) df = shift_values(df, 's_') print("______ RESULT ______") print(df.head())
Advertisement
Answer
Try:
df = df.transform(sorted, key=pd.isna, axis=1) print(df)
Prints:
key s_1 s_2 s_3 s_4 0 1.0 1.0 2.0 NaN NaN 1 1.0 10.0 20.0 25.0 NaN 2 1.0 10.0 25.0 NaN NaN
EDIT: If columns are not next to each other:
x = df.filter(regex=r"^s_") df.loc[:, x.columns] = df.loc[:, x.columns].transform( sorted, key=pd.isna, axis=1 ) print(df)