I have a dataframe like below and I want to add another column that is replicated untill certain condition is met.
sample_df = pd.DataFrame(data={ 'id': ['A', 'B', 'C'], 'n' : [ 1, 2, 3], 'v' : [ 10, 13, 8], 'z' : [5, 3, 6], 'g' : [8, 8, 10] }) additional_rows=
Now I want to add another column which contains additional information about the dataframe. For instance, I want to replicate Yes
untill id
is B
and No when it is below B
and Yes
from C
to D
and from from D
to E
Maybe
.
The output I am expecting is as follows:
sample_df = pd.DataFrame(data={ 'id': ['A', 'B', 'C','G','D','E'], 'n' : [ 1, 2, 3, 5, 5, 9], 'v' : [ 10, 13, 8, 8, 4 , 3], 'z' : [5, 3, 6, 9, 9, 8], 'New Info': ['Yes','Yes','No','No','Maybe','Maybe'] }) sample_df id n v z New Info 0 A 1 10 5 Yes 1 B 2 13 3 Yes 2 C 3 8 6 No 3 G 5 8 9 No 4 D 5 4 9 Maybe 5 E 9 3 8 Maybe
How can I achieve this in python?
Advertisement
Answer
You can use np.select
to return results based on conditions. Since you were talking more about positional conditions I used df.index
:
sample_df = pd.DataFrame(data={ 'id': ['A', 'B', 'C','G','D','E'], 'n' : [ 1, 2, 3, 5, 5, 9], 'v' : [ 10, 13, 8, 8, 4 , 3], 'z' : [5, 3, 6, 9, 9, 8] }) sample_df['New Info'] = np.select([sample_df.index<2, sample_df.index<4],['Yes', 'No'], 'Maybe') sample_df Out[1]: id n v z New Info 0 A 1 10 5 Yes 1 B 2 13 3 Yes 2 C 3 8 6 No 3 G 5 8 9 No 4 D 5 4 9 Maybe 5 E 9 3 8 Maybe