I tried to takeout decimal values based on value present before decimal point .I have data frame like below,
data flow 1.5 parallel 1.2 parallel 1.3 parallel 2 sequence 2.5 parallel 2.4 parallel 2.8 parallel 3 sequence 3.2 parallel 3.1 parallel 3.5 parallel 4 sequence 4.1 parallel 4.5 parallel 4.3 parallel 1 sequence 5 sequence 6 sequence
Expected output,
data flow 1.5 Parallel1 1.2 Parallel1 1.3 Parallel1 2 sequence 2.5 Parallel2 2.4 Parallel2 2.8 Parallel2 3 sequence 3.2 Parallel3 3.1 Parallel3 3.5 Parallel3 4 sequence 4.1 Parallel4 4.5 Parallel4 4.3 Parallel4 1 sequence 5 sequence 6 sequence
How can i achieve this using pands,…
Advertisement
Answer
If data is a string:
df.loc[df['flow'].ne('sequence'), 'flow'] += df['data'].str.extract('(d+)', expand=False)
if it is a float:
df.loc[df['flow'].ne('sequence'), 'flow'] += df['data'].astype(int).astype(str)
output:
data flow 0 1.5 parallel1 1 1.2 parallel1 2 1.3 parallel1 3 2.0 sequence 4 2.5 parallel2 5 2.4 parallel2 6 2.8 parallel2 7 3.0 sequence 8 3.2 parallel3 9 3.1 parallel3 10 3.5 parallel3 11 4.0 sequence 12 4.1 parallel4 13 4.5 parallel4 14 4.3 parallel4 15 1.0 sequence 16 5.0 sequence 17 6.0 sequence