I have a log file that has large number of columns and I would like to merge timestamp columns into one as shown in the example below
start time End time process ID. . . 10:02 11:10 A 1 10:32 11:23 B 1 ... .... A 2 ... .... B 2 ... .... C 2
Note: Single ID can have multiple similar processes e.g. multiple A in an ID
time process ID . . . 10:02 A_start 1 11:10 A_end 1 10:32 B_start 1 11:23 B_end 1 .... A_start 2 .... A_end 2 .... B_start 2 .... ... ..
Trying with melt on pandas but it seem I am missing something here
df.melt(id_vars=["ID", "process"], var_name="End time", )
Advertisement
Answer
IIUC, you can use melt
with a bit of reshaping:
out = (df .rename(columns={'start time': 'start', 'End time': 'end'}) .melt(['process', 'ID'], value_name='time') .assign(process=lambda d: d['process']+'_'+d.pop('variable')) )
output:
process ID time 0 A_start 1 10:02 1 B_start 1 10:32 2 A_end 1 11:10 3 B_end 1 11:23
used input:
start time End time process ID 0 10:02 11:10 A 1 1 10:32 11:23 B 1