I was doing a project in nlp. My input is:
index name lst 0 a c 0 d 0 e 1 f 1 b g
I need output like this:
index name lst combine 0 a c a c 0 d a d 0 e a e 1 f b f 1 b g b g
How can I achieve this?
Advertisement
Answer
You can use groupby
+transform('max')
to replace the empty cells with the letter per group as the letters have precedence over space. The rest is a simple string concatenation per column:
df['combine'] = df.groupby('index')['name'].transform('max') + ' ' + df['lst']
Used input:
df = pd.DataFrame({'index': [0,0,0,1,1], 'name': ['a','','','','b'], 'lst': list('cdefg'), })
NB. I considered “index” to be a column here, if this is the index you should use df.index
in the groupby
Output:
index name lst combine 0 0 a c a c 1 0 d a d 2 0 e a e 3 1 f b f 4 1 b g b g