I’ve DataFrame with 4 columns and want to merge the first 3 columns in a new DataFrame.
The data is identical, the order is irrelevant and any duplicates must remain.
import pandas as pd data = [['tom', 'nick', 'john', 10], ['bob', 'jane', 'nick', 15]] df = pd.DataFrame(data, columns = ['col1', 'col2', 'col3','col4'])
Desired DataFrame
+-----+-----+ |col_a|col_b| +-----+-----+ |tom |10 | |nick |10 | |john |10 | |bob |15 | |jane |15 | |nick |15 | +-----+-----+
How do I get this done?
Advertisement
Answer
Here is one way of merging the first three columns with the help of numpy
:
a = df.values pd.DataFrame({'col_a': np.ravel(a[:, :3]), 'col_b': np.repeat(a[:, 3], 3)})
col_a col_b 0 tom 10 1 nick 10 2 john 10 3 bob 15 4 jane 15 5 nick 15