I’m trying to create duplicate rows during a dataframe on conditions.
For example, I have this Dataframe.
students = [ ("a", "Ursula"), ("b", "Hayfa, Martin"), ("c", "Kato"), ("d", "Tanek, Ava, Pyto"), ("e", "Aiko"), ("f", "Hunter"), ("g", "Josiah, Derek, Uma, Nell"), ] df = pd.DataFrame(students, columns=["team", "student"]) print(df)
team student a Ursula b Hayfa, Martin c Kato d Tanek, Ava, Pyto e Aiko f Hunter g Josiah, Derek, Uma, Nell
And I would like to get the following output:
team student name remark a Ursula Ursula b Hayfa, Martin Hayfa with Martin b Hayfa, Martin Martin with Hayfa c Kato Kato d Tanek, Ava, Pyto Tanek with Ava, Pyto d Tanek, Ava, Pyto Ava with Tanek, Pyto d Tanek, Ava, Pyto Pyto with Tanek, Ava e Aiko Aiko f Hunter Hunter g Josiah, Derek, Uma, Nell Josiah with Derek, Uma, Nell g Josiah, Derek, Uma, Nell Derek with Josiah, Uma, Nell g Josiah, Derek, Uma, Nell Uma with Josiah, Derek, Nell g Josiah, Derek, Uma, Nell Nell with Josiah, Derek, Uma
Advertisement
Answer
For pandas 0.25+ is possible use DataFrame.explode
with splitted values by Series.str.split
and for remark
column list comprehension with filtering:
students = df["student"].str.split(", ") df = df.assign(name=students, remark=students).explode("name").reset_index(drop=True) df["remark"] = [ "with " + ", ".join(x for x in r if x != n) if len(r) > 1 else "" for n, r in zip(df["name"], df["remark"]) ] print (df)
And we get the following result:
team student name remark 0 a Ursula Ursula 1 b Hayfa, Martin Hayfa with Martin 2 b Hayfa, Martin Martin with Hayfa 3 c Kato Kato 4 d Tanek, Ava, Pyto Tanek with Ava, Pyto 5 d Tanek, Ava, Pyto Ava with Tanek, Pyto 6 d Tanek, Ava, Pyto Pyto with Tanek, Ava 7 e Aiko Aiko 8 f Hunter Hunter 9 g Josiah, Derek, Uma, Nell Josiah with Derek, Uma, Nell 10 g Josiah, Derek, Uma, Nell Derek with Josiah, Uma, Nell 11 g Josiah, Derek, Uma, Nell Uma with Josiah, Derek, Nell 12 g Josiah, Derek, Uma, Nell Nell with Josiah, Derek, Uma