Skip to content
Advertisement

Pandas – Duplicate Rows and Slice String

I’m trying to create duplicate rows during a dataframe on conditions.

For example, I have this Dataframe.

students = [
    ("a", "Ursula"),
    ("b", "Hayfa, Martin"),
    ("c", "Kato"),
    ("d", "Tanek, Ava, Pyto"),
    ("e", "Aiko"),
    ("f", "Hunter"),
    ("g", "Josiah, Derek, Uma, Nell"),
]
df = pd.DataFrame(students, columns=["team", "student"])
print(df)
team    student
 a      Ursula
 b      Hayfa, Martin
 c      Kato
 d      Tanek, Ava, Pyto
 e      Aiko
 f      Hunter
 g      Josiah, Derek, Uma, Nell

And I would like to get the following output:

  team  student                   name      remark
   a    Ursula                    Ursula    
   b    Hayfa, Martin             Hayfa     with Martin
   b    Hayfa, Martin             Martin    with Hayfa
   c    Kato                      Kato            
   d    Tanek, Ava, Pyto          Tanek     with Ava, Pyto
   d    Tanek, Ava, Pyto          Ava       with Tanek, Pyto
   d    Tanek, Ava, Pyto          Pyto      with Tanek, Ava
   e    Aiko                      Aiko            
   f    Hunter                    Hunter    
   g    Josiah, Derek, Uma, Nell  Josiah    with Derek, Uma, Nell
   g    Josiah, Derek, Uma, Nell  Derek     with Josiah, Uma, Nell
   g    Josiah, Derek, Uma, Nell  Uma       with Josiah, Derek, Nell
   g    Josiah, Derek, Uma, Nell  Nell      with Josiah, Derek, Uma

Advertisement

Answer

For pandas 0.25+ is possible use DataFrame.explode with splitted values by Series.str.split and for remark column list comprehension with filtering:

students = df["student"].str.split(", ")
df = df.assign(name=students, remark=students).explode("name").reset_index(drop=True)
df["remark"] = [
    "with " + ", ".join(x for x in r if x != n) if len(r) > 1 else ""
    for n, r in zip(df["name"], df["remark"])
]

print (df)

And we get the following result:

   team                   student    name                    remark
0     a                    Ursula  Ursula                          
1     b             Hayfa, Martin   Hayfa               with Martin
2     b             Hayfa, Martin  Martin                with Hayfa
3     c                      Kato    Kato                          
4     d          Tanek, Ava, Pyto   Tanek            with Ava, Pyto
5     d          Tanek, Ava, Pyto     Ava          with Tanek, Pyto
6     d          Tanek, Ava, Pyto    Pyto           with Tanek, Ava
7     e                      Aiko    Aiko                          
8     f                    Hunter  Hunter                          
9     g  Josiah, Derek, Uma, Nell  Josiah     with Derek, Uma, Nell
10    g  Josiah, Derek, Uma, Nell   Derek    with Josiah, Uma, Nell
11    g  Josiah, Derek, Uma, Nell     Uma  with Josiah, Derek, Nell
12    g  Josiah, Derek, Uma, Nell    Nell   with Josiah, Derek, Uma
Advertisement