I have a dataframe as:
JavaScript
x
105
105
1
{'last_name': {0: 'Acosta-Arriola',
2
1: 'Afragola',
3
2: 'Bertolini',
4
3: 'Coyle',
5
4: 'Davis',
6
10: 'Duntz',
7
11: 'Eastman',
8
12: 'Fitzgerald',
9
13: 'Fitzgerald',
10
14: 'Freeman',
11
15: 'Freeman',
12
16: 'Gambardella',
13
17: 'Kelleher',
14
18: 'King',
15
19: 'Looney',
16
20: 'Mccann',
17
21: 'Murray',
18
22: 'Palmeri',
19
23: 'Powers',
20
24: 'Vitelli',
21
25: 'Wyzykowski'},
22
'first_name_or_initial': {0: 'Jose',
23
1: 'Sarah',
24
2: 'Peter',
25
3: 'James',
26
4: 'Albert',
27
10: 'Shawn',
28
11: 'Bryan',
29
12: 'Richard',
30
13: 'Richard',
31
14: 'Matthew',
32
15: 'Matthew',
33
16: 'Vincent',
34
17: 'Robert',
35
18: 'Thomas',
36
19: 'Ray',
37
20: 'Joseph',
38
21: 'Joshua',
39
22: 'Randy',
40
23: 'Dennis',
41
24: 'Robert',
42
25: 'John'},'middle_name_or_initial': {0: 'Lusi;Luis',
43
1: 'R.;B.',
44
2: 'M.;Mario',
45
3: 'M.;Michael',
46
4: 'Chadbourne;C.',
47
10: 'R.;Richard',
48
11: 'J.;James',
49
12: 'M.;J.;Micha',
50
13: 'M.;Michael',
51
14: 'Christopher;Robert',
52
15: 'Christopher;C.',
53
16: 'A.;Anthony',
54
17: 'S.;Steven',
55
18: 'E.;Emory',
56
19: 'S.;Scott',
57
20: 'M.;Michael',
58
21: 'M.;P.',
59
22: 'T.;Thomas',
60
23: 'E.;Edward',
61
24: 'J.;D.',
62
25: 'J.;James'},
63
'Suffix': {0: '',
64
1: '',
65
2: '',
66
3: '',
67
4: 'Jr.',
68
10: '',
69
11: '',
70
12: '',
71
13: '',
72
14: '',
73
15: '',
74
16: '',
75
17: '',
76
18: 'Jr.',
77
19: 'Jr.',
78
20: '',
79
21: '',
80
22: '',
81
23: 'Jr.',
82
24: '',
83
25: ''},
84
'address_1': {0: '',
85
1: '51 Indigo Trail',
86
2: '90 Cherry Street;1295 Great Hill Road;90 Cherry Street',
87
3: '51 Canary Court;51 Canary Court;687 Main Street',
88
4: '39 Hemenway Street',
89
10: '118 Brookside Avenue;9886 171 Street Place',
90
11: '616 East Main Street;989 Boston Post Road;38 Mallard Court;1421 Naugatuck Avenue',
91
12: '',
92
13: '18 Fox Ridge Lane;18 Fox Ridge;18 Fox Ridge Road',
93
14: '',
94
15: '',
95
16: '45 Jakobs Landing',
96
17: '171 Williams Road;181 Knob Hill Road',
97
18: '31 Millwood Drive;31 Millwood Drive;41 Waverly Park Road;31 Millwood Drive;25 Crouch Road',
98
19: '',
99
20: '17 Pheasant Run;25 Mcdermott Road;PO Box 510;17 Pheasant Run',
100
21: '42 Seymour Street;42 Seymour Stt',
101
22: '205 Mccall Road',
102
23: '204 Milton Avenue;187 Milton Avenue',
103
24: '16 Montgomery Drive',
104
25: '457 Hill Street;139 County Line Road'}}
105
Here i would like to split a column middle_name using delimeter semicolon ‘;’.
after splitting i would like to have a additional rows as many spitted words as existed.
for example:
JavaScript
1
3
1
Duntz Shawn R.;Richard 118 Brookside Avenue;9886 171 Street Place
2
3
should be
JavaScript
1
4
1
1. Duntz - Shawn - R. - 118 Brookside Avenue;9886 171 Street Place
2
3
2. Duntz - Shawn - Richard - 118 Brookside Avenue;9886 171 Street Place
4
Advertisement
Answer
JavaScript
1
6
1
# split the middle name
2
df.middle_name_or_initial = df.middle_name_or_initial.str.split(';')
3
4
# explode the dataframe
5
df_new = df.explode('middle_name_or_initial')
6
here is the documentation of df.explode()
doc