I have code that produces the following df as output:
JavaScript
x
48
48
1
year month day category keywords
2
0 '2021' '09' '06' 'us' ['afghan, refugees, volunteers']
3
1 '2021' '09' '05' 'us' ['politics' 'military, drone, strike, kabul']
4
2 '2021' '09' '06' 'business' ['rto, return, to, office']
5
3 '2021' '09' '06' 'nyregion' ['nyc, jewish, high, holy, days']
6
4 '2021' '09' '06' 'world' ['americas' 'mexico, migrants, asylum, border']
7
5 '2021' '09' '06' 'us' ['TAHOE, CALDORFIRE, WORKERS']
8
6 '2021' '09' '06' 'nyregion' ['queens, flooding, cleanup']
9
7 '2021' '09' '05' 'us' ['new, orleans, power, failure, traps, older, residents, in, homes']
10
8 '2021' '09' '05' 'nyregion' ['biden, flood, new, york, new, jersey']
11
9 '2021' '09' '06' 'technology' ['freedom, phone, smartphone, conservatives']
12
10 '2021' '09' '06' 'sports' ['football' 'nfl, preview, nfc, predictions']
13
11 '2021' '09' '06' 'sports' ['football' 'nfl, preview, afc, predictions']
14
12 '2021' '09' '06' 'opinion' ['texas, abortion, september, 11']
15
13 '2021' '09' '06' 'opinion' ['coronavirus, masks, school, board, meetings']
16
14 '2021' '09' '06' 'opinion' ['south, republicans, vaccines, climate, change']
17
15 '2021' '09' '06' 'opinion' ['labor, workers, rights']
18
16 '2021' '09' '05' 'opinion' ['ku, kluxism, trumpism']
19
17 '2021' '09' '05' 'opinion' ['culture' 'sexually, harassed, pentagon']
20
18 '2021' '09' '05' 'opinion' ['parenting, college, empty, nest, pandemic']
21
19 '2021' '09' '04' 'opinion' ['letters' 'coughlin, caregiving']
22
20 '2021' '08' '24' 'opinion' ['kara, swisher, maggie, haberman, event']
23
21 '2021' '09' '05' 'opinion' ['labor, day, us, history']
24
22 '2021' '09' '04' 'opinion' ['drowning, our, future, in, the, past']
25
23 '2021' '09' '04' 'opinion' ['biden, job, approval, rating']
26
24 '2021' '09' '05' 'opinion' ['dorothy, day, christian, labor']
27
25 '2021' '09' '03' 'business' ['goodbye, office, mom']
28
26 '2021' '09' '06' 'business' ['media' 'burn, out, companies, pandemic']
29
27 '2021' '08' '30' 'arts' ['music' 'popcast, lorde, solar, power']
30
28 '2021' '09' '02' 'opinion' ['sway, kara, swisher, julie, cordua, ashton, kutcher']
31
29 '2021' '08' '12' 'science' ['fauci, kids, and, covid, event']
32
30 '2021' '09' '05' 'us' ['shooting, lakeland, florida']
33
31 '2021' '09' '05' 'business' ['media' 'leah, finnegan, gawker']
34
32 '2021' '09' '06' 'nyregion' ['piping, plovers, bird, rescue']
35
33 '2021' '09' '05' 'us' ['anti, abortion, movement, texas, law']
36
34 '2021' '09' '05' 'us' ['politics' 'bernie, sanders, budget, bill']
37
35 '2021' '09' '05' 'world' ['africa' 'guinea, coup']
38
36 '2021' '09' '05' 'sports' ['soccer' 'brazil, argentina, suspended']
39
37 '2021' '09' '06' 'world' ['africa' 'south, africa, jacob, zuma, medical, parole']
40
38 '2021' '09' '05' 'sports' ['nfl, social, justice']
41
39 '2021' '09' '02' 'well' ['go, bag, essentials']
42
40 '2021' '09' '01' 'parenting' ['raising, resilient, kids']
43
41 '2021' '09' '03' 'books' ['911, anniversary, fiction, literature']
44
42 '2021' '09' '01' 'arts' ['design' 'german, hygiene, museum']
45
43 '2021' '09' '03' 'arts' ['music' 'opera, livestreams']
46
44 '2021' '09' '04' 'style' ['the, return, of, the, dream, honeymoon']
47
<class 'str'>
48
I built a for loop to iterate over all the elements in the ‘keyword’ column and put them separately into a new df called df1. The loop look like this:
JavaScript
1
22
22
1
df1 = pd.DataFrame(columns=['word'])
2
3
i = 0
4
5
for p in df.loc[i, 'keywords']:
6
7
teststr = df.loc[i, 'keywords']
8
9
10
splitstr = teststr.split()
11
12
u = 0
13
14
for p1 in splitstr:
15
dict_1 = {'word': splitstr[u]}
16
df1.loc[len(df1)] = dict_1
17
u = u + 1
18
19
i = i + 1
20
21
print(df1)
22
The output it produces is:
JavaScript
1
146
146
1
word
2
0 ['afghan,
3
1 refugees,
4
2 volunteers']
5
3 ['politics'
6
4 'military,
7
5 drone,
8
6 strike,
9
7 kabul']
10
8 ['rto,
11
9 return,
12
10 to,
13
11 office']
14
12 ['nyc,
15
13 jewish,
16
14 high,
17
15 holy,
18
16 days']
19
17 ['americas'
20
18 'mexico,
21
19 migrants,
22
20 asylum,
23
21 border']
24
22 ['TAHOE,
25
23 CALDORFIRE,
26
24 WORKERS']
27
25 ['queens,
28
26 flooding,
29
27 cleanup']
30
28 ['new,
31
29 orleans,
32
30 power,
33
31 failure,
34
32 traps,
35
33 older,
36
34 residents,
37
35 in,
38
36 homes']
39
37 ['biden,
40
38 flood,
41
39 new,
42
40 york,
43
41 new,
44
42 jersey']
45
43 ['freedom,
46
44 phone,
47
45 smartphone,
48
46 conservatives']
49
47 ['football'
50
48 'nfl,
51
49 preview,
52
50 nfc,
53
51 predictions']
54
52 ['football'
55
53 'nfl,
56
54 preview,
57
55 afc,
58
56 predictions']
59
57 ['texas,
60
58 abortion,
61
59 september,
62
60 11']
63
61 ['coronavirus,
64
62 masks,
65
63 school,
66
64 board,
67
65 meetings']
68
66 ['south,
69
67 republicans,
70
68 vaccines,
71
69 climate,
72
70 change']
73
71 ['labor,
74
72 workers,
75
73 rights']
76
74 ['ku,
77
75 kluxism,
78
76 trumpism']
79
77 ['culture'
80
78 'sexually,
81
79 harassed,
82
80 pentagon']
83
81 ['parenting,
84
82 college,
85
83 empty,
86
84 nest,
87
85 pandemic']
88
86 ['letters'
89
87 'coughlin,
90
88 caregiving']
91
89 ['kara,
92
90 swisher,
93
91 maggie,
94
92 haberman,
95
93 event']
96
94 ['labor,
97
95 day,
98
96 us,
99
97 history']
100
98 ['drowning,
101
99 our,
102
100 future,
103
101 in,
104
102 the,
105
103 past']
106
104 ['biden,
107
105 job,
108
106 approval,
109
107 rating']
110
108 ['dorothy,
111
109 day,
112
110 christian,
113
111 labor']
114
112 ['goodbye,
115
113 office,
116
114 mom']
117
115 ['media'
118
116 'burn,
119
117 out,
120
118 companies,
121
119 pandemic']
122
120 ['music'
123
121 'popcast,
124
122 lorde,
125
123 solar,
126
124 power']
127
125 ['sway,
128
126 kara,
129
127 swisher,
130
128 julie,
131
129 cordua,
132
130 ashton,
133
131 kutcher']
134
132 ['fauci,
135
133 kids,
136
134 and,
137
135 covid,
138
136 event']
139
137 ['shooting,
140
138 lakeland,
141
139 florida']
142
140 ['media'
143
141 'leah,
144
142 finnegan,
145
143 gawker']
146
Although the for loop works fine, it does not iterate over all the rows from df and stops more or less in the middle (it doesn’t stop always at the same spot).
Do you have an idea why? Thanks in advance
Advertisement
Answer
I dont think your for
iterates all the rows.
do this:
for i in range(len(df)):
Also you can remove i = i + 1