I have code that produces the following df as output:
year month day category keywords 0 '2021' '09' '06' 'us' ['afghan, refugees, volunteers'] 1 '2021' '09' '05' 'us' ['politics' 'military, drone, strike, kabul'] 2 '2021' '09' '06' 'business' ['rto, return, to, office'] 3 '2021' '09' '06' 'nyregion' ['nyc, jewish, high, holy, days'] 4 '2021' '09' '06' 'world' ['americas' 'mexico, migrants, asylum, border'] 5 '2021' '09' '06' 'us' ['TAHOE, CALDORFIRE, WORKERS'] 6 '2021' '09' '06' 'nyregion' ['queens, flooding, cleanup'] 7 '2021' '09' '05' 'us' ['new, orleans, power, failure, traps, older, residents, in, homes'] 8 '2021' '09' '05' 'nyregion' ['biden, flood, new, york, new, jersey'] 9 '2021' '09' '06' 'technology' ['freedom, phone, smartphone, conservatives'] 10 '2021' '09' '06' 'sports' ['football' 'nfl, preview, nfc, predictions'] 11 '2021' '09' '06' 'sports' ['football' 'nfl, preview, afc, predictions'] 12 '2021' '09' '06' 'opinion' ['texas, abortion, september, 11'] 13 '2021' '09' '06' 'opinion' ['coronavirus, masks, school, board, meetings'] 14 '2021' '09' '06' 'opinion' ['south, republicans, vaccines, climate, change'] 15 '2021' '09' '06' 'opinion' ['labor, workers, rights'] 16 '2021' '09' '05' 'opinion' ['ku, kluxism, trumpism'] 17 '2021' '09' '05' 'opinion' ['culture' 'sexually, harassed, pentagon'] 18 '2021' '09' '05' 'opinion' ['parenting, college, empty, nest, pandemic'] 19 '2021' '09' '04' 'opinion' ['letters' 'coughlin, caregiving'] 20 '2021' '08' '24' 'opinion' ['kara, swisher, maggie, haberman, event'] 21 '2021' '09' '05' 'opinion' ['labor, day, us, history'] 22 '2021' '09' '04' 'opinion' ['drowning, our, future, in, the, past'] 23 '2021' '09' '04' 'opinion' ['biden, job, approval, rating'] 24 '2021' '09' '05' 'opinion' ['dorothy, day, christian, labor'] 25 '2021' '09' '03' 'business' ['goodbye, office, mom'] 26 '2021' '09' '06' 'business' ['media' 'burn, out, companies, pandemic'] 27 '2021' '08' '30' 'arts' ['music' 'popcast, lorde, solar, power'] 28 '2021' '09' '02' 'opinion' ['sway, kara, swisher, julie, cordua, ashton, kutcher'] 29 '2021' '08' '12' 'science' ['fauci, kids, and, covid, event'] 30 '2021' '09' '05' 'us' ['shooting, lakeland, florida'] 31 '2021' '09' '05' 'business' ['media' 'leah, finnegan, gawker'] 32 '2021' '09' '06' 'nyregion' ['piping, plovers, bird, rescue'] 33 '2021' '09' '05' 'us' ['anti, abortion, movement, texas, law'] 34 '2021' '09' '05' 'us' ['politics' 'bernie, sanders, budget, bill'] 35 '2021' '09' '05' 'world' ['africa' 'guinea, coup'] 36 '2021' '09' '05' 'sports' ['soccer' 'brazil, argentina, suspended'] 37 '2021' '09' '06' 'world' ['africa' 'south, africa, jacob, zuma, medical, parole'] 38 '2021' '09' '05' 'sports' ['nfl, social, justice'] 39 '2021' '09' '02' 'well' ['go, bag, essentials'] 40 '2021' '09' '01' 'parenting' ['raising, resilient, kids'] 41 '2021' '09' '03' 'books' ['911, anniversary, fiction, literature'] 42 '2021' '09' '01' 'arts' ['design' 'german, hygiene, museum'] 43 '2021' '09' '03' 'arts' ['music' 'opera, livestreams'] 44 '2021' '09' '04' 'style' ['the, return, of, the, dream, honeymoon'] <class 'str'>
I built a for loop to iterate over all the elements in the ‘keyword’ column and put them separately into a new df called df1. The loop look like this:
df1 = pd.DataFrame(columns=['word']) i = 0 for p in df.loc[i, 'keywords']: teststr = df.loc[i, 'keywords'] splitstr = teststr.split() u = 0 for p1 in splitstr: dict_1 = {'word': splitstr[u]} df1.loc[len(df1)] = dict_1 u = u + 1 i = i + 1 print(df1)
The output it produces is:
word 0 ['afghan, 1 refugees, 2 volunteers'] 3 ['politics' 4 'military, 5 drone, 6 strike, 7 kabul'] 8 ['rto, 9 return, 10 to, 11 office'] 12 ['nyc, 13 jewish, 14 high, 15 holy, 16 days'] 17 ['americas' 18 'mexico, 19 migrants, 20 asylum, 21 border'] 22 ['TAHOE, 23 CALDORFIRE, 24 WORKERS'] 25 ['queens, 26 flooding, 27 cleanup'] 28 ['new, 29 orleans, 30 power, 31 failure, 32 traps, 33 older, 34 residents, 35 in, 36 homes'] 37 ['biden, 38 flood, 39 new, 40 york, 41 new, 42 jersey'] 43 ['freedom, 44 phone, 45 smartphone, 46 conservatives'] 47 ['football' 48 'nfl, 49 preview, 50 nfc, 51 predictions'] 52 ['football' 53 'nfl, 54 preview, 55 afc, 56 predictions'] 57 ['texas, 58 abortion, 59 september, 60 11'] 61 ['coronavirus, 62 masks, 63 school, 64 board, 65 meetings'] 66 ['south, 67 republicans, 68 vaccines, 69 climate, 70 change'] 71 ['labor, 72 workers, 73 rights'] 74 ['ku, 75 kluxism, 76 trumpism'] 77 ['culture' 78 'sexually, 79 harassed, 80 pentagon'] 81 ['parenting, 82 college, 83 empty, 84 nest, 85 pandemic'] 86 ['letters' 87 'coughlin, 88 caregiving'] 89 ['kara, 90 swisher, 91 maggie, 92 haberman, 93 event'] 94 ['labor, 95 day, 96 us, 97 history'] 98 ['drowning, 99 our, 100 future, 101 in, 102 the, 103 past'] 104 ['biden, 105 job, 106 approval, 107 rating'] 108 ['dorothy, 109 day, 110 christian, 111 labor'] 112 ['goodbye, 113 office, 114 mom'] 115 ['media' 116 'burn, 117 out, 118 companies, 119 pandemic'] 120 ['music' 121 'popcast, 122 lorde, 123 solar, 124 power'] 125 ['sway, 126 kara, 127 swisher, 128 julie, 129 cordua, 130 ashton, 131 kutcher'] 132 ['fauci, 133 kids, 134 and, 135 covid, 136 event'] 137 ['shooting, 138 lakeland, 139 florida'] 140 ['media' 141 'leah, 142 finnegan, 143 gawker']
Although the for loop works fine, it does not iterate over all the rows from df and stops more or less in the middle (it doesn’t stop always at the same spot).
Do you have an idea why? Thanks in advance
Advertisement
Answer
I dont think your for
iterates all the rows.
do this:
for i in range(len(df)):
Also you can remove i = i + 1