I need help with deleting “None” along with extra comma in language
columns that have one or more language
Here is the existing csv:
f = pd.DataFrame({'Movie': ['name1','name2','name3','name4'], 'Year': ['1905', '1905','1906','1907'], 'Id': ['tt0283985', 'tt0283986','tt0284043','tt3402904'], 'language':['Mandarin,None','None,Cantonese','Mandarin,None,Cantonese','None,Cantonese']})
Where f
now looks like:
Movie Year Id language 0 name1 1905 tt0283985 Mandarin,None 1 name2 1905 tt0283986 None,Cantonese 2 name3 1906 tt0284043 Mandarin,None,Cantonese 3 name4 1907 tt3402904 None,Cantonese
And the result should be like this:
Movie Year Id language 0 name1 1905 tt0283985 Mandarian 1 name2 1905 tt0283986 Cantonese 2 name3 1906 tt0284043 Mandarin,Cantonese 3 name4 1907 tt3402904 Cantonese
There are also other columns that have only ‘None’ values in language column, so I can’t just use the replace function in excel, and there’s also a problem of extra “,” after doing that. So I may need help with a new way using pandas or something. Thanks in advance!
Advertisement
Answer
You could achieve it this way,
f["language"] = f.apply( lambda x: ",".join(filter(lambda y: y != "None", x.language.split(","))), axis=1 )
Or much better
f["language"] = f.apply( lambda x: ",".join([y for y in x.language.split(",") if y != "None"]), axis=1 )