I have a list of tweets that has been delivered as a csv. But when I read them, the emojis unicode has been converted as str and I can’t translate them to their real name (“waffle” or “heart”).
JavaScript
x
13
13
1
def load_csv(csv_name):
2
path = os.getcwd()
3
df = pd.read_csv(path + "/" + csv_name, header=0, index_col=0, parse_dates=True, sep=",", encoding="utf-8")
4
return df
5
6
csv_name = "tweets_nikekaepernick.csv"
7
df = load_csv(csv_name)
8
9
text = df["tweet_full_text"].iloc[0]
10
text
11
12
Out[]: 'Hi <U+0001F602><U+0001F602><U+0001F480><U+0001F480><U+0001F480><U+0001F480>'
13
Advertisement
Answer
Try it with demoji
. You can get more details about demoji
at here.
code
JavaScript
1
12
12
1
import re
2
import demoji
3
demoji.download_codes()
4
5
text = 'Hi <U+0001F602><U+0001F602><U+0001F480><U+0001F480><U+0001F480><U+0001F480>'
6
7
# changed format with regex
8
text_ = re.sub('+|>','',text).replace('<','\').encode().decode('unicode-escape')
9
10
#find emoji
11
demoji.findall(text_)
12
result
JavaScript
1
3
1
demoji.findall(text_)
2
Out[1]: {'💀': 'skull', '😂': 'face with tears of joy'}
3
More
For more, if you wants to remove emojis, you can try the below code, which is referring form here:
JavaScript
1
10
10
1
pattern = re.compile("["
2
u"U0001F600-U0001F64F" # emoticons
3
u"U0001F300-U0001F5FF" # symbols & pictographs
4
u"U0001F680-U0001F6FF" # transport & map symbols
5
u"U0001F1E0-U0001F1FF" # flags (iOS)
6
"]+", flags=re.UNICODE)
7
8
print(pattern.sub(r'', text_))
9
>>> Hi
10
Or, if you wants to translate your emoji to str
, you can try:
JavaScript
1
5
1
import emoji
2
print(emoji.demojize(text_))
3
4
>>> Hi :face_with_tears_of_joy::face_with_tears_of_joy::skull::skull::skull::skull:
5