I have a list of tweets that has been delivered as a csv. But when I read them, the emojis unicode has been converted as str and I can’t translate them to their real name (“waffle” or “heart”).
def load_csv(csv_name): path = os.getcwd() df = pd.read_csv(path + "/" + csv_name, header=0, index_col=0, parse_dates=True, sep=",", encoding="utf-8") return df csv_name = "tweets_nikekaepernick.csv" df = load_csv(csv_name) text = df["tweet_full_text"].iloc[0] text Out[]: 'Hi <U+0001F602><U+0001F602><U+0001F480><U+0001F480><U+0001F480><U+0001F480>'
Advertisement
Answer
Try it with demoji
. You can get more details about demoji
at here.
code
import re import demoji demoji.download_codes() text = 'Hi <U+0001F602><U+0001F602><U+0001F480><U+0001F480><U+0001F480><U+0001F480>' # changed format with regex text_ = re.sub('+|>','',text).replace('<','\').encode().decode('unicode-escape') #find emoji demoji.findall(text_)
result
demoji.findall(text_) Out[1]: {'💀': 'skull', '😂': 'face with tears of joy'}
More
For more, if you wants to remove emojis, you can try the below code, which is referring form here:
pattern = re.compile("[" u"U0001F600-U0001F64F" # emoticons u"U0001F300-U0001F5FF" # symbols & pictographs u"U0001F680-U0001F6FF" # transport & map symbols u"U0001F1E0-U0001F1FF" # flags (iOS) "]+", flags=re.UNICODE) print(pattern.sub(r'', text_)) >>> Hi
Or, if you wants to translate your emoji to str
, you can try:
import emoji print(emoji.demojize(text_)) >>> Hi :face_with_tears_of_joy::face_with_tears_of_joy::skull::skull::skull::skull: