Skip to content
Advertisement

Can’t convert this text in normal format in python?

I am web-scraping some stuff and i got something like this “735 πš†πš’πš•πš•πš’πšŠπš– πšƒ π™Όπš˜πš›πš›πš’πšœπšœπšŽπš’ π™±πš•πšŸπš, π™³πš˜πš›πšŒπš‘πšŽπšœπšπšŽπš›, 𝙼𝙰 02122 Dorchester MA 02121” how do i convert it to normal text in python?

Advertisement

Answer

You can run it through Unicode normalization:

import unicodedata

unicodedata.normalize('NFKD', '735 πš†πš’πš•πš•πš’πšŠπš– πšƒ π™Όπš˜πš›πš›πš’πšœπšœπšŽπš’ π™±πš•πšŸπš, π™³πš˜πš›πšŒπš‘πšŽπšœπšπšŽπš›, 𝙼𝙰 02122')

# '735 William T Morrissey Blvd, Dorchester, MA 02122'

Here’s a REPL screenshot that demonstrates it works:

NFKD

Advertisement