Skip to content
Advertisement

How to get rid of b’ , all these x00, x** like things in bunch of strings in python 3.6?

I have strings as below:

content =
"b'MAJOR CONRAD A. PREEDOM\n2354 Fairchild Dr., Suite 6H-126\nUSAF Academy, CO 
\xe2\x80\x93 Present 160 Flight Hours/145 Instructor Pilot Hours in Diamond Star
 DA-40 (USAF T-52)\n2004 \xe2\x80\x93 2007\n442 Hours/45 Flight Lead Hours in 
McDonnell Douglas F-15E Strike Eagle\n2003 \xe2\x80\x93 2004\n19 Hours in 
Northrop AT-38B\n2000 \xe2\x80\x93 2003\n1,311 Flight Hours/1051 Instructor 
Pilot Hours in Cessna T-37B\n1999 \xe2\x80\x93 2000\n26 Flight Hours in Northrop
 T-38A\n1995 PA \xe2\x80\x93 1999\nDistinguished Graduate, United States Air 
Force Academy, CO \xe2\x80\x93 1998\nOmega Rho Honor Society for Operations 
Research, United States Air Force Academy, CO \xe2\x80\x93 1998\nAIR FORCE AWARDS 
AND DECORATIONS\nMeritorious Service Medal\nAir Force Commendation Medal\nAir 
Force Achievement Medal\nAir Force Outstanding Unit Award\nAir Force Organizational
 Excellence Award\nCombat Readiness Medal\nNational Defense Service Medal\nGlobal
 War on Terrorism Service Medal\nKorean Defense Service Medal\nAF Longevity 
Service\nSmall Arms Expert Marksmanship Ribbon (Pistol)\nAF Training Ribbon'"

I want to get rid of all these b’ and anything with x with 2 trailings like xe2, x80 and so on. I dont know how to get rid of it. I tried

content.decode("utf-8", errors="ignore")

But because content is already str, I can’t decode. So I tried below to make it like bytes and get rid of the things I want to get rid of and back to string but it does not work at all.

new_content =content.encode("ascii").decode("utf-8", errors="ignore")

when I run this code below, I can get rid of ‘b and x** things so I tried every possible thing but I do not know how to make my strings to bytes one like below. I can convert content to bytes, but it doesnt get rid of the stuff.

b'x80abc sadad dkfbkafaf /n   n x80dajhbahsdsabj'.decode("utf-8", errors="ignore")

Do you have any idea how my ‘content’ can get rid of b’ and all of x**?

Advertisement

Answer

You have a str value that contains the string representation of a bytes value, which itself is a UTF-8-encoded string. Use ast.literal_eval to get the actual bytes value, then decode it.

>>> import ast
>>> print(ast.literal_eval(content).decode())
MAJOR CONRAD A. PREEDOM
2354 Fairchild Dr., Suite 6H-126
USAF Academy, CO – Present 160 Flight Hours/145 Instructor Pilot Hours in Diamond Star DA-40 (USAF T-52)
2004 – 2007
[etc]
User contributions licensed under: CC BY-SA
7 People found this is helpful
Advertisement