I have strings as below:
content =
"b'MAJOR CONRAD A. PREEDOM\n2354 Fairchild Dr., Suite 6H-126\nUSAF Academy, CO
\xe2\x80\x93 Present 160 Flight Hours/145 Instructor Pilot Hours in Diamond Star
DA-40 (USAF T-52)\n2004 \xe2\x80\x93 2007\n442 Hours/45 Flight Lead Hours in
McDonnell Douglas F-15E Strike Eagle\n2003 \xe2\x80\x93 2004\n19 Hours in
Northrop AT-38B\n2000 \xe2\x80\x93 2003\n1,311 Flight Hours/1051 Instructor
Pilot Hours in Cessna T-37B\n1999 \xe2\x80\x93 2000\n26 Flight Hours in Northrop
T-38A\n1995 PA \xe2\x80\x93 1999\nDistinguished Graduate, United States Air
Force Academy, CO \xe2\x80\x93 1998\nOmega Rho Honor Society for Operations
Research, United States Air Force Academy, CO \xe2\x80\x93 1998\nAIR FORCE AWARDS
AND DECORATIONS\nMeritorious Service Medal\nAir Force Commendation Medal\nAir
Force Achievement Medal\nAir Force Outstanding Unit Award\nAir Force Organizational
Excellence Award\nCombat Readiness Medal\nNational Defense Service Medal\nGlobal
War on Terrorism Service Medal\nKorean Defense Service Medal\nAF Longevity
Service\nSmall Arms Expert Marksmanship Ribbon (Pistol)\nAF Training Ribbon'"
I want to get rid of all these b’ and anything with x with 2 trailings like xe2, x80 and so on. I dont know how to get rid of it. I tried
content.decode("utf-8", errors="ignore")
But because content is already str, I can’t decode. So I tried below to make it like bytes and get rid of the things I want to get rid of and back to string but it does not work at all.
new_content =content.encode("ascii").decode("utf-8", errors="ignore")
when I run this code below, I can get rid of ‘b and x** things so I tried every possible thing but I do not know how to make my strings to bytes one like below. I can convert content to bytes, but it doesnt get rid of the stuff.
b'x80abc sadad dkfbkafaf /n n x80dajhbahsdsabj'.decode("utf-8", errors="ignore")
Do you have any idea how my ‘content’ can get rid of b’ and all of x**?
Advertisement
Answer
You have a str
value that contains the string representation of a bytes
value, which itself is a UTF-8-encoded string. Use ast.literal_eval
to get the actual bytes
value, then decode it.
>>> import ast
>>> print(ast.literal_eval(content).decode())
MAJOR CONRAD A. PREEDOM
2354 Fairchild Dr., Suite 6H-126
USAF Academy, CO – Present 160 Flight Hours/145 Instructor Pilot Hours in Diamond Star DA-40 (USAF T-52)
2004 – 2007
[etc]