I have strings as below:
content = "b'MAJOR CONRAD A. PREEDOM\n2354 Fairchild Dr., Suite 6H-126\nUSAF Academy, CO \xe2\x80\x93 Present 160 Flight Hours/145 Instructor Pilot Hours in Diamond Star DA-40 (USAF T-52)\n2004 \xe2\x80\x93 2007\n442 Hours/45 Flight Lead Hours in McDonnell Douglas F-15E Strike Eagle\n2003 \xe2\x80\x93 2004\n19 Hours in Northrop AT-38B\n2000 \xe2\x80\x93 2003\n1,311 Flight Hours/1051 Instructor Pilot Hours in Cessna T-37B\n1999 \xe2\x80\x93 2000\n26 Flight Hours in Northrop T-38A\n1995 PA \xe2\x80\x93 1999\nDistinguished Graduate, United States Air Force Academy, CO \xe2\x80\x93 1998\nOmega Rho Honor Society for Operations Research, United States Air Force Academy, CO \xe2\x80\x93 1998\nAIR FORCE AWARDS AND DECORATIONS\nMeritorious Service Medal\nAir Force Commendation Medal\nAir Force Achievement Medal\nAir Force Outstanding Unit Award\nAir Force Organizational Excellence Award\nCombat Readiness Medal\nNational Defense Service Medal\nGlobal War on Terrorism Service Medal\nKorean Defense Service Medal\nAF Longevity Service\nSmall Arms Expert Marksmanship Ribbon (Pistol)\nAF Training Ribbon'"
I want to get rid of all these b’ and anything with x with 2 trailings like xe2, x80 and so on. I dont know how to get rid of it. I tried
content.decode("utf-8", errors="ignore")
But because content is already str, I can’t decode. So I tried below to make it like bytes and get rid of the things I want to get rid of and back to string but it does not work at all.
new_content =content.encode("ascii").decode("utf-8", errors="ignore")
when I run this code below, I can get rid of ‘b and x** things so I tried every possible thing but I do not know how to make my strings to bytes one like below. I can convert content to bytes, but it doesnt get rid of the stuff.
b'x80abc sadad dkfbkafaf /n n x80dajhbahsdsabj'.decode("utf-8", errors="ignore")
Do you have any idea how my ‘content’ can get rid of b’ and all of x**?
Advertisement
Answer
You have a str
value that contains the string representation of a bytes
value, which itself is a UTF-8-encoded string. Use ast.literal_eval
to get the actual bytes
value, then decode it.
>>> import ast >>> print(ast.literal_eval(content).decode()) MAJOR CONRAD A. PREEDOM 2354 Fairchild Dr., Suite 6H-126 USAF Academy, CO – Present 160 Flight Hours/145 Instructor Pilot Hours in Diamond Star DA-40 (USAF T-52) 2004 – 2007 [etc]