How do decode b”x95xc3x8axb0x8dsx86x89x94x82x8axba”?

Question

[Summary]: The data grabbed from the file is How to decode these bytes into readable Chinese characters please? ====== I extracted some game scripts from an exe file. The file is packed with Enigma Virtual Box and I unpacked it. Then I'm able to see the scripts' names just right, in English, as it supposed to be. In analyzing these

Accepted Answer

In order to reliably decode bytes, you must know how the bytes were encoded. I will borrow the quote from the python codecs docs:Without external information it’s impossible to reliably determine which encoding was used for encoding a string.Without this information, there are ways to try and detect the encoding (chardet seems to be the most widely-used). Here&#8217;s how you could approach that.import chardetdata = b"x95xc3x8axb0x8dsx86x89x94x82x8axba"detected = chardet.detect(data)decoded = data.decode(detected["encoding"])The above example, however, does not work in this case because chardet isn&#8217;t able to detect the encoding of these bytes. At that point, you&#8217;ll have to either use trial-and-error or try other libraries.One method you could use is to simply try every standard encoding, print out the result, and see which encoding makes sense.codecs = [    "ascii", "big5", "big5hkscs", "cp037", "cp273", "cp424", "cp437", "cp500", "cp720",     "cp737", "cp775", "cp850", "cp852", "cp855", "cp856", "cp857", "cp858", "cp860",    "cp861", "cp862", "cp863", "cp864", "cp865", "cp866", "cp869", "cp874", "cp875",    "cp932", "cp949", "cp950", "cp1006", "cp1026", "cp1125", "cp1140", "cp1250",    "cp1251", "cp1252", "cp1253", "cp1254", "cp1255", "cp1256", "cp1257",    "cp1258", "cp65001", "euc_jp", "euc_jis_2004", "euc_jisx0213", "euc_kr", "gb2312",    "gbk", "gb18030", "hz", "iso2022_jp", "iso2022_jp_1", "iso2022_jp_2",    "iso2022_jp_2004", "iso2022_jp_3", "iso2022_jp_ext", "iso2022_kr", "latin_1",    "iso8859_2", "iso8859_3", "iso8859_4", "iso8859_5", "iso8859_6", "iso8859_7",    "iso8859_8", "iso8859_9", "iso8859_10", "iso8859_11", "iso8859_13", "iso8859_14",    "iso8859_15", "iso8859_16", "johab", "koi8_r", "koi8_t", "koi8_u", "kz1048",    "mac_cyrillic", "mac_greek", "mac_iceland", "mac_latin2", "mac_roman",    "mac_turkish", "ptcp154", "shift_jis", "shift_jis_2004", "shift_jisx0213",    "utf_32", "utf_32_be", "utf_32_le", "utf_16", "utf_16_be", "utf_16_le", "utf_7",    "utf_8", "utf_8_sig",]data = b"x95xc3x8axb0x8dsx86x89x94x82x8axba"for codec in codecs:    try:        print(f"{codec}, {data.decode(codec)}")    except UnicodeDecodeError:        continueOutputcp037, nC«^ýËfimb«[cp273, nC«¢ýËfimb«¬cp437, ò├è░ìsåëöéè║cp500, nC«¢ýËfimb«¬cp720, ـ├è░së¤éè║cp737, Χ├Λ░ΞsΗΚΦΓΛ║cp775, Ģ├Ŗ░ŹsåēöéŖ║cp850, ò├è░ìsåëöéè║cp852, Ľ├Ő░ŹsćëöéŐ║cp855, Ћ├і░ЇsєЅћѓі║cp856, ץ├ך░םsזיפגך║cp857, ò├è░ısåëöéè║cp858, ò├è░ìsåëöéè║cp860, ò├è░ìsÁÊõéè║cp861, þ├è░Þsåëöéè║cp862, ץ├ך░םsזיפגך║cp863, Ï├è░‗s¶ëËéè║cp864, ¼ﺃ├٠┌s│┬½∙├ﻑcp865, ò├è░ìsåëöéè║cp866, Х├К░НsЖЙФВК║cp875, nCα£δΉfimbαςcp949, 빩뒺뛱냹봻듆cp1006, ﺣﺍsﭦcp1026, nC«¢`Ëfimb«¬cp1125, Х├К░НsЖЙФВК║cp1140, nC«^ýËfimb«[cp1250, •ĂŠ°Ťs†‰”‚Šşcp1251, •ГЉ°Ќs†‰”‚Љєcp1256, •أٹ°چs†‰”‚ٹ؛gbk, 暶姲峴唹攤姾gb18030, 暶姲峴唹攤姾latin_1, Ã°sºiso8859_2, Ă°sşiso8859_4, Ã°sēiso8859_5, УАsКiso8859_7, Γ°sΊiso8859_9, Ã°sºiso8859_10, Ã°sšiso8859_11, รฐsบiso8859_13, Ć°sŗiso8859_14, ÃḞsẃiso8859_15, Ã°sºiso8859_16, Ă°sșkoi8_r, ∙ц┼╟█s├┴■┌┼╨koi8_u, ∙ц┼╟█s├┴■┌┼╨kz1048, •ГЉ°Қs†‰”‚Љғmac_cyrillic, Х√К∞НsЖЙФВКЇmac_greek, ïΟäΑçsÜâî²äΚmac_iceland, ï√ä∞çsÜâîÇä∫mac_latin2, ē√äįćsÜČĒāäļmac_roman, ï√ä∞çsÜâîÇä∫mac_turkish, ï√ä∞çsÜâîÇä∫ptcp154, •ГҠ°ҚsҶү”ӮҠәshift_jis_2004, 陛寛行̹狽桓shift_jisx0213, 陛寛行̹狽桓utf_16, 쎕낊玍覆芔몊utf_16_be, 闃誰赳蚉钂誺utf_16_le, 쎕낊玍覆芔몊Edit: After running all of the seemingly legible results through Google Translate, I suspect this encoding is UTF-16 big-endian. Here&#8217;s the results:+-----------+---------------+--------------------+--------------------------+| Encoding  |  Decoded      |  Language Detected |    English Translation   |+-----------+---------------+--------------------+--------------------------+| gbk       |  暶姲峴唹攤姾  |  Chinese           |  Jian Xian JiaoTanJiao   || gb18030   |  暶姲峴唹攤姾  |  Chinese           |  Jian Xian Jiao Tan Jiao || utf_16    |  쎕낊玍覆芔몊  |  Korean            |  None                    || utf_16_be |  闃誰赳蚉钂誺  |  Chinese           |  Who is the epiphysis?   || utf_16_le |  쎕낊玍覆芔몊  |  Korean            |  None                    |+-----------+---------------+--------------------+--------------------------+

Advertisement

Answer

Output