Truncating unicode so it fits a maximum size when encoded for wire transfer

Question

Given a Unicode string and these requirements: The string be encoded into some byte-sequence format (e.g. UTF-8 or JSON unicode escape) The encoded string has a maximum length For example, the iPhone push service requires JSON encoding with a maximum total packet size of 256 bytes. What is the best way to tru…

Accepted Answer

def unicode_truncate(s, length, encoding='utf-8'):    encoded = s.encode(encoding)[:length]    return encoded.decode(encoding, 'ignore')Here is an example for a Unicode string where each character is represented with 2 bytes in UTF-8 and that would&#8217;ve crashed if the split Unicode code point wasn&#8217;t ignored:>>> unicode_truncate(u'абвгд', 5)u'u0430u0431'

Advertisement

Answer