python str.format with utf-8 characters that take more than 1 position

Question

I trying to print japanese characters in python, aligned in columns. It seems that japanese characters have a width equivalent to two spaces, so the alignment doesn't work. Here is the code: The output I obtain is the following: The column in the left is spanish but that's not important. The important thing is that the 3 columns on the

Accepted Answer

In a terminal, it’s common for certain characters to take up two columns and other characters to take up one column. You can find out which characters are which by using the unicodedata Python module, which has an east_asian_width().Here is an example of how you can use it to pad your text:import unicodedatatable = [    ('decir', 'いう', 'イウ', '言う'),     ('pequeño', 'すくない', 'スクナイ', '少ない'),     ('niño', 'こども', 'コドモ', '子供'),     ('ya [ha hecho X]', 'もう', 'モウ', ''),]WIDTHS = {    'F': 2,    'H': 1,    'W': 2,    'N': 1,    'A': 1, # Not really correct...    'Na': 1,}def pad(text, width):    text_width = 0    for ch in text:        width_class = unicodedata.east_asian_width(ch)        text_width += WIDTHS[width_class]    if width <= text_width:        return text    return text + ' ' * (width - text_width)for s, reading1, reading2, kanji in table:    print('{}{}{}{}'.format(        pad(s, 20),        pad(reading1, 10),        pad(reading2, 10),        pad(kanji, 10),    ))Here is a screenshot of how this looks on my system (macOS):LimitationsThe above code does not handle Unicode combining characters. A more complete implementation would perform Unicode text segmentation, and then figure out the width of each grapheme cluster. There are libraries that do this for you, I’m sure.Language NoteAs a note, I don’t think the words “少ない” and “pequeño” are likely equivalents. The Spanish word “pequeño” refers to the size of something, and “少ない” refers to the quantity.I think it’s more likely thatpoco: 少ないpequeño: 小さい

python str.format with utf-8 characters that take more than 1 position

EDIT

EDIT 2

Advertisement

Answer

Limitations

Language Note