I trying to print japanese characters in python, aligned in columns. It seems that japanese characters have a width equivalent to two spaces, so the alignment doesn’t work.
Here is the code:
def print_kanji(s, k): print('{:<20}{:<10}{:<10}{:<10}' .format(s, k['reading'][0], k['reading'][1], k['kanji'])) # Being 's' some input string and 'k' a map which contains readings in the 3 different japanese alphabets.
The output I obtain is the following:
decir いう イウ 言う pequeño すくない スクナイ 少ない niño こども コドモ 子供 ya [ha hecho X] もう モウ
The column in the left is spanish but that’s not important. The important thing is that the 3 columns on the right are not aligned. I have counted the number of positions and it is correct, that is, the first japanese column always is 10 ‘positions’ long, the problem is that japanese characters are 2 positions wide while blanks are 1 only.
I have checked as well that a blank (using the japanese input) is two positions wide as well, therefore I should be able to fix the problem by replacing the ‘latin’ space (1 position width) by the japanese one.
How can I change the character that format
will use to align strings?
EDIT
I have found that str.format
has a parameter which is fill
. I have tried to replace this by the japanese blank (two positions wide) and the result is worse.
EDIT 2
I have solved it by implementing this function
def get_formatted_kanji(h, k, kn): h2 = h + str(' ' * (10 - 2*len(h))) k2 = k + str(' ' * (10 - 2*len(h))) kn2 = kn + str(' ' * (10 - 2*len(h))) return h2 + k2 + kn2 # being h, k and kn the three 'japanese strings' to be formatted in columns
however, is there a better (built-in) way of achieving this?
Advertisement
Answer
In a terminal, it’s common for certain characters to take up two columns and other characters to take up one column. You can find out which characters are which by using the unicodedata
Python module, which has an east_asian_width()
.
Here is an example of how you can use it to pad your text:
import unicodedata table = [ ('decir', 'いう', 'イウ', '言う'), ('pequeño', 'すくない', 'スクナイ', '少ない'), ('niño', 'こども', 'コドモ', '子供'), ('ya [ha hecho X]', 'もう', 'モウ', ''), ] WIDTHS = { 'F': 2, 'H': 1, 'W': 2, 'N': 1, 'A': 1, # Not really correct... 'Na': 1, } def pad(text, width): text_width = 0 for ch in text: width_class = unicodedata.east_asian_width(ch) text_width += WIDTHS[width_class] if width <= text_width: return text return text + ' ' * (width - text_width) for s, reading1, reading2, kanji in table: print('{}{}{}{}'.format( pad(s, 20), pad(reading1, 10), pad(reading2, 10), pad(kanji, 10), ))
Here is a screenshot of how this looks on my system (macOS):
Limitations
The above code does not handle Unicode combining characters. A more complete implementation would perform Unicode text segmentation, and then figure out the width of each grapheme cluster. There are libraries that do this for you, I’m sure.
Language Note
As a note, I don’t think the words “少ない” and “pequeño” are likely equivalents. The Spanish word “pequeño” refers to the size of something, and “少ない” refers to the quantity.
I think it’s more likely that
- poco: 少ない
- pequeño: 小さい