Skip to content
Advertisement

python str.format with utf-8 characters that take more than 1 position

I trying to print japanese characters in python, aligned in columns. It seems that japanese characters have a width equivalent to two spaces, so the alignment doesn’t work.

Here is the code:

JavaScript

The output I obtain is the following:

JavaScript

The column in the left is spanish but that’s not important. The important thing is that the 3 columns on the right are not aligned. I have counted the number of positions and it is correct, that is, the first japanese column always is 10 ‘positions’ long, the problem is that japanese characters are 2 positions wide while blanks are 1 only.

I have checked as well that a blank (using the japanese input) is two positions wide as well, therefore I should be able to fix the problem by replacing the ‘latin’ space (1 position width) by the japanese one.

How can I change the character that format will use to align strings?

EDIT

I have found that str.format has a parameter which is fill. I have tried to replace this by the japanese blank (two positions wide) and the result is worse.

EDIT 2

I have solved it by implementing this function

JavaScript

however, is there a better (built-in) way of achieving this?

Advertisement

Answer

In a terminal, it’s common for certain characters to take up two columns and other characters to take up one column. You can find out which characters are which by using the unicodedata Python module, which has an east_asian_width().

Here is an example of how you can use it to pad your text:

JavaScript

Here is a screenshot of how this looks on my system (macOS):

The same table, with columns lined up visually.

Limitations

The above code does not handle Unicode combining characters. A more complete implementation would perform Unicode text segmentation, and then figure out the width of each grapheme cluster. There are libraries that do this for you, I’m sure.

Language Note

As a note, I don’t think the words “少ない” and “pequeño” are likely equivalents. The Spanish word “pequeño” refers to the size of something, and “少ない” refers to the quantity.

I think it’s more likely that

  • poco: 少ない
  • pequeño: 小さい
User contributions licensed under: CC BY-SA
9 People found this is helpful
Advertisement