I’m trying to convert a list of numbers that I believe represent bytes that together constitute a bitmap image, into said image file (saved to disk) and/or simply converted into a form usable by tesseract. I’d prefer to be able to visualize the images, though, to make sure the conversion actually worked properly. I don’t know the shape of the image, but I think it might be 4 wide by 8 tall.
I have this json file of a character mapping for a font (an image-based font, used by a Japanese dictionary) where each character is represented as a bitmap image, e.g. one character is:
{ "bitmap": [0,0,26,0,17,252,17,36,89,100,81,84,81,132,209,252,144,0,19,254,42,84,46,84,38,84,66,86,79,255,0,0], "code": 46370 }
I’m trying to get at the actual characters these represent. The way I’m trying to go about this is by converting these lists of ints into bytes (or arrays of bytes that they represent), then converting them into bitmap image files (and possible saving them to disk – that’s the step I’m stuck at), and then I’ll OCR those images (either in python with tesseract or maybe with Adobe’s OCR if I can put them in a pdf) to determine their UTF-8 or Shift-JIS equivalents. If I’m overcomplicating this then I’d also appreciate some more direction!
I’ve referred to the following stackoverflow posts (and a few others) to try and convert the list of ints into an actual image file: How do I convert byte array to bitmap image in Python? Converting int to bytes in Python 3 Image from bytes (python) PIL: Convert Bytearray to Image Convert Numpy array of ASCII codes to string
I’ve also tried this library and I think I successfully converted the list into a string representing bits, and into this library’s version of a bitmap, but I can’t figure out how to save the resulting object. Looking at the source, this particular library’s bitmap class seems kind of useless for what I want to do.
The numbers above supposedly correspond to this picture: (which is not greyscale).
I’ve written something that converts a list of ints into either bytes or a “bytearray” (I’ve just tried a lot of different things, and I’m not sure which format I actually need), but then I get stuck when I try to save those bytes as a bmp file. Depending on what I try, I get errors like the following:
OSError: cannot identify image file 'out.bmp' OSError: cannot identify image file <_io.BytesIO object at 0x000001F037F7C5C8> AttributeError: 'BitMap' object has no attribute 'save'
or, I just save a file that can’t be opened because it’s an unsupported file format (e.g. if I just open a file and write to it).
I’m guessing part of the problem is that I’m not saving data that uses bitmap headers. But also, saving some bytes as an image seems a lot more complicated than I thought, so frankly, I don’t even know where to begin.
I’m also not sure if the byte array I’m making is an array of the individual bytes or some representation of the whole list…
Can someone help me save this list of numbers as an image? I don’t know if I actually need to save it as a bitmap.
This is (one version of) my program:
import io from PIL import Image test_image = "out.bmp" test_bytes = [0,0,26,0,17,252,17,36,89,100,81,84,81,132,209,252,144,0,19,254,42,84,46,84,38,84,66,86,79,255,0,0] actual_bytes = bytes(test_bytes) def generate_output_image(input_image): image = Image.open(io.BytesIO(input_image)) image.save(test_image) generate_output_image(actual_bytes)
Advertisement
Answer
It seems like you’re converting the json output of https://github.com/FooSoft/zero-epwing.
I haven’t been able to figure out how to convert the wide glyphs, but I used this (absolute hack of a) Python script to export the narrow glyphs.
Change font.json
to the path for your json file.
It exports bmp’s with the glyph code as the filename.
from PIL import Image import json with open('font.json') as json_file: data = json.load(json_file) for font in data['fonts']: for glyph in font['narrow']['glyphs']: bitmap = glyph['bitmap'] row = 0 img = Image.new('1', (24, 24)) pixels = img.load() for high,low in zip(bitmap[::2], bitmap[1::2]): bits = list(map(int, list('{:08b}'.format(high) + '{:08b}'.format(low)))) col = 0 for bit in bits: pixels[col, row] = not bit col += 1 row += 1 img.save(f'{glyph["code"]}.bmp')
This is what the exported glyphs look like.
Hopefully this is enough to get you started.