Skip to content
Advertisement

Interpret Python bytecode in C# (with fine control)

For a project idea of mine, I have the following need, which is quite precise:

I would like to be able to execute Python code (pre-compiled before hand if necessary) on a per-bytecode-instruction basis. I also need to access what’s inside the Python VM (frame stack, data stacks, etc.). Ideally, I would also like to remove a lot of Python built-in features and reimplement a few of them my own way (such as file writing).

All of this must be coded in C# (I’m using Unity).

I’m okay with loosing a few of Python’s actual features, especially concerning complicated stuff with imports, etc. However, I would like most of it to stay intact.

I looked a little bit into IronPython‘s code but it remains very obscure to me and it seems quite enormous too. I began translating Byterun (a Python bytecode interpreter written in Python) but I face a lot of difficulties as Byterun leverages a lot of Python’s features to… interpret Python.

Today, I don’t ask for a pre-made solution (except if you have one in mind?), but rather for some advice, places to look at, etc. Do you have any ideas about the things I should research first?

Advertisement

Answer

I’ve tried to do my own implementation of the Python VM in the distant past and learned a lot but never came even close to a fully working implementation. I used the C implementation as a starting point, specifically everything in https://github.com/python/cpython/tree/main/Objects and https://github.com/python/cpython/blob/main/Python/ceval.c (look for switch(opcode))

Here are some pointers:

Come to grips with the Python object model. Implement an abstract PyObject class with the necessary methods for instancing, attribute access, indexing and slicing, calling, comparisons, aritmetic operations and representation. Provide concrete implemetations for None, booleans, ints, floats, strings, tuples, lists and dictionaries.

Implement the core of your VM: a Frame object that loops over the opcodes and dispatches, using a giant switch statment (following the C implementation here), to the corresponding methods of the PyObject. The frame should maintains a stack of PyObjects for the operants of the opcodes. Depending on the opcode, arguments are popped from and pushed on this stack. A dict can be used to store and retrieve local variables. Use the Frame object to create a PyObject for function objects.

Get familiar with the idea of a namespace and the way Python builds on the concept of namespaces. Implement a module, a class and an instance object, using the dict to map (attribute)names to objects.

Finally, add as many builtin functions as you think you need to get a usefull implementation.

I think it is easy to underestimate the amount of work you’re getting yourself into, but … have fun!

Advertisement