Workings of Python Interpreter

Python is very popular programming language, whenever I go for any Research Conference or an tech event everyone seems to be talking about Python and how it can be applied to different domains like Data Science. Owing to its popularity as well higher demand in the industry, it’s really crucial to understands fundamentals of it – First question need to be asked is

How does Python Works? or What’s under the hood of Python? Its Python interpreter which analyze Python code and convert it into machine executable code.

Most commonly used Python interpreter is CPython, which converts Python code to C code before it being executed by machine.

Python interpretor procedure for converting Python code to Machine Code

1. Initialization
• Converting main to Py_Main
2. Compilation
• Parse tree generation
• AST generation
• Bytecode generation
• Bytecode optimization
• Code object generation
3. Execution
• Executing code object on machine

Explaining terms used above 🤔
main – Python’s start of code function

Py_Main – int Py_Main(int argc, wchar_t **argv) C function kind of equivalent of main python function

Parse Tree Generation – Ananlyzing sequence of characters in Python code and matching it against language grammer then making a tree out of it. (Do not worry I will explain more about it later in this article).

AST Generation – Generating an abstract syntax structure of Python code.

Bytecode Generation – Generating byte code ( 101010000101 like this)

Bytecode Optimization – Making changes to code so that it become more simple, cleaner for being executable by interpreter

Code Object Generation – Taking zeros-ones code and bundling it together as objects.

How does Python interpreter works?, Python to machine code steps, how does python works underhood?

Let’s try to understand each step involved in conversion of Python to C code using interpreter.

Initialization

  1. Interpreter starts with looking for main method inside file which need to executed(For example – Python3 codefile.py). Here codefile.py will be searched to find out main method inside it.
  2. When interpreter found main method then it will call function Py_Main, which initialises some parameters like command-line arguments, program flags, environment variables.
  3. Then Py_Main function calls Py_Initialize function which initialises sys.modules, builtins, __main__ and sys.
    Kind of setting up fundamental stuff needed later. Moreover it also initialises two data structures PyInterpreterState, PyThresdState which will later be used by interpreter.
  4. After doing Step 3, then Py_Main function calls PyRun_file function afterthat a series of function calls are made in following sequence –

    PyRun_AnyFileExFlags – Creates __main__ namespace kind of similar to namespace in C Programming Language. This function will also look for a pyc version of codefile.py in same directory if in case this function found pyc then it will execute it.
    PyRun_SimpleFileExFlags
    PyRun_FileExFlags
    PyParser_ASTFromFileObject – Returns a module made out of code from Python.

Compilation

Parse Tree Generation

Continuing after Step 4 from above.

  • Now Py_Main function will call PyParser_ParseFileObject Function which takes in module returned by PyParser_ASTFromFileObject in last step and build ups a Parse Tree from it.

Abstract Syntax Tree(AST) Generation

  • PyParser_ASTFromNodeObject function is called which parse tree as an argument and creates an Abstract Syntax Tree from it.

Byte Code Generation, Optimisation and Forming Code Objects

  • After generating Abstract Syntax Tree run_mod function is called which in turn invokes PyAST_CompileObject returning Byte Code. Inbetween the processes PyAST_CompileObject function convert code to Bytes and then do some optimisation as well.

Executing Code Object

  • As till now Code Objects have been created now is the time to finally run these. Same function run_mod function which was invoked in last step after calling PyAST_CompileObject now calls PyEval_EvalCode function. Which in turn makes calls to PyEval_EvalCodeWithName which again will make call to PyEval_EvalFrameEx.

After Executing Code Object – CleanUp Process

After finishing up execution of Code Object Py_Main function call Py_FinalizeEx function, which do processes to do clean up memory, waiting for running threads to exit and making a way for Interpreter to exit.

Josh

Hi, I'm Josh a Computer Science graduate from California State University, Sacramento since coming out with my Master's from university. I've worked with multiple startups across US and in UK as well primarily as a Python Developer. Here on this website, I'm sharing my knowledge of Python. If you want to ask me anything about Python feel free to reach out, I would be happy to help you out.

Leave a Reply

Your email address will not be published. Required fields are marked *

Recent Posts