Python - Python Bytecode and Disassembly (Detailed Explanation)

When you write Python code, it is not executed directly by the computer in its original form. Instead, Python first converts your source code (.py files) into an intermediate representation known as bytecode. This bytecode is then executed by the Python Virtual Machine (PVM). Understanding this process gives deeper insight into how Python works internally and can help with debugging, optimization, and advanced programming techniques.

What is Python Bytecode

Bytecode is a low-level, platform-independent representation of your Python program. It is not machine code (like C or C++ compiled output), but a set of instructions designed specifically for the Python interpreter.

When you run a Python program:

The source code is parsed.
It is compiled into bytecode.
The bytecode is executed by the Python Virtual Machine.

This bytecode is often stored in .pyc files inside the __pycache__ directory. These files allow Python to skip recompilation if the source code has not changed, improving performance during repeated execution.

Python Virtual Machine (PVM)

The PVM is responsible for executing bytecode instructions. It acts as an interpreter that reads one bytecode instruction at a time and performs the corresponding operation.

This is why Python is considered both compiled (to bytecode) and interpreted (executed by the PVM).

Disassembly in Python

Disassembly refers to converting bytecode back into a human-readable form. Python provides a built-in module called dis that allows you to inspect the bytecode generated from your code.

This is useful for:

Understanding how Python executes code internally
Debugging complex behavior
Comparing performance of different implementations

Example of Disassembly

Consider a simple Python function:

def add(a, b):
    return a + b

Using the dis module:

import dis
dis.dis(add)

The output will look something like this:

  2           0 LOAD_FAST                0 (a)
              2 LOAD_FAST                1 (b)
              4 BINARY_ADD
              6 RETURN_VALUE

Understanding the Instructions

Each line in the disassembled output represents a bytecode instruction:

LOAD_FAST: Loads a local variable onto the stack
BINARY_ADD: Adds the top two values on the stack
RETURN_VALUE: Returns the result from the function

Python uses a stack-based execution model, meaning operations are performed using a stack rather than registers.

Stack-Based Execution Model

In the example:

LOAD_FAST a pushes the value of a onto the stack
LOAD_FAST b pushes the value of b
BINARY_ADD pops both values, adds them, and pushes the result
RETURN_VALUE returns the final result

This stack mechanism is central to how Python executes bytecode.

.pyc Files and pycache

When a Python file is executed, Python may generate a compiled bytecode file:

Stored in __pycache__
Named like: module.cpython-<version>.pyc
Automatically reused if the source code has not changed

These files:

Improve startup time
Are platform-independent
Should not be manually edited

Why Bytecode Matters

Understanding bytecode helps in:

Performance Optimization
You can analyze which operations are expensive and restructure code accordingly.
Debugging
Helps trace how Python interprets complex expressions.
Security Awareness
Bytecode can be reverse-engineered, so sensitive logic should not rely on code secrecy.
Advanced Development
Useful in writing compilers, interpreters, or tools like linters and static analyzers.

Limitations of Bytecode Analysis

Bytecode is specific to the Python version
It may change between versions
It is not as low-level as machine code
Not typically needed for everyday programming

Conclusion

Python bytecode serves as the bridge between human-readable source code and machine execution. By using tools like the dis module, developers can explore how Python translates and executes code internally. While this level of understanding is not required for basic programming, it becomes highly valuable for advanced debugging, performance tuning, and deeper knowledge of Python’s execution model.