Python - Python Bytecode and Disassembly (Detailed Explanation)
When you write Python code, it is not executed directly by the computer in its original form. Instead, Python first converts your source code (.py files) into an intermediate representation known as bytecode. This bytecode is then executed by the Python Virtual Machine (PVM). Understanding this process gives deeper insight into how Python works internally and can help with debugging, optimization, and advanced programming techniques.
What is Python Bytecode
Bytecode is a low-level, platform-independent representation of your Python program. It is not machine code (like C or C++ compiled output), but a set of instructions designed specifically for the Python interpreter.
When you run a Python program:
-
The source code is parsed.
-
It is compiled into bytecode.
-
The bytecode is executed by the Python Virtual Machine.
This bytecode is often stored in .pyc files inside the __pycache__ directory. These files allow Python to skip recompilation if the source code has not changed, improving performance during repeated execution.
Python Virtual Machine (PVM)
The PVM is responsible for executing bytecode instructions. It acts as an interpreter that reads one bytecode instruction at a time and performs the corresponding operation.
This is why Python is considered both compiled (to bytecode) and interpreted (executed by the PVM).
Disassembly in Python
Disassembly refers to converting bytecode back into a human-readable form. Python provides a built-in module called dis that allows you to inspect the bytecode generated from your code.
This is useful for:
-
Understanding how Python executes code internally
-
Debugging complex behavior
-
Comparing performance of different implementations
Example of Disassembly
Consider a simple Python function:
def add(a, b):
return a + b
Using the dis module:
import dis
dis.dis(add)
The output will look something like this:
2 0 LOAD_FAST 0 (a)
2 LOAD_FAST 1 (b)
4 BINARY_ADD
6 RETURN_VALUE
Understanding the Instructions
Each line in the disassembled output represents a bytecode instruction:
-
LOAD_FAST: Loads a local variable onto the stack
-
BINARY_ADD: Adds the top two values on the stack
-
RETURN_VALUE: Returns the result from the function
Python uses a stack-based execution model, meaning operations are performed using a stack rather than registers.
Stack-Based Execution Model
In the example:
-
LOAD_FAST apushes the value ofaonto the stack -
LOAD_FAST bpushes the value ofb -
BINARY_ADDpops both values, adds them, and pushes the result -
RETURN_VALUEreturns the final result
This stack mechanism is central to how Python executes bytecode.
.pyc Files and pycache
When a Python file is executed, Python may generate a compiled bytecode file:
-
Stored in
__pycache__ -
Named like:
module.cpython-<version>.pyc -
Automatically reused if the source code has not changed
These files:
-
Improve startup time
-
Are platform-independent
-
Should not be manually edited
Why Bytecode Matters
Understanding bytecode helps in:
-
Performance Optimization
You can analyze which operations are expensive and restructure code accordingly. -
Debugging
Helps trace how Python interprets complex expressions. -
Security Awareness
Bytecode can be reverse-engineered, so sensitive logic should not rely on code secrecy. -
Advanced Development
Useful in writing compilers, interpreters, or tools like linters and static analyzers.
Limitations of Bytecode Analysis
-
Bytecode is specific to the Python version
-
It may change between versions
-
It is not as low-level as machine code
-
Not typically needed for everyday programming
Conclusion
Python bytecode serves as the bridge between human-readable source code and machine execution. By using tools like the dis module, developers can explore how Python translates and executes code internally. While this level of understanding is not required for basic programming, it becomes highly valuable for advanced debugging, performance tuning, and deeper knowledge of Python’s execution model.