Security Risks of Using Pickle for Deserialization in Python
Wenhao Wang
Dev Intern · Leapcell

Deserialization Attacks and Prevention of the Pickle Module in Python
Introduction
Hello everyone! In the realm of Python programming, there exists a potential security risk – deserialization attacks. Before delving into deserialization attacks, it is essential for us to understand what serialization and deserialization are.
Conceptually, serialization is the process of converting a data structure or object into a byte stream. Through this conversion, data can be conveniently saved to a file or transmitted over a network. Deserialization, on the other hand, is the reverse process, which converts the byte stream back into the original data structure or object.
In Python, the Pickle module is one of the commonly used tools for implementing serialization and deserialization. It provides a convenient interface that can serialize and save complex Python objects, and when needed later, it can easily deserialize and restore them. However, this convenience also brings potential security risks.
Overview of Deserialization Attacks
The deserialization process is not always secure and reliable. When we perform deserialization operations from an untrusted data source, there is a possibility of suffering from deserialization attacks. Attackers can embed malicious code in the serialized data. Once these data are deserialized, the embedded malicious code will be executed. Such attacks may lead to serious consequences, such as data leakage, system crashes, and even enable attackers to obtain remote control permissions of the system.
Overview of the Python Pickle Module
Basic Functions of Pickle
The Pickle module is part of the Python standard library and can be used without additional installation. Its main function is to implement the serialization and deserialization of Python objects. Whether it is a simple basic data type or a complex data structure (such as lists, dictionaries, class instances, etc.), Pickle can convert it into a byte stream for storage or transmission and restore it to the original object form when needed.
Working Principle of Pickle
The working principle of Pickle is relatively intuitive. In the serialization stage, it will convert Python objects into byte streams according to specific rules. These byte streams contain the type information and data content of the objects. In the deserialization stage, Pickle will read the byte stream and restore it to the corresponding Python object according to the information in it.
Serialization and Deserialization of Pickle
- Serialization:
Pickle provides two main serialization functions: pickle.dumpandpickle.dumps. Thepickle.dumpfunction will directly write the serialized object into the specified file, while thepickle.dumpsfunction will return a byte stream containing the serialized data.
import pickle # Create an object data = {'name': 'Leapcell', 'age': 29, 'city': 'New York'} # Serialize the object and write it to a file with open('data.pickle', 'wb') as file: pickle.dump(data, file) # Or return a byte stream data_bytes = pickle.dumps(data)
- Deserialization:
There are also two commonly used functions for deserialization: pickle.loadandpickle.loads. Thepickle.loadfunction reads the byte stream from the specified file and deserializes it, and thepickle.loadsfunction directly deserializes a byte stream.
import pickle # Deserialize the object from the file with open('data.pickle', 'rb') as file: data = pickle.load(file) # Or directly deserialize a byte stream data = pickle.loads(data_bytes)
Principle of Deserialization Attacks
Attack Mechanism
The core of deserialization attacks is that attackers can inject malicious code into the serialized data. When the target system deserializes these serialized data containing malicious code, the malicious code will be executed, thus achieving the attacker's goal. That is to say, if we do not conduct strict verification and screening of the data source during deserialization, it is equivalent to opening the door for attackers to execute arbitrary code in the system.
What Attackers Can Do
Attackers can use deserialization vulnerabilities to perform a variety of malicious operations, such as executing arbitrary system commands, modifying important data in the system, or stealing sensitive information, etc. These operations may cause serious damage to the security and stability of the system.
Example Code
To more clearly demonstrate the process of deserialization attacks, let's look at a specific example:
import pickle import os # Construct malicious code class Malicious: def __reduce__(self): return (os.system, ('echo Hacked!',)) # Serialize the malicious object malicious_data = pickle.dumps(Malicious()) # Execute malicious code during deserialization pickle.loads(malicious_data)
In this example:
- Construct malicious code: We define a class named Maliciousand specify the commandos.system('echo Hacked!')to be executed in its__reduce__method. The__reduce__method is a special method that Pickle will call during the deserialization process to reconstruct the object.
- Serialize the malicious object: Use the pickle.dumpsfunction to serialize an instance of theMaliciousclass to obtain the byte streammalicious_datacontaining malicious code.
- Deserialize the malicious object: When using the pickle.loadsfunction to deserializemalicious_data, the__reduce__method will be called, thus executing the specified command and outputting "Hacked!".
How to Prevent Pickle Deserialization Attacks
Principles of Secure Deserialization
The primary principle of preventing deserialization attacks is to avoid performing deserialization operations from untrusted sources. Only when the data source is completely trusted can deserialization operations be carried out.
Practical Defense Methods
- Example of Secure Deserialization Code:
If deserialization using Pickle is necessary in some cases, the types of objects that can be deserialized can be limited by overloading the find_classmethod, thereby restricting the scope of deserialization.
import pickle import types import io # Custom Unpickler to restrict deserializable types class RestrictedUnpickler(pickle.Unpickler): def find_class(self, module, name): if module == "builtins" and name in {"str", "list", "dict", "set", "int", "float", "bool"}: return getattr(__import__(module), name) raise pickle.UnpicklingError(f"global '{module}.{name}' is forbidden") def restricted_loads(s): return RestrictedUnpickler(io.BytesIO(s)).load()
In the above code, we have customized a RestrictedUnpickler class, which inherits from pickle.Unpickler and overrides the find_class method. In this way, only some secure built-in types are allowed to be deserialized, thus improving the security of deserialization operations.
2. Use Other Secure Serialization Modules (such as JSON):
A more secure option is to use the JSON module instead of Pickle for serialization and deserialization operations. JSON only supports basic data types (such as strings, numbers, booleans, arrays, and objects) and will not execute arbitrary code, so it has certain advantages in terms of security.
import json # Serialize the object data = {'name': 'Leapcell', 'age': 29, 'city': 'New York'} data_json = json.dumps(data) # Deserialize the object data = json.loads(data_json)
Conclusion
This article comprehensively introduces the concepts of serialization and deserialization in Python, as well as the application of the Pickle module in this process. At the same time, it elaborates in detail on the principles of deserialization attacks and demonstrates the ways that attackers may use through specific code examples. Finally, we discussed the principles and specific methods of preventing Pickle deserialization attacks, including restricting deserialization types and using more secure serialization modules. It is hoped that through the introduction of this article, everyone can have a deeper understanding of deserialization attacks and take effective preventive measures in actual programming to ensure the security of the system. If you have any questions or suggestions about the content of this article, you are welcome to discuss them in the comment section.
Leapcell: The Best of Serverless Web Hosting
Finally, I would like to recommend a platform that is most suitable for deploying Python services: Leapcell

🚀 Build with Your Favorite Language
Develop effortlessly in JavaScript, Python, Go, or Rust.
🌍 Deploy Unlimited Projects for Free
Only pay for what you use—no requests, no charges.
⚡ Pay-as-You-Go, No Hidden Costs
No idle fees, just seamless scalability.

🔹 Follow us on Twitter: @LeapcellHQ

