In the ever-evolving world of software development, the importance of mock data cannot be overstated. As we build applications, testing becomes crucial, and having realistic, production-grade mock data can make all the difference. Today, we’re diving into a powerful tool called Polyfactory that allows us to generate rich mock data straight from Python type hints. But what does that mean for developers? Let’s explore.
Setting Up Your Environment
Before we get started, it’s essential to set up our development environment properly. For this tutorial, you’ll need to have Python installed, along with Polyfactory, dataclasses, Pydantic, and attrs. If you haven’t already installed these, you can do so using pip:
pip install polyfactory pydantic attrs
Once you have your packages ready, let’s jump into creating our first mock data factory.
Building Factories for Data Classes
Polyfactory allows us to define factories that generate instances of Python classes. To illustrate this, let’s create a simple data class that models a user profile:
from dataclasses import dataclass
@dataclass
class UserProfile:
username: str
email: str
age: int
With our UserProfile class defined, we can create a factory using Polyfactory:
from polyfactory import PolyFactory
user_profile_factory = PolyFactory(UserProfile)
This simple line of code sets us up with a factory that will produce instances of UserProfile. But there’s so much more we can do!
Customizing Your Data
One of the standout features of Polyfactory is its ability to customize the generated data. For instance, if we want to ensure that the username generated by our factory always starts with a letter, we can override the default behavior:
from random import choice, randint
import string
def custom_username():
return choice(string.ascii_letters) + ''.join(choice(string.ascii_letters + string.digits) for _ in range(7))
user_profile_factory.override('username', custom_username)
Now, every time we call our factory, it’ll generate a username that meets our criteria. This kind of customization is incredibly useful, especially when we’re trying to simulate real-world scenarios.
Pydantic Models and Validation
Next up, let’s discuss how we can integrate Pydantic models into our mock data pipeline. Pydantic provides data validation, which is particularly helpful when we need to ensure that our data meets specific criteria. Here’s how we can define a Pydantic model:
from pydantic import BaseModel
class UserProfileModel(BaseModel):
username: str
email: str
age: int
By using UserProfileModel, we can now create a factory that not only generates data but also validates it:
user_profile_model_factory = PolyFactory(UserProfileModel)
This setup allows us to generate data and catch any validation errors early in the development process.
Nesting Models
But what if your data structures are more complex and require nested models? Polyfactory shines here as well. Let’s say we want to include an address in our user profile:
@dataclass
class Address:
street: str
city: str
zip_code: str
@dataclass
class ExtendedUserProfile:
username: str
email: str
age: int
address: Address
We can create a factory for ExtendedUserProfile and include a factory for the nested Address class:
address_factory = PolyFactory(Address)
extended_user_profile_factory = PolyFactory(ExtendedUserProfile, {'address': address_factory})
This way, our mock data will include complete user profiles with realistic addresses attached!
Calculated Fields and Overrides
Let’s also talk about calculated fields. Sometimes, you might want to generate a field based on the values of other fields. For instance, if we want to derive the user’s year of birth from their age, we can add a calculated field like this:
@property
def year_of_birth(self):
return 2023 - self.age # assuming the current year is 2023
By integrating this into our model, we keep our data dynamic and relevant. But what if our calculation needs specific values? Overrides in Polyfactory allow us to control that:
def calculate_age_based_on_birth_year():
return randint(18, 65)
extended_user_profile_factory.override('age', calculate_age_based_on_birth_year)
This approach ensures that our age values remain realistic and aligned with our calculated fields.
Conclusion: Moving Forward with Mock Data Pipelines
Designing production-grade mock data pipelines using Polyfactory opens up a world of possibilities for developers. By leveraging Python’s data classes, Pydantic models, and attrs-based classes, you can create rich, realistic mock data tailored to your application’s needs. From customization to validation and nested structures, Polyfactory stands out as an essential tool in the developer’s toolkit. So, what are you waiting for? Start building your mock data pipelines today!
Sam Torres
Digital ethicist and technology critic. Believes in responsible AI development.




