Currently, I am trying to create a pydantic model for a pandas dataframe. I would like to check if a column is unique by the following
JavaScript
x
23
23
1
import pandas as pd
2
from typing import List
3
from pydantic import BaseModel
4
5
class CustomerRecord(BaseModel):
6
7
id: int
8
name: str
9
address: str
10
11
class CustomerRecordDF(BaseModel):
12
13
__root__: List[CustomerRecord]
14
15
16
df = pd.DataFrame({'id':[1,2,3],
17
'name':['Bob','Joe','Justin'],
18
'address': ['123 Fake St', '125 Fake St', '123 Fake St']})
19
20
df_dict = df.to_dict(orient='records')
21
22
CustomerRecordDF.parse_obj(df_dict)
23
I would now like to run a validation here and have it fail since address is not unique.
The following returns what I need
JavaScript
1
25
25
1
from pydantic import root_validator
2
3
class CustomerRecordDF(BaseModel):
4
5
__root__: List[CustomerRecord]
6
7
@root_validator(pre=True)
8
def unique_values(cls, values):
9
root_values = values.get('__root__')
10
value_set = set()
11
for value in root_values:
12
print(value['address'])
13
14
15
if value['address'] in value_set:
16
raise ValueError('Duplicate Address')
17
else:
18
value_set.add(value['address'])
19
return values
20
21
CustomerRecordDF.parse_obj(df_dict)
22
>>> ValidationError: 1 validation error for CustomerRecordDF
23
__root__
24
Duplicate Address (type=value_error)
25
but i want to be able to reuse this validator for other other dataframes I create and to also pass in this unique check on multiple columns. Not just address.
Ideally something like the following
JavaScript
1
9
1
from pydantic import root_validator
2
3
class CustomerRecordDF(BaseModel):
4
5
__root__: List[CustomerRecord]
6
7
_validate_unique_name = root_unique_validator('name')
8
_validate_unique_address = root_unique_validator('address')
9
Advertisement
Answer
You could use an inner function and the allow_reuse
argument:
JavaScript
1
7
1
def root_unique_validator(field):
2
def validator(cls, values):
3
# Use the field arg to validate a specific field
4
5
6
return root_validator(pre=True, allow_reuse=True)(validator)
7
Full example:
JavaScript
1
49
49
1
import pandas as pd
2
from typing import List
3
from pydantic import BaseModel, root_validator
4
5
6
class CustomerRecord(BaseModel):
7
id: int
8
name: str
9
address: str
10
11
12
def root_unique_validator(field):
13
def validator(cls, values):
14
root_values = values.get("__root__")
15
value_set = set()
16
for value in root_values:
17
if value[field] in value_set:
18
raise ValueError(f"Duplicate {field}")
19
else:
20
value_set.add(value[field])
21
return values
22
23
return root_validator(pre=True, allow_reuse=True)(validator)
24
25
26
class CustomerRecordDF(BaseModel):
27
__root__: List[CustomerRecord]
28
29
_validate_unique_name = root_unique_validator("name")
30
_validate_unique_address = root_unique_validator("address")
31
32
33
df = pd.DataFrame(
34
{
35
"id": [1, 2, 3],
36
"name": ["Bob", "Joe", "Justin"],
37
"address": ["123 Fake St", "125 Fake St", "123 Fake St"],
38
}
39
)
40
41
df_dict = df.to_dict(orient="records")
42
43
CustomerRecordDF.parse_obj(df_dict)
44
45
# Output:
46
# pydantic.error_wrappers.ValidationError: 1 validation error for CustomerRecordDF
47
# __root__
48
# Duplicate address (type=value_error)
49
And if you use a duplicated name:
JavaScript
1
19
19
1
# Here goes the most part of the full example above
2
3
df = pd.DataFrame(
4
{
5
"id": [1, 2, 3],
6
"name": ["Bob", "Joe", "Bob"],
7
"address": ["123 Fake St", "125 Fake St", "127 Fake St"],
8
}
9
)
10
11
df_dict = df.to_dict(orient="records")
12
13
CustomerRecordDF.parse_obj(df_dict)
14
15
# Output:
16
# pydantic.error_wrappers.ValidationError: 1 validation error for CustomerRecordDF
17
# __root__
18
# Duplicate name (type=value_error)
19
You could also receive more than one field
and have a single root validator that validates all the fields. That will probably make the allow_reuse
argument unnecessary.