Validation

Model typing¶

It is also possible to provide types for a Model subclass, such that linters and callers know exactly which item type is expected, and what the result of a Model call looks like.

Types are specified when instantiating the Model class:

# This model takes `str` items and always returns `int` values
class SomeTypedModel(Model[str, int]):
    def _predict(self, item):
        return len(item)

Static type checking¶

Setting Model types allows static type checkers to fail if the expected return value of calls to predict have the wrong types.

Consider the above model:

m = SomeTypedModel()
x : int = m("ok")
y : List[int] = m(["ok", "boomer"])
z : int = m(1) # would lead to a typing error with typecheckers (e.g. mypy)

Runtime type validation¶

In addition, whenever the model's predict method is called, the type of the item is validated against the provided type and raises an error if the validation fails:

modelkit.core.model.ItemValidationException if the item fails to validate
modelkit.core.model.ReturnValueValidationException if the return value of the predict fails to validate

Marshalling of item/return values¶

It is possible to specify a pydantic.BaseModel subtype as a type argument for Model classes. This will actually change the structure of the data that is fed into to the _predict method. For example:

class ItemModel(pydantic.BaseModel):
    x: int

class ReturnModel(pydantic.BaseModel):
    x: int

class SomeValidatedModel(Model[ItemModel, ReturnModel]):
    def _predict(self, item):
        # item is guaranteed to be an instance of `ItemModel` even if we feed a dictionary item
        result = {"x": item.x}
        # We can either return a dictionary
        return result
        # or return the pydantic structure
        # return ReturnModel(x = item.x)

m = SomeValidatedModel()
# although we return a dict from the _predict method, return value
# is turned into a `ReturnModel` instance.
y : ReturnModel = m({"x": 1})

This also works with list of items

class SomeValidatedModelBatch(Model[ItemModel, ReturnModel]):
    def _predict_batch(self, items):
        return [{"x": item.x} for item in items]

m = SomeValidatedModelBatch()
y : List[ReturnModel] = m.predict_batch(items=[{"x": 1}, {"x": 2}])

Note

Note that, although we call predict with a dictionary, _predict will see pydantic structures. Importantly, this means that attributes now need to be refered to with natural naming: item.x instead of item["x"]

Disabling validation¶

pydantic validation can take some time, and in some cases the validation may end up taking much more time than the prediction itself.

This occurs generally when:

a Model's payload is large (contains long lists of objects to validate)
a Model's prediction is very simple

To avoid the validation overhead, especially in production scenarios, it is possible to ask modelkit to create models without validation, which will be faster in general. This also still creates pydantic structure and therefore will not break the natural naming inside the predict function.