Tensorflow models¶
modelkit provides different modes to use TF models, and makes it easy to switch between them.
- calling the TF model using the
tensorflowmodule - requesting predictions from TensorFlow Serving synchronously via a REST API
- requesting predictions from TensorFlow Serving asynchronously via a REST API
- requesting predictions from TensorFlow Serving synchronously via gRPC
TensorflowModel class¶
All tensorflow based models should derive from the TensorflowModel class. This class provides a number of functions that help with loading/serving TF models.
At initialization time, a TensorflowModel has to be provided with definitions of the tensors predicted by the TF model:
output_tensor_mappinga dict of arbitrarykeys to tensor names describing the outputs.output_tensor_shapesandoutput_dtypesa dict of shapes and dtypes of these tensors.
Important
Be careful that _tensorflow_predict returns a dict of np.ndarray of shape (len(items),?) when _predict_batch expects a list of len(items) dicts of np.ndarray.
Other convenience methods¶
Post processing¶
After the TF call, _tensorflow_predict_* returns a dict of np.ndarray of shape (len(items),?).
These can be further manipulated by reimplementing the TensorflowModel._post_processing function, e.g. to reshape, change type, select a subset of features.
Empty predictions¶
Oftentimes we manipulate the item before feeding it to TF, e.g. doing text cleaning or vectorization. This sometimes results in making the prediction trivial, in which case we need not bother calling TF with anything.
modelkit provides a built-in mechanism do deal with these "empty" examples, and the default implementation of predict_batch uses it.
To make use of it, override the _is_empty method:
def _is_empty(self, item) -> bool:
return item == ""
This will fill in missing values with zeroed arrays when empty strings are found, without calling TF.
To fill in values with another array, also override the _generate_empty_prediction method
def _generate_empty_prediction(self) -> Dict[str, Any]:
"""Function used to fill in values when rebuilding predictions with the mask"""
return {
name: np.zeros((1,) + self.output_shapes[name], self.output_dtypes[name])
for name in self.output_tensor_mapping
}
Keras model¶
The TensorflowModel class allows you to build an instance of keras.Model from the underlying saved tensorflow model via the method get_keras_model().
TF Serving¶
modelkit provides an easy way to query Tensorflow models served via TF Serving. When TF serving is configured, the TF models are not run in the main process, but queried.
Running TF serving container locally¶
In order to run a TF Serving docker locally, one first needs to download the models and write a configuration file.
This can be achieved by
modelkit tf-serving local-docker --models [PACKAGE]
The CLI creates a configuration file for tensorflow serving, with the model locations refered to relative to the container file system. As a result, the TF serving container will expect that the MODELKIT_ASSETS_DIR is bound to the /config directory inside the container.
Specifically, the CLI:
- Instantiates a
ModelLibrarywith all configured models inPACKAGE - Downloads all necessary assets in the
MODELKIT_ASSETS_DIR - writes a configuration file under the local
MODELKIT_ASSETS_DIRwith all TF models that are configured
The container can then be started by pointing TF serving to the generated configuration file --model_config_file=/config/config.config:
docker run \
--name local-tf-serving \
-d \
-p 8500:8500 -p 8501:8501 \
-v ${MODELKIT_ASSETS_DIR}:/config \
-t tensorflow/serving \
--model_config_file=/config/config.config\
--rest_api_port=8501\
--port=8500
See also:
- the CLI documentation.
- the Tensorflow serving documentation
- the Tensorflow serving github
Internal TF serving settings¶
Several environment variables control how modelkit requests predictions from TF serving.
MODELKIT_TF_SERVING_ENABLE: Controls whether to use TF serving or use TF locally as a libMODELKIT_TF_SERVING_HOST: Host to connect to to request TF predictionsMODELKIT_TF_SERVING_PORT: Port to connect to to request TF predictionsMODELKIT_TF_SERVING_MODE: Can begrpc(withgrpc) orrest(withrequestsforTensorflowModel, or withaiohttpforAsyncTensorflowModel)MODELKIT_TF_SERVING_ATTEMPTS: number of attempts to wait for TF serving response
All of these parameters can be set programmatically (and passed to the ModelLibrary's settings):
lib_serving_grpc = ModelLibrary(
required_models=...,
settings=LibrarySettings(
tf_serving={
"enable": True,
"port": 8500,
"mode": "grpc",
"host": "localhost",
}
),
models=...,
)
Using TF Serving during tests¶
modelkit provides a fixture to run TF serving during testing:
@pytest.fixture(scope="session")
def tf_serving():
lib = ModelLibrary(models=..., settings={"lazy_loading": True})
yield tf_serving_fixture(request, lib, tf_version="2.8.0")
This will configure and run TF serving during the test session, provided docker is present.