Scenario
A scenario is the container for implementing a custom task that shall be solved by tfaip. The scenario glues together all components that must and can be implemented, see Figure 1.
The ScenarioBase
(green) creates the Model (blue) and Data (purple).
Training (red), Load And Validate (LAV, orange), and Prediction (yellow) access the scenario but are also instantiated by the ScenarioBase
to allow to implement custom overrides.
Scenario Directory
Each scenario should be set up in a directory that comprises the following files:
* scenario.py
which defines the basic scenario information
* model.py
which defines the model (neural net)
* data.py
which defines the input data pipeline of the model
* graphs.py
which defines the graph(s) of the model
* params.py
(optional but recommended) which stores the parameters of data, model, and the scenario. (Note, the data params, if implemented should always be implemented in a different file than data itself else there will occurr warnings in the data pipeline).
If tfaip
was set up by cloning the repository, the scenario class should be located at tfaip/scenario/XXX/scenario.py
where XXX
is the scenario name, e.g. atr
.
If installed via pypi
(i.e., pip install tfaip
), an arbitrary directory can be used for the location of a scenario.
Implementing a Scenario
To implement a ScenarioBase
, first setup a directory and the files. In scenario.py
(or params.py
) implement MyScenarioParams
:
@pai_dataclass
@dataclass
class MyScenarioParams(ScenarioBaseParams[MyDataParams, MyModelParams]):
pass
The MyDataParams
and MyModelParams
implement DataBaseParams
and ModelBaseParams
to define the parameters for the data and the model.
Next, implement the actual scenario:
class MyScenario(ScenarioBase[MyScenarioParams, MyTrainerPipelineParams]):
pass
The MyTrainerPipelineParams
define how the input data source for training and extend either TrainerPipelineParamsBase
or TrainerPipelineParams
.
The derived ListFileScenario
already implements the TrainerPipelineParams
by assuming a list file as input (see here).
Development
The Scenario
defines several Generics
that are used for instantiation of the actual classes of TModel
, TData
, TScenarioParams
, and the TTrainerPipelineParams
.
The ListFileScenario
replaces TTrainerPipelineParams
by ListFileTrainerPipelineParams
.
Additional Modules
In the following, additional methods/functionality of a scenario that can optionally be implemented is listed.
Evaluator
Quite often, defining metrics is difficult in pure Tensorflow-Operations while it is trivial using python and numpy.
Furthermore, some metrics should also first be computed after [post-processing](04_data.md)
For this purpose, tfaip offers the Evaluator
which is similar to a keras.Metric
however with the advantage that anything can be computed with most flexibility.
An Evaluator
can optionally be parametrized by EvaluatorParams
.
Similar to a keras.Metric
the Evaluator
requires to overwrite two functions, namely update_state
and result
.
update_state
receives a post-processed (un-batched) Sample and should update an internal state.
Finally, result
shall yield a dictionary of the metrics.
The Evaluator
follows the context
-design of Python: A metric is __enter__
-ed before the validation, and __exit__
-ed after receiving the result.
Use this mechanism to clear the internal state.
The Evaluator
is attached to a Scenario
using the evaluator_cls
-method.
Example
The full tutorial provides an example:
class MNISTEvaluator(Evaluator):
def __init__(self, params):
super(MNISTEvaluator, self).__init__(params)
self.true_count = 0
self.total_count = 0
def __enter__(self):
self.true_count = 0
self.total_count = 0
def update_state(self, sample: Sample):
self.total_count += 1
self.true_count += np.sum(sample.targets['gt'] == sample.outputs['class'])
def result(self) -> Dict[str, AnyNumpy]:
return {'eval_acc': self.true_count / self.total_count}
Add this in the Scenario
:
@classmethod
def evaluator_cls(cls):
return MNISTEvaluator
Tensorboard
During training, the computed metrics by result
will be written to the Tensorboard.
This also allows computing custom data (e.g., images or PR-curves) within the Evaluator
.
The model defines how to write arbitrary data to the Tensorboard.
ListFileScenario
The ListFileScenario
is an abstract ScenarioBase
that already provides some additional functionality if using list files as the input source.
A list file is a simple text file where each line is the path to a sample, e.g. an image:
path/to/image_001.png
path/to/image_002.png
path/to/image_003.png
...
The following shows how to extend a ListFileScenario
:
Assume the new scenario has the model Model
and corresponding params ModelParams
, Data
and corresponding DataParams
, and works with list files.
The new scenario, here called Scenario
requires to set up its params the corresponding implementation.
Note, that both classes are empty since in most cases no extra functionality is required.
@pai_dataclass
@dataclass
class ScenarioParams(ScenarioBaseParams[DataParams, ModelParams]):
pass
class Scenario(ListFileScenario(ScenarioParams)):
pass