Debugging
Sooner or later, there will be a point during development where debugging is essential. Since tfaip uses Tensorflow 2 which allows for eager execution, debugging is drastically simplified as each computation of the graph can be manually traced by a debugger. Unfortunately, there are some rare circumstances that lead to code that runs in eager but not in graph mode. Usually, the reason is that operations are used that are only allowed in eager-mode.
This file shows how to efficiently debug the data-pipeline, the model, its graph, loss, and metrics, and how to profile using the Tensorboard which helps to detect bottlenecks.
Data-Pipeline
Debugging of the data-pipeline is usually the first step since this helps to verify the data integrity.
While tf.data.Datasets can not be debugged easily, the tfaip pipeline based on DataProcessors can easily be debugged.
Do the following:
Disable all multiprocessing (setting
run_parallel=Falsein thepre_procandpost_procparameters of theDataParams).Create a
DataPipeline.Call
with pipeline.generate_input_samples() as samples:which will return a Generator of samples which are the (un-batched) input of thetf.data.Dataset.Optionally call
with pipeline.generate_input_batches() as batchesto access the outputs of thetf.data.Dataset.as_numpy_iterator(). Note that this makes debugging of the pipeline impossible thistf.data.Datasetis accessed. Use this only if you want to verify the batched and padded outputs of the dataset not to debug the data-pipeline itself.
Here is an example for the Tutorial:
class TestTutorialData(unittest.TestCase):
def test_data_loading(self):
trainer_params = TutorialScenario.default_trainer_params()
data = TutorialData(trainer_params.scenario.data)
with trainer_params.gen.train_data(data).generate_input_samples(auto_repeat) as samples:
for sample in samples:
print(sample) # un-batched, but can be debugged
# or
with trainer_params.gen.train_data(data).generate_input_batches(auto_repeat) as batches:
for batch in batches:
print(batch) # batched and prepared (inputs, targets) tuple, that can not be debugged. Use prints.
Note that generate_input_samples() will run infinitely for the train_data which is why auto_repeat=False is set to only generate an epoch of data.
Model
To allow for debugging of the model, enable the eager mode (pass --trainer.force_eager True during training, or --lav.run_eagerly True during LAV)).
Now, the full computations of the graph can be followed.
Graph
During training, additionally pass --scenario.debug_graph_construction.
This will once evaluate the (prediction) graph and compute the loss and metrics on real data.
It is recommended to use this flag if any error occurs in the graph during construction.
Loss
Losses can be fully debugged in eager mode.
Metric
Metrics of the model can be fully debugged in eager mode. Also metrics defined in the Evaluator can always be debugged since they run in pure Python.
Profiling
Profiling is useful to detect bottlenecks in a scenario that slow down training.
Pass the --trainer.profile True flag to write the full profile of the training (graph mode required) to the Tensorboard.
Also have a look at the official documentation.
Optimizing the input pipeline
In many cases, the input pipeline to too slow to generate samples for the model. However, there are several parameters for tweaking:
First, enable parallel processing of the pipeline by setting
run_paralleltoTrue.Increase the number of threads for the pipeline
--train.num_processes 16.Change the default behaviour for prefetching
--train.prefech 128.Verify that the size of a sample is as small as possible. Python required to pickle the data for parallelization which can drastically slow down the queue-speed. We observed crucial problems if the input data size is in the order of more than 50 MB. Consider changing the data type (e.g.
uint8instead ofint32)
Optimizing the model
The standard way to increase the throughput of a model is to increase its batch size if the memory of a GPU is not exceeded: --train.batch_size 32.