To help you improve a machine learning model, you can visualize and monitor performance metrics like accuracy and loss on various datasets, such as the training and test sets. Rhino FCP integrates Tensorboard, which is an industry-standard tool for visualization. Tensorboard lets you compare different model versions to select the best-performing candidate, debug poor performance by identifying potential hyperparameter issues, and monitor the training process in real-time to decide when to halt a training run, such as when the model fails to converge over a long period.
With Tensorboard you can:
- Track and visualize metrics such as loss and accuracy
- View model graphs
- Examine histograms of weights, biases, or other tensors
- Display images, text, and audio data
You can securely visualize data from a single Rhino FCP client, from multiple clients, and from the server. You can also view aggregated data from multiple training runs.
Prerequisites
You will need to the following view Tensorboard on FCP:
- Access to the workgroup
- Access to the project
- Permission to view the analytics of code runs
Step 1: Specify What Data To Display in Tensorboard
To specify what you want to appear in Tensorboard, you'll need to add a few lines to your code. Typically, we recommend that you import SummaryWriter from the tensorboardX library because it is lightweight, provides only tools for writing TensorBoard logs, and does not require that you install TensorFlow.
NOTE: If you are coding NVFlare, it has built-in support for Tensorboard logs (as well as MLFlow and Weights and Biases), so you will not need to complete these steps. Since NVFlare's implementation sends logs to an external server, and Rhino FCP considers the data in these logs to be potentially sensitive, so it is stored locally and is protected by our Secure Access features. For an example of how to specify data to display using NVFlare, see our example in the user resources in the Rhino FCP NVFlare (MIMIC CXR NVFLARE for versions 2.3, 2.4, and 2.5).
For an example of how to use SummaryWriter from tensorboardX in your library, see the following.
- Define a writer that points to the folder where you want the logs to be written, like this.
from tensorboardX import SummaryWriter
def __init__():
#add this additional line
self.tb_writer = SummaryWriter("/tb-logs")- Add the items you want to see on TensorBoard using the summary writer. In the following example, the items are added using the writer's add_scalar function.
def model_training_function():
#add logging during training for each epoch, and optionally flush at the end of training
for epoch in range(self._epochs):
self.tb_writer.add_scalar("local_train_loss_per_epoch", running_loss / images_count,
self.epoch_global)
self.tb_writer.flush()
def model_validation_function():
#add logging for specific model validation metric, e.g.,
self.tb_writer.add_scalar("global_model_test_loss_per_round", average_loss, current_round)- Finally, run the code, using these instructions: Running a Code Object.
For a more comprehensive guide, see this Tensorboardx Tutorial.
Step 2: Use Tensorboard to View Data from Code Runs
After you've run the code successfully, complete the following steps to view the results on Tensorboard.
- Create a code object and run the code. For more information on how to do this, see Creating a New Code Object.
- After the code run is completed, select Code Runs from the menu on the left. See Running a Code Object for more information.
- Select the Open Tensorboard button on the upper right corner of the page. The Tensorboard page opens. It might take a few seconds for the dashboard to appear.
- There are two different dashboards that you can choose.
- Time Series - Displays metrics in each iteration. Choose Time Series if you want to analyze and compare multiple time series data.
- Scalars - Displays information at the end of each epoch. Choose Scalars if you need to monitor individual scalar metrics.
- Select the code runs you want to review from the options on the left. The top-most option allows you to select all of them at once.
- Choose whether you want to see scalars, images, or histograms (or some combination of the three.) If you want to see all of them, select All.
- For more information on individual options, see the Tensorboard Guide.
- Note that the connection to the Rhino FCP Tensorboard is available for two hours after the last time that you interact with it.