Accessing your datasets in Code Objects – Rhino Federated Computing

Code Objects use numbered inputs to read data and numbered outputs to write data. The numbering corresponds to the order you set when creating the code object, see more information in Creating a New Code Object. You can easily see these input and output numbers in the dataset selection dropdowns when running your code (see image below):

To ensure seamless access, FCP stores all imported data in a consistent format, regardless of the original file names or folder structure. This allows any code object to read and write data reliably. The following sections provide examples of how to access your data for reading and writing, tailored to different code object types and input/output configurations.

GC, Interactive Containers, and Python Auto-containers

In these Code Object types, inputs are accessible in /input/, and outputs are accessible in /output/. Accessing datasets will depend on the configuration for each Input or Output, see below:

Default (mandatory) Input

df = pandas.read_csv("/input/0/dataset.csv") # reads input 0 dataset

Optional Input

If input 1 was not provided, the /input/1 directory will be empty. If it is provided, read your dataset as described below:

df = pandas.read_csv("/input/1/dataset.csv") # reads input 1 dataset

List Input

Datasets in a list input are accessible in subfolders (using their uid) within the main input folder. For example, the datasets in list input 0 would be each accessible in /input/0/{uid}. If Input 2 were set as a list input, you could use the example code below in your Code Object to read all datasets provided in input 2:

import pandas as pd
from pathlib import Path

input_dirs = [p for p in Path("/input/2").iterdir() \
                if p.is_dir() and (p / "dataset.csv").exists()]
input_dfs = [pd.read_csv(dir_path / "dataset.csv") for dir_path in input_dirs]

Default (mandatory) Output

df.to_csv("/output/0/dataset.csv", index=False) # writes output 0 dataset

Optional Output

If output 1 is set as optional, you can leave /output/1 empty, or write to it as described below

df.to_csv("/output/1/dataset.csv", index=False) # writes output 1 dataset

List Output

Use a folder within the output to separate your datasets. For example, you could use the code below in your Code Object to write in output 1 a list of 2 datasets:

os.mkdir("/output/1/part1")
df1.to_csv("/output/1/part1/dataset.csv")

os.mkdir("/output/1/part2")
df2.to_csv("/output/1/part2/dataset.csv")

Snippet Python Auto-container

The Python Snippet Code Object simplifies data handling by automatically loading your datasets into Pandas DataFrames and saving your output as FCP datasets directly from Pandas DataFrames. More detailed information is provided in the sections below.

Default (mandatory) Input

All of the input datasets will be loaded as pandas dataframes into an inputs variable, which will be a doubly nested list. The first level of nesting will be the different inputs of the Code Object, and the second level will be the list of datasets passed into each input.

df = inputs[0][0] # access a dataset in mandatory input 0

[[df0], [df1], [df2]] = inputs # access a dataset in mandatory inputs 0, 1, and 2

To maintain compatibility with older code, if you provide only one input dataset, it will automatically be loaded into the df variable (as inputs[0][0]).

Optional Input

[[], [], [df]] = inputs # ignore inputs 0 and 1 if they were to be optional and not needed by your code

List Input

[[df0], [df1, df2, df3]] = inputs # access a dataset in input 0, and a list with 3 datasets in input 2. Note that number of datasets in a list is arbitrary, so you might want to add code to handle for that case.

Default (mandatory) Output

All of the output datasets can be provided as pandas dataframes in an outputs variable, which is also a doubly nested list with the first level being the different outputs of the Code Object, and the second level being the list of datasets for each output.

outputs[1][0] = out_df # save `out_df` into the Output 1 dataset

To maintain compatibility with older code, if you don't specify any outputs, the contents of the df variable will be saved as a single output dataset (outputs[0][0]).

Optional Output

outputs = [[], [df]] # don't return output 1 if it were to be optional and not needed

List Output

outputs = [[df1], [df2, df3, df4, df5]] # return 4 datasets in list output 2

NVFlare

NVFlare Clients only support a single list input (multiple or optional inputs are not supported). In contrast to GC list inputs which are accessible under /input/0/{uid}, NVFlare Clients' data is accessible in /input/datasets/{uid}. NVFlare Clients can't write any outputs (other than logs and TensorBoard logs).

Read Training Datasets (on FL Clients)

Read the dataset using their UUIDs as:

df = pandas.read_csv("/input/datasets/{uid}/dataset.csv")

See full NVFlare code examples in User Resources.

Write Trained Model Parameters (on FL server)

Write checkpoints to files under the /output/model_parameters/ directory (needs to be created first)
Write final/best model weights to a file /output/model_parameters.* , for example /output/model_parameters.pt for PyTorch models.

Run Inference

Same as for GC and its variants (see above), but with a single input and a single output. Specifically:

Input: /input/dataset.csv or /input/0/dataset.csv
Output : /output/dataset.csv or /output/0/dataset.csv

Access the Model Weights

NVFlare model weights can be downloaded by users that have "Manage Code Objects" permission in the project. See the following article for detailed steps: FAQ: How Do I Retrieve Weight Files?

Related to