Code Objects use numbered inputs to read data and numbered outputs to write data. The numbering corresponds to the order you set when creating the code object, see more information in Creating a New Code Object. You can easily see these input and output numbers in the dataset selection dropdowns when running your code (see image below):
To ensure seamless access, FCP stores all imported data in a consistent format, regardless of the original file names or folder structure. This allows any code object to read and write data reliably. The following sections provide examples of how to access your data for reading and writing, tailored to different code object types and input/output configurations.
GC, Interactive Containers, and Python Auto-containers
In these Code Object types, inputs are accessible in /input/
, and outputs are accessible in /output/
. Accessing datasets will depend on the configuration for each Input or Output, see below:
Default (mandatory) Input
df = pandas.read_csv("/input/0/dataset.csv")
# reads input 0 dataset
Optional Input
If input 1 was not provided, the /input/1
directory will be empty. If it is provided, read your dataset as described below:
df = pandas.read_csv("/input/1/dataset.csv")
# reads input 1 dataset
List Input
Datasets in a list input are accessible in subfolders (using their uid) within the main input folder. For example, the datasets in list input 0 would be each accessible in /input/0/{uid}
. If Input 2 were set as a list input, you could use the example code below in your Code Object to read all datasets provided in input 2:
import pandas as pd
from pathlib import Path
input_dirs = [p for p in Path("/input/2").iterdir() \
if p.is_dir() and (p / "dataset.csv").exists()]
input_dfs = [pd.read_csv(dir_path / "dataset.csv") for dir_path in input_dirs]
Default (mandatory) Output
df.to_csv("/output/0/dataset.csv", index=False)
# writes output 0 dataset
Optional Output
If output 1 is set as optional, you can leave /output/1
empty, or write to it as described below
df.to_csv("/output/1/dataset.csv", index=False)
# writes output 1 dataset
List Output
Use a folder within the output to separate your datasets. For example, you could use the code below in your Code Object to write in output 1 a list of 2 datasets:
os.mkdir("/output/1/part1")
df1.to_csv("/output/1/part1/dataset.csv")
os.mkdir("/output/1/part2")
df2.to_csv("/output/1/part2/dataset.csv")
Snippet Python Auto-container
The Python Snippet Code Object simplifies data handling by automatically loading your datasets into Pandas DataFrames and saving your output as FCP datasets directly from Pandas DataFrames. More detailed information is provided in the sections below.
Default (mandatory) Input
All of the input datasets will be loaded as pandas dataframes into an inputs variable, which will be a doubly nested list. The first level of nesting will be the different inputs of the Code Object, and the second level will be the list of datasets passed into each input.
df = inputs[0][0]
# access a dataset in mandatory input 0
[[df0], [df1], [df2]] = inputs
# access a dataset in mandatory inputs 0, 1, and 2
To maintain compatibility with older code, if you provide only one input dataset, it will automatically be loaded into the df
variable (as inputs[0][0]
).
Optional Input
[[], [], [df]] = inputs
# ignore inputs 0 and 1 if they were to be optional and not needed by your code
List Input
[[df0], [df1, df2, df3]] = inputs
# access a dataset in input 0, and a list with 3 datasets in input 2. Note that number of datasets in a list is arbitrary, so you might want to add code to handle for that case.
Default (mandatory) Output
All of the output datasets can be provided as pandas dataframes in an outputs variable, which is also a doubly nested list with the first level being the different outputs of the Code Object, and the second level being the list of datasets for each output.
outputs[1][0] = out_df
# save `out_df` into the Output 1 dataset
To maintain compatibility with older code, if you don't specify any outputs, the contents of the df
variable will be saved as a single output dataset (outputs[0][0]
).
Optional Output
outputs = [[], [df]]
# don't return output 1 if it were to be optional and not needed
List Output
outputs = [[df1], [df2, df3, df4, df5]]
# return 4 datasets in list output 2
NVFlare
NVFlare Clients only support a single list input (multiple or optional inputs are not supported). In contrast to GC list inputs which are accessible under /input/0/{uid}
, NVFlare Clients' data is accessible in /input/datasets/{uid}
. NVFlare Clients can't write any outputs (other than logs and TensorBoard logs).
Read Training Datasets (on FL Clients)
Read the dataset using their UUIDs as:
df = pandas.read_csv("/input/datasets/{uid}/dataset.csv")
See full NVFlare code examples in User Resources.
Write Trained Model Parameters (on FL server)
- Write checkpoints to files under the
/output/model_parameters/
directory (needs to be created first) - Write final/best model weights to a file
/output/model_parameters.*
, for example/output/model_parameters.pt
for PyTorch models.
Run Inference
Same as for GC and its variants (see above), but with a single input and a single output. Specifically:
- Input:
/input/dataset.csv
or/input/0/dataset.csv
- Output :
/output/dataset.csv
or/output/0/dataset.csv
Access the Model Weights
NVFlare model weights can be downloaded by users that have "Manage Code Objects" permission in the project. See the following article for detailed steps: FAQ: How Do I Retrieve Weight Files?