Running NVFlare Code using the Rhino FCP UI
- Go to the main project's page and select your project.
- Select Code from the menu on the left to open the Code Objects page.
- Select the Create New Code Object button to open the Create New Code Object page.
Enter your Code Run Details
Fill in the following fields within the new Code Run modal window.
Note: These steps are the same for all NVIDIA FLARE Code Object versions 2.0, 2.2, 2.3, and 2.4.
- Training Datasets: One or many already imported Dataset to be used as input to your Code. If a Dataset happens to be a Collaborator's Dataset, the code will be run by your Collaborator's Rhino Client. In other words, the data never moves outside your collaborator's Rhino Client. If your Datasets all reside on your Rhino Client, make sure to check the below checkbox to simulate federated learning
- Simulated FL - One FL Client Per Training Dataset: Check this box to run simulated federated learning. When checked, your Rhino Client will spin up each Dataset as its own federation network, with each Dataset being its own "remote client," and perform federated training
- Validation Datasets (Optional): An optional one or many Dataset that the newly trained federated model will be validated against
- Output Dataset Name Suffix: A suffix that is appended to the name of each input validation Dataset. This name will serve as the output validation Dataset that will be created during your validation run and then will be re-imported back into the system to display the results of your training validation
-
Federated Server Config Override (Optional): Copy and paste the contents of your
config_fed_server.json
file here. This will overrideconfig_fed_server.json
in the config directory of the initial container image you pushed to your workspace's ECR. You can use this to dynamically set values for training, like hyper-parameters -
Federated Server Client Override (Optional): Copy and Paste the contents of your
config_fed_client.json
file here. This will overrideconfig_fed_client.json
in the config directory of the initial container image you pushed to your workspace's ECR. You can use this to dynamically set values for training, like hyper-parameters - Timeout (seconds): The number of seconds that must elapse before a Code run is killed. This is to avoid zombie tasks that run perpetually within a Rhino client. The default is 1 hour
When you have completed adding all your NVFlare Code Run details, click the Run Training button to run your Code.
Running NVFlare Code using the Rhino SDK
Prerequisites
Before starting this process, you should have already:
- Created a Project using the Rhino SDK or UI
- Created 1 or more Datasets using the Rhino SDK or UI
- Created a Code Object using the Rhino SDK or UI
Import your Python Dependencies
import rhino_health as rh from rhino_health.lib.endpoints.code.code_object_dataclass import ( CodeObject, ModelTrainInput ) from rhino_health.lib.endpoints.code_run.code_run_dataclass import ( CodeRunStatus ) import getpass
Note: Remember to change all lines with CHANGE_ME comments above them in all the blocks below!
Log into the Rhino SDK using your FCP Credentials
Your username will be the email address you log into the Rhino FCP platform with.
print("Logging In") # CHANGE_ME: MY_USERNAME my_username = "MY_USERNAME" session = rh.login(username=my_username, password=getpass.getpass()) print("Logged In")
Get Supporting FCP Information Needed to Run Your Code
At this point, you will need the name of your Project, any Dataset's name you would like to use as input and your previously created Code Object's name. You can also retrieve each object's UUID by following the instructions here: How do I retrieve a Project's, Collaborator's, Data Schema's, Dataset's, Code Object's, or Code Run's UID?
# CHANGE_ME: YOUR_FCP_PROJECT_NAME project = session.project.get_project_by_name('YOUR_FCP_PROJECT_NAME') # CHANGE_ME: DATASET_NAME_1 & DATASET_NAME_2 & Possibly Version Number too input_dataset1 = session.dataset.get_dataset_by_name("DATASET_NAME_1", project_uid=project.uid, version=1) input_dataset2 = session.dataset.get_dataset_by_name("DATASET_NAME_2", project_uid=project.uid, version=1) # Repeat the above for as many Datasets you would like to run as input to your Code # CHANGE_ME: DATASET_NAME_3 & DATASET_NAME_4 & Possibly Version Number too validation_dataset1 = session.dataset.get_dataset_by_name("DATASET_NAME_3", project_uid=project.uid, version=1) validation_dataset2 = session.dataset.get_dataset_by_name("DATASET_NAME_4", project_uid=project.uid, version=1) # Repeat the above for as many Datasets you would like to run as input to your Code # CHANGE_ME: CODE_OBJECT_NAME & Possibly Version Number too code_object = session.code.get_code_object_by_name("CODE_OBJECT_NAME:, project_uid=project.uid, version=1)
Training & Validating Your Model
To run your model you will need to supply the Code Object with the input Datasets you would like to train the model with, validation Datasets you would like to validate the newly trained model with, configurations for both the federated server and clients whether you are simulating federated learning, a timeout and a validation Dataset names suffix. The suffix will be appended to each validation Dataset name so you can view the results of the model validation once the output has been re-imported into the system as a Dataset.
code_object_params = ModelTrainInput( code_object_uid = code_object.uid, # CHANGE_ME: Add/Delete Dataset variables based on how many you want as input input_dataset_uids = [dataset1.uid, dataset2.uid], # CHANGE_ME: Add/Delete Dataset variables based on how many you want as input validation_dataset_uids = [dataset3.uid, dataset4.uid], # CHANGE_ME: OUTPUT_SUFFIX validation_dataset_inference_suffix = "OUTPUT_SUFFIX", # CHANGE_ME: Set to Trie to run simulated federated learning on the same Rhino Client by treating each Dataset as a site simulate_federated_learning: False, # CHANGE_ME: CONFIG_FED_SERVER - a string of valid JSON config_fed_server: "CONFIG_FED_SERVER", # CHANGE_ME: CONFIG_FED_CLIENT - a string of valid JSON config_fed_client: "CONFIG_FED_CLIENT", # CHANGE_ME: 900 - Set to whatever number of seconds you would like your code to timeout after timeout_seconds: 900, secrets_fed_server: "", # Optional - The secrets for the federated server secrets_fed_client: "" # Optional - The secrets for the federated client ) model_run = session.code.train_model(code_object_params) run_result = model_run.wait_for_completion() print("Finished running Training & Validation") print(f"Result status is '{run_result.status.value}', errors={run_result.result_info.get('errors') if run_result.result_info else None}") if run_result.status.value == CodeRunStatus.COMPLETED: print("Saving Model Parameters") run_result.save_model_params("/rhino_data/model_parameters.pt
Getting Help
If you have received an error or run into any issues throughout the process, please reach out to support@rhinohealth.com for more assistance.