Promoted articles
-
Introduction
The Federated Datasets (FDs) feature enhances the discoverability of data available via the FCP, enabling users to explore and share datasets in a more accessible manner. Whether you are a data owner or a data scientist, this guide will help you navigate the process of creating and exploring Federated Datasets.
Table of Contents
Creating a New Federated Dataset
If you are responsible for data strategy and want to make your organization’s data assets more accessible, follow these steps to create and publish a new Federated Dataset:
-
Create a Project:
- Log into Rhino FCP and navigate to your project.
- Import relevant data into the project.
-
Prepare your data for publishing:
- The Rhino FCP enables you to run custom code for data preprocessing (e.g. feature extraction, data cleaning, etc.). Read the Models section in the User Guides for more information.
-
Create Federated Dataset:
- Select the dataset you wish to publish and click "Create Federated Dataset."
-
Define Access and Privacy Settings:
- Public name for the FD.
- Set viewing privileges for data analytics. Selecting "Limited" will require users to submit an access request for your approval.
- Specify privacy settings (k-anonymization, differential privacy).
- Optionally provide an alternative contact email address. The default email address will be the one you use to log into the FCP.
- Define a "Datasheet" for the FD with the following sections:
- Motivation
- Composition
- Geography
- Data type
- Modality
- Disease Group
- Body Part
- Collection Process
- Preprocessing / cleaning / labeling
- Uses
- Distribution
-
Maintenance
Note: This information will help users find your dataset via keyword search.
- Select the dataset you wish to publish and click "Create Federated Dataset."
-
Publish your Federated Dataset:
- Click "Publish" to make the FD accessible.
- Find your published FD under the "Datasets" section in FCP.
-
Managing access:
- Access requests will be delivered by email, and will also be visible on the FCP under the specific datasets:
- To review requests and grant/deny access, navigate to the Access view inside the relevant dataset:
- Access requests will be delivered by email, and will also be visible on the FCP under the specific datasets:
Congratulations! You are now ready to create Federated Datasets on the Rhino Health FCP. If you have any further questions or need assistance, feel free to refer to our comprehensive documentation or reach out to our support team. Happy collaborating!
Exploring an Existing Federated Dataset
As an FCP user interested in exploring datasets for your project, follow these steps to explore an existing Federated Dataset:
- Log into the Rhino FCP
-
Navigate to the Datasets view by clicking the Datasets icon:
-
Search for Datasets:
- Use the "Search by keyword" feature to find relevant datasets.
-
View Datasheet:
- Click on a dataset to view the Datasheet and Data Schema defined by the publisher:
- Click on a dataset to view the Datasheet and Data Schema defined by the publisher:
-
Explore Analytics:
- The icon indicates that access to the data analytics must be approved by the publisher.
- Click on Analytics to fill and submit an access request. You will need to provide:
- Name of affiliation (Academic Institution, Company)
- Type of affiliation (Academic, Commercial)
- Purpose for querying the data - a brief description of your goals for the publisher's consideration.
- Once you submit the request, a notification will be sent to the publisher. Upon approval, you will be notified and can view Analytics for the dataset.
-
Using a Federated Dataset in your project:
- Reach out to the data publisher in order to gain full access to the data, enabling you to run your code using the data via the Rhino FCP. You can also do so by reaching out to us at support@rhinohealth.com and we will be happy to assist!
- Once agreements have been achieved by you and the data publisher, a project will be created for you with the publisher as a collaborator and the relevant dataset will be imported into that project.
Congratulations! You are now ready to explore Federated Datasets on the Rhino Health FCP. If you have any further questions or need assistance, feel free to refer to our comprehensive documentation or reach out to our support team. Happy collaborating!
-
Create a Project:
-
Creating a new Python Code Object using the Rhino FCP UI
Navigate to the Code (formerly Models) Page
Use the left-hand navigation menu to click the Code menu item
Add a New Code Object
Create a new model by clicking on the Create New Code Object button in the top-right corner
[Option 1] Python "Code Snippet": Use this option for basic Python scripts
Fill in the following fields within the Code Object creation modal:
- Code Object Type: Python Code
-
Details:
- Name: The name you would like to provide for your Code Object
- (Optional) Description: A description that will help you and others understand the nature of this Code Object
-
Data Schemas:
- Input Data Schema: The Data Schema for the Datasets this Code Object will accept as input. Your selection here will affect which Datasets can be selected as input when triggering a Code Run with this Code Object.
- Output Data Schema:The Data Schema for the Datasets that will be created as output when triggering a Code Run with this Code Object. You can also select the option to [Auto-generate Data Schema from Data]. For more information about Auto-generating Data Schemas, please refer to Auto-Generated Data Schemas.
-
Container details:
Here you can specify parameters that will define your Code Object container:- Python and CUDA Versions, or conversely
- Pre-existing Container Base Image (from Dockerhub). By default, the FCP will use the
python:3.9.7-slim-bullseye
docker image. This means your code will run within a Python 3.9 environment. You are free to select whichever docker base image makes the most sense for your use case (i.e. an image that supports GPU operations).
- Python and CUDA Versions, or conversely
-
Code: You can provide your code in one of the following formats:
-
Code snippet - Use this option for basic Python scripts.
Important information:-
Code Snippet automatically processes the input Dataset as a Pandas DataFrame, which is Ideal for simple data operations like feature extraction and normalization.
- The code automatically loads the content of your tabular data from
dataset.csv
into a Pandas DataFrame nameddf
. - The output Dataset is created from the same
df
object.
-
Code Snippet automatically processes the input Dataset as a Pandas DataFrame, which is Ideal for simple data operations like feature extraction and normalization.
- Standalone file - Use this option for more complex code, which can still be run as a single file. Unlike the Code Snippet, no additional "hidden" functionality is included here.
- Upload file(s) - Use this option when your code includes multiple files. After uploading your files, you will need to define the Entry Point to be used to run your code.
-
Code snippet - Use this option for basic Python scripts.
- Requirements: This field enables you to specify Python libraries that are required to run your code. This capability is enabled for all Python Code types except for Code Snippet which has two locked dependencies—Pandas and NumPy. Paste all Python dependencies that are necessary for running your previously specified Python code. These dependencies will automatically be installed by the Rhino FCP when it builds your container in the background. Alternatively, click Upload File to load a requirements file from your local computer. The Rhino FCP supports both Pip and Conda as package management frameworks.
-
Container Base Image: This field enables you to specify a base image for your container to be built off of. This capability is enabled for all Python Code types except for Code Snippet which has two locked dependencies—
python:3.9.7-slim-bullseye
Click the Create New Code button to finish.
Not sure which Python Code Object type to Use?
Here is a bit more information on when to use each type:
Code Snippet
Ideal for relatively straightforward scripts that primarily utilize basic Python packages, along with Pandas and Numpy. FCP automatically provides access to your cohort data as a Pandas Dataframe named df. Additionally, it generates the output cohort of your model from this same Dataframe. This setup facilitates elementary Pandas-based operations on your raw input data, such as feature extraction and value normalization.
An Example, consider this Python code snippet:
normalized_df = (df-df.mean())/df.std() df = normalized_df
Executing this code snippet yields an output cohort with z-normalized numerical features, all without requiring any additional code.
Standalone File
This option grants you the freedom to execute varied Python code with custom dependencies, extending beyond Pandas and Numpy. The Standalone File option lets you specify your code's prerequisites, which will be automatically installed within the image generated by FCP. Moreover, you're required to define the Container Base Image for your container, a crucial step if your code necessitates a specific environment – for instance, to support GPU operations.
Once you have provided all the necessary inputs, click "Create New Model" at the bottom of the dialog box. The FCP will now create a container image, in the background, based on your specifications, push it to your workgroup's ECR repository, and create a Model which you can then run.
Note: This can take significant time in some cases, depending on your requirements and selected container base image,
Creating a New Python Code Object Version using the Rhino FCP UI
To create a new version, make sure you are on the page containing the original object you would like to create a new version of. In the box where your original object is in the upper right-hand corner, you should see a + New Version button, like the one shown below:
Click that button and a new dialog should appear that will allow you to create a new version of your object.
Creating a New Version in Every Interface within the Rhino FCP UI
Below is a GIF showing all the interfaces within the Rhino FCP application where you can create a new Version. It displays the Schemas, Cohorts, and Models pages and the + New Version button you will press to create that new version of the schema, cohort, or model you are looking to create.
Creating a New Python Code Object using the Rhino SDK
Prerequisites
Before starting this process, you should have already:
- Created a Project using the Rhino SDK or UI
- Created an Input Data Schema using the Rhino SDK or UI
- [Optional] Created an Output Data Schema using the Rhino SDK or UI. If you choose not to create an output Data Schema, the system will auto-generate the Data Schema of the output for your Code
Import your Python Dependencies
import rhino_health as rh from rhino_health.lib.endpoints.code.code_objects_dataclass import ( CodeObject, CodeObjectCreateInput, CodeTypes, CodeRunType ) import getpass
Notes: Remember to change all lines with CHANGE_ME comments above them in all the blocks below!Log into the Rhino SDK using your FCP Credentials
Your username will be the email address you log into the Rhino FCP platform with.
print("Logging In") # CHANGE_ME: MY_USERNAME my_username = "MY_USERNAME" session = rh.login(username=my_username, password=getpass.getpass()) print("Logged In")
Get Supporting FCP Information Needed to Create Your Code Object
At this point, you will need the name of your project, your input Data Schema name, optionally your output Data Schema name, and your Python code. You can also retrieve each object's UUID by following the instructions here: How do I retrieve a UUID for a Project, Collaborator, Schema, Dataset, Code Object's, or Code Run?
# CHANGE_ME: YOUR_FCP_PROJECT_NAME project = session.project.get_project_by_name('YOUR_FCP_PROJECT_NAME') # CHANGE_ME: INPUT_SCHEMA_NAME & Possibly Version Number too input_schema = session.project.get_data_schema_by_name('INPUT_SCHEMA_NAME', project_uid=project.uid, version=1) # CHANGE_ME: OUTPUT_SCHEMA_NAME & Possibly Version Number too output_schema = session.project.get_data_schema_by_name('OUTPUT_SCHEMA_NAME', project_uid=project.uid, version=1) # CHANGE_ME: PYTHON_CODE python_code = """PYTHON_CODE"""
Create Your Python Code Object
During the creation of your
CodeObject
, you will provide it with various pieces of data similar to how a Code Object is created within the UI. You will give theCodeObject
a name, description,input_data_schema_uids
,output_data_schema_uids
,project_uid
,code_object_type
, and theconfig
where you will provide your Python code and other information dependent on your code run type.[Option 1]
CodeObject
with a default Code Run Typecode_object_params = CodeObjectCreateInput( # CHANGE_ME: CODE_OBJECT_NAME name = "CODE_OBJECT_NAME", # CHANGE_ME: CODE_OBJECT_DESCRIPTION description = "CODE_OBJECT_DESCRIPTION", input_data_schema_uids = [input_schema.uid],
# output_schema.uid or None to Infer Output Data Schema output_data_schema_uids = [output_schema.uid], project_uid = project.uid, code_object_type = CodeTypes.PYTHON_CODE, config = { "code_run_type": CodeRunType.DEFAULT, "python_code": python_code }
) code_object = session.code.create_code_object(code_object_params) print(f"Got Code Object '{code_object.name}' with UID {code_object.uid}")[Option 2]
CodeObject
with an Auto-Container Snippet Code Run Typecode_object_params = CodeObjectCreateInput( # CHANGE_ME: CODE_OBJECT_NAME name = "CODE_OBJECT_NAME", # CHANGE_ME: CODE_OBJECT_DESCRIPTION description = "CODE_OBJECT_DESCRIPTION", input_data_schema_uids = [input_schema.uid],
# output_schema.uid or None to Infer Output Data Schema output_data_schema_uids = [output_schema.uid], project_uid = project.uid, code_object_type = CodeTypes.PYTHON_CODE, config = { "code_run_type": CodeRunType.AUTO_CONTAINER_SNIPPET, "python_code": python_code }
) code_object = session.code.create_code_object(code_object_params) print(f"Got Code Object '{code_object.name}' with UID {code_object.uid}")[Option 3]
CodeObject
with an Auto-Container File Code Run Typecode_object_params = CodeObjectCreateInput( # CHANGE_ME: CODE_OBJECT_NAME name = "CODE_OBJECT_NAME", # CHANGE_ME: CODE_OBJECT_DESCRIPTION description = "CODE_OBJECT_DESCRIPTION", input_data_schema_uids = [input_schema.uid],
# output_schema.uid or None to Infer Output Data Schema output_data_schema_uids = [output_schema.uid], project_uid = project.uid, code_object_type = CodeTypes.PYTHON_CODE, config = { "code_run_type": CodeRunType.AUTO_CONTAINER_FILE, "python_code": python_code }
) code_object = session.code.create_code_object(code_object_params) print(f"Got Code Object '{code_object.name}' with UID {code_object.uid}")NOTE: If you have received an error or run into any issues throughout the process, please reach out to support@rhinohealth.com for more assistance.
-
Welcome to the Rhino Health Federated Computing Platform!
This document will guide you through the process of setting up a new project on the Rhino Health Federated Computing Platform (FCP). By following the step-by-step instructions in this document, you will learn how to:
- Set up a new project
- Prepare data, import it as a Dataset, and explore data metrics.
- Containerize your code and run it using our distributed computing platform.
- Produce visualizations of the results and create a report within the project.
We are excited to see what you will do with the platform, so let’s get started!
Step 1: Getting Started
Important Concepts
-
Container: A container is a lightweight and portable software package that encapsulates an application, its dependencies, and its runtime environment. Containers provide a consistent and isolated environment for applications to run, ensuring that they can run consistently across different computing environments. They offer a standardized way to package, deploy, and manage software applications, making it easier to build, ship, and scale applications across different platforms and infrastructures.
- Why is this important? In many cases, you will need to build a container to execute code on the FCP.
-
Container Image: A container image is a standalone, executable package that includes everything needed to run a piece of software within a container. It contains the application code, runtime environment, libraries, and dependencies required for the application to function properly. Container images are typically built from a base image and can be easily shared, replicated, and deployed across different containerization platforms, allowing for consistent and reproducible application deployments.
- Why is this important? Container images are what the FCP will execute in order to run your code on your FCP client or remote FCP clients.
-
Docker: Docker is an open-source platform that enables developers to automate the deployment and management of applications within containers. It provides a simple and efficient way to package applications and their dependencies into portable container images. Docker allows for easy and consistent deployment across different environments, ensuring that applications run reliably and consistently regardless of the underlying infrastructure. It has become a widely adopted tool in the software development industry, simplifying application deployment and promoting scalability and flexibility.
- Why is this important? The FCP uses Docker to help build your containers in your development environment and then run your containers locally, on your FCP client, or other remote FCP clients
-
Amazon Elastic Container Registry (ECR): ECR is a fully managed container registry service provided by Amazon Web Services (AWS). It allows users to store, manage, and deploy container images, making it easier to run containerized applications on AWS. It provides secure and scalable storage for Docker container images and supports private repositories, access control, and image lifecycle management. Developers and organizations can use Amazon ECR to build, store, and share container images to streamline their container-based workflows.
- Why is this important? The ECR is where you will push your container images to be run within the FCP.
-
Secure File Transfer Protocol (SFTP): SFTP is a network protocol that provides a secure and encrypted method for transferring files between remote systems. SFTP is commonly used as a secure alternative to FTP (File Transfer Protocol) and allows for secure file transfers over SSH (Secure Shell) connections. It ensures data confidentiality and integrity during file transfers, making it suitable for secure file exchange and remote file management.
- Why is this important? SFTP will be used to move data from your local computer to your FCP client to be imported by the system for exploitation by your containerized code.
-
Data Schema: A Data Schema is a structure or blueprint that defines the organization, format, and relationships of data within a database or data system. It defines the rules and constraints for how data is organized and represented, including the data types, fields, and their relationships. A Data Schema provides a standardized framework for data storage, retrieval, and manipulation, ensuring the consistency and integrity of the data.
- Why is this important? Data Schemas are used to describe the structure and format of the data you will import into your FCP client as Datasets.
Configuring your Environment
To configure your environment, please follow the steps on this page. Once you have completed those steps, please return here and continue the tutorial.
Tutorial 1: Project Resources
Click here to access the GitHub repository that holds a whole host of resources that will be helpful on your journey with the Rhino FCP. It also hosts all the external files you will need for this tutorial. Download or clone the repository on your local computer. For this tutorial, you will be using the resources inside the
user-resources/tutorials/tutorial_1/
folderThis folder includes:
-
containers/ - This folder contains all the containers that you will push to your ECR repository to be used within the FCP during this tutorial.
-
data-prep/ - This folder contains a Python script (
dataprep_gc.py
), and several additional files that are required to create the Docker container that will run the script on the FCP (to be used in Step 5: Running Python Code with Custom Dependencies via the FCP UI). - prediction-model/ - This folder contains code for a federated learning (FL) model. The model utilizes PyTorch and has been wrapped for NVFlare (Nvidia’s FL framework). Additionally, the folder contained the files required to create the Docker container to run the model training on the FCP (to be used in Step 6: Running Federated Training with NVFlare on the FCP UI).
-
data-prep/ - This folder contains a Python script (
-
data/ - This folder contains all the data that will be used in this tutorial. The folder is structured in a way that will become more familiar to you as you become more comfortable using various parts of the system, such as creating custom containers.
-
input/ - This folder is part of the system's structure as you utilize the FCP features mentioned above. It contains the input data for the tutorial.
- dataset.csv - This file defines the dataset you will use as input for this project. Each row in this file represents a case (meaning a study or patient). For each case, there is a DICOM series UID, which is similar to a file path, and the related metadata as described in the Data Schema.
-
dicom_data/ - This folder contains the DICOM imaging files, specifically chest X-ray (CXR) images, referenced in the
dataset.csv
file. When using files as input for your project, it is best practice to keep the files in a dedicated folder separate from thedataset.csv
file.
-
input/ - This folder is part of the system's structure as you utilize the FCP features mentioned above. It contains the input data for the tutorial.
-
notebooks/ - This folder contains the notebooks you will utilize within the tutorial.
-
Tutorial 1 - Results Analysis Notebook.ipynb - This Jupyter notebook is a step-by-step tutorial for producing code run visualizations using the Rhino Health Python SDK (to be used in Step 7: Producing Visualizations of Your Model Results with the Rhino SDK).
-
Another important directory is the
user-resources/rhino-utils/
folder. This folder contains several utility scripts the Rhino Health team has created to help simplify the process of pushing your containers to the platform, testing your containers locally, and simulating training and inference in a local simulated FL environment. A subset of these scripts will be used in both Step 5: Running Python Code with Custom Dependencies via the FCP UI and Step 6: Running Federated Training with NVFlare on the FCP UIStep 2: Preparing Your Data
For your data to be available to the FCP, you will first need to transfer the data located within the
data/
folder to be able to utilize it within the FCP UI. If you are new to using SFTP and would like to learn more, please refer to the following support article: How can I move data from my local environment to my Rhino Health client using SFTP?For this tutorial, you are going to move the following two resources from your local machine to your Rhino Health client:
user-resources/tutorials/tutorial_1/data/input/dataset.csv
user-resources/tutorials/tutorial_1/data/input/dicom_data/
Connecting to your SFTP Server for MacOS, Linux & Windows 10+
- Open a terminal or command prompt on your respective operating system, navigate to the folder
user-resources/tutorials/tutorial_1/data/input/
. -
Connect to your client via SFTP using the following command:
sftp rhinosftp@RHINO_CLIENT_IP_ADDRESS
- Note: Ensure to replace RHINO_CLIENT_IP_ADDRESS in the above command with the credentials found in your profile. If you need help finding your SFTP details, check out the following article: How can I find my SFTP Server Name/IP Address, SFTP Username, & SFTP Password?
-
Copy the
dataset.csv
anddicom_data/
files from your local machine into a new folder you create on your Rhino Health client by running the commands below:sftp> mkdir tutorial_1
sftp> cd tutorial_1
sftp> put dataset.csv
sftp> put -r dicom_data/
sftp> exit
Other Operating Systems
- If you have downloaded and configured your SFTP client, skip to the next step. Otherwise, please follow steps 1 and 2 outlined in the following support article under the heading Connecting to your Rhino Health Client via SFTP from Other Operating Systems
- Open your SFTP client and connect to your Rhino Client
-
Using the STFP client to upload your data:
- On the local machine file system panel, navigate to the folder
user-resources/tutorials/tutorial_1/data/input/
. - On your Rhino Health client file system panel, create a new folder called
tutorial_1
and navigate inside of it
- On the local machine file system panel, navigate to the folder
- Drag the
dataset.csv
anddicom_data/
files from your local machine file system panel to the Rhino Health client file system panel in order to upload them to your Rhino Health client.
Wait until your files have successfully been uploaded to the Rhino Health client before proceeding to the next step.
Step 3: Set Up Your Project on the FCP UI
Creating a New Project within the FCP UI
In this section of the tutorial, you will create a new Project within the Rhino FCP UI that will host your tutorial. If you are interested in learning more about Projects within the context of the Rhino FCP, please follow one of the links to the Projects section of our User Guides.
- Log in at https://dashboard.rhinohealth.com/login. If this is your first time logging in, you will be required to change your initial password and sign the EULA.
- Create a new project by clicking on the Add New Project button in the top-right corner.
-
Fill in the following fields within the new modal window:
- Name: Tutorial 1 - YOUR_NAME
- Description: This is my first project on the Rhino Health FCP
- Permissions Policy: Expand this section to explore the various configurable permission policies and personas that are available to you. For this tutorial, you can leave the default Permissions Policy.
- Click the Create Project button to create your project. Once clicked, you will be navigated back to the project screen, where you will see your newly created project.
Importing a New Dataset within the FCP UI
In this section of the tutorial, you will import a new Dataset within your Project that will contain the data you will use with other aspects of your Project. If you are interested in learning more about Datasets within the context of the Rhino FCP, please follow one of the links to the Datasets section of our User Guides.
- Click the Datasets menu item within the left-hand navigation menu.
- Import a new Dataset by clicking on the Import New Dataset button in the top-right corner.
-
Fill in the following fields within the new modal window:
- Name: Site 1 Dataset
- Description: Pneumonia Site 1 Dataset
- Select Workgroup: Do not modify the default option, Current workgroup, is correct for this tutorial
- Data Schema: [Auto-generate Data Schema from Data] - Do not modify
- Tabular Data File Path: /rhino_data/tutorial_1/dataset.csv
-
DICOM Data Path: /rhino_data/tutorial_1/dicom_data
- Import method: Do not modify. The default option, Filesystem, is correct for this tutorial
- File Data Path: Do not modify. You have no file data to import in this tutorial since the Data Schema and accompanying dataset.csv only define DICOM data. So you only need to fill in the DICOM Data Path.
- Is Data Deidentified?: Yes
- Finally, click the Import New Dataset button to import your new dataset
-
Within the Datasets page, you should now have a new dataset object defined with the message "Importing New Dataset". Once the Dataset has been imported completely, your Datasets page should look similar to the screenshot shown below:
Step 4: Running Simple Python Code via the FCP UI
The FCP provides an easy way to perform simple data operations that require only basic Python code and standard libraries such as NumPy and Pandas. In this step, you will use this functionality to produce a new Dataset with a new derived feature (or Schema Field in the FCP) from your previously imported Dataset.
Creating a new Data Schema with the New Field's Definition in it
- Click the Data Schemas menu item within the left-hand navigation menu.
- Place your mouse anywhere within the bounds of the white box surrounding the Pneumonia Schema you created within the last step; this should reveal a new button in the top-right corner labeled + New Version.
- Click the + New Version button to create a new version of the Data Schema in which you will define the new field you would like to derive.
- Within the new dialog, ensure that the radio button Edit Latest Schema is checked. Leaving all other fields untouched.
- Click the Create New Schema Version button; this will take you to the FCP's Data Schema editing tool. Here you should see a tabular format of the Data Schema that you defined in the previous step.
- To add the new field's definition, click the + Add Field button next to the __Notes__ Schema field column.
-
Fill in the following inputs within the new Schema Field column:
- Data Schema Field: BMI
- Identifier: Leave blank
- Description: Weight / Height**2
- Role: Do not modify. The default option, input, is correct for this tutorial
- Type: Float
- Type Parameters: Leave blank
- Units: Leave Blank
- May Contain PHI: No
- Permissions: Do not modify. The default option, Default, is correct for this tutorial
-
Once you have completed entering all your details for the new BMI field, click Save in the top right corner. You should now have two versions of your Data Schema, and your Data Schemas page should look similar to the screenshot below:
Creating a New Python Code Object within the FCP UI
In this section of the tutorial, you will create a new Python Code Object within your Project to process the Dataset you imported. If you are interested in learning more about Python Code Objects within the context of the Rhino FCP, please follow one of the links to the Python Code sub-section in the Code section of our User Guides.
- Click the Code menu item within the left-hand navigation menu.
- Create a new Code Object by clicking on the Create New Code button in the top-left corner.
-
Fill in the following fields within the new modal window:
- Name: My First Code
- Description: Python code for computing BMI
- Type: Python Code
- Input Data Schema: Site 1 Dataset schema (V.0)
- Output Data Schema: Site 1 Dataset schema (V.1)
- Keep Code Snippet selected
-
Python Code:
df['BMI'] = df.Weight / (df.Height ** 2)
- Requirements File: This input field will be disabled due to the Code Snippet checkbox being selected. For now, keep it that way, but know that selecting a different option enables this input field.
- Container Base Image: The input field will be disabled due to the Code Snippet checkbox being selected. For now, keep it that way, but know that selecting a different option enables this input field.
-
Next, click the Create New Code button to create your new Code Object within your project. Once the Code Object creation is complete, your Code page should now look similar to the screenshot below:
Running your New Python Code within the FCP UI
In this section of the tutorial, you will run the newly created Python Code which will produce a Code Run after running. If you are interested in learning more about Code Runs within the context of the Rhino FCP, please follow one of the links to the Code Runs section of our User Guides.
- Navigate to the Code Object you created in the last step, and click the Run button in the row corresponding to Version 0 of My First Code
-
Fill in the following fields within the new modal window:
- Input Datasets: Site 1 Dataset (V=0)
- Output Dataset Name Suffix: _addBMI
- Timeout in Seconds: Do not modify. The default value will work fine here because the Code runs quickly
- Run Parameters: Leave blank
- Once you have completed entering all the details of your Code Run, click the Run button to send your code to be run on your Rhino Health client.
- To monitor your Code Run's progress, click the Code Runs menu item within the left-hand navigation menu.
- On the Code Runs page, you should now see a new Code Run entitled My First Code with a single row in it showing a status of "Running". Once your Code Run has completed running, you should see a green checkmark and the words "Completed: Success" next to it. Your Code Runs page should look similar to the screenshot shown below:
-
To review the output of your Code Run, click on the link entitled 1 Dataset under the Output Datasets heading within the first row of the My First Code card. This link will take you back to the Datasets page, but now on the Analytics tab. There you will see summary statistics about your newly created Dataset, Site 1 Dataset_addBMI (v0), produced by the latest run of My First Code.
-
A few things of note on the Dataset analytics page:
- The new field, BMI, is present and has been calculated using the code you provided during the creation of My First Code.
- The table under the Data Completeness heading shows that the new field, BMI, has a comparatively low data completeness score, only 64%. This is because it was created using other fields with missing values.
-
A few things of note on the Dataset analytics page:
Step 5: Running Python Code with Custom Dependencies via the FCP UI
After discovering new insights about the output Dataset in Step 4: Running Simple Python code via the FCP UI, you would like to improve your data completeness metric by adding a preliminary data imputation step. Additionally, you would like to convert the DICOM CXR images into JPEG versions. As data preparation becomes more and more complex it is often better to collect all the steps in a single script that can be run on the platform, rather than perform a series of smaller steps.
We have provided such a script in the
user-resources/tutorials/tutorial_1/containers/data-prep
folder nameddataprep_gc.py
This script has a few additional dependencies which are included in the requirements.txt file, go ahead and take a look now.In order to be able to run this code on the FCP, you will need to include some additional dependencies in your Python Code by following the instructions below:
Create a New Data Schema Version that Includes your New Output Schema Fields
- Return to the FCP to create another new version of the original Pneumonia Schema. This new Data Schema will serve as the output of your newly created container. You will do this by repeating Steps 1-6 in Step 4: Running Simple Python Code via the FCP UI.
-
Fill in the following inputs within the new Schema Field column:
- Data Schema Field: JPG file
- Identifier: Leave blank
- Description: JPG representation of the input DICOM image
- Role: Do not modify. The default option, Input, is correct for this tutorial
- Type: Filename
- Type Parameters: Leave blank
- Units: Leave Blank
- May Contain PHI: No
- Permissions: Do not modify. The default option, Default, is correct for this tutorial
-
Once you have completed entering all your details for the new JPG file field, click Save in the top right corner. You should now have three versions of your Data Schema, and your Data Schemas page should look similar to the screenshot below:
Creating a New Python Code Object with Custom Dependencies
- Click the Code menu item within the left-hand navigation menu.
- Create a new Code Object by clicking on the Create New Code button in the top left corner.
-
Fill in the following fields within the new modal window:
- Name: Data Prep
- Description: Data Imputation, BMI Calculation, & DICOM file conversion to JPG
- Type: Python Code
- Input Data Schema: Pneumonia Schema (v0)
- Output Data Schema: Pneumonia Schema (v2)
- Select the Standalone file checkbox
-
Python Code: Paste the entire contents of the
dataprep_gc.py
script into the input field -
Requirements File: Paste the entire contents of the
requirements.txt
file into the input field -
Container Base Image: Do not modify. The default option,
python:3.9.7-slim-bullseye
, is correct for this tutorial
-
Next, click the Create New Code button to create your new Code Object within your project. Once the Code Object creation is complete, your Code page should now look similar to the screenshot below:
Running your New Python Code Object to Prepare the Data
- Navigate to the Code Object you created in the last step, and click the Run button in the row corresponding to Version 0 of Data Prep.
-
Fill in the following fields within the new modal window:
- Input Datasets: Site 1 Dataset (V=0)
- Output Dataset Name Suffix: _complete
- Timeout in Seconds: Do not modify. The default value will work fine here because the Code runs quickly
- Run Parameters: Leave blank
- Once you have completed entering all the details of your Code Run, click the Run button to send your code to be run on your Rhino Health client.
- To monitor your Code Run's progress, click the Code Runs menu item within the left-hand navigation menu.
- On the Code Runs page, you should now see a new Code Run entitled Data Prep with a single row in it showing a status of "Running". Once your Code Run has completed running, you should see a green checkmark and the words "Completed: Success" next to it.
- To review the output of your Code Run, click the Datasets menu item within the left-hand navigation menu.
-
You should now have a new Dataset listed on the page with the name Site 1 Dataset_complete. Your Datasets page should now look similar to the screenshot below:
-
Click on the row representing Version 0 within the Site 1 Dataset_complete Dataset card to view the new Dataset analytics after we have successfully run our Data Prep Python Code. Your Dataset Analytics tab within the Datasets page should now look similar to the screenshot below:
-
A few things of note on the Analytics tab within the Datasets page:
- The Data Prep Code has added 2 new fields to the Dataset - BMI, and JPG file.
- The Code performed data imputation so your Data completeness metrics for all fields should now be 100%.
-
If you navigate to the Data tab, you can directly access the tabular data within the Dataset. Your Dataset Data tab within the Datasets page should now look similar to the screenshot below:
- Permission Note: You are permitted to view this data in this way because you have imported this Dataset from your workgroup’s client. When working with Datasets from other workgroups, the other workgroups will need to grant you explicit permission to be able to access their data at this level. For more information, please refer to Secure Access Lists within the User Guides section.
-
A few things of note on the Analytics tab within the Datasets page:
Step 6: Running Federated Training with NVFlare on the FCP UI
Create a New Data Schema for the Output of your NVFlare Code
- Follow the steps outlined in Creating a New Data Schema within the FCP UI within Step 3: Set Up Your Project on the FCP UI to create a new Data Schema called Pneumonia Results Schema. For this step, you can elect to Create Blank Data Schema with the details outlined below or simply Upload from File with the provided
user-resources/tutorials/tutorial_1/schemas/Pneumonia Output Schema.csv
.
Field Name
SeriesUID
Height
Weight
Gender
Pneumonia
BMI
JPG file
Model Score
Identifier
Description
DICOM Series UID of the CXR
Patient Height
Patient Weight
Patient Gender
Whether or not the patient had pneumonia
Patient BMI
CXR JPG image
Model score on validation set
Role
Input
Input
Input
Metadata
Output
Input
Input
Output
Type
DicomSeriesUID
ConstrainedFloat
PositiveFloat
Enum
Boolean
NonNegativeFloat
Filename
Float
Type Parameters
{"gt": 0, "le": 2.3}
{"choices": ["M", "F"]}
Units
m
kg
kg/m^2
May Contain PHI?
FALSE
FALSE
FALSE
FALSE
FALSE
FALSE
FALSE
FALSE
Permissions
Default
Default
Default
Default
Default
Default
Default
Default
Building and Pushing your NVFlare Container to Your Workgroup's ECR Repository
- Using a terminal or command prompt, navigate to the
user-resources/rhino-utils
folder. -
Copy the
docker-push.sh
script from theuser-resources/rhino-utils
touser-resources/tutorials/tutorial_1/containers/prediction-model
codecp docker-push.sh ..
/tutorials/tutorial_1/containers/prediction-model
-
You will need to make the shell scripts executable, you can do that by running the following command:
chmod +x docker-push.sh
-
Navigate to the
user-resources/tutorials/tutorial_1/containers/prediction-model
folder. -
Using your terminal or command prompt, navigate to your
user-resources/tutorials/tutorial_1/containers/prediction-model
folder. This folder contains code for a PyTorch classification model which has been adapted to the Nvidia FLARE framework for federated learning. Learn more about Nvidia FLARE here. -
We will run the
docker-push.sh
script to build and push the container defined in theprediction-model
folder to your Workgroup's ECR repository. Run the following command to complete this action:./docker-push.sh WORKGROUP_ECR_REPOSITORY_NAME tutorial-1-prediction-model
- Note: Remember to replace WORKGROUP_ECR_REPOSITORY_NAME with your workgroup ECR repository name from your FCP Profile Page. If you need help finding your workgroup ECR repository name check out the following article: How can I find my ECR Credentials & Workgroup Name?
- Note: This step may take several minutes to complete as the container image is built
Creating a New NVFlare Code Object for the Newly Pushed Container
In this section of the tutorial, you will create a new NVFlare Code Object within your Project to train and validate a model to predict the likelihood of pneumonia in the Dataset you imported and then preprocessed. If you are interested in learning more about NVFlare Code Objects within the context of the Rhino FCP, please follow one of the links to the NVFlare Code sub-section with the Code section of our User Guides.
- Click the Code menu item within the left-hand navigation menu.
- Create a new code model by clicking on the Create New Code button in the top left corner.
-
Fill in the following fields within the new modal window:
- Name: Prediction Model
- Description: PyTorch Classification Model for Predicting Pneumonia within Patients
- Type: NVIDIA FLARE 2.0
- Input Data Schema: Pneumonia Schema (v2)
- Output Data Schema: Pneumonia Results Schema (v0)
-
Container:
tutorial-1-prediction-model
- The container should be listed within the drop-down under the section titled Workgroup Images- If your container is not present, check that your
docker-push.sh
script from Building and Pushing your NVFlare Container to Your Workgroup's ECR Repository has been completed and executed without any errors. If you need support, feel free to reach out to support@rhinohealth.com.
- If your container is not present, check that your
- Next, click the Create New Code button to create your new Code Object within your project.
Running your New NVFlare Code Object to Predict Pneumonia
- Navigate to the Code Object you created in the last step, and click the Run button in the row corresponding to Version 0 of the Prediction Model.
-
Fill in the following fields within the new modal window:
- Training Datasets: Site 1 Dataset_complete (V=0)
-
Validation Datasets: Site 1 Dataset_complete(V=0)
- Note: In a real-world scenario you would not select the same Datasets for both training and validation, rather you would split the Datasets into training and validation Datasets first
- Output Dataset Name Suffix: _results
-
Federated Server Config: Do not modify. You will not need to override the server config provided within your
tutorial-1-prediction-model
container image -
Federated Client Config: Do not modify. You will not need to override the client config provided within your
tutorial-1-prediction-model
container images - Timeout in Seconds: Do not modify. The default value will work fine here because the Code runs within the specified time limit
- Run Parameters: Leave blank
-
Once you have completed entering all the details of your Code Run, click the Run button to send your code to be run on your Rhino Health client.
- Training should take between 10-30 minutes to complete (depending on your Rhino Health client’s hardware) - The training is only performing a single epoch, so model performance will likely be less than stellar
- Once training has been completed, the system will automatically run inference with your validation Dataset and your newly trained model to produce a new output Dataset with the results of the validation.
- After the training and validation steps have successfully completed, you should have a new Dataset within the Datasets page entitled Site 1 Dataset_complete_results. Your Datasets page should now look similar to the screenshot below:
Step 7: Producing Visualizations of Your Code Run with the Rhino SDK
In this section of the tutorial, you will create a report to visualize the output of your NVFlare Code's Inference results. If you are interested in learning more about the Rhino SDK, please follow one of the links to the Rhino SDK section of our User Guides.
- If you do not have Python installed with your development environment, please download and install Python here: Python
- If you do not have Jupyter Notebook installed within your development environment, please follow the steps outlined here: Jupyter Notebook Installation
-
Using your terminal or command prompt, navigate to your
user-resources/tutorials/tutorial_1/notebooks
folder. - Run the following command to start the Jupyter Notebook,
Tutorial 1 - Results Analysis Notebook.ipynb
:jupyter notebook
"Tutorial 1 - Results Analysis Notebook.ipynb"
- Follow the step-by-step tutorial for producing Code Run visualizations using the Rhino SDK by running each of the cells contained within the notebook.
- Once you have completed running the entire notebook, switch back over to the FCP. Navigate to the Code Runs page, clicking the first row labeled V, for validation, within the Prediction Model card.
- You will be taken to a new page with two tabs, Report, and Logs. Through the previous steps, we have populated the Report tab. You should now see various charts and your Reports tab within the Code Runs page should now look similar to the screenshot below:
Congratulations on completing your first tutorial on the Rhino Health FCP!
You should now have a good understanding of:
- How to access the FCP and locate your credentials from your profile page
- How to create a new project
- What a Data Schema is and how to create a new Data Schema
- How to move data from your local development environment to your FCP client and then import it as a new Dataset
- How to create several different Code Objects (Python Code, Generalized Compute & NVFlare) and run them
- How to use the Rhino SDK to create custom reports for visualizing the output of Code Runs
Things to try next
-
Adjust the code to produce different results:
- Change the units for Height and/or Weight
-
Extract new fields from the data
-
Check out the tabular viewer’s advanced features:
-
The tabular viewer (accessible by clicking on a Dataset you have access to, then clicking on the Data tab) includes a few advanced features, such as a fully functional DICOM viewer with annotation capabilities, an auto-generated editable __Notes__ column you can use to append free-text notes to each case, and a viewer for standard image formats such as .jpg and .png.
You can read about these features in the Rhino Health Federated Computing Platform User Manual and test them in your Hello World project.
-
The tabular viewer (accessible by clicking on a Dataset you have access to, then clicking on the Data tab) includes a few advanced features, such as a fully functional DICOM viewer with annotation capabilities, an auto-generated editable __Notes__ column you can use to append free-text notes to each case, and a viewer for standard image formats such as .jpg and .png.
-
Try to break things, for example:
- Mismatches between the Data Schema and the data in the
dataset.csv
file will produce validation errors when attempting to import the Dataset. We recommend you try altering the Dataset and Data Schema to see what happens. - Try adding faulty code to the Python Code Object and running it. This will help you learn how the FCP produces different error messages and how you can use the platform to debug your code.
- Mismatches between the Data Schema and the data in the
Thanks again for investing your time in learning how to use the FCP, we can't wait to see what you will do with it! If you need support at any time, feel free to contact support@rhinohealth.com.
Continue to Tutorial #2 →
-
This article explains how to import and export datasets on the Rhino Health Platform to and from the following cloud storage platforms:
- Amazon Web Services (AWS) S3
- Google Cloud Platform's (GCP) Cloud Storage (CS)
- Server Message Block (SMB) network file sharing protocol
Request Mounting Storage to your Rhino Client
Contact Rhino's support team to set up your S3 bucket, GCP CS storage, or SMB shared folder integration and provide the following details.
-
File storage type:
- `s3` for AWS S3 bucket
- `gcs` for Google Cloud Storage
- `smb` for SMB shared folder
- Your file storage path: The path in your cloud storage to make accessible to the Rhino Client.
-
`rhino_data` subfolder: A subfolder to access your cloud data within the Rhino Client such as `my_cloud_storage_folder`. The provided bucket will be mapped to the following path in `rhino_data`:
-
/rhino_data/external/`file storage type`/my_cloud_storage_folder
-
- Is read only: If `True`, the Rhino Client can only import data. If `False`, the Rhino Client can also export datasets to your cloud storage.
- Credentials: The certificate/credentials required to access the relevant data in your cloud storage. See how to provide such credentials in a secure manner with AWS, and GCP.
Import Datasets
Create a new dataset and point to your cloud storage data using the following paths:
For AWS S3:
/rhino_data/external/s3/my_cloud_storage_folder/YOUR_DATA_PATH_UNDER_BUCKET
For GCP CS:
/rhino_data/external/gcs/my_cloud_storage_folder/YOUR_DATA_PATH_UNDER_BUCKET
For SMB:
/rhino_data/external/smb/my_cloud_storage_folder/YOUR_DATA_PATH_NETWORK_SHARE
Note: The integration is available at the workgroup level. Each workgroup can set up their own buckets or network share. Those buckets or network shares are not accessible to other workgroups.
Example: Importing a file from AWS S3
Suppose you want to import file that is located in a S3 bucket under the path `my_bucket/some_folder/some_subfolder/dataset.csv`, and the details supplied to Rhino support are:
- File storage path: `my_bucket`
- File storage type: `S3`
- `rhino_data` subfolder: `my_cloud_storage_folder`
- Credentials: `{"aws_access_key_id": <key>, "aws_secret_access_key": <key>}`
When importing this file as a dataset on FCP, the path for this file would be:
/rhino_data/external/s3/my_cloud_storage_folder/some_folder/some_subfolder/dataset.csv
Export Datasets
To export an existing dataset, follow the steps described in Exporting a Dataset. The Rhino integration with your network storage should be configured as `Is read only` = `False` to allow your Rhino Client to save the exported files in your network storage. (If you are not sure if `Is read only` = `False` in your configuration, please contact the Rhino support team.)
Datasets will be exported to the file storage path set in the integration. For the AWS import example above, datasets would exported to your AWS S3 bucket named `my_bucket`.
-
What's New and Release NotesSee all articles
-
Quick Start with Rhino Health FCPSee all articles
-
Introduction to Rhino HealthSee all articles
-
Setting Up Projects and CollaborationSee all articles
-
Importing and Harmonizing Data at the EdgeSee all articles
-
Creating and Running CodeSee all articles
-
Using the Rhino SDKSee all articles
-
Securing and Governing Your Data and ComputationsSee all articles
-
Learning How to Use FCP with Rhino TutorialsSee all articles
-
FAQs (Frequently Asked Questions)See all articles