This document will guide you through another step-by-step usage example on the Rhino Health FCP. By following the step-by-step instructions in this document, you will learn how to:
- Interact with your project objects with the FCP Python SDK
- Import Datasets with the FCP Python SDK
- Run simple Python code using a pre-built container image
- Write reusable preprocessing scripts for messy data
- Use an Interactive Container to interact directly with distributed data
We are excited to see what you will do on the platform, so let’s get started!
Step 1: Getting started
Credentials
As in Tutorial #1 - Rhino Health Federated Computing Platform “Hello World” - Basic Usage, you will need your FCP credentials (email and password) to log into https://dashboard.rhinohealth.com/login.
If you have not received those credentials - please contact support@rhinohealth.com.
Tutorial 2 Project Resources
Click here to access all the files you will need for this project. Download or clone the repository to your local computer. For this tutorial, you will be using the resources inside the client-resources/tutorials/tutorial_2/
folder
This folder includes:
-
data/ - This folder contains all the data that will be used within this tutorial. The folder is structured in a way that will become more familiar to you as you begin to feel more comfortable using various parts of the system such as creating custom containers.
-
input/ - This folder is part of the system's structure as you utilize the features mentioned above. It contains the input data for the tutorial.
-
site1_part1_dataset_data.csv,
site2_part1_dataset_data.csv,
site3_part1_dataset_data.csv - The files that define the datasets at the three different sites you will import during the tutorial.
-
site1_part2_dataset_data.csv,
site2_part2_dataset_data.csv,
site3_part2_dataset_data.csv - The files that define the second part of the datasets at the three different sites. This second part of the dataset is meant to illustrate how easy it is to create re-usable data harmonization workflows using the Rhino Health Python SDK
-
site1_part1_dataset_data.csv,
-
input/ - This folder is part of the system's structure as you utilize the features mentioned above. It contains the input data for the tutorial.
-
notebooks/ - This folder contains the notebooks you will utilize within the tutorial.
- Tutorial 2 - Data harmonization.ipynb - a Jupyter notebook containing the code you will run in this tutorial.
-
schemas/ - This folder contains the Data Scehmas that will define the structure and variable type of the tutorial data.
- Harmonization Schema.csv - The Data Schema you will use for this project.
Important: In a real multi-site project on the FCP, you will not need to collect data from different sites within your own client. Each site will hold its own data on its own Rhino Health client.
Step 2: Preparing your data
For your data to be available to the FCP, you will first need to transfer the data located within the data/
folder to be able to utilize it within the FCP UI. If you are new to using SFTP and would like to learn more, please refer to the following support article: How can I move data from my local environment to my Rhino Health client using SFTP?
For this tutorial, you are going to move the following six resources from your local machine to your Rhino Health client:
client-resources/tutorials/tutorial_2/data/input/site1_part1_dataset_data.csv
client-resources/tutorials/tutorial_2/data/input/site2_part1_dataset_data.csv
client-resources/tutorials/tutorial_2/data/input/site3_part1_dataset_data.csv
client-resources/tutorials/tutorial_2/data/input/site1_part2_dataset_data.csv
client-resources/tutorials/tutorial_2/data/input/site2_part2_dataset_data.csv
client-resources/tutorials/tutorial_2/data/input/site3_part2_dataset_data.csv
Connecting to your SFTP Server for MacOS, Linux and Windows 10+
- Open a terminal or command prompt on your respective operating system and navigate to the folder
client-resources/tutorials/tutorial_2/data/
. -
Connect to your client via SFTP using the following command:
sftp rhinosftp@RHINO_CLIENT_IP_ADDRESS
- Note: Ensure to replace RHINO_CLIENT_IP_ADDRESS in the above command with the credentials found in your profile. If you need help finding your SFTP details check out the following article: How can I find my SFTP Server Name/IP Address, SFTP Username, & SFTP Password?
-
Copy the CSV files from your local machine into a new folder you create on your Rhino Health client by running the commands below:
sftp> mkdir tutorial_2
sftp> cd tutorial_2
sftp> put -r input/
sftp> exit
Other Operating Systems
- If you have downloaded and configured your SFTP client, skip to the next step. Otherwise, please follow steps 1 & 2 outlined in the following support article under the heading Connecting to your Rhino Health Client via SFTP from Other Operating Systems
- Open your SFTP client and connect to your Rhino Client
-
Using the STFP client to upload your data:
- On the local machine file system panel, navigate to the folder
client-resources/tutorials/tutorial_2/data/input/
. - On your Rhino Health client file system panel, create a new folder called
tutorial_2
and navigate inside of it
- On the local machine file system panel, navigate to the folder
- Drag the CSV files from your local machine file system panel to the Rhino Health client file system panel in order to upload them to your Rhino Health client.
Wait until your files have successfully been uploaded to the Rhino Health client before proceeding to the next step.
Step 3: Set Up Your Project on the FCP UI
Creating a New Project within the FCP UI
- Log in at https://dashboard.rhinohealth.com/login. If this is your first time logging in, you will be required to change your initial password and sign the EULA.
- Create a new project by clicking on the Add New Project button in the top right corner.
-
Fill in the following fields within the new modal window:
- Name: Tutorial 2 - YOUR_NAME
- Description: This is my second project on the Rhino Health FCP
- Permission Policy: Expand this section to explore the various configurable permission policies and personas that are available to you. For this tutorial, you can leave the default permission policy.
- Click the Create Project button to create your project. Once clicked, you will be navigated back to the project screen where you will see your newly created project.
Creating a New Data Schema within the FCP UI
- Click on the project you just created on the FCP Projects Page.
- Click the Data Schemas menu item within the left-hand navigation menu.
- Create a new schema by clicking on the Create New Data Schema button in the top right corner.
-
Fill in the following fields within the new modal window:
- Name: Harmonization Schema
- Description: Data Harmonization Schema
- Select the Upload From File radio button
- Within the newly visible Data Schema CSV File input field, click the Select File button. A filesystem dialog should appear
- Navigate to where you placed the
client-resources/
folder. Further, navigate totutorials/tutorial_2/schemas/Harmonization Schema.csv
and select Open to load the CSV file
- Finally, click the Create New Data Schema button to create your new Data Schema.
-
Within the Data Schemas page, you should now have a new Data Schema object defined, similar to the screenshot shown below.
Step 4: Rhino Health Python SDK
In this step, you will be interacting with the FCP using the programmatic interface, or Python SDK. To get started, you will need the following:
- If you do not have Python installed within your development environment, please download and install Python here: Python
- If you do not have Jupyter Notebook installed within your development environment, please follow the steps outlined here: Jupyter Notebook Installation
-
Using your terminal or command prompt, navigate to your
client-resources/tutorials/tutorial_2/notebooks
folder. - Run the following command to start the Jupyter Notebook,
Tutorial 2 - Data Harmonization.ipynb
:jupyter notebook
"Tutorial 2 - Data Harmonization.ipynb"
- In the FCP GUI, go to the Projects dashboard and copy the UID for your newly created project as shown in the screenshot below. You will need this UID shortly within your Jupyter Notebook.
- Follow the step-by-step tutorial for harmonizing data programmatically using the FCP Python SDK by running each of the cells contained within the notebook.
Step 5: Interactive Container
Sometimes knowing exactly what data preprocessing/harmonization code to prepare for each site is not trivial, and it would be very helpful to interact directly with the data at each site in order to get a better understanding of what’s going on.
Interactive Containers is an additional type of supported code on the FCP. With this mechanism, your code is still running in a container, but you can open an interactive session with the on-prem client (think of it like a kind of remote desktop) where you can have access to pre-installed 3rd party applications, such as Jupyter Notebook, image analytics software like 3D Slicer, and many more.
In this tutorial, you will build and push an interactive container from a prepared folder. This interactive container will include an installation of Jupyter Notebook, as well as an example notebook you can run on the distributed data in your project. If you are interested in learning more about the Interactive Container Code Objects, please follow one of the links to the Interactive Container Code section of our User Guides.
Creating a New Interactive Container Code Object
- Return to the FCP to create a new Code Object
- Click the Code menu item within the left-hand navigation menu.
- Create a new Code Object by clicking on the Create New Code button in the top right corner.
-
Fill in the following fields within the new modal window:
- Name: jupyter-notebook
- Description: Demo Jupyter Notebook Interactive Container
- Type: Interactive Container
- Input Data Schema: Harmonization Schema (v0)
- Output Data Schema: Do not Modify - The default option [Auto-generate Data Schema from Data] is correct for this tutorial
- Container: interactive-jupyter-notebook
- Next, click the Create New Code button to create your new Code within your project.
Running your New Interactive Container Code to Use Jupyter Notebook
- Navigate to the Code you created in the last step, and click the Run button in the row corresponding to Version 0 of Jupyter Notebook.
-
Fill in the following fields within the new modal window:
- Input Datasets: Select any Dataset you would like
- Output Dataset Name Suffix: _output
- Idle Timeout (seconds): Do not modify. The default value will work fine
- Max Duration (seconds): Do not modify. The default value will work fine here
- Run Parameters (JSON, optional): Leave blank
- Once you have completed entering all the details of your Code Run , click the Run button to send your code to be run on your Rhino Health client.
- Note: In collaborative projects with remote sites, you will only be able to run Interactive Containers on remote data after the site has granted you permission via the FCP Secure Access mechanism.
Connecting to your Interactive Session of Jupyter Notebook
- Click the Code Runs menu item within the left-hand navigation menu.
- On the Code Runs page, you should now see a new Code Run entitled Jupyter Notebook with a single row of type "IGC" and showing a status of "Running".
- Connect to the active session by clicking the icon.
- A new tab will open with the interactive session. This tab is running your Interactive Container on the remote client:
- To open a Jupyter Notebook, simply click the desktop icon.
- Note: You may see an error message indicating that there is no audio support. You can ignore this message.
- You can now create a new notebook and run it on the remote client. You can access the input Dataset files under the
/input
folder:
- For a pre-built example, open the notebook that has been included under the Desktop folder.
- Saving your output and importing it into your project on the FCP:
- Save any files you wish to import back into the FCP under
/output
:- If you wish to import a tabular file, save it as
/output/dataset.csv
- If you wish to import file data, follow these steps:
- Save your files under
/output/file_data/
- After you have saved all your files, click the desktop icon named Create Output Dataset. This will run a local script that looks for files under
/output/file_data/
and populates a new/output/dataset.csv
file with the file names so they can be imported as a Dataset.
- Save your files under
- If you wish to import a tabular file, save it as
- When you terminate the interactive session, the FCP will attempt to import the output Dataset defined in the
/output/dataset.csv
file. If you did not create this file, the Code Run will show an “Error: failure to import dataset” message.
- Save any files you wish to import back into the FCP under