Adding Data to your Rhino Federated Computing Platform Client

Getting data into your project on the FCP generally requires two steps:

  1. Transferring data into storage that your Rhino Client can access. This step is most commonly done via SFTP, as described below.
  2. Importing your data as a Dataset in your project. Once the data is accessible to the Rhino Client, you can follow the steps in the Creating a new Dataset section to import the data into your project. One exception to this flow is when the data is extracted from a SQL database; in this case, extracting the data and creating the Dataset is done as a single step (described below)

 

Adding Data to your FCP Client via SFTP

The process of moving data from your local environment to your Rhino Health client using SFTP depends on the operating system of your local machine:

 

Connecting to your Rhino Health Client via SFTP from MacOS, Linux & Windows 10+

  1. Open a terminal or command prompt on your respective operating system, and navigate to the folder where the data that you would like to upload are located.
  2. Connect to your Rhino Health Client via SFTP
    • Retrieve your SFTP server IP and credentials from your Rhino Health Federated Computing Platform profile page.  If you need a quick refresher on how to do that, you can follow this support article to help you: How can I find my SFTP Server Name? 
    • Connect to your Rhino Health client via SFTP, by entering the following. Note: Ensure to replace RHINO_CLIENT_IP_ADDRESS in the below command with the credentials found in your profile.
> sftp rhinosftp@RHINO_CLIENT_IP_ADDRESS
  1. Next, you will be prompted to enter your SFTP password, and if everything is entered properly you will be connected to your Rhino Health client and your command line should change to look like the following:
sftp>   

For additional help with common commands within the SFTP terminal interface, check out the section Common Commands when SFTP'ing Data.

 

Connecting to your Rhino Health Client via SFTP from Other Operating Systems

  1. To SFTP data from other Operating Systems to your Rhino Health client, you will need to first install an SFTP client, like Cyberduck, WinSCP, etc.
  2. Once your SFTP client has been installed, you can configure a new SFTP connection.
    1. Retrieve your SFTP server IP and credentials from your Rhino Health Federated Computing Platform profile page.  If you need a quick refresher on how to do that, you can follow this support article to help you: How can I find my SFTP Server Name? 
    2. If you are successfully connected to your SFTP client, you should see two separate panels, one displaying your local machines file system and a second showing your Rhino Health client's filesystem.
  3. Using the STFP client to upload your data:
    • On the local machine file system panel, navigate to the folder where your data is located.
    • On the Rhino Health client file system panel, navigate to where you would like to place the data. 
      • Note: It is recommended that you create folders for your datasets to keep things organized.
    • Drag the files from the local machine file system panel to the Rhino Health client file system panel to upload your data to the Rhino Health client server.

 

Common Commands when SFTP'ing Data

Use the following commands to create folders, navigate to different folders, view folder contents, and upload data:

 

Create a new folder called DIRECTORY_NAME within your Rhino Health client

sftp> mkdir DIRECTORY_NAME

 

Change to the specified directory, DIRECTORY_NAME within your Rhino Health client

sftp> cd DIRECTORY_NAME

 

List all files within the current directory on your Rhino Health client

sftp> ls


Upload the specified file FILE_NAME on your local machine to the current directory of your Rhino Health client

sftp> put FILE_NAME


Upload the folder and all of its contents on your local machine to the current directory of your Rhino Health client

sftp> put -r DIRECTORY_NAME


Exit out of the SFTP connection

sftp> exit

 

 

Adding Data to your FCP Client via the Rhino SDK and SQL

*Note: Importing a cohort from SQL is currently only supported via the Rhino SDK.

Prerequisites

  1. Have an SQL DB that is open to connections from a Rhino Client, with access credentials for read-only access to this DB.
  2. Have a project where you are either part of the project's lead workgroup and the DB is within your site, or where there is a collaborator in the project that has the DB at their site.
  3. Ensure the required site-level permissions for SQL querying (Import and export Datasets, View Dataset analytics) are enabled for the site that has the DB.
  4. Ensure that you have the Rhino SDK installed on your computer (pip install rhino_health).
     

To import a Dataset directly from a SQL DB, follow the following steps:

Import your Python Dependencies

import rhino_health as rh
from rhino_health.lib.endpoints.sql_query.sql_query_dataclass import (
    SQLQueryImportInput,
    SQLQueryInput,
    SQLServerTypes,
    ConnectionDetails,
)
import getpass

 

Remember to change all lines with CHANGE_ME comments above them in all the blocks below!

 

Log into the Rhino SDK using your FCP Credentials

Your username will be the email address you log into the Rhino FCP platform with.  

print("Logging In")

# CHANGE_ME: MY_USERNAME
my_username = "MY_USERNAME"

session = rh.login(username=my_username, password=getpass.getpass())
print("Logged In")

 

Get Supporting FCP Information Needed to Import Your Dataset

At this point, you will need the name of your project to retrieve your Project's UUID and subsequently your Workgroup UUID.  You can also retrieve each object's UUID by following the instructions here: How do I retrieve a Project's, Collaborator's, Data Schema's, Dataset's, Code Object's, or Code Run's UID?

# CHANGE_ME: YOUR_FCP_PROJECT_NAME
project = session.project.get_project_by_name('YOUR_FCP_PROJECT_NAME')

workgroup = session.project.get_collaborating_workgroups(project.uid)[0]

 

Connection Setup

When specifying the connection details, ensure that you provide the server_type using the approved SQLServerTypes enum. This step ensures that your server is supported and compatible with the querying process.  For a complete list of supported & compatible servers, refer to the SDK documentation here: SQL Server Types.  Replace the following variables below:

  • sql_db_user - Your SQL database username. Make sure that the user has read-only permissions
  • sql_db_password - Your SQL database password. For better security, consider using an environment variable  (e.g. os.getenv("DB_PASSWORD")), or using getpass.getpass() to type in the password
  • external_server_url - Your SQL database url & port (i.e. "{url}:{port}")
  • db_name - Your SQL database name
# CHANGE_ME: SQL_DATABASE_USERNAME
sql_db_user = "SQL_DATABASE_USERNAME"

# CHANGE_ME: SQL_DATABASE_PASSWORD
sql_db_password = "SQL_DATABASE_PASSWORD"

# CHANGE_ME: EXTERNAL_SERVER_URL
external_server_url = "EXTERNAL_SERVER_URL"

# CHANGE_ME: DB_NAME
db_name = "DB_NAME" # Replace this with your DB name.
connection_details = ConnectionDetails(
    server_user=sql_db_user,
    password=sql_db_password,
    
    # CHANGE_ME: Replace POSTGRESQL with one of the supported SQL Server Types outlined in the documentation above
    server_type=SQLServerTypes.POSTGRESQL,
    server_url=external_server_url,
    db_name=db_name
)

 

[Optional] Running Exploratory Queries

You can run SQL queries on the remote DB and receive aggregate statistics on the results of the query.

This involves two inputs:

  1. Define the query you want to run (note that the RHP does not limit the SQL code that is run - always connect with a DB user that has read-only permissions)
  2. Define the metrics you would like to calculate on the query results (See the SDK documentation for more information about what Metrics you can calculate: rhino_health.lib.metrics)
# CHANGE_ME: QUERY_STRING, Replace with query you want to run, e.g. "SELECT * FROM <your_table> WHERE <condition>"
starting_query = "QUERY_STRING"

# CHANGE_ME: ["METRICS"], # Define a list of metrics (e.g. [Mean(variable="Height")], etc.) outlined in the documentation above
metric_definitions = ["METRICS"] 

 

Define the Exploratory Query Run Parameters

When defining your SQLQueryInput, your project & workgroup will be used to validate permissions that were set at the project and site level (i.e. k-anonymization value).  For more information on permissions, please refer to Permissions.

query_run_params = SQLQueryInput(
    session=session,
    project=project.uid,
    workgroup=workgroup.uid,
    connection_details=connection_details,
    sql_query=starting_query,
    timeout_seconds=600,
    metric_definitions=metric_definitions
)

 

Run the Exploratory Query

Run the query on your SQL database and get the metric results.

response = session.sql_query.run_sql_query(query_run_params)
print(response.results)

 

Import Query Results as a Dataset in the Rhino FCP

You can run SQL queries on the remote database and then have the results of the query stored as a Dataset on the Rhino FCP allowing further processing, analysis, etc.

This involves two inputs:

  1. The query you want to run (note that the RHP does not limit the SQL code that is run - always connect with a DB user that has read-only permissions)
  2. Data needed for the Dataset creation (e.g. Dataset name)
# CHANGE_ME: QUERY_STRING, Replace with query you want to run to generate the data for the Dataset, e.g. "SELECT * FROM <your_table> WHERE <condition>"
query = "QUERY_STRING"

# CHANGE_ME: COHORT_NAME
cohort_name = "COHORT_NAME"

# CHANGE_ ME: False, Can be either True or False depending on whether the data being queried is de-identifed
is_data_deidentified = False # 

# CHANGE_ME: SCHEMA_NAME & Possibly Version Number too. To auto-generate your schema, set this variable to None
dataschema_uid = session.project.get_data_schema_by_name('SCHEMA_NAME', project_uid=project.uid, version=1).uid

 

Define the Dataset Import Parameters

Same as when defining your SQLQueryInput, when creating your SQLQueryImportInput, your project & workgroup will be used to validate permissions that were set at the project and site level (i.e. k-anonymization value).  For more information on permissions, please refer to Permissions.

import_run_params = SQLQueryImportInput(
    session=session,
    project=project.uid,
    workgroup=workgroup.uid,
    connection_details=connection_details,
    dataset_name=dataset_name,
    data_schema_uid=dataschema_uid
    timeout_seconds=600,
    is_data_deidentified=is_data_deidentified,
    sql_query=query
)

 

Trigger the Query to Import your Dataset

Run the query on your SQL database and the results will be imported into the Rhino FCP as a Dataset.

response = session.sql_query.import_dataset_from_sql_query(import_run_params)
print(response.results)

 

Was this article helpful?
0 out of 0 found this helpful

Articles in this section