This article explains how to import and export datasets on the Rhino Health Platform to and from the following cloud storage platforms:
- Amazon Web Services (AWS) S3
- Google Cloud Platform's (GCP) Cloud Storage (CS)
- Server Message Block (SMB) network file sharing protocol
Prerequisite
Before you can complete these instructions, you will need to mount the bucket or directory that contains the data you want to access. To do this, follow the steps in Mounting Storage to Your Rhino Client.
Import Datasets
Create a new dataset and point to your cloud storage data using the following paths:
For AWS S3:
/rhino_data/external/s3/my_cloud_storage_folder/YOUR_DATA_PATH_UNDER_BUCKET
For GCP CS:
/rhino_data/external/gcs/my_cloud_storage_folder/YOUR_DATA_PATH_UNDER_BUCKET
For SMB:
/rhino_data/external/smb/my_cloud_storage_folder/YOUR_DATA_PATH_NETWORK_SHARE
The integration is available at the workgroup level. Each workgroup can set up their own buckets or network share. Those buckets or network shares are not accessible to other workgroups.
Example: Importing a file from AWS S3
Suppose you want to import file that is located in a S3 bucket under the path `my_bucket/some_folder/some_subfolder/dataset.csv`, and the details supplied to Rhino support are:
- File storage path: `my_bucket`
- File storage type: `S3`
- `rhino_data` subfolder: `my_cloud_storage_folder`
- Credentials: `{"aws_access_key_id": <key>, "aws_secret_access_key": <key>}`
When importing this file as a dataset on FCP, the path for this file would be:
/rhino_data/external/s3/my_cloud_storage_folder/some_folder/some_subfolder/dataset.csv
Export Datasets
To export an existing dataset, follow the steps described in Exporting a Dataset. The Rhino integration with your network storage should be configured as `Is read only` = `False` to allow your Rhino Client to save the exported files in your network storage. (If you are not sure if `Is read only` = `False` in your configuration, please contact the Rhino support team.)
Datasets will be exported to the file storage path set in the integration. For the AWS import example above, datasets would exported to your AWS S3 bucket named `my_bucket`.