This article explains how to import and export datasets on the Rhino Health Platform to and from the following cloud storage platforms:
- Amazon Web Services (AWS) S3
- Google Cloud Platform's (GCP) Cloud Storage (CS)
- Server Message Block (SMB) network file sharing protocol
Request Mounting Storage to your Rhino Client
Contact Rhino's support team to set up your S3 bucket, GCP CS storage, or SMB shared folder integration and provide the following details.
-
File storage type:
- `s3` for AWS S3 bucket
- `gcs` for Google Cloud Storage
- `smb` for SMB shared folder
- Your file storage path: The path in your cloud storage to make accessible to the Rhino Client.
-
`rhino_data` subfolder: A subfolder to access your cloud data within the Rhino Client such as `my_cloud_storage_folder`. The provided bucket will be mapped to the following path in `rhino_data`:
-
/rhino_data/external/`file storage type`/my_cloud_storage_folder
-
- Is read only: If `True`, the Rhino Client can only import data. If `False`, the Rhino Client can also export datasets to your cloud storage.
- Credentials: The certificate/credentials required to access the relevant data in your cloud storage. See how to provide such credentials in a secure manner with AWS, and GCP.
Import Datasets
Create a new dataset and point to your cloud storage data using the following paths:
For AWS S3:
/rhino_data/external/s3/my_cloud_storage_folder/YOUR_DATA_PATH_UNDER_BUCKET
For GCP CS:
/rhino_data/external/gcs/my_cloud_storage_folder/YOUR_DATA_PATH_UNDER_BUCKET
For SMB:
/rhino_data/external/smb/my_cloud_storage_folder/YOUR_DATA_PATH_NETWORK_SHARE
Note: The integration is available at the workgroup level. Each workgroup can set up their own buckets or network share. Those buckets or network shares are not accessible to other workgroups.
Example: Importing a file from AWS S3
Suppose you want to import file that is located in a S3 bucket under the path `my_bucket/some_folder/some_subfolder/dataset.csv`, and the details supplied to Rhino support are:
- File storage path: `my_bucket`
- File storage type: `S3`
- `rhino_data` subfolder: `my_cloud_storage_folder`
- Credentials: `{"aws_access_key_id": <key>, "aws_secret_access_key": <key>}`
When importing this file as a dataset on FCP, the path for this file would be:
/rhino_data/external/s3/my_cloud_storage_folder/some_folder/some_subfolder/dataset.csv
Export Datasets
To export an existing dataset, follow the steps described in Exporting a Dataset. The Rhino integration with your network storage should be configured as `Is read only` = `False` to allow your Rhino Client to save the exported files in your network storage. (If you are not sure if `Is read only` = `False` in your configuration, please contact the Rhino support team.)
Datasets will be exported to the file storage path set in the integration. For the AWS import example above, datasets would exported to your AWS S3 bucket named `my_bucket`.