Introduction
The Federated Datasets (FDs) feature enhances the discoverability of data available via the FCP, enabling users to explore and share datasets in a more accessible manner. Whether you are a data owner or a data scientist, this guide will help you navigate the process of creating and exploring Federated Datasets.
Table of Contents
Creating a New Federated Dataset
If you are responsible for data strategy and want to make your organization’s data assets more accessible, follow these steps to create and publish a new Federated Dataset:
-
Create a Project:
- Log into Rhino FCP and navigate to your project.
- Import relevant data into the project.
-
Prepare your data for publishing:
- The Rhino FCP enables you to run custom code for data preprocessing (e.g. feature extraction, data cleaning, etc.). Read the Models section in the User Guides for more information.
-
Create Federated Dataset:
- Select the dataset you wish to publish and click "Create Federated Dataset."
-
Define Access and Privacy Settings:
- Public name for the FD.
- Set viewing privileges for data analytics. Selecting "Limited" will require users to submit an access request for your approval.
- Specify privacy settings (k-anonymization, differential privacy).
- Optionally provide an alternative contact email address. The default email address will be the one you use to log into the FCP.
- Define a "Datasheet" for the FD with the following sections:
- Motivation
- Composition
- Geography
- Data type
- Modality
- Disease Group
- Body Part
- Collection Process
- Preprocessing / cleaning / labeling
- Uses
- Distribution
-
Maintenance
Note: This information will help users find your dataset via keyword search.
- Select the dataset you wish to publish and click "Create Federated Dataset."
-
Publish your Federated Dataset:
- Click "Publish" to make the FD accessible.
- Find your published FD under the "Datasets" section in FCP.
-
Managing access:
- Access requests will be delivered by email, and will also be visible on the FCP under the specific datasets:
- To review requests and grant/deny access, navigate to the Access view inside the relevant dataset:
- Access requests will be delivered by email, and will also be visible on the FCP under the specific datasets:
Congratulations! You are now ready to create Federated Datasets on the Rhino Health FCP. If you have any further questions or need assistance, feel free to refer to our comprehensive documentation or reach out to our support team. Happy collaborating!
Exploring an Existing Federated Dataset
As an FCP user interested in exploring datasets for your project, follow these steps to explore an existing Federated Dataset:
- Log into the Rhino FCP
-
Navigate to the Datasets view by clicking the Datasets icon:
-
Search for Datasets:
- Use the "Search by keyword" feature to find relevant datasets.
-
View Datasheet:
- Click on a dataset to view the Datasheet and Data Schema defined by the publisher:
- Click on a dataset to view the Datasheet and Data Schema defined by the publisher:
-
Explore Analytics:
- The
icon indicates that access to the data analytics must be approved by the publisher.
- Click on Analytics to fill and submit an access request. You will need to provide:
- Name of affiliation (Academic Institution, Company)
- Type of affiliation (Academic, Commercial)
- Purpose for querying the data - a brief description of your goals for the publisher's consideration.
- Once you submit the request, a notification will be sent to the publisher. Upon approval, you will be notified and can view Analytics for the dataset.
- The
-
Using a Federated Dataset in your project:
- Reach out to the data publisher in order to gain full access to the data, enabling you to run your code using the data via the Rhino FCP. You can also do so by reaching out to us at support@rhinohealth.com and we will be happy to assist!
- Once agreements have been achieved by you and the data publisher, a project will be created for you with the publisher as a collaborator and the relevant dataset will be imported into that project.