Introduction
The Federated Datasets (FDs) feature enhances the discoverability of data available via the FCP, enabling users to explore and share datasets in a more accessible manner. Whether you are a data owner or a data scientist, this guide will help you navigate the process of creating and exploring Federated Datasets.
Creating a New Federated Dataset
If you are responsible for data strategy and want to make your organization’s data assets more accessible, follow these steps to create and publish a new Federated Dataset:
-
Create a Project.
- Log into Rhino FCP and create the project.
- Make sure to add permissions to each Federated Dataset site.
- For each site, select the Project > Collaborators > Permission Policy.
- Ensure that Manage This Site's Datasets is added for the Federated Dataset. So for example, if a Federated Dataset is comprised of 3 datasets from 3 sites, you will need to make sure that the Manage This Site's Datasets permission is set for each site. See more information in Specifying a Project's Permissions Policy.
- Import relevant data into the project.
-
Prepare your data for publishing.
- The Rhino FCP enables you to run custom code for data preprocessing (e.g. feature extraction, data cleaning, etc.).
- If needed, read the Models section in the User Guides for more information.
-
Create Federated Dataset.
- Select the dataset you wish to publish and click "Create Federated Dataset."
- The Create Federated Dataset page opens.
-
Define Access and Privacy Settings.
- Public name for the FD.
- Set viewing privileges for data analytics. Selecting "Limited" will require users to submit an access request for your approval.
- Specify privacy settings (k-anonymization, differential privacy).
- Optionally provide an alternative contact email address. The default email address will be the one you use to log into the FCP.
- Define a "Datasheet" for the FD with the following sections:
- Motivation
- Composition
- Geography
- Data type
- Modality
- Disease Group
- Body Part
- Collection Process
- Preprocessing / cleaning / labeling
- Uses
- Distribution
-
Maintenance
Note: This information will help users find your dataset via keyword search.
-
Publish your Federated Dataset.
- Click "Publish" to make the FD accessible.
- Find your published FD under the "Datasets" section in FCP.
-
Managing access.
- Access requests will be delivered by email, and will also be visible on the FCP under the specific datasets:
- To review requests and grant/deny access, navigate to the Access view inside the relevant dataset.
- Access requests will be delivered by email, and will also be visible on the FCP under the specific datasets:
You are now ready to explore Federated Datasets on the Rhino Health FCP. If you have any further questions or need assistance, feel free to refer to our comprehensive documentation or reach out to our support team.
Exploring an Existing Federated Dataset
As an FCP user interested in exploring datasets for your project, follow these steps to explore an existing Federated Dataset:
- Log into the Rhino FCP.
-
Navigate to the Datasets view by clicking the Datasets icon.
-
Search for Datasets.
- Use the "Search by keyword" feature to find relevant datasets.
-
View Datasheet.
- Click on a dataset.
- The Datasheet and Data Schema defined by the publisher opens.
-
Explore Analytics.
- The
icon indicates that access to the data analytics must be approved by the publisher.
- Click on Analytics to fill and submit an access request. You will need to provide:
- Name of affiliation (Academic Institution, Company)
- Type of affiliation (Academic, Commercial)
- Purpose for querying the data - a brief description of your goals for the publisher's consideration.
- Once you submit the request, a notification will be sent to the publisher. Upon approval, you will be notified and can view Analytics for the dataset.
- The
-
Using a Federated Dataset in your project.
- Reach out to the data publisher in order to gain full access to the data, enabling you to run your code using the data via the Rhino FCP. You can also do so by reaching out to us at support@rhinohealth.com and we will be happy to assist!
- Once agreements have been achieved by you and the data publisher, a project will be created for you with the publisher as a collaborator and the relevant dataset will be imported into that project.
You are now ready to explore Federated Datasets on the Rhino Health FCP. If you have any further questions or need assistance, feel free to refer to our comprehensive documentation or reach out to our support team.