What is a Dataset? – Rhino Federated Computing

In the Rhino FCP, a dataset is an immutable representation of a collection of data points associated with a specific project. Each dataset is owned by a Workgroup. Data is stored in the Rhino Client for that Workgroup (behind the institutional firewall). Only aggregate information, such as summary statistics, is stored in the Rhino Orchestrator.

Each dataset has a unique name and version combination, as well as a unique identifier (UID).
A dataset may include tabular data, file data, or a combination of both. The FCP supports files of any format within a dataset, however, there is additional functionality built into the FCP for DICOM files.
Each dataset is associated with a Data Schema, which describes the format in which the data is organized within the dataset (e.g., features, data types, units, schema-level permissions, etc.).

In Rhino FCP you can import a dataset or create a new version of an existing dataset, view the dataset's UID, perform schema-based de-identification for sensitive data, and view analytics on the data.

You can view datasets in tabular form. You can also view and annotate DICOM image daa within the built-in OHIF view. You can also export datasets, publish and unpublish them, or even remove them if they are no longer needed. You can also view or add to secure access lists.

You can do many of these things as well with the Rhino SDK. For more information see Importing a New Dataset or Dataset Version Using Rhino SDK and SQL.

Key Components of a Dataset

Datasets have the following components.

Name: The name of the defined dataset. All datasets within the platform must have a unique name and version combination.
Description (Optional): The description of the dataset.
Version(s): The version of the dataset. After creating the initial dataset you can edit the existing dataset configuration and create a new version instead of creating a whole new dataset.
Data Schema: The data schema that was used when importing the particular version of the dataset.
Date Created: The date the dataset was created.
Number of Rows: The number of rows that were imported from the tabular, DICOM, and/or file data.
Source: The creator of the dataset version.
UID: The unique identifier for a specific dataset version within the Rhino FCP.
Data: The underlying tabular data that defines the cohort. This can be simply a list of DICOMs, filenames, or a CSV-style data table that contains multiple variables.
Secure Access List: A list that defines access you have been granted to a collaborator's remote dataset or access you have granted in your dataset to your collaborators. For more information about collaborators or Secure Access, please refer to What is a Collaborator? or What is Secure Access? respectively.

Related to

Key Components of a Dataset

Related Pages