A Data Schema is a fundamental aspect of the Rhino Federated Computing Platform (FCP) that defines the structure, attributes, and organization of data within a dataset. It serves as a blueprint, outlining the characteristics of individual data fields and their roles, ensuring a consistent and standardized format for interpreting and analyzing data.
Implicit vs. Explicit Data Schema Specification
Every dataset in the Rhino FCP inherently possesses a Data Schema, even if not explicitly defined by the user. The platform is equipped to automatically infer basic data types, like strings, floats, and files, from the data content. This means that users can start working with their data without the immediate need to specify an explicit Data Schema.
Advantages of Explicit Data Schema Definition
While an explicit Data Schema is not mandatory, there are cases where defining a Data Schema offers distinct benefits:
-
Validation: Explicit Data Schemas enable users to enforce specific validation rules on their data. This ensures that the data adheres to predefined standards, minimizing errors and inconsistencies.
-
Interpretability: Clearly defined Data Schemas enhance the interpretability of the data. Knowing the intended role and meaning of each field improves collaboration and understanding among researchers.
-
Customization: Explicit Data Schemas empower users to tailor data structures to their specific research needs, accommodating unique requirements and analysis objectives.
Summary
In the Rhino FCP, Data Schemas play a pivotal role in maintaining data organization, consistency, and accuracy. The platform's ability to infer basic data types allows users to seamlessly begin working with their data. Meanwhile, the option to explicitly define a Data Schema provides greater control, validation, and customization, enabling effective and insightful research endeavors.
Key Components of a Data Schema
Attributes
- Name: The name of the defined Data Schema. All Data Schemas within the platform must have a unique name and version combination
- Description (Optional): The description of the Data Schema
- Version(s): The version of the Data Schema. After creating the initial Data Schema you are able to edit the existing Data Schema and create a new version instead of creating a whole new Data Schema.
- Date Created: The date the Data Schema version was created
- Number of Fields: The number of fields or features that are defined within the Data Schema version. Note that this number will always be 1 larger than the number of fields you have defined during the creation of a Data Schema. The extra field is the __notes__ field, which is created by default in every Data Schema for users to use as they would like
- UUID: The unique identifier for a specific Data Schema version within the Rhino FCP
- Fields(s): Contained within a Data Schema, a field is a column of data that is being defined in your data set. These are often referred to as features within the data science community. There are several components that make up a field within a Data Schema. For more information please refer to Data Schema Field Attributes. If you are curious about the data types that a field can take on within the FCP, please refer to Supported Data Schema Data Types
- Number of Datasets: The number of datasets that are currently using this Data Schema version to define a dataset's structure
- Creator: The creator of the Data Schema version
Actions
- Create a New Data Schema or Data Schema Version: Create a new Data Schema or new version of an existing Data Schema
- Auto-Generated Data Schemas: Skip the process of defining a Data Schema and let the system auto-generate a Data Schema for you
- Viewing a Data Schema: View the details of an already-defined Data Schema
- Exporting a Data Schema: Export an existing Data Schema
- Deleting a Data Schema or Data Schema Version: Delete a single version or a whole Data Schema
Interfaces
Below are a series of screenshots that detail how you can interact with Data Schemas within the Rhino FCP
Main Data Schema Page
The main interface for initiating the creation of a Data Schema.
Creating a New Data Schema
For more information about creating a new Data Schema, please refer to Creating a New Data Schema or Data Schema Version.
Creating a New Blank Data Schema
For more information about creating a new blank Data Schema, please refer to Creating a New Schema or Schema Version.
Creating a Data Schema by Uploading a File
For more information about creating a Data Schema by uploading a file, please refer to Creating a New Schema or Schema Version.
Creating a New Data Schema Version
For more information about creating a new Data Schema version, please refer to Creating a New Schema or Schema Version.
Viewing an Existing Data Schema
Editing an Existing Data Schema
For more information about editing an existing Data Schema, please refer to Creating a New Data Schema or Data Schema Version.