The Harmonization CoPilot simplifies the transformation of electronic health record (EHR) data into the FHIR data standard. This guide explains the steps involved in syntactic and semantic mapping, along with executing the data harmonization process.
Example data files used in this example can be downloaded from github.com/RhinoHealth/user-resources/tree/main/tutorials to replicate the steps described below. |
Overview
FHIR harmonization on the Rhino Federated Computing Platform (FCP) involves translating data into a structured, standardized format through schema transformations (syntactic mapping) and aligning data values with standardized terminologies (semantic mapping). This ensures interoperability across healthcare systems.
-
Create Syntactic Mappings to FHIR Schemas: Map source tables to their corresponding FHIR resource schemas.
-
Create Semantic Mappings to Custom Vocabularies: Harmonize codes such as race, ethnicity, encounter types, and conditions to standardized FHIR ValueSets.
-
Execute the Data Harmonization: Apply mappings to convert your data into FHIRPath, which is a tabular representation of FHIR data.
- Generate FHIR Resources using a CodeObject: A code object is used to transform the tabular representation of your FHIR data into nested JSON objects that are valid FHIR resources.
Step 1: Review FHIR Profiles & Source Data
First, familiarize yourself with the FHIR profiles that are to be created through the project You will likely benefit from learning more about the FHIR format for clinical data and subsequently familiarizing yourself with the implementations of the FHIR format called Profiles that are specific to given countries or jurisdictions.
-
What FHIR resources am I creating? Prior to starting any project to transform local data into FHIR resources, is it important to identify the set of FHIR resources to be created.
- What data is required within each FHIR resource? Each FHIR profile has several data elements that are required, whereas others are not required. Simplifier.net is a fantastic resource used by many FHIR developers to store their Profiles and the site offers an interface to review Profile specifications and the fields required therein.
Second, review your source data to better understand which data will be necessary to transform into your desired FHIR resources. The Rhino FCP offers a useful interface for this exploration:
-
Which source tables contain relevant information? In some cases, creating a FHIR resource will require data from only a single source data table. However, it is more common that the data required for a single FHIR resource will be distributed across multiple tables.
-
Which source columns contain relevant data? Focus on columns required by the FHIR resource schemas.
-
What transformations are required? Identify proprietary codes requiring semantic mapping to standard FHIR ValueSets.
Step 2: Create Syntactic Mapping
Tip: In the case that you are creating multiple FHIR profiles, we recommend creating one Syntactic Mapping for each FHIR profile - this will make it easier to edit mappings in the future. |
Syntactic mappings serve as a blueprint for transforming your source data into the FHIR model. This process will create a tabular representation of your desired FHIR data; the actual FHIR data in JSON format will be created in a subsequent step. Follow these steps to create a Syntactic Mapping to FHIR:
-
Navigate to Data Mappings → Syntactic Mappings.
-
Click Create Syntactic Mapping.
-
Select Custom as the target data model.
-
Choose Manually Configure to use the graphical interface.
-
Select the relevant Source Data Schemas: each table that has any data to be included in the FHIR resource should be included.
-
Select the relevant Target Data Schema that represents the FHIR Profile to be created.
Graphical Interface for Mapping
Once the Syntactic Mapping has been created, an interface to map source fields to target FHIR fields will appear. The following three columns exist in this interface:
-
Source Fields: This column enables the selection of the relevant source fields that will be used to populate a given FHIR field.
-
Target Field: This column is automatically populated with each possible field associated with a given FHIR profile. In the case that a corresponding Source Field is not selected, the Target Field will remain empty in the resulting FHIR data.
-
Transformation: This column enables users to define how the source field should be transformed into compliant FHIR data. Navigate below to learn more about Transformations.
Column Transformations for FHIR:
Transformations are the 'workhorse' of the Harmonization Copilot and are the primary means of standardizing your data into FHIR format.
- No Transformation In some cases, your source data column will require no transformation to be included in a FHIR resource. In this case, simply select the relevant source field and leave 'Transformations' empty. Examples include encoding a patient's identifier into an identifier.value field, or inserting an ICD code into a Condition FHIR resource, as ICD codes are typically valid per FHIR ValueSets.
- Semantic Mapping: Map source values to standardized terms and is recommended for large ValueSets or for small ValueSets when there is a large number of corresponding source terms to be mapped.
-
Custom Mapping: Like Semantic Mapping, Custom Mapping maps source values to standardized terms but is recommended for small ValueSets when there is a low number of corresponding source terms to be mapped. This exists as an alternative to Semantic Mapping.
- In the following example, a Custom Mapping has been defined that maps three source does to three standard codes:
male, Male
female, Female
unk, Unknown
- In the following example, a Custom Mapping has been defined that maps three source does to three standard codes:
-
Set Value: Assign a constant value across all rows. This transformation is often used for system codes in FHIR, which require a constant uri (e.g. https://fhir.hl7.org.uk/CodeSystem/UKCore-AddressKeyType). Note that any arbitrary Source Field can be selected for Set Value transformations without impact to the generated output.
-
Convert Date: Reformat date columns to FHIR-required formats.
-
Stable UUID: Generate consistent identifiers; these are transformed into FHIR-compliant identifiers in the post-processing FCP CodeObject (see Section 4).
-
VLookUp: Retrieve data from another table based on a foreign key; approximates a JOIN operation.
- For example, the creation of an Encounter FHIR Profile may require populating Encounter.diagnosis.condition with a diagnosis code that is in a table named Diagnoses, with visit_id existing as a foreign key to link the tables. Your transformation would look like this:
-
Table Level Code: Execute any arbitrary Python code across all available tables and rows. This is the appropriate transformation type when generating lists of multiple data elements, which is common in FHIR arrays (eg. a Patient with multiple phone numbers or an Encounter with multiple diagnoses).
- In the following example, I am attempting to create an list of diagnosis codes for each Encounter. This can be accomplished by performing a groupby() operation in Pandas and then merge that dataset with my original source table.
-
# generate lists of diagnoses for each VisitID
array_df = my_diagnoses_dataset_schema_v0.groupby('VisitID')['ICDCode'].apply(lambda x: x.unique().tolist()).reset_index()
# merge list of diagnoses into original dataframe
output = my_encounters_table_schema_v0.merge(array_df, on = 'VisitID', how = 'left')['ICDCode']
Step 3: Create & Review Semantic Mappings x
In the context of FHIR, the creation of Semantic Mappings on the FCP entails the translation of source data to FHIR ValueSets. A semantic mapping object should be created for each Semantic Mapping transformation specified in the Syntactic Mapping created in Step 2.
-
Navigate to Data Mappings → Semantic Mappings.
-
Click Create Semantic Mapping.
- Select the Dataset and Field to Map from the relevant source dataset and column, respectively.
-
Select Custom Vocabulary as the target.
-
Select the relevant FHIR Value Set to be mapped via the Custom Vocabulary dropdown.
You can follow these instructions on how to review and approve the AI-generated recommendations for semantic mappings.
Step 4: Execute Data Harmonization via the User Interface
Syntactic mappings serve as a blueprint for transforming your source data into the FHIR model. This process will create a tabular representation of your desired FHIR data; the actual FHIR data in JSON format will be created in a subsequent step. Follow these steps to create a Syntactic Mapping to FHIR:
-
Navigate to Code
-
Select Run on the relevant Data Harmonization Code Object. Repeat this process on all Data Harmonization Code Objects related to FHIR data. Depending on the size of your source dataset, the amount of time required for the Code Run to complete will vary.
After the Code Runs complete, you can navigate to the Datasets section to review your newly harmonized dataset. The name of each harmonized dataset will reflect the target schema (eg. 'patient_fhir_v0' will be created from a schema called Patient (FHIR)).
Step 5: Generate FHIR resources via FCP CodeObject
Once your source data has been transformed into a tabular representation of FHIR data (per Step 4), you'll run an FCP Code Object specifically design to generate valid FHIR data. This Code Object can take multiple harmonized datasets as input and will create a JSON file for each row in the input datasets.
Once the Code Object completes it's run, you can inspect the contents of the dataset, which will show the files created by the code run:
Step 6: Export Dataset
The final step to perform is to export your dataset, which will save all of the newly-generated JSON files onto a specified directory on your Rhino client. To do so, simply click on the three dots in the Datasets interface. Please review this article to learn more about exporting FCP datasets.