A Study of Terminology Mapping in SALUS Project

 

In SALUS project, the clinical conditions are coded in ICD-9-CM and ICD-10-GM in our EHR sources. The safety analysts who would like to carry out post-marketing safety studies would like to present their research questions in MedDRA terms to indicate possible adverse drug events (ADE) as conditions. In addition to this, SALUS provides tools to enable reporting ADEs as individual case safety reports through ICH E2B(R2) standard[1]. The E2B(R2) reports also necessitates the clinical conditions to be represented in MedDRA terminology. Hence we need mappings between MedDRA and ICD-9-CM and ICD10-GM.

For enabling this, we searched  but couldn’t find direct mappings between MedDRA and ICD codes that are publicly accessible. Instead we have utilized a number of different resources providing mappings between ICD codes and SNOMED CT, and also between SNOMED CT to MedDRA to enable this.

  • OMOP project  provides mappings of a selected subset of ICD9CM and ICD10CM codes to SNOMEDCT Clinical Findings. OMOP project has a very similar objective with SALUS project, the aim is to map the ICD codes  used to code clinical conditions in EHR sources to SNOMEC CT codes, as SNOMEC CT codes are used as pivot terminologies through which statistical analysis is carried out.
  • IMI PROTECT project created an ontology called OntoADR[2], which also presented the correspondence between MedDRA and SNOMED CT codes.
  • US NLM provides mapping between SNOMED CT to ICD-10 to support semi-automated generation of ICD-10-CM codes from clinical data encoded in SNOMED CT for reimbursement and statistical purposes. This is a result of CrossMap Project by IHTDSO and WHO

First of all we represented SNOMED CT clinical findings codes, ICD-9-CM, ICD-10, ICD-10-GM and MedDRA codes as skos:concepts. We have also represented the hierarchical relations in these terminologies through skos:broader. We decided to represent the mapping relationships through SKOS properties as “skos:exactMatch” “skos:broadMatch” or “skos:narrowMatch”.

  • We represented the mappings provided by ontoADR ontology between SNOMED CT and MeDDRA codes through the “skos:exactMatch” property
  • The mappings provided by OMOP project are maintained in a database, and there is no indication about the type of the mapping in SKOS terms, i.e. it is not possible to automatically decide whether the mapping type is “skos:exactMatch” “skos:broadMatch” or “skos:narrowMatch”. In the first attempt we assumed that in the OMOP mappings ICD-9-CM codes are either more specific or equal to SNOMED CT codes, and created a new property with this semantics as “salusc:exactorNarrowMatch”.
  • The original CrossMap mappings are expressed in spread sheets, where SNOMED CT codes are mapped to ICD-10 codes with additional context information represented through custom rules. We have tried to automatically interpret these context information to create direct mappings between SNOMED CT codes and ICD-10 codes represented through skos:exactMatch when possible, and to create one to many mappings between  SNOMED CT codes and ICD-10  through salusc:exactorNarrowMatch by changing the direction of mapping based on the context information.

Based on these partial mappings, through rules we have implemented we have calculated the mappings between ICD-9-CM to MedDRA and ICD-10-CM to MedDRA, and made them available from SALUS Terminology Server.

Problematic Patterns with SKOS Mappings

After manually analyzing some of the terminology mappings, we realized that there are clinically incorrect mappings. By analyzing the errors, we discovered that as most of the SKOS mappings can be considered as transitive and bidirectional after certain inference (please see this document for a detailed analysis), the mappings therefore may bring assertions that a mapping creator may not be aware of. Furthermore, those assertions may also conflict with existing semantic or mapping relations.

fig1

Figure 1. Basic Problematic Pattern I

Figure 1 shows the basic problematic pattern. Problematic relations, as displayed in red dotted line in the figure, can be inferred from the mapping relations. If such an inferred relation is not stated in (or cannot be inferred from) that concept scheme, then inferring such a relation via SKOS mapping relations is considered as vocabulary hijacking. In addition, if such an inferred relation is contradictory to any existing relation, e.g. the semantic relations displayed as black dashed line in Figure 1, it would be considered as a conflict.
fig2

Figure 2. Basic Problematic Pattern II

Furthermore, problematic relations can also be inferred from the combination of semantic relations and mapping relations. In Figure 2, it is possible to infer a relation between A2 and B1 based on the semantic relation between A2 and A1 and the mapping relation between A1 and B1. If this inferred relation is contradictory to the mapping relation between A2 and B1, it would be considered as a conflict as well.

SKOS Mapping Validation Rules

We have developed a set of SKOS mapping validation rules to detect problematic patterns in SKOS mapping, so that a mapping creator could validate these problematic patterns, and eventually improve the quality of their mappings. The validation rules are published at this link, accompanied with an explanation document. The rules are expressed in N3 format and executed by Euler YAP Engine (EYE). Figure 3 shows 7 basic problematic patterns that the validation rules would check, sub-patterns of the listed basic patterns are defined, depending on the conflict categories.
fig3

Figure 3. Basic problematic patterns

Validation Results

We have applied the validation rules to the mapping files used in the SALUS project, the validation results are exhibited in Table 1.

Table 1. SKOS Mapping Validation Results

table1

The column ‘Detected Patterns’ in Table 1 shows the number of detected problematic patterns. It can be observed that there are many problematic patterns detected from the Crossmap. This is because wrong assumptions we made in interpreting the original mappings stored Excel style sheet to RDF by assigning either skos:exactMatch and salusc:exactOrNarrowMatch property which are transitive. A problematic mapping can cause multiple problematic patterns, especially when the relations in the target/source vocabularies are complex, this explains why there are 104,761 problematic patterns detected from 16,710 skos:exactMatch mappings.
The manually created OntoADR mapping exhibits much better quality, nevertheless, there are still 1,790 problematic patterns detected. The OMOP mapping is expressed in non SKOS mapping property, through salusc:exactOrNarrowMatch, it is therefore cannot be validated by our SKOS validation rules. Nevertheless, by manual check from clinical expert, there are also errors detected.
In the MedDRA-ICD10 mappings, only a small subset uses SKOS mapping property. There are also many problematic patterns detected on those mappings.

An extract of OntoADR validation result (problematic pattern) is shown below:

{<http://purl.bioontology.org/ontology/MDR/10000358> (Accelerated hypertension) skos:exactMatch <http://purl.bioontology.org/ontology/SNOMEDCT/70272006> (Malignant hypertension).
<http://purl.bioontology.org/ontology/SNOMEDCT/70272006> (Malignant hypertension) skos:exactMatch <http://purl.bioontology.org/ontology/MDR/10025600> (Malignant hypertension)
} a validation:Pattern2VocabularyHijacking.

Summary

Although mappings between different terminologies exist from external resources like OMOP or CrossMap Projects, these mappings do not clarify the relationships of the stated mappings. The problematic patterns we detected on the aforementioned mappings are caused by wrong interpretations of these mappings. Furthermore, it seems impossible to correctly assign the listed mappings with appropriate SKOS mapping relationships in an automated way at this moment. Our understanding of SKOS terminology mapping is summarized as follows:
  1. It is difficult to interpret the existing mappings into correct SKOS mappings. In order to make the existing mappings reusable over the semantic web, it is extremely important that the communities who created the mappings also provide their mappings in RDF (e.g. SKOS) using standard ontology to represent their mappings. By using RDF properties, the mapping relations are more explicitly stated compared with text description. It is also important that the mapping owners explicitly state the usage scope of their mappings. On the other hand, it is difficult to have mappings expressed as what expected as aforementioned: A mapping is not always a simple 1-1 relation between the source and target vocabulary; 1-n mapping and n-1 mapping also exist. Meanwhile, some mappings require additional conditions to conduct a mapping. For example, in the original mappings of the Crossmap file, around 25% of their mappings are associated with conditions. Those mappings are difficult to be expressed by SKOS mappings, and would better be treated as mapping rules.
  2. The SKOS mapping may inject additional relations to the original vocabularies, which in most cases the mapping creator is not intends to. Those inferred relations may conflicts with the relations stated in the original vocabularies. In addition, established mappings still require updates when related terminologies evolve. It is therefore important that the quality of the mappings can be assessed and improved continuously. We have developed a set of SKOS mapping validation rules so as to assess the quality of the SKOS mappings automatically.
  3. Providing the above mentioned challenges are met, it is possible to create a terminology server to provide terminology mapping. However, a SKOS terminology server still cannot be used as a generic terminology server (in the sense to link two terminologies via a third terminology, e.g. mapping MedDRA to ICD-10 via SNOMED). This is because the SKOS Mapping properties are designed to be non-transitive (except skos:exactMatch). Introducing own mapping properties, e.g. salusc:exactOrNarrowMatch, would take the risk of hiding and propagating errors existed in the mappings, as we already discovered.
In SALUS Project, we will investigate to repair the mapping relations in the terminology server. The mapping relation, skos:exactMatch and salusc:exactOrNarrowMatch, in the current mapping files will be replaced with weaker statements, such as skos:related. In addition to this we aim to preserve the original semantics of the mapping relations provided by external resources, by creating project specific mapping relationships. Based on this, through rule based reasoning, our aim is to end up with skos:relatedMatch relations, which will be used SALUS pilot applications. When needed the source codes of the clinical statements and the mapping relations will be available to clinical researchers. We are also pursuing other opportunities to create direct mappings between ICD codes and MeDDRA through UMLS mappings. The results will be separately reported in our blog.
In addition to this, we will experiment to include terminology mapping in the conversion rules which convert ORBIS content data entities to SALUS domain entities (expressed with SALUS reusable ontology). Therefore, it is not terminology-to-terminology mapping, but terminology-to-entity mapping. AGFA will test ICD10GM-to-SNOMED mapping, in the subset of IHD (ischemic heart disease) as use case. The mapping is manually created, this manually created mapping will be expressed as N3 rules and can cope with complex mapping relationships.

[1]ICH guideline E2B (R2), Electronic transmission of individual case safety reports – Message specification (ICH ICSR DTD Version 2.1), Final Version 2.3, Document Revision Feb. 1, 2001.

[2] Declerck, G., Bousquet, C., Jaulent, M.-C. Automatic generation of MedDRA terms groupings using an ontology, 24th European Medical Informatics Conference (MIE 2012), August 2012, Pisa