Safeguarding the private or confidential (personal or corporate) data is one of the most critical challenges that all technological artefacts face due to the numerous technical, legal, organizational and ethical issues arising (among others). In this sense, data safeguarding is considered as one of the essential elements of all technological platforms that must ensure the protection of the underlying data assets and the affected entities from any potential abuse, harm and neglect. To this end, different techniques and measures that range from data encryption, anonymization and isolation to reliable access control, are employed.
The ICARUS platform adopts the security by design principle in order to ensure trust in the aviation data value chain. Towards this aim, in ICARUS data safeguarding is considered as a critical aspect of the platform’s design tailored to the aviation stakeholders’ needs and is viewed under a 3-fold perspective consisting of Access Control, Data Encryption and Anonymization as depicted in the following figure.
Data Safeguarding with Access Control
Access control is a generic term that denotes the selective restriction of access to critical or valuable resources, encompassing authorization mechanisms (in a narrower view) and authentication mechanisms (from a broader view). In ICARUS, the designed Data Access Control method facilitates the declarative and deterministic definition of authorisation policies for permitting or denying access requests to any data asset available in the ICARUS platform, in real-time. The Data Access Control methods effectively manages the whole policy lifecycle towards the aim of preventing: (a) unauthorized disclosure to private data assets (confidentiality) and (b) any intentional or accidental unauthorized changes to data assets (integrity).
Attribute-based Access Control (ABAC)
The ICARUS access control mechanism adheres to the Attribute-Based Access Control (ABAC) paradigm with policies based on the XACML standard that enable the data providers to secure and share their data assets without any prior knowledge of the potential individual data consumers, while also providing a proper separation of concerns between policy specification and policy enforcement that is effectively pursued with dynamically enforced arbitrary attributes in the policies. In brief, the ICARUS data access control policy consists of the six phases as illustrated in the following figure:
Taking into account the XACML data flow defined in (OASIS, 2018), the ICARUS data access control method is built on the following main functional points: the Policy Enforcement Point (PEP), the Policy Decision Point (PDP), the Policy Information Point (PIP), and the Policy Administration Point (PAP), which function together to provide access control decisions and policy enforcement.
ICARUS Data Access Control Workflows
The following figure illustrates the basic workflows of the ICARUS Data Access Control.
Workflow I: At data asset check-in time (Related Policy Lifecycle Phases: I and II)
In the PAP, the data provider defines the policies that are related to the data asset that is checked-in in the ICARUS platform at the given moment (i.e. “no airline will access the data asset”). The PAP checks the consistency of the policy, converts it into an XACML canonical form and stores them in the PIP. The PIP stores the arbitrary attributes of the policies defined by the respective data provider.
Workflow II: At data query time (Related Policy Lifecycle Phase: III)
When a data consumer implicitly requests access to a data asset, prior to returning the specific data asset in the list of results in the platform, the policies associated to it need to be resolved and access needs to be granted or denied. A request for access to the respective data asset is created to the PEP that transforms it in an XACML canonical form and forwards it to the PDP. The PDP collects the policies related to the specific data asset from the PIP and the values of the related attributes. Finally, the PDP evaluates the policy and returns the response context (including the authorization decision) to the PEP. The PEP fulfils the obligations by permitting or denying access to the data asset.
Additional, similar workflows are supported: (i) when a data provider packages and reuses the policies defined for a data asset for other data assets (Related Policy Lifecycle Phases: IV and II), (ii) when a data provider updates the policies associated to his/her data asset (Related Policy Lifecycle Phases: V and II), and (iii) when the policies related to a data asset are to be deleted, either 1 by 1 or at batch level.
Data Safeguarding with Encryption
The purpose of the ICARUS Data Encryption method is to ensure that the data assets are securely transmitted: (a) from the data providers’ premises to the ICARUS platform and (b) from the ICARUS platform to the data consumer and the ICARUS secure experimentation spaces, as well as (c) stored in the ICARUS platform, without any alterations and only the authorized data consumers (who have an active data contract in the case of private data) shall be able to access and use the data asset. The ICARUS Data Encryption method employs a dual encryption approach in which:
- A symmetric key encryption (using the AES256 symmetric key encryption algorithm) is utilised as the most efficient solution in order to encrypt the data assets ensuring the high performance of the platform without compromising the security level of the data assets
- Secure SSL handshakes are performed in order to share the symmetric key between: (a) the data provider and the data consumer, and / or (b) the data provider and the secure experimentation space of the data consumer in the ICARUS platform.
ICARUS Encryption-Decryption Workflow
In ICARUS, the encryption – decryption workflow is composed by the following three phases (as depicted also in the following figure):
Phase I. Symmetric Key Encryption:
During the data check-in time, the data provider decides whether and which columns (expect from the ones with temporal and spatial non business-critical information that cannot be selected so as to ensure the seamless operation of the platform) of a data asset will be encrypted prior to being uploaded to the platform. Following the data provider’s instructions, the data asset is actually encrypted at the premises of the data provider with the help of a locally generated symmetric key. The emerging data ciphertext is then transmitted and stored in an encrypted form in the ICARUS core platform. This ensures that the actual data always remain private (even in the extremely unlike situation that the ICARUS platform was corrupted from internal and external attacks) as the platform is unaware of the secret key and cannot decrypt the ciphertext.
Phase II. Access to Data Ciphertext
At the point that a data consumer, who is confirmed to be eligible to access and buy a data asset according to the ICARUS data access control method, expresses his interest and requests to download a specific data asset, the platform cross-checks whether an active contract between the data provider and the data consumer for the specific data asset is in place and upon verification, then access to the part of the data asset ciphertext that has been bought is granted to the data consumer. In the case where no active contract exists, the data consumer is obliged to formulate a purchase request through the ICARUS Data Assets Brokerage framework.
Phase III. Symmetric Key Decryption
Once the data consumer has gained access to the requested data asset ciphertext from the ICARUS platform and the ciphertext has been downloaded locally, a request is created to the data provider for the decryption key for the specific data asset cyphertext. The data provider, upon successful validation and active contract verification, shares the symmetric key (for the data asset decryption) via an established secure SSL-enabled connection to the data consumer. In the end, the data consumer utilises the decryption key to decrypt the data asset and properly access the underlying data.
Data Safeguarding with Anonymisation
Data anonymisation is the critical process in which sensitive data that are included in the datasets which contain not only personal information but also confidential, business and / or private information are safeguarded. Within this context, the ICARUS Anonymisation method addresses the problem of data privacy protection by providing a customisable process that can be appropriately configured depending on the nature of the data to be anonymized, as well as on the privacy threat that needs to be properly eliminated. Towards this aim, it supports a generic-enough anonymization workflow that covers the complete spectrum of aviation related data across a wide range of aviation-related data analysis cases.
In the ICARUS Anonymisation method the data provider holds a key role as he/she is the only one that both deeply comprehends the privacy concerns and vulnerabilities of the data. The method is tailored based on his/her input for the appropriate parameters of the anonymisation workflow which is configured and executed prior to making a dataset available to other ICARUS stakeholders.
ICARUS Anonymisation Workflow
Hence, in the ICARUS perspective, the process of Data Anonymisation for the safeguarding of sensitive data includes the following steps:
Step 1. Definition of the attribute types
The data provider is upon selecting the data asset to be anonymised is responsible for defining the attribute type of all the fields included in the data asset. In accordance with the data anonymisation techniques, there are four types of attribute types in respect to privacy issues: (a) insensitive variables, which can be kept unmodified, (b) identifying variables, which are variables that must be removed from the data set as they pose a high risk of re-identification, (c) Quasi-identifying (QID) variables, which are variables that can be used directly for re-identification, but they may in combination be used for linkage and must be transformed and (d) Sensitive variables which can be kept as-is, but they can be protected using privacy models. Their categorisation serves as the basis for the next step in which the privacy models will be configured.
Step 2. Selection and configuration of the privacy models
The data provider is upon defining the attribute types of the fields of the data asset, selects and configures the appropriate privacy model(s) that will be used in the process in order to identity the privacy threat that needs to be eliminated. A variety of privacy models is supported depending on the nature, the structure and the actual content of the dataset that can be ground into: (a) syntactic privacy models (such as k-Anonymity, l-Diversity, t-Closeness, δ- Disclosure, β-Likeness and δ-Presence), (b) Statistical privacy models (such as k-Map, Average Risk, Population Uniqueness and Sample Uniqueness), (c) Semantic privacy models (such as Profitability and Differential Privacy). The data provider selects the appropriate privacy model for each field and is able to customize it based on the corresponding model parameters.
Step 3. Selection and configuration of the transformation models
The data provider selects the transformation models that will be utilised in order to eliminate the privacy threat that were identified in the previous step. For each field of the data asset that was classified as a Quasi-identifying variable, the corresponding model is selected and configured. By transforming the specific fields the sensitive variables are also addressed. As with the rest of the models, each transformation model is customisable via a model-specific set of parameters. The list of supported models includes, but is not limited to, the Value Generalisation, the Random Sampling, the record, attribute and cell suppression, the micro-aggregation and the categorisation.
Step 4 (optional). Assessment of the re-identification risks
The data provider can optionally explore the outcome of the executed data anonymisation workflow in terms of the privacy threat risks. In this assessment a variety of highly data- and domain-dependent methods is applied with various statistical comparisons between the input and output data. During this assessment, the data provider is presented with sample- based and population-based risk estimates in order to ensure the desired level of privacy risks has been achieved and is able to identify the risks related to the quasi-identifiers after the execution of the designed workflow.
Blog post prepared by UBITECH.