In the era of big data, data privacy is one of the difficult challenges whose complexity and importance is increasing with the new data regulations such as the European Union’s General Data Protection Regulation (GDPR). To effectively address this rising challenge, a variety of techniques are employed towards the effective data anonymization. Data anonymization can be viewed as a technique to remove an individual’s identifying information from a dataset so that the remaining data cannot be linked to that individual.
Data Sensitivity Levels
In this process, the various data anonymization techniques are applied on the variables of a dataset which they have to be categorised firstly in one of the four possible categories: a) the insensitive variables that require no modification, b) the identifying variables that must be removed, c) the sensitive variables that can be kept provided that they should be protected by various privacy models and d) the quasi-identifying variables that should be transformed by various transformation models in order to be included on the final dataset.
The ICARUS Data Anonymization Process
Within the context of ICARUS, the Anonymization method that is provided to the data providers aims at addressing the problem of data privacy protection by providing the customisable process that can be appropriately configured depending on the nature of the data to be anonymized, as well as on the privacy threat that needs to be properly eliminated. In this sense, the provided method enables the data provider to design and execute an anonymization workflow that is highly customisable to cover the complete spectrum of aviation related data across a wide range of aviation-related data analysis cases that cover the needs of aviation industry stakeholders.
Within this process, the data provider is able to select data attributes of the provided dataset and specify the category of each variable from the four categories explained above. For each data attribute, the data provider is able to select and configure the respective privacy models that will be used in the data anonymization process from an extensive list of available privacy models depending on the nature, the structure and the actual content of the dataset. Finally, the data provider is able to select the transformation models that will be utilised in order to eliminate the privacy threats that were identified by the applied privacy models. Furthermore, an assessment of the re-identifications risks is provided to the data provider once the designed anonymization workflow execution is completed.
Anonymization Challenges
Knowing your Data
One of the major challenges in data anonymization is the importance of having a deep understanding of the privacy concerns and vulnerabilities of the information included in the dataset. This prior knowledge is a prerequisite for the selection and parameterization of the anonymization process. While there is a large variety of privacy models that can be exploited to identify any privacy threat and an additional large list of transformation models that can be utilised to eliminate these threats, not all models are applicable on all cases or contexts. On the contrary, each available model is designed and is applicable on specific cases and contexts depending on the nature of the information included in the dataset. Hence, in order to be in position to select the appropriate models, deep understanding of the information included in the dataset as well as the anonymization complexities and legal implications is mandatory.
In the context of ICARUS, the data provider holds a key role in the data anonymization method which provides the means to the data provider to configure and execute the tailored to his/her needs anonymization process. The data provider, as the owner of the data should deeply comprehend the privacy concerns and vulnerabilities of his own data and is accountable for selecting the appropriate parameters for the anonymization workflow (to be followed prior to making a dataset available to other ICARUS stakeholders). Hence, the ICARUS anonymization method provides the required toolset to the data provider without enforcing the usage of specific models or providing any assertions regarding the information disclosure risks that may be caused by improperly anonymized data. The data provider is expected to have in-depth knowledge of the information included and is responsible to configure the anonymization workflow based on this knowledge.
Privacy vs Utility
Another challenge is what is called the “privacy vs. utility tradeoff”. While there is an extended list of privacy and transformation models that can be parametrized based on the needs of the data provider, their selection and employment should be performed wisely. If a dataset is perfectly anonymized, there is no risk in identifying an individual from that data, but that data also might (and probably will) be useless. Hence, the transformed data after the anonymization process needs to actually be useful as well. However, ensuring anonymity usually requires sacrificing utility. Hence, to achieve the right balance the anonymization process should be configured taking into consideration both the nature of the data included in the dataset, as well as the way their anonymized version is expected to be used in order to provided results with the proper quality.
Anonymization is one the three key mechanisms adopted in ICARUS for the safeguarding of data. If you want to have an overview of the overall ICARUS Data Safeguarding approach, read our relevant blogpost.
Blog post authored by UBITECH.
Image by Pete Linforth from Pixabay