ICARUS aims to develop a data analytics and sharing platform for organizations in the aviation domain, that will provide a secure and trustworthy environment for the stakeholders to sell and purchase datasets, data services, algorithms or intelligence reports. Towards reaching this objective, the ICARUS Methodology was defined, based on the stakeholders needs in the aviation domain that were extracted through the requirement analysis that is presented in the ICARUS deliverable “D1.3 – Updated ICARUS Methodology and MVP”.
The ICARUS Methodology aims to provide a well-constructed and meaningful workflow that shall act as a guideline for the development of the ICARUS Platform. It consists of a set of interactive phases which illustrate how the different stakeholders interact with ICARUS. These stakeholders can be either Asset Providers (i.e. data or applications providers) who aim to utilize ICARUS offerings to sell their assets to other interested parties or Asset Consumers whose goal is to explore the ICARUS pool of assets and acquire them for improving their operations.
Asset Providers’ Perspective
For Asset Providers, the objective is to upload their own assets to ICARUS in order to perform an analysis or share them with other stakeholders that are interested in purchasing them.
Data Collection Phase
Based on the requirement analysis performed in the first months of the ICARUS project implementation, the Data Collection Phase was defined in order to satisfy the stakeholders’ needs that require data of high quality and ensuring the anonymity and security of data in an end-to-end manner. Therefore, this phase is responsible for handling the procedure in which Data Providers upload their data to ICARUS, by setting several rules, before their data are uploaded on the platform (Data Uploading) with the help of the on-premise (local) environment of ICARUS. These rules involve:
- Data Cleaning: cleaning and filtering of the data from inconsistencies and errors (e.g. removing columns with missing values)
- Data Mapping: mapping the data to ICARUS Aviation Data Model, enriching the data with aviation concepts and applying unit measurement transformations whenever necessary
- Data Anonymization: removing or filtering sensitive information for protecting the privacy of individuals described in the data (e.g. personal information of passengers)
- Data Encryption: encrypting the data in order to provide an extra layer of security, as the data can be decrypted only by the stakeholders that are entitled to use them
- Data Check-in: facilitating the registration (e.g. data profiling, licensing and access policies) of data in a way that data policy definition is compliant with the data provider’s IPR
A Simple Example
For instance, an airline aims to sell their data using ICARUS in order to increase the profit of the company. Hence, they access the ICARUS platform and select to upload a dataset sample about passengers that used multiple flights to reach a destination. After importing the dataset sample, they set specific data cleaning rules for some columns to remove inconsistencies or errors. Subsequently, they proceed with confirming the mapping to the Aviation Data Model that has been automatically calculated in the ICARUS platform. Then, they choose to hide the personal details of the passengers by selecting a generalization technique for the passengers’ unique identifiers. Moreover, they select to encrypt the columns of the airline names, the destination airports and the timestamps. Then, they proceed to define a custom proprietary data license. In particular, they set the pricing and the updating policy to “monthly” as the data will be updated per month. Finally, the data are uploaded and stored in the ICARUS repository with the help of the on-premise environment located in their local machines that is responsible for applying the Data Cleaning, Mapping, Anonymization and Encryption rules to the actual dataset.
Application Collection Phase
The ICARUS requirement analysis further highlighted the stakeholders’ need of having not only data, but also applications such as custom implemented algorithms and analytical reports. Hence, the Application Collection Phase was defined and refers to the procedure in which a stakeholder (i.e. Asset Provider) wants to create her application asset to ICARUS. This phase involves the following iterative steps:
- Application Check-In: the Application Provider defines terms and conditions of sharing and provides all the metadata that are related to the application
- Testing and Assessment: custom implemented algorithms are tested and assessed prior to their inclusion in ICARUS, based on some predefined KPIs (e.g. resources needed to run, etc.)
- Application Asset Review: conducted by a member of the ICARUS team, acting as moderator, deciding if the application asset will be approved or rejected
A Simple Example
Let’s consider the following example. An aerospace company has created an efficient algorithm for predicting the delays of flights and they would like to sell this algorithm through ICARUS. Therefore, an employee of the company accesses ICARUS and selects to create an application. He sets the analytics workflow in the Data Analytics phase that is described in the following paragraphs (or potentially selects to upload the specific executable file containing the algorithm), defines a custom proprietary license and sets the pricing. Furthermore, he provides 3 KPIs (87% accuracy, not exiting 3GB of RAM usage, 1 hour is required for training the algorithm) and a training and testing dataset about flight delays in order for the algorithm to be assessed. Subsequently, the service is tested based on the predefined KPIs of ICARUS and the KPIs provided by the company. Afterwards, the ICARUS moderator checks the information of the service, the validity and the quality of the service based on the results of the assessment. Finally, the service is approved by the ICARUS moderator and stored in the ICARUS repository.
Asset Consumers’ Perspective
For Asset Consumers, the objective is to explore the ICARUS open and private assets marketplace in order to obtain valuable assets that can improve their operations.
Asset Exploration and Extraction Phase
With the purpose of enabling the various stakeholders to identify the assets that interest them, the Asset Exploration and Extraction Phase was defined. This phase aims to satisfy the stakeholders’ requirements that were identified regarding the availability of data and applications assets, recommendations for data assets that may interest the users and exporting the assets to be used outside of ICARUS. It involves the following steps:
- Asset Searching: enables the stakeholders to search the ICARUS repository of assets by providing advanced searching functionalities (e.g. dynamic field selection, filter definition, etc.) and displaying the results in a user-friendly way
- Asset Recommendation: provides accurate suggestions for additional related data assets based on users’ preferences, users’ exploration history and the topics and metadata of data assets
- Asset Acquisition: a service that supervises the creation and execution of the sharing contracts over data and application assets, in order to facilitate and ensure the secure and trustworthy exchange of assets between asset providers and consumers
- Asset Export: provides efficient mechanisms to download in various formats the assets that a stakeholder acquired, in order for the assets to be used outside of ICARUS, if allowed by the assets’ terms in the contract
A Simple Example
For example, an aviation insurance company wants to acquire data about passengers’ injuries and aircraft accidents from various airports in order to analyze them and provide better contract terms to the companies. Thus, they search in the ICARUS repository and provide the concept “accidents in airports”. Then, they select three proprietary data assets (accidents in airports, airlines total passengers per flight, airports total passengers) that appeared in the search results. They also select another data asset related to aircraft maintenance that was recommended to them by the platform along with the search results. Then, the company requests for quotations for the data assets from their respective providers and when the providers and the insurance company both sign the respective data contract and the price is paid, the insurance company gains access to the data. Finally, the insurance company can download the data that has acquired and/or proceed to analyze them in the ICARUS platform.
Data Analytics Phase
Moreover, the Data Analytics Phase was defined in order to satisfy the stakeholders’ need of a secure private space for analyzing the data that they are entitled to use by applying various algorithms and visualization schemes. Hence, Asset Consumers can utilize ICARUS for extracting knowledge from data through this phase that involves the following steps:
- Resource Allocation: any organization is able to deploy and customize its own secure private space, by selecting the computational resources needed to perform the analysis
- Data Decryption: providing the mechanisms for restoring encrypted data into their original form, if a stakeholder is entitled to use them
- Data Linking: combining multiple data assets that a stakeholder is entitled to use through dedicated data preparation instructions.
- Data Analysis: selecting, configuring and applying various algorithms on the data assets according to the stakeholder’s needs (e.g. algorithms for classification, clustering, etc.)
- Data Visualization: providing a set of advanced visualization capabilities using different charts and plots, enabling stakeholders to better understand the data and the patterns that are derived from the analysis of the data assets
- Analytics Workflow Storing: enables stakeholders to create a report that contains all the information of the workflow that was followed in their analysis
A Simple Example
For instance, an airline company wants to analyze data that owns or acquired about flights schedules and weather conditions in order to gain insights of how various aspects of the weather affect flights schedules. These data are already stored in the organization’s assets in ICARUS and the company wants to use the data analytics functionalities of the platform. A data scientist of the company deploys a secure private space with the computational resources needed. To begin his analysis, he selects the data assets that would like to use and defines the expected analytics workflow, by selecting a linear regression model from the predefined list of algorithms provided by ICARUS in order to identify the various aspects of the weather that affect flights’ schedules. Then, he selects the visualization schemes that will present the identified aspects and creates a report that contains these analytical steps to be reused again in the future.
Added Value of ICARUS
In addition to the workflows of the Asset Providers and Consumers, ICARUS offers services for all stakeholders that are provided throughout all methodology phases, in order to satisfy the stakeholders’ requirements that were identified. These requirements are related to the need of a secure and trustworthy exchange of assets between providers and consumers and also, notifying the stakeholders for any updates that interest them. Therefore, the Added Value Services phase was defined and involves:
- Asset Sharing: an overarching service that supervises the execution of the sharing contracts over data and application assets, in order to facilitate and ensure the secure and trustworthy exchange of assets between asset providers and consumers
- Notifications: a service that provides any updated information to the stakeholders with regards to the data assets and their scheduled analytics jobs (e.g. notifications for new data asset addition, data asset request, updates on the status of a scheduled analytics job, etc.)
- Usage Analytics: a service that is responsible for collecting, aggregating and visualizing the usage of the various assets of ICARUS in order to enable the stakeholders to extract useful insights and statistics (e.g. total number of views per data asset, total number of active users, etc.).
Based on the requirement analysis we performed, and the feedback we received by key stakeholders in the aviation domain, we strongly believe that ICARUS Methodology satisfies the needs of the aviation domain, considering data security, privacy and trust. At the time being, the ICARUS beta platform has been released, but additional functionalities are under development and are going to be completed by the end of 2020. More details about the ICARUS Final Methodology are presented in the ICARUS deliverable “D1.3 – Updated ICARUS Methodology and MVP”.
Blog post prepared by UCY.