ICARUS Architecture (Photo by Dawid Zawiła on Unsplash)

The aim of the ICARUS platform’s architecture is to provide a big data-enabled platform that aspires to become an “one-stop shop” for aviation data and intelligence marketplace that provides a trusted and  secure “sandboxed” analytics workspace. The architecture of the platform was designed by conducting a thorough analysis of the technical requirements that were elicited from the user requirements and user stories collected in a close collaboration between the demonstrator and technical partners of the consortium. During the design process, concerns and decisions were weighted, and the stakeholder requirements were constantly validated against the design taking into account the MVP activities.

To this end, the architecture of ICARUS is composed by a set of key components that are built on top of efficient and state-of-the-art big data infrastructures, technologies and tools. It is a modular architecture that provides enhanced flexibility in order to adapt and connect the various components that will be implemented as software modules that were designed with the aim to maximise the benefits of combining multiple technologies and tools in order to realise the aspired offerings of the platform. The major focus was on the functional decomposition, the strict separation of concerns, the dependencies identification and especially the data flow. Each component has been designed with the aim of delivering specific business services with a clear context, scope and set of features. The architecture of ICARUS ensures the offering of a scalable and flexible environment that will enable the interoperability of the various components that facilitate the execution of big data analytics and sharing of data through secure, transparent and advanced functionalities and features. The designed architecture incorporates the entire lifecycle of the platform that spans from data preparation and data upload, to data exploration, data sharing, data brokerage and data analysis.

The architecture of the ICARUS platform is conceptually divided in three main tiers, the On Premise Environment, the Core ICARUS platform and the Secure and Private Space. Each tier is undertaking a set of functionalities of the ICARUS platform depending on the execution environment and context. The ICARUS platform architecture is illustrated in the following figure.

Figure 1: The ICARUS high-level architecture

The On Premise Environment is composed by multiple components that are running on the data provider’s environment with the main purpose to prepare the data provider’s private or confidential datasets in order to be uploaded in the platform following the instructions set by the data provider in the Core ICARUS platform. To facilitate the data preparation, the Master / Worker paradigm is utilised. More specifically, the OnPremise Worker running on the On Premise Environment receives a set of instructions from the Master Controller running on the Core ICARUS platform in order to perform a set of tasks by utilising the set of components running on the On Premise Environment for each specific task. The Cleanser provides the data cleansing functionalities of the platform incorporating a set of techniques for performing simple and more advanced cleansing operations over datasets with regards to data validation, data cleansing and missing value handling. The Mapper provides the harmonisation process of the platform in which the user defines the mapping of the fields of the dataset to the ICARUS common aviation model in a semi-automatic way, while also enabling the exploration of the ICARUS common aviation model from the user in order to provide suggestions for possible extensions. The Anonymiser provides the data anonymisation process of the platform in which the privacy issues and protection of sensitive information are addressed with a variety of anonymisation techniques that eliminate the disclosure of private, sensitive or personal information. The Wallet Manager is the component that handles all blockchain-related operations in the context of the On Premise Environment by interacting with the blockchain to report on the validity of smart contracts and informing the Decryption Manager whether a request for data access should be granted or denied based on the status of the corresponding smart contract. The Encryption Manager provides the encryption process of the platform that enables the encryption of the data provider’s dataset with the relevant encryption mechanism that produces the encryption key and encrypted ciphertext. Additionally, it enables the dataset sharing with the generation, management and secure transmission of the appropriate decryption keys. The Decryption Manager provides the mechanism for the decryption of the dataset on the On Premise Environment when an encrypted dataset is downloaded locally, provided that a valid smart contract exists permitting this operation. It interacts with Encryption Manager of the data provider via the Key Pair Administrator in order to obtain the decryption key that is used for the decryption of the obtained data asset.

The Core ICARUS platform is composed by multiple interconnected components running on the platform’s infrastructure. It performs all core operations of the platform while also orchestrating and providing the instructions that are executed by the On Premise Environment and the Secure and Private Space. The Master Controller is the component responsible for compiling a set of instructions for the execution of specific jobs or tasks, as provided by the components of the Core platform, and for providing this set of instructions for local execution to the workers running on the On Premise Environment and the Secure and Private Space, namely the OnPremise Worker and the SecureSpace Worker. Furthermore, it transfers the list of selected encrypted datasets to the Secure and Private Space and supports the uploading of the encrypted analysis results from the Secure and Private Space to the Data Handler. The Data Handler provides the various services responsible for tasks related to making data available from and to the platform, as well as among different platform components. It supports the uploading proprietary and open datasets to the platform in order to be stored in the platform’s storage, downloading datasets from the platform to the end user’s On Premise Environment and/or to a Secure and Private Space, and finally the uploading of data generated in a Secure and Private Space back into the core platform’s storage. It interacts with the local running instance of the Mapper in order to perform the harmonisation process instructions for the open data sources in the same manner as it performed on the On Premise Environment for the private and confidential datasets. The Data License and Agreement Manager is implementing the blockchain functionalities of the ICARUS platform which hosts a local blockchain node, providing all the necessary processes for the creation, negotiation, review and acceptance or decline of the data sharing agreements in the form of smart contracts. Moreover, it allows the users to define IPR related attributes, pricing terms and policies, as well as the license for the datasets they own in the platform.

The Key Pair Administrator is performing the signalling operations for the exchange of the decryption key between the data provider and the data consumer, while also orchestrating the revocation process of any key when needed. The Policy Manager is the component providing the authorisation engine that implements the access control mechanisms within the platform. The purpose of the Policy Manager is to provide the logical access control that prevents the unauthorised access of any type of resource of the platform such as data, services, tools, any kind of system resources, as well as all other relevant objects by providing the authorisation engine that is based on the ABAC model and incorporates the required authorisation XACML-based policies that are evaluated in order to the access control decisions. The Storage and Indexing is the component that enables effective and efficient storage and maintenance of large, complex and unrelated datasets within the platform, as well as the flexible and high-performance indexing of the stored datasets. The Query Explorer encapsulates the intuitive environment that facilitates a data query definition with enhanced functionalities such as dynamic field selection and filter definition with the offering of: (a) a graphical interface for users to search for datasets and view the search results and (b) a service that translates each search to a query that can be processed by the storage and indexing component while also supporting the retrieval and display of the proper recommendations through the Recommender. The Recommender is providing the enhanced recommendation functionalities that enable the dataset exploration and discoverability with recommendations and suggestions for additional related datasets that can be explored or utilised during the search and query process.

The Analytics and Visualisation Workbench is providing the environment where the users of the platform are able to design, execute and monitor the data analytics workflows and also where the visualisation and dashboards are displayed. The users are able to select an algorithm from the extended list of supported algorithms and set the corresponding parameters according to their needs in order to formulate the instructions that will be executed within the Secure and Private Space with the use of the Master Controller and the SecureSpace Worker. Finally, it offers the advanced visualisation capabilities of the platform with a variety of visualisations that can be combined in order to form dynamic dashboards upon the user needs. Through the Analytics and Visualisation Workbench the users create an ICARUS application, which contains the list of datasets that were selected for analysis, as well as the algorithm along with the corresponding parameters, and store it in the BDA Application Catalogue. The BDA Application Catalogue implements a repository of the ICARUS applications in order to be stored, retrieved, modified and loaded in the Analytics and Visualisation Workbench by the users at any time. The Notifications Manager is responsible for providing the updated information, in the form of notifications, to the users with regards to the addition or update of the datasets or the status scheduled analytics jobs. The Usage Analytics component is responsible for providing the tools that collect, analyse and visualise the usage of the various services and assets of the platform in order to extract useful insights and statistics by recording the user’s behaviour in various levels providing usage information to both the users and the platform administrator. The Resource Orchestrator is enabling the provisioning and management of the Secure and Private Space. More specifically, the Resource Orchestrator is able to connect to the virtualised infrastructure (e.g. OpenStack)  in order to perform monitoring and management of the available resources, to allocate and release the resource in the corresponding virtual machines, as well as deploy and manage the containerised applications or services running on the virtual machines. Finally, the Resource Orchestrator is performing enhanced service discovery and monitoring, as well as health checks on the services or applications running on the virtual machines.

The Secure and Private Space contains a set of interconnected components that constitute the advanced secure and trusted analytics execution environment of the platform. The designed analytics workflow in the form of an ICARUS application is translated into a set of instructions which are executed by the responsible deployed components. The SecureSpace Worker running on the Secure and Private Space receives a set of instructions from the Master Controller running on the Core ICARUS platform in order to perform the specified jobs with the use of a set of components running on the Secure and Private Space. The local running instance of the Decryption Manager undertakes the decryption of the dataset on the data consumer side by performing the data consumer’s identity verification, the request for the decryption key exchange and eventually the decryption of the encrypted dataset via the dedicated decryption mechanism on the Secure and Private Space. The Jobs Scheduler and Execution Engine is the component in charge of initiating, executing the analytics jobs as provided by the Analytics and Visualisation Workbench, as well as of managing the resources available to the Execution Cluster nodes in the context of a Secure and Private Space. The Execution Cluster is the cluster-computing framework of the platform that is deployed within the Secure and Private Space offering the powerful processing engine that enables the data analysis execution. The results of the analysis are passed to the Encryption Manager in order to be encrypted before they are securely transmitted and stored in the Core ICARUS platform.

 

Blog post prepared by UBITECH.

 

Featured Photo by Dawid Zawiła on Unsplash