In general, Data analytics is an arbitrary collection of computational methods and algorithms used to examine data sets and harvest meaningful insights from them. The term data analytics is often seen together with, or even used interchangeably with, the term Data Science. A commonly used distinction between the two terms is though that Data Science is responsible for asking questions, whereas Data Analytics provide the processes and techniques that provide answers to questions.
In ICARUS, Data Analytics is viewed from three major perspectives: descriptive, predictive and prescriptive (Deka 2016). In brief, descriptive analytics refer to methods that attempt to describe raw data and extract some form of useful information interpretable by humans. In a way, its purpose is to describe the past (what has happened and why), and as such, it is closely related to Data Mining (Han, Kamber, and Pei 2011). Predictive analytics, on the other hand, aims to forecast the future and make predictions based on discovered patterns in the given dataset. It originates from AI (Artificial Intelligence) theories and aims to unravel future events or trends. The last category, prescriptive analytics, is a relatively new field that goes beyond descriptive and predictive analytics by recommending particular courses of action that lead towards a solution. Prescriptive analytics use a combination of computational intelligence techniques, tools and procedures, applied against input from different data sets, in order to take advantage of predictions and provide useful pieces of advice.
The algorithms under consideration in ICARUS that address the descriptive, predictive and predictive analytics problems are divided into the following axes:
- Axis I: Basic Analytics, including a number of diagnostic algorithms and statistical methods, useful to extract insights from data that help the analyst understand the underlying behaviour and foresee possible patterns.
- Axis II: Machine Learning Algorithms, containing the most widely accepted techniques from the field of machine learning, such as decision trees, support vector machines or random forest. These algorithms can be employed for descriptive, predictive and prescriptive analysis.
- Axis III: Deep Learning, consisting of advanced neural networks algorithms, such as convolutional or recurrent neural networks, especially designed to efficiently process big data by using multiple (“deep”) internal processing layers. These networks are considered to be the next evolution of machine learning.
- Axis IV: Visual Analytics, belonging exclusively to the descriptive analytics framework and aiming to offer tangible insights through visuals. No algorithms are required for this process, as the outcome is usually a graphical representation of computations on selected data features and their relations. Nevertheless, the conclusions derived from this representation, can be of critical value to both the business user and the analyst, especially when the dataset at hand is characterized by high dimensionality and large volume.
In ICARUS, over 70 research papers and web sources for data analytics in aviation across the IATA NEXTT Aircraft – Passenger – Baggage – Cargo Journeys were studied, and over 10 aviation real life analytics use cases were identified. For example, the most prominent research problem related to the aircraft journey which affects a number of aviation stakeholders, seems to be the prediction of flight delays while another popular research topic, interlinked with delay times in air transport, is the accurate taxi-out time prediction, which is a significant precondition for improving the operationality of the departure process at an airport, as well as reducing congestion and excessive emission of greenhouse gases. Overall, data analytics in aviation is rising, with the aviation value chain stakeholders implementing their own, isolated solutions, which may improve particular processes, but fail to capture the whole picture and face a real obstacle when attempting to scale up and combine data sources. The main conclusions deriving from the state-of-the art analysis performed in ICARUS can be summarized as:
- The infamous “Garbage in – Garbage out” principle reflects the need to feed any data analytics process with appropriate data. Data structures, data veracity, data quality and, most of all, data availability and completeness need to be examined prior to any data analytics application. The data also significantly influence the selection of the most suitable algorithm and their pre-processing is in many cases the most demanding part of the whole process.
- Identifying the right “algorithm” is a multifaceted problem and no predefined answers can be apriori provided. Not all problems can or should be expressed as data analytics problems (solved through machine learning) since approaches that fall under the operations research category may be more suitable for certain problems.
- Each problem can be seen from diverse perspectives depending on the stakeholder who acts as the decision-maker. As in most business to business transactions, the objective of each involved party may differ, in this case resulting in a different formalisation and modelling of the same underlying processes.
- Computational needs, especially in a big data context, should be taken into consideration and appropriate infrastructures should be foreseen when developing a data analytics solution.
- The data analytics concept in an industry perspective should not be treated as a must-have because it is a trend hype, especially in a complex environment like the one formed by the interacting aviation stakeholders. This would create unrealistic expectations and would push involved parties to develop rushed solutions, hindering the adoption of a truly beneficial data analytics strategy.
- A successful data analytics approach requires time and the will to experiment, evaluation and refine the developed models.
In ICARUS, the typical data analytics workflow involves several stages, from data ingestion and cleansing to data transformation and dimensionality reduction, up until the actual data analysis and the visualisation of the results. In fact, dimensionality reduction and dataset merging are considered as part of the data preparation activities while data analysis involves the workflow composition, the selection of input datasets and the metadata annotation for the emerging application. The visualization of the results is typically associated with the chart selection, the chart configuration and the results’ visual representation.
It needs to be noted that the ICARUS data analytics approach embraces the selection of appropriate algorithms to be included in the ICARUS platform. For the analysis phase in particular, ICARUS explored several methods and algorithms that cover most of the complex aspects encountered in the aviation industry. These algorithms were selected with the following criteria in mind: a) to adhere to the ICARUS platform requirements and be applicable to aviation specific tasks, b) to have proven their ability and robustness in the research community through the years, and c) to have been implemented in a commonly used software framework or library, including: Spark MLlib, Scikit-learn, Tensorflow, Keras, H2O, Deeplearning4j, BigDL, PyTorch, Caffe, Caffe2, Apache Mxnet, Microsoft CNTK.
All algorithms have been described in detail according to the following template.
In conclusion, the ICARUS data analytics methods and algorithms are summarized in the following table.
Blog post prepared by Suite5.
Deka, Ganesh Chandra. 2016. “Big Data Predictive and Prescriptive Analytics.” In Handbook of Research on Cloud Infrastructures for Big Data Analytics, IGI Global, 30–55.
Han, Jiawei., Micheline. Kamber, and Jian. Pei. 2011. Data Mining : Concepts and Techniques. Elsevier Science.