FORECASTING THE NUMBER OF PASSENGER AT JENDERAL AHMAD YANI SEMARANG INTERNATIONAL AIRPORT USING HYBRID SINGULAR SPECTRUM ANALYSIS-NEURAL NETWORK (SSA-NN) METHOD

: Transportation was an important sector of supporting the economic growth of a country. The impact of the Covid-19 2020 pandemic at Achmad Yani International Airport in Semarang resulted in the movement of the number of passengers decreasing quite drastically, but in mid-2020 the movement of the number of passengers had slowly increased. Forecasting was done to determine the flow of movement of the number of passengers in the future using the Hybrid Singular Spectrum Analysis (SSA)-Neural Network (NN) method. The SSA method was expected to be able to decompose various patterns in the data into trend, seasonality and noise. Furthermore, the NN method was used to analyze nonlinear patterns in the data. The results based on the number of domestic air passengers data at Jenderal Ahmad Yani International Airport Semarang from January 2006 to December 2021 showed that the best method was a combination of the SSA method with a window length of 40 and the NN method with a 6-8-1 network architecture (6 input neurons, 8 hidden neurons and 1 output neuron) for the trend component, 11-15-1 (11 neurons input, 15 hidden neurons and 1 output neuron) for the seasonal component, and 10-15-1 (10 input neurons, 15 hidden neurons and 1 output neuron) for the noise component. The model produces a prediction error based on a MAPE value of 0.54% or an accuracy rate of 99.46%.


INTRODUCTION
Indonesia as a country with a large enough population, where the increasing population growth was followed by the increasing need for transportation facilities [1]. Transportation facilities had a very important role for the community because they can support all aspects of life making it easier to carry out activities between regions. One of the transportation facilities that can be used is air transportation [2]. Today there were many human activities that used airplanes. This can be seen from the increasingly crowded community activities at Achmad Yani International Airport Semarang which is a part of PT Angkasa Pura I (Persero) which is engaged in air transportation flight services [1]. Aviation services were air transportation services that were in great demand by the public because they provided a feeling of comfort, safety, and had a fairly high speed with a relatively short travel time compared to other transportation [3].
However, the Covid-19 pandemic resulted in a drastic decreasing in the number of passengers. Based on data from the Badan Pusat Statistik (BPS) of Central Java Province, the movement of the number of domestic flight passengers at Jenderal Ahmad Yani International Airport in Semarang in 2020 decreased by 46.58% or 2.3 million passengers from the previous year [5]. This decreasing in the number of passengers occurred in line with the stipulation of Government Regulation No. 21 of 2020 concerning Large-Scale Social Restrictions in the context of accelerating the handling of Covid-19 [6]. In 2022, the movement of the number of passengers will slowly start to increase, but still not showing too big a change [6]. Therefore, forecasting was very important to use to determine the flow of movements of the number of domestic flight passengers at Achmad Yani International Airport, Semarang in the coming period as a material consideration in policy making [6], [7].
Forecasting was a method used to estimate a value in the future using data in the past [8]. The development of the number of passengers who have a certain pattern of movement, will provide a separate calculation in forecasting. So a forecasting method was needed that can determine the components of the data pattern separately by decomposing it into sub patterns so that better forecasting accuracy can be obtained. The method that can be used to decompose time series data patterns was the Singular Spectrum Analysis (SSA) method [9]. SSA was a forecasting method that combinned elements of classical time series analysis, multivariate statistics, multivariate geometry, dynamic systems and signal processing. The main objective of this method was to decompose the original time series into a small number of components that can be identified by patterns such as trend, seasonality and noise [10].
In addition to the trend, seasonality and noise components, nonlinear patterns were often found in the time series data. Therefore, additional methods were needed that can capture nonlinear patterns in the data, one of which was the Neural Network (NN) method [9], [11]. This research will forecast the number of passengers on domestic flights at Achmad Yani International Airport in Semarang using the hybrid SSA-NN method. The SSA method was expected to be able to decompose various patterns in the number of passengers into trend, seasonality and noise, while the NN method was used to analyze data that has a nonlinear relationship pattern. The combination of methods carried out can increase the accuracy of the forecast results, because the combination of the two methods tends to produce better forecasts compared to using one method [9].
Research on the number of passengers was carried out by [1] to predict the number of airplane passengers at Ahmad Yani International Airport in Semarang using the Holt Winter's Exponential Smoothing method and the Exponential Smoothing Event Based method. The results of this study obtained that the Holt Winter's Exponential Smoothing method was the best method with the smallest accuracy error based on the Mean Absolute Percentage Error (MAPE) value of 5.644139%. Forecasting the number of passengers has also been carried out by (Larissa et al., 2021) [2] to predict the number of passengers at Soekarno-Hatta Airport. The results showed that the Holt's Winter Additive method was the best method based on the smallest MAPE value of 17.465%.
Forecasting using the Hybrid SSA-NN method was carried out by (Suhartono et al., 2019) [9] to predict the value of inflow and outflow fractions of currency in Indonesia by comparing the ARIMAX method with Hybrid SSA-NN. The results obtained where the Hybrid SSA-NN method provides a better forecasting value. The application of the other hybrid model with SSA has also been carried out by [11]. The results showed that the hybrid model yielded more accurate than the single model. Therefore, in this study it is hoped that the results of forecasting the number of domestic flight passengers at Achmad Yani International Airport in Semarang using the Hybrid SSA-NN method can provide more accurate forecasting results.

LITERATURE REVIEW 2.1. Forecasting
Forecasting was a way to estimate a value in the future by taking into account past and current data [10]. Forecasting was the basis for long-term planning for a company or agency. The accuracy of forecasting results will increase the chances of achieving a profitable investment. Forecasting played an important role in making a policy, whether it was effective or not can be seen from the time the policy was taken [6], [7].

Singular Spectrum Analysis (SSA)
SSA was a technique for decomposing time series data into several pattern components (trend, seasonal and noise), so that they were easier to interpret. The SSA method did not require special assumptions, so its used becomes wider. The SSA method was divided into 2 main stages, namely decomposition and reconstruction [12].

Decomposition
The parameter used in the decomposition stage was the window length (L). Parameter L had a function to determine the number of dimensions of the path matrix. The determination of the value of L was done by checking through trial and error. This decomposition stage consists of two stages, namely the Embedding stage and the Singular Value Decomposition (SVD) stage [10].
The Embedding stage was carried out by converting a time series data into multidimensional data (matrix). Suppose the time series data with length N was expressed by = ( ! , " , … , # ) where F was a time series data that was not zero or there was no missing data which was then converted into a path matrix with size × . L parameter was specified by value 2 < < The next stage was the formation of a Singular Value Decomposition (SVD) from the X path matrix. On the X matrix, the eigenvalues were determined by ! , " , … , 1 ) from a symmetric matrix = 2 where ! ≥ " ≥ ⋯ ≥ 1 > 0 with the formula | − | = 0, eigenvector ( ! , " , … , 1 ) from the S matrix corresponding to the eigenvalues [11].
was the rank of the X matrix. If the principal component is denoted by 3 , for i =1,2,…,d, then the SVD of the X path matrix could be written as shown in Equation (1), with 3 = Z 3 3 3 2 and matrix 3 had rank 1. Therefore the 3 matrix was an elementary matrix and a collection 7Z 3 , 3 , 3 9 was called eigentriple ke-i dari SVD [13].

Reconstruction
The reconstruction phase was the process of creating new time series data by grouping and diagonal averaging. The parameter used at this stage was the groupping effect (r) which functions to determine patterns in data plot. Grouping was a set of indices of {1, 2, … , } become m groups of mutually exclusive subsets denoted by ! , " … , 9 . Suppose = { ! , " , … , : }, then the resulting ; matrix corresponds to group I which was defined as ; = 3 # + ⋯ + 3 $ . This matrix was calculated for = ! , … , 9 and the expansion of Equation (1) causes the decomposition to become the Equation (2) [14], the process of selecting sets ! , " , … , 9 was called eigentriple grouping. If = and = = { }, = 1, 2, … , , then the appropriate grouping was called elementary [13].
The last step in the SSA method was to change each matrix ; ( from the grouped decomposition in Equation (2) into a new series with length N. Suppose Y was a matrix of size × with elements 3= , 1 ≤ ≤ , 1 ≤ ≤ , for ≤ . Given * = min( , ), * = max( , ), = + − 1, 3= * = =3 for < and 3= * = 3= for > . Based on these provisions, diagonal averaging was done by moving the Y matrix to the series ? , ! , … , #@! with the Equation (3) [9], Equation (3) was related to the average of the matrix elements over the anti-diagonal and so on. Note that if the Y matrix was a path matrix of several series (ℎ ! , … , ℎ # ), then 3 = ℎ 3 for all i. If diagonal averaging equation (4) was applied to the resulting ; , matrix, a reconstructed series

Neural Network (NN)
The NN method had been widely developed and could be used in predicting past patterns because of its ability to remember and made generalizations from what has existed before, its ability to learn and was immune to errors, so it could create a system that was resistant to damage and consistently works well [9]. The NN architecture that was widely used and applied was MultiLayer Perceptrons (MLP) known as Feedforward Neural Networks (FFNN). FFNN in statistical modeling could be viewed as a flexible class of nonlinear functions. A special form of FFNN with one hidden layer consisting of q neuron units and an output layer consisting of only one neuron unit with response or output values y ̂ was calculated by the Equation 5 [15]: : Neuron weights for j of the hidden layer to output layer ; ?
: Neuron bias of the output layer; ?
: Activation function in the output layer;

Hybrid Singular Spectrum Analysis-Neural Network (SSA-NN)
The application of the Hybrid SSA-NN method was carried out by decomposing a time series data into trend, seasonality and noise patterns. The results of the decomposition were forecasted using the FFNN method. Forecasting was done in aggregate, namely by summing the components that had the same pattern so that only three main patterns were formed, namely trend, seasonality and noise. If the resulting noise meets white noise, then forecasting was not necessary. The final forecasting result was the sum of all components into one time series data [9].

Model Goodness Evaluation
The evaluation of model goodness is carried out to determine how well the model performs in predicting events for several future periods. The evaluation is done by looking at the accuracy of the forecasting results based on the Mean Absolute Percentage Error (MAPE) value [16]. MAPE is the average percentage error in forecasting compared to the actual value. The MAPE value can be obtained using the Equation (7) [9].

YUNITASARI, T., ET AL
with ! : The actual value for period t; " ! : The predicted value for period t; : the number of sanpel.

METHODOLOGY
The data used in this study was data on the number of passengers on domestic flights at Achmad Yani International Airport, Semarang. This data was monthly data collected from January 2006 to December 2021 obtained from the official website of the Badan Pusat Statistik (BPS) of Central Java Province [5].
The procedure to forecasting the number of passenger at Jenderal Ahmad Yani Semarang International Airport using Hybrid SSA-NN method is as follows: 1. Describing the data of the number of domestic flight passengers at Jenderal Ahmad Yani International Airport in Semarang from January 2006 to December 2021; 2. Decomposing data using the SSA model which includes the stages of embedding, singular value decomposition, grouping, and diagonal averaging; 3. Modeling the decomposed data (trend, seasonal, and noise components) using a neural network architecture; 4. Calculating the accuracy value of the SSA-NN model's prediction results; 5. Forecasting the number of domestic flight passengers at Jenderal Ahmad Yani International Airport in Semarang for the next 12 periods.

RESULTS AND DISCUSSION
The characteristics of data on the number of domestic flight passengers at Achmad Yani Semarang International Airport from January 2006 to December 2021 can be seen in Figure 1. to 2018. It can be seen that the number of domestic flight passengers tends to increase during holiday seasons such as Eid al-Fitr, Christmas, and New Year's Eve. However, the situation changed in 2019 when the number of passengers began to decline. Angkasa Pura Corporation stated that the decrease in the number of passengers in 2019 was due to the issue of high ticket prices, which caused a decrease in public interest in purchasing them. In addition, the significant decrease in the number of flight passengers in 2020 was caused by the COVID-19 pandemic that entered Indonesia's territory.

Decomposition
The decomposition stage began with embedding. The embedding process was carried out by determining the value of the Windows Length (L) parameter with 2 < L < # " through trial and error. The data used in this study consisted of 192 observations, so the value of L ranged from 2 to 96. Next, trial and error was carried out for the values of L = 10, 20, 30, 40, 50, 60, 70, 80, and 90, and then the value of L with the smallest MAPE was selected. The results of the trial and error of L values were presented in Table 1.  Table 1 showed that the smallest value of L was 40 with the smallest MAPE value of 12.39%. Next, the value of K = N -L + 1 or K = 192 -40 + 1 = 153 was used to form a trajectory matrix with the order of L×K. Therefore, the trajectory matrix X (Hankel) can be arranged and was used to obtain eigen-triples by forming a symmetric matrix = 2 as follows.  The symmetric matrix (N?×N?) that has been obtained was used to calculate the eigentriples. the eigen-triples were used to calculate the principal component values that the result was shown in Table 2.

Reconstruction
The reconstruction stage was carried out by grouping the eigentriples into trend, seasonal, and noise components based on the Effect grouping parameter (r). The value of r was determined based on the number of eigentriples that do not reflect noise in the singular value plot. The singular value plot based on 40 eigentriples was presented in Figure 2  The singular values based on Figure 2 showed a slow decreasing pattern on eigenvectors 21 to 40 that indicating of 21 to 40 eigenvectors were grouped as noise components and affect the determination of r = 20. Furthermore, the eigenvectors that will be used to group trend and seasonal components were the first twenty eigenvectors presented in Figure 3. Based on the reconstructed series plot in Figure 3, it can be seen that the series reconstructed by eigentriples 1, 2, and 3 contain slowly varying components, so eigentriples 1, 2, and 3 were grouped into the trend component. The grouping of eigentriples related to seasonality was based on the similarity of the singular values of consecutive eigentriples. In the reconstructed series plot, the similarity of singular values results in the reconstructed series by an eigentriple having the same seasonal pattern and period as the series reconstructed by other eigentriples. Some pairs of consecutive eigentriples that have similar patterns were eigentriples 4 and 5, eigentriples 6, eigentriples 7 and 8, eigentriples 9, eigentriples 10 and 11, eigentriples 12 and 13, eigentriples 14 and 15, eigentriples 16 and 17, eigentriples 18 and 19, and eigentriples 20.
The final step in the reconstruction was diagonal averaging. Diagonal averaging was performed by summing up the reconstruction results for each component. The result of the diagonal averaging produces 3 time series data patterns consisting of trend, seasonal, and noise, as shown in Figure 4. The time series data patterns of trend, seasonal, and noise obtained from the SSA method reconstruction were then further processed using the NN method. The application of the NN method to the reconstruction results was expected to improve forecasting accuracy.

Hybrid Singular Spectrum Analysis-Neural Network (SSA-NN)
Determining the input variables for the NN architecture in each component (trend, seasonality, noise) was done by observing the stationary ACF and PACF plots. The significant lags on the PACF plot will be used as input variables for The NN architecture. The NN architecture was formed using 1 hidden layer with the tanh activation function. The determination of the number of neurons in the hidden layer was done using the cross-validation method, where the number of hidden neurons tried includes 1, 2, 3, 4, 5, 6,7,8,9,10,11,12,13,14,15. The best NN architecture was determined based on the smallest MAPE value. The training results of the NN network in the three components were shown in Table 5.
Based on Table 5, the best architecture in the trend component was the architecture with input variables of 1, 2, 4, 5, 6, and 13 and the number of neurons in the hidden layer was 8, which produced the smallest MAPE value of 1.75%. The best architecture in the seasonal component was the architecture with input variables of 1, 5, 6, 9, 10, 12, 13, 14, 15, 16, and 18 and the number of neurons in the hidden layer was 15, which produced the smallest MAPE value of 3.91%. The best architecture in the noise component was the architecture with input variables of 1, 2, 3, 4, 6, 7, 8, 9, 10, and 12 and the number of neurons in the hidden layer was 15, which produced the smallest MAPE value of 5.44%.   The forecasting results for each component's architecture were then summed up to obtain the final Hybrid SSA-NN forecast. After obtaining the prediction results using the Hybrid SSA-NN method, the prediction results were evaluated on the data of the number of domestic flight passengers at Achmad Yani International Airport in Semarang, with the MAPE value as the criteria for the goodness of the model. The comparison of prediction accuracy results can be seen in Table 6.  Table 6 showed that the prediction using the Hybrid SSA-NN method resulted the MAPE value of 0.54%. This value was lower than the SSA method. Therefore, the application of the Hybrid SSA-NN method was better used to forecast the number of domestic airline passengers at Ahmad Yani International Airport Semarang. The forecasting results using both methods were also presented in Figure 7.

CONCLUSION
The number of passengers at Achmad Yani Semarang International Airport has fluctuated due to the Covid-19 pandemic. The best model that can be used for forecasting is the Hybrid SSA-NN model with a window length of 40 and network architecture of 6-8-1 (6 input neurons, 8 hidden neurons, and 1 output neuron) for the trend component, 11-15-1 (11 input neurons, 15 hidden neurons, and 1 output neuron) for the seasonal component, and 10-15-1 (10 input neurons, 15 hidden neurons, and 1 output neuron) for the noise component. The model produces an MAPE value of 0.54% or an accuracy rate of 99.46%, which indicates that the prediction results are very good.