FOURIER SERIES APPLICATION FOR MODELING “CHOCOLATE” KEYWORD SEARCH TRENDS IN GOOGLE TRENDS DATA

: In some cases of regression modeling, it is very common to find a repeating pattern. To model this, of course, the approach used must be in accordance with the characteristics of the data. The Fourier series is one of the proposed approaches, because it has advantages in modeling relationship patterns that tend to repeat, such as cosine sine waves. The Fourier series is a subset of nonparametric regression, which has good flexibility in modeling. In this study, the Fourier series approach was applied to model search trend data for the keyword "Chocolate" sourced from Google Trends. Generalized Cross-Validation (GCV) is used as model evaluation criteria. Based on the results of the analysis, the best Fourier series nonparametric regression model is obtained with the number of oscillations of 5, which is indicated by the minimum GCV value.


INTRODUCTION
The Fourier Series is one of the most widely used nonparametric regression approaches. The Fourier series was introduced by Bilodeau (1992) as a form of trigonometry polynomial which has such flexibility that it can adjust and estimate the regression curve effectively [1], [2]. The Fourier Series tends to give good estimation results when applied to a regression curve with a repeating pattern [3]- [5].
As the Fourier Series is part of nonparametric regression, it can also be used to model the relationship pattern between the dependent variable and the independent variable, whose relationship pattern is unknown [6]- [8]. This is a distinct feature of nonparametric regression [9]. In nonparametric regression, the data are expected to find their own form of estimation without being influenced by the subjectivity of the research designer [10]- [12].
In the Fourier Series, there is a smoothing parameter generally referred to as the oscillation parameter [1], [13]. This oscillation parameter is used to capture repeated changes in data patterns [14]- [16]. It becomes crucial to determine the optimal oscillation parameters [17]. If the oscillation used is too high, it will impact the number of parameters in the regression model that need to be estimated. Of course, the regression model that is formed is not parsimony [14], [18]. However, if the oscillation used is too low, then the regression curve estimation with the Fourier series will not be able to capture the local nature of the data. Several studies that examine and develop the Fourier series approach include Dani and Adrianingsih (2020) regarding Nonparametric Regression Modeling with Truncated Spline Estimators and Fourier Series [3]. Octavanny et al. (2021) modeled children ever born in Indonesia using Fourier series nonparametric regression [5]. Furthermore, Dani et al. (2022) reviewed simulation studies and applications of the Fourier Series estimator in nonparametric regression modeling [19].
This study will apply the Fourier Series approach to Google Trends data. Google as a search engine for various information that is very popular and frequently used, of course, can be used to see trends in society. Google Trends is one of the official pages from Google that captures events in the community based on search keywords.
The search keyword that will be examined in this article is "Chocolate". Chocolate is a processed food or drink that comes from cocoa beans, and can be used as a food ingredient [20]. Chocolate is usually given as gifts or parcels or expressions on days of celebration. Diksa shows that there is the existence of the keyword "Chocolate" in Google Trends searches at certain times and certain seasons [21]. Using Google Trends, [20] also show predictions of bread production from chocolate using the Double Exponential Smoothing method. This research aims to model the search trend for the keyword "Chocolate" because it is exciting and up-to-date since the data patterns formed tend to show repeating patterns. This study selects optimal oscillations using Generalized Cross-Validation (GCV).

LITERATURE REVIEW 2.1. Nonparametric Regression
Nonparametric regression is used when the form of the relationship between the dependent variable and the independent variable is unknown [7], [22]- [24]. Suppose there are paired data ( ! , ! ), where = 1,2, … , , then the nonparametric regression model is generally presented in Equation (1) as follows: where ! is the response variable, ! is the predictor variable, ! is the error which is assumed to be identical, independent, and normally distributed with zero mean and variance " , while ( ! ) is a regression function whose pattern shape is unknown [25], [26]. In this study, ( ! ) is a function of the Fourier Series.

Fourier Series
The Fourier Series is a form of a trigonometric polynomial with such flexibility that it can adjust and estimate the regression curve effectively [27], [28]. The Fourier Series is highly dependent on precisely determining the oscillation parameters [5], [29], [30]. An illustration of the repeating pattern and oscillation parameters is shown in Figure 1. Suppose given independent observations, namely data pairs ( ! , ! ) where = 1,2, … , . The relationship pattern between ! and ! s as in Equation (1). Based on Equation (1), the regression curve of ( ! ) can be estimated using the Fourier Series estimator with the cosine component as follows: If Equation (2) is substituted in Equation (1), then Equation (3) if implemented for = 1,2, … , , then we obtain in such a way that, based on Equation 3, we can write it in matrix form as in Equation (4) as follows = + (4) Where: Using the Maximum Likelihood Estimation (MLE) [31], an estimate for the parameter G is obtained, namely:

Generalized Cross-Validation (GCV)
The GCV method has several advantages compared to other methods, such as Cross Validation (CV) and the Unbiased Risk (UBR) method. Theoretically, the GCV method has asymptotically optimal properties [32]. The formula does not contain the variance " , and invariance of the transformation [12]. The GCV function for selecting optimal oscillation parameters can be shown in Equation (6). Where:

METHODOLOGY 3.1. Data and Source Data
The data used in this study is secondary data in the form of weekly data series starting from January 2019 to January 2023. The data is collected by the Google search engine via the Google Trends page. The research variables are divided into two, namely, the independent variable and the dependent variable.  The independent variables used in this study are time, = 1,2, … , and first lag. Using the first lag ( !"# ) based on the identification results on the PACF graph where there is a cut off at lag 1.

Steps of Analysis
The stages of modeling using the Fourier series approach are detailed as follows: 1. Exploring data by creating time series graphs. 2. Make a scatter diagram on each of the dependent and independent variables. 3. Modeling uses nonparametric regression with the Fourier series approach. The oscillation parameters tested are limited in this study, namely = 1,2,3,4,5. 4. Selection of the optimal oscillation based on the minimum GCV value. 5. Visualization of actual data and predictions from the best models.

RESULTS AND DISCUSSION
In this section, we will describe the analysis results and discussion of modeling the search trend for the keyword "Chocolate" using the Fourier series approach.

Data Exploration
Exploration of time series data can be used to see patterns of data changes. In general, the exploration of time series data is displayed using the time series graph presented in Figure  2. The time series graph displayed starts from January 2019 to January 2023 for each week. Based on Figure 2, it can be seen that the search trend for the keyword "Chocolate" shows a recurring pattern. This repeating pattern is fathomed to be caused by Valentine's Day in February and Christmas New Year at the end of each year.

Scatter Diagram
In conducting regression modeling, it is necessary to know in advance the pattern of the relationship between the independent variables and the dependent variable. If the relationship pattern between the independent and dependent variables is unknown, then a nonparametric regression approach can be used. Analysis of data patterns between the dependent variable and each independent variable is shown in Figure 3.

Modeling with Fourier series
In the nonparametric Fourier series regression modeling, the first step that needs to be considered is to determine the number of oscillations used. The number of oscillations will be limited to 1 to 5 oscillations. The results of the Fourier series nonparametric regression modeling are shown in Table 2. Based on Table 2, the minimum GCV value is obtained when the number of oscillations used is five with GCV value of 15647.74, with a coefficient of determination of 60.91%. As a note, the number of oscillations that are attempted can actually be more than 5, but this will directly impact the number of parameters that need to be estimated so that the resulting model will not be parsimony.
The best Fourier series nonparametric regression model with the number of oscillations used is five can be written: A graphical illustration of a comparison of actual data with predictions from the best Fourier series nonparametric regression model with five oscillations is shown in Figure 4.  Based on Figure 4, it can be seen that the comparison graph of actual and predicted data from the Fourier series nonparametric regression model tends to follow the actual data, although there are some parts that show that there are still quite high gaps.

CONCLUSION
Based on the analysis results, the best Fourier series nonparametric regression model is obtained with the number of oscillations of 5, which is indicated by the minimum GCV value and coefficient of determination of 60.91%. By looking at the comparison graph of actual and predicted data, it can be seen that the predicted results from the Fourier series nonparametric regression model tend to follow the actual data.
The suggestion that can be submitted for further research is to use other Fourier components, for example, adding sin and cos components combined. In addition, this study also could be improved further by incorporating higher lag of Y as independent variable with initial inspection of lag plots. Moreover, the addition of dummy variable for special events of outlier could be considered.

ACKNOWLEDGMENT
The author is very grateful for the facilities from the Ministry of Education, Culture, Research and Technology (KEMENDIKBUDRISTEK), especially to Mulawarman University.