1 Introduction
Most of the various natural evolutions presented in nature are in a nonlinear state, such as meteorology [1], solar activity [2], hydrology [3], and DNA sequence [4]. These chaotic systems contain rich dynamic information, and they will evolve with time in any dimension to form chaotic time series data. Mining these data can obtain valuable results such as future value prediction and past law summary, and more and more scholars have applied the results to the natural and social sciences.
Time series prediction is often predicted by statistical methods, including the auto regression (AR) model, moving average (MA) model, auto regression moving average (ARMA) model, and auto regression integrated moving average (ARIMA) model [5]. Initially, the time series is decomposed into distinct components, followed by an analysis of their post-decomposition changing patterns. Subsequently, historical data is utilized for future forecasting purposes. Its essence is to reuse the data itself to find the recursive law. However, the statistical method only obtains precision results for predicting a linear time series and is poorly fitting for nonlinear data such as chaotic data [6].
As Rumelhart et al. proposed the backpropagation algorithm of multi-layer perceptron and used the sigmoid function for nonlinear mapping, it triggered an upsurge in using neural networks for nonlinear fitting [7]. Han et al. proposed a new chaotic time series model and prediction method based on recurrent neural network (RNN) prediction to realize the prediction of chaotic time series [8]. This method implements long-term prediction through accurate multi-step prediction. Its essence is an extended algorithm of adaptive backpropagation, embodied in the recurrent prediction neural network composed of nonlinear function nodes, and its output is only connected with the input of itself and its subsequent nodes. In 2007, Herrera et al. used the TaSe fuzzy model and least squares support vector machine for long-term time-series prediction based on an iterative prediction strategy [9]. Wang et al. established an online support vector machine model to predict the concentration of air pollutants [10]. However, the above traditional neural networks have the problem that they cannot identify chaotic time series in a high-dimensional state. With the rapid development of deep learning, models with high-dimensional nonlinear fitting have been proposed. Sangiorgio et al. further verified that a long short-term memory (LSTM) network is robust and has a good prediction effect in chaotic time series prediction [11]. Li et al. built an LSTM prediction model based on chaos theory and empirical mode decomposition [12]. In addition to the high reliability of the prediction results, it also has a certain generalization ability. Zhang et al. achieved improved accuracy in predicting recirculation flow rates through reconstructing the data in phase space [13]. Zeng et al. utilized the strong nonlinear fitting characteristics of neural networks to effectively fit the chaotic system of sea clutter after phase space reconstruction, thereby achieving effective prediction [14]. However, the above prediction based on deep learning algorithms rely heavily on phase space parameter estimation, and different parameters will reconstruct different training data, which in turn affects the training effect of deep neural networks.
Phase space parameter estimation includes delay time estimation and embedding dimension estimation. The common methods to estimate phase space parameters in traditional physics methods such as, the CAO method [15], false nearest neighbors (FNN) [16], and C-C [17]. The delay time is first obtained using the mutual information method in the CAO method. Since each phase point will have the nearest phase point, the distance between the phase points will increase with the increasing embedding dimension. Until the distance between the phase points reaches the maximum and does not change, the dimension is the embedding dimension. In the FNN method, as the embedding dimension increases, the nearest neighbor phase point will gradually disappear when a certain embedding dimension is reached, and the chaotic system begins to reconstruct. At this time, the embedding dimension can be determined. In the third C-C method, the principle is to rely on the correlation integral function to estimate the delay time $ \tau $ and the delay time window $ {\tau }_{w}=(m-1)\tau $, and then calculate the statistics. Finally, the delay time is determined by statistics. The embedding dimension and delay time calculated by conventional physical methods tend to be large, leading to both data wastage and suboptimal utilization of phase space dimensions. This ultimately undermines the accuracy of chaotic time series prediction. Li et al. proposed a method based on neural networks to configure phase space parameters rapidly [18]. This method seeks the most effective predictive parameters by mapping the relationship between input data and phase space reconstruction parameters. However, this approach has not yet considered the dimensional contribution of the parameters, resulting in significant randomness.
Aiming at the above two problems, we propose a convolutional neural network-long short-term memory (CNN-LSTM) chaotic time series prediction model based on the incremental attention mechanism. Firstly, to optimize the contribution of each dimension in the reconstructed phase space for enhancing the predictive accuracy of label mapping, an incremental attention mechanism is employed to traverse and filter phase space parameters. This process can select the parameters that best conform to the dimensional weight criteria (DWC). Then, the reconstructed phase space is fed into a CNN-LSTM network for the extraction of spatio-temporal features and to provide prediction results. CNN is advantageous in extracting spatial features, while LSTM outperforms in the extraction of temporal features. The composite network, compared with a singular network, is more effective in extracting the temporal and spatial characteristics of chaotic sequences, demonstrating higher accuracy in chaotic time series prediction. Finally, the proposed model is applied to the Logistic system, Lorenz system, and the sunspot chaotic time series, respectively. The simulation results show that the proposed prediction model exhibits superior performance in terms of both root mean square error (RMSE) and mean absolute error (MAE), demonstrating enhanced predictive accuracy.
Section 1 introduces the theoretical evolution of chaotic time series prediction methods from traditional statistics to deep neural networks. Section 2 outlines data processing, focusing on phase space reconstruction and data preprocessing. In Section 3, a novel chaotic time series prediction algorithm based on the incremental attention mechanism is presented, showcasing improved predictive accuracy in the CNN-LSTM network. Section 4 details simulation experiments conducted on three types of chaotic data: The logistic system, Lorenz system, and the sunspot time series. Results indicate that the proposed phase space reconstruction optimization and CNN-LSTM network outperform other methods.
2 Data preprocessing
To keep the data chaotic, one needs to filter out a period of data at the initial moment. In terms of chaos time series prediction, the phase space reconstruction method of sequence data is an important method to reconstruct the state of the system. Therefore, it is necessary to reconstruct the phase space of the chaotic time series before importing the state data into the training mode. In addition, to reduce the influence of dimension, the data needs to be normalized, and the result set is also supposed to be inversed normalized when predicting the test set.
2.1 Phase space reconstruction
Phase space reconstruction provides theoretical support for the prediction of chaotic sequence data. On the one hand, the reconstructed phase space can restore the state of the original motion system, and on the other hand, it also provides training data for the input layer of the neural network. According to the coordinate delay method proposed by Ref. [19], one-dimensional (1D) chaotic data can be transformed into multi-dimensional data. This method can fully reconstruct the dimension of the motion system. Subsequently, Takens [20] proved from a mathematical point of view that the dimensionality of the dynamical system can be reconstructed under the appropriate embedding dimensionality, and phase space reconstruction can be realized. On the premise of determining the delay time and embedding dimension of phase space reconstruction, the basic principle of reconstruction is as follows:
Assuming a time series of length N:$\left\{ {{x_1}{\mathrm{,}}\; {x_2}{\mathrm{,}}\; \cdots {\mathrm{,}}\; {x_N}} \right\}$, the reconstructed state vector after calculating the delay time $ \tau $ and embedding dimension $ m $ is
$ \mathrm{\mathbf{X}}\left(i\right)=\left[x_i\mathrm{,}\; x_{i+\tau}\mathrm{,}\; \cdots\mathrm{,}\; x_{i+\left(m-1\right)\tau}\right]\quad i=1\mathrm{,}\; 2\mathrm{,}\; \cdots\mathrm{,}\; M $ (1)
where $ M=N-(m-1)\tau $ is the number of phase points reconstructed for the phase space reconstruction. The key to whether the reconstructed phase space is suitable for chaos prediction is to determine the two parameters of time delay and embedding dimension. In the chaotic prediction network, these two parameters determine not only the vector state at the phase point but also the specific format and quantity of the training data in the network. At the same time, different parameters also show different prediction effects. The reconstructed phase point matrix is expressed as follows:
$ {\bf{X}} = \left[ {} \right] $ (2)
$ {\bf{Y}} = {\left[
{{x_{2 + \left( {m - 1} \right)\tau }}} \quad {{x_{3 + \left( {m - 1} \right)\tau }}} \quad {{x_{4 + \left( {m - 1} \right)\tau }} \quad
\cdots \quad {{x_{M + 1 + \left( {m - 1} \right)\tau }}}
} \right]^{\text{T}}} {\mathrm{.}} $ (3)
For the reconstructed phase point matrix, the data of each phase point, that is, each row of the X matrix, is used as a row of data in the training set, and the data in the corresponding row in Y is used as the predicted label value of the row of training data. Different training sets will be generated for different phase space constructions, and different prediction performances have been shown in Ref. [21].
2.2 Data normalization
Normalizing the data set is an indispensable step in using deep neural networks to predict data. It can not only eliminate the influence of dimensions but also promote the convergence of the model and improve its prediction ability. Therefore, normalization is applied to standardize one-dimensional (1D) chaotic data between 0 and 1 during test set validation. The normalization function is defined as follows:
$ x' = \frac{{x - {\text{min}}\left( x \right)}}{{{\text{max}}\left( x \right) - {\text{min}}\left( x \right)}}{\mathrm{.}} $ (4)
After the chaotic series data set is normalized, a subsequent phase of space reconstruction can be performed.
3 Prediction algorithm model
In terms of chaotic series prediction, the advantages of major neural networks are different. This paper uses CNNs combined with long-term and short-term neural networks as the main model. CNN is used to extract spatial features, and the LSTM network further obtains temporal correlations. In addition, since the reconstruction of the traditional phase space only considers the original dimensional features, and does not consider the problem that the training data should be fully compatible to the prediction network, this paper proposes an incremental attention layer to test the phase space parameters and determine the phase space structure that is the most suitable for the prediction model.
The algorithm model proposed in this paper is shown in Fig. 1, which mainly includes the following details:

Figure 1.Overall network structure.
1) Traversal layer: It is also called the data preprocessing layer, which mainly includes the preprocessing of 1D chaotic data and the reconstruction of phase space traversal. Due to the chaotic of the data, its phase space parameters are limited, and the ergodic reconstruction of a limited number of phase spaces can be performed by setting the maximum value, which is the layer that provides all possible reconstructed phase spaces.
2) Incremental attention network (IAN): It will calculate the contribution weight value of each input dimension to the predicted label value, and judge whether the reconstructed phase space is the most suitable data structure for prediction training.
3) CNN: Convolution operation is performed on the input reconstructed phase space, and temporal features are extracted and transferred to the next neural network layers.
4) LSTM: The feature tensor obtained by the convolution layer serves as the input for the LSTM. At this layer, the combination of spatial features and temporal correlations, referred to as spatio-temporal features, is obtained.
3.1 Incremental attention network
IAN is a phase space parameter estimation network layer based on the incremental attention mechanism. It mainly includes the incremental attention layer, parameter estimation layer, and final phase space reconstruction layer. After ergodic reconstruction of limited phase space parameters in the traversal layer, the parameter estimation layer uses DWC to discriminate each reconstructed phase space, in which the dimensional weight is quantified by the attention network to the dimension contribution of the training data. The incremental attention mechanism is based on weight quantification using the attention mechanism and must meet the requirements of DWC.
3.1.1 Incremental attention
The attention mechanism focuses on the function to obtain information on key parts just like human eyes. This information will have a relatively large guiding effect on the next prediction. Its essence is to pay more attention to the parts that need to be paid attention to improving the prediction accuracy of the model [22]. In the 1D chaotic prediction network, the attention mechanism can be understood as a dimension weight extractor, which aims to extract the contribution weight of each dimension in the training data to the future value prediction. For the window training data with a dimension of m, in order to extract the contribution weight $ \beta $ of each dimension, each window data is used as the input of the attention model, and the activation function of the output layer is the softmax. The output calculation formula is as follows:
$ \alpha_{i}=\frac{\exp \left(e_{i}\right)}{\displaystyle\sum_{k=1}^{m} \exp \left(e_{k}\right)} $ (5)
where $ {\alpha }_{i} $ represents the contribution of the data of the ith dimension to the predicted value, $ {e}_{i}=\sigma (\mathbf{W}{v}_{i}+\mathbf{b}) $, $ \sigma $ is the sigmoid function, $ \mathbf{W} $ is the weight matrix from the input layer to the hidden layer, and b is the bias matrix.
The IAN model is shown in Fig. 2, in which the contribution weight is trained and quantized by the attention network, and then the contribution weight is dot-multiplied with the input data and used as training data for re-prediction training. In the experiments of this paper, the key parameters of IAN mainly concern two linear layers. The first linear layer has an input dimension of m and an output dimension of m. The second linear layer takes input data weighted with a dimension of m and produces an output dimension of 1.

Figure 2.Prediction model of IAN.
In the attention network, v1 to vm represent the values from the 1st to the mth dimension in each input window. After the activation of the fully connected layer chaotic mapping and the activation function, the weight contribution value $ {{\boldsymbol{\mathbm{β}}}} =\left[{\alpha }_{1}{\mathrm{,}}\;{\alpha }_{2}{\mathrm{,}}\;\cdots {\mathrm{,}}\;{\alpha }_{m}\right] $ of each dimension to the predicted future value is obtained. For this process, a large amount of data mapping training is performed, and then the mean value is calculated for the weights of each dimension, which is the final attention weight. Finally, point-multiply the corresponding weight value with the corresponding training data to obtain a vector $ {{\boldsymbol{\mathbm{β}}}}\odot \mathbf{v} $, where $ \odot $ denotes the Hadamard product, and then use the new weight vector as the input of the new neural network for the last round of chaotic mapping prediction training, and output the predicted value. This process is the overall process of IAN.
In order to make the input data of chaotic time series prediction effectively contribute to the predicted value of each dimension and participate in the training process efficiently, this paper proposes DWC for IAN input data: If and only if the dimension of the training data is from low to high, and the corresponding prediction contribution weight value is also from low to high. The dimension at this time is called the appropriate dimension, and the parameters that constitute the data are reasonable parameters, and the weight of the dimension is on the basis of increasing the more evenly distributed in each dimension, the larger the dimension and delay parameters, the better the parameters. So, DWC can be described as follows:
$ \sum\limits_{i = 1}^m {{\alpha _i}} = 1 $ (6)
$ {\alpha _1} < {\alpha _2}\; \cdots \; < {\alpha _m} $ (7)
where the total dimension of the data is m.
In chaotic time series prediction, the total contribution of all dimensions to the final predicted value is 1. If each dimension can actively participate in contribution, and the contribution value $ {\alpha }_{i} $ gradually increases from far to near according to the time distance from the predicted value, the phase space data set constructed at this time will achieve the maximum and the most effective utilization rate. At the same time, under the condition of meeting DWC, if there are more participating dimensions and a longer time window, the parameter will be better. Overall, based on the attention mechanism, if the contribution weight value of each dimension meets DWC, it is named the incremental attention weight, and the resulting weight is an incremental attention weight.
3.1.2 Phase space parameter estimation
Since the maximum and minimum values of the phase space parameters are finite, the two parameters of delay time and embedding dimension can be searched and optimized. For each reconstructed phase space, on the one hand, DWC is used to choose the eligible phase space parameters, and on the other hand, the data weighted by the dimension weight is used for re-prediction training, which further improves the prediction accuracy. Among them, the optimization process of the phase space parameters is as follows. After initializing the maximum value of the parameter, all reasonable parameter pairs are traversed and verified. The verification method is to substitute the reconstructed phase space of the parameters into the attention model. For each dimension, the contribution weight value is quantitatively calculated by IAN. If the contribution weight of each dimension conforms to DWC, the parameters are recorded until the optimal reconstruction parameters are finally determined. The overall process algorithm is shown in Table 1.

Table 1. Parameters estimation algorithm.
Table 1. Parameters estimation algorithm.
Algorithm: Parameter estimation algorithm | Input: $\rm{all} \; (m{\mathrm{,}}\; \tau) $ | Output: $\rm{best} \; (m{\mathrm{,}}\; \tau) $ | 1: Initialize: Set $m_0=1{\mathrm{,}}\; \tau_0=0 $ | 2: for $ m=1{\mathrm{,}}\; 2{\mathrm{,}}\; \cdots{\mathrm{,}}\; m_{{\mathrm{max}} } $ do | 3: for $\tau=0{\mathrm{,}}\;1{\mathrm{,}}\; \cdots{\mathrm{,}}\; \tau_{{\mathrm{max}} } $ do | 4: if $(m{\mathrm{,}}\; \tau) $ fit DWC then save; | 5: else continue; | 6: end if | 7: end for | 8: end for |
|
3.2 Convolutional neural network
CNN [23] mainly comprises convolution layers and pooling layers, with supervised and unsupervised training methods. Its core is the convolution layer, which mainly uses the appropriate convolution kernel to extract features and reduce dimensions of input data. For 1D input data, applications such as machine translation and time series prediction can be performed, and the convolution kernel is a 1D vector. For two-dimensional (2D) input data, there are applications such as image recognition and image classification, and its convolution kernel is also a 2D matrix. According to the dimension of the convolution kernel, CNN can be divided into 1D convolution, 2D convolution, and so on. In this paper, 1D CNN is used to extract features of time series data, and the specific level information is shown in Fig. 3.

Figure 3.One-dimensional convolutional neural network.
CNN in this paper consists of an input layer, convolution layers, pooling layers, and output layers. Two convolution layers are used in CNN. The first convolution layer has an input channel number of 1, 50 convolutional kernels, and a kernel size of 1, and employs same’ padding. The second convolution layer has an input channel number of 50, 50 convolutional kernels, and a kernel size of 1, and similarly employs ‘same’ padding. The key operation parts are the convolution layer and the pooling layer, which realize feature extraction and dimensionality reduction. The input layer is a data processing step, which performs data preprocessing and normalization, and inputs the constructed phase space data. The 1D feature extraction of the convolution layer can be understood as extracting the translational features of the data in a certain direction, and its mathematical significance is shown in
$ {\bf{y}}\left(n\right)={\bf{h}}\left(n\right){\bf{u}}\left(n\right)=\displaystyle \sum _{i=0}^{k}{\bf{h}}\left(n-i\right){\bf{u}}\left(i\right) $ (8)
where ${\bf{y}}{\mathrm{,}}\;{\bf{h}}{\mathrm{,}}$ and u are vectors, $n$ is the number of convolutions, and $k$ is the length of u [24].
3.3 Long short-term memory network
RNN [25] is an extension of the feedforward neural network, which can not only extract the features of the current data, but also relearn the features of the previous moment.
Although it can effectively extract the time-dependent features of time series and ensure the ability to learn time series, it suffers from serious gradient problems, which is prone to gradient vanishing and gradient explosion. In order to overcome the gradient problem, LSTM came into being. The three gates are added on the basis of RNN to solve the gradient problem of time series, especially long-term sequence prediction.
The LSTM network structure diagram is shown in Fig. 4 below, and this unit represents a network unit. ht is the cell output state value at time t.

Figure 4.Network structure of LSTM.
The core of the network consists of three gates: Forget gate, input gate, and output gate. Two stacked layers are used in LSTM. The first LSTM layer has an input size of 5 and a hidden state size of 32. The second LSTM layer has an input size of 32 and its hidden state size is set to 16. LSTM uses the forget gate fi and the input gate it to control the content of the unit state C, and the output gate controls how much of the unit state Ct output to the current output value of LSTM.
Forget gate:
$ {{\bf{f}}_t} = \sigma \left( {{{\bf{W}}_f} \cdot \left[ {{{\bf{h}}_{t - 1}}{\mathrm{,}}\;{{\bf{x}}_t}} \right] + {{\bf{b}}_f}} \right) $ (9)
Input gate:
$ {{\bf{i}}_t} = \sigma \left( {{{\bf{W}}_i} \cdot \left[ {{{\bf{h}}_{t - 1}}{\mathrm{,}}\;{{\bf{x}}_t}} \right] + {{\bf{b}}_i}} \right) $ (10)
State gate:
$ {\tilde {\bf{c}}_t} = {\text{tanh}}\left( {{{\bf{W}}_c} \cdot \left[ {{{\bf{h}}_{t - 1}}{\mathrm{,}}\;{{\bf{x}}_t}} \right] + {{\bf{b}}_c}} \right) $ ()
$ {{\bf{c}}_t} = {f_t} {{\bf{c}}_{t - 1}} + {i_t} {\tilde {\bf{c}}_t} $ (11)
Output gate:
$ {{\bf{o}}_t} = \sigma \left( {{{\bf{W}}_o} \cdot \left[ {{{\bf{h}}_{t - 1}}{\mathrm{,}}\;{{\bf{x}}_t}} \right] + {{\bf{b}}_o}} \right) $ (12)
where $ {\mathbf{W}}_{f} $, $ {\mathbf{W}}_{i} $, and $ {\mathbf{W}}_{c} $ are the weight vectors of the neural network where the forget gate, input gate, and states are located, the corresponding b is their bias vectors, and $\sigma $ is the sigmoid activation function.
4 Experiments and discussion
The experimental environment of this paper is GPU-2070S, 8 G video memory, AMD Ryzen 5 3600X6-Core Processor 3.80 GHz CPU, 16 G memory, all programs are written in the Python 3.7 version, and the editor is Jupyter Notebook.
To verify the predictive ability of the model proposed in this paper, we used the logistic system, the Lorenz system, and the sunspot chaotic time series. The experiment carried out a comparison of two dimensions. First, for the prediction accuracy of the network, in the case of the same phase space structure, the CNN-LSTM network based on the incremental attention mechanism proposed in this paper is compared with LSTM, CNN, SVR, and RNN about the network prediction accuracy value. Second, in the case of the same prediction network, the method to estimate phase space parameters based on the incremental attention mechanism proposed in this paper is compared with the traditional C-C, CAO, and FNN for phase space reconstruction. The comparison of accuracy utilizes RMSE and MAE, which are defined as follows:
$ \text { RMSE }=\sqrt{\frac{1}{h} \sum_{i=1}^{n}\left(y_{t}-\hat{y}_{t}\right)^{2}} $ (13)
$ {\mathrm{MAE}} = \frac{1}{h}\sum\limits_{i = 1}^h {|{y_i} - \widehat {{y_i}}|} $ (14)
where h is the number of predicted samples; $ {y}_{i} $ and $ {\widehat{y}}_{i} $ are the predicted value and label value, respectively.
4.1 Prediction of logistic systems
The logistic chaos mapping equation is
$ {x_{n + 1}} = \mu {x_n}\left( {1 - {x_n}} \right). $ (15)
When $ 3\le \mu \le 4 $, the dynamical system of the Logistic mapping shows chaos and the initial value of the system was selected as $ {x}_{0}=0.32 $ and $ \mu =3.8 $. To ensure that the experimental data is out of periodicity and enters the chaotic state, we eliminate the first 500 data in the data preprocessing stage, using the subsequent 3000 data as the training set, and the subsequent 600 data of the training set are used as the test set. After the data set is preprocessed by the traversal layer, IAN is used to calculate the embedding dimension of the training sample which is 2, and the delay time is 5.
The sample delay time and embedding dimension obtained using the IAN, CAO, FNN, and C-C methods are shown in Table 2. Logistic is a discrete chaotic system, with the units of delay time always normalized to 1. Table 2 shows that the Logistic reconstruction parameters determined by the traditional phase space reconstruction method are too large. The maximum embedding dimension reaches 12, and the maximum delay time is 15. Traditional methods keep high-dimensional characteristics of the chaotic system, but they have certain constraints in prediction accuracy because they cannot meet DWC.

Table 2. Phase space reconstruction parameters of logistic system.
Table 2. Phase space reconstruction parameters of logistic system.
Method | Embedding dimension | Delay | IAN | 2 | 5 | CAO | 5 | 9 | FNN | 4 | 15 | C-C | 12 | 3 |
|
We determined the reconstruction parameters using IAN by searching from small to large values in order to fulfill the three requirements: Chaotic system reconstruction, appropriate sequence window length, and sufficient data features for the network. Table 3 shows the contribution weights of each dimension of reconstructed phase space by IAN. The maximum dimension that satisfied DWC is the second line. So, the reconstruction phase space of IAN is 2D, with the corresponding weights for each dimension of 0.3808 and 0.6192.

Table 3. Dimension contribution weight of the logistic system.
Table 3. Dimension contribution weight of the logistic system.
Dimension | Contribution weight | Incremental | 1 | [1.0] | Yes | 2 | [0.3808, 0.6192] | Yes | 3 | [0.2507, 0.4120, 0.3384 ] | No | 4 | [0.2430, 0.1731, 0.3057, 0.2782] | No |
|
Fig. 5 demonstrates the effectiveness of the proposed prediction method by showcasing a well-fitted diagram of the test set, the utilizing parameters selected by IAN and CNN-LSTM. Fig. 5 not only captures the trend of change but also maintains predicted values within a small error range compared with actual values, indicating the method’s efficacy.

Figure 5.Prediction of the logistic system.
Furthermore, Tables 4 and 5 present RMSEs and MAEs of various prediction networks, alongside corresponding phase space parameters under the logistic data set. These results provide quantitative insights into the performance of different prediction models and their parameter configurations.

Table 4. Predicted RMSE error table of the logistic system.
Table 4. Predicted RMSE error table of the logistic system.
Model | IAN | CAO | FNN | C-C | CNN-LSTM | 0.0061 | 0.0084 | 0.0087 | 0.0072 | LSTM | 0.0069 | 0.0188 | 0.0084 | 0.0103 | CNN | 0.0087 | 0.0138 | 0.0115 | 0.0907 | SVR | 0.0499 | 0.0664 | 0.0516 | 0.0873 | RNN | 0.0146 | 0.0277 | 0.0437 | 0.0298 |
|

Table 5. Predicted MAE error table of the logistic system.
Table 5. Predicted MAE error table of the logistic system.
Model | IAN | CAO | FNN | C-C | CNN-LSTM | 0.00025 | 0.00034 | 0.00036 | 0.00029 | LSTM | 0.00028 | 0.00077 | 0.00034 | 0.00042 | CNN | 0.00036 | 0.00056 | 0.00047 | 0.00370 | SVR | 0.00204 | 0.00271 | 0.00211 | 0.00356 | RNN | 0.00060 | 0.00113 | 0.00178 | 0.00122 |
|
Tables 4 and 5 demonstrate that for a single LSTM or CNN network, no matter which phase space reconstruction method is used, the prediction accuracy is inferior to CNN-LSTM, which shows that spatio-temporal features are more conducive to prediction than single temporal or spatial features. In addition, for the five networks of CNN-LSTM, LSTM, CNN, SVR, and RNN, the IAN phase space reconstruction method proposed in this paper improves the final prediction accuracy by at least 15.28%, 17.86%, 24.38%, 3.29%, and 4.73% compared with the CAO, FNN, and C-C reconstruction methods, respectively.
4.2 Prediction of Lorenz systems
The Lorenz chaotic mapping equation is expressed as follows
$ \frac{{{\text{d}}x}}{{{\text{d}}t}} = - a\left( {x - y} \right) $ (16a)
$ \frac{{{\text{d}}y}}{{{\text{d}}t}} = - xz + cx - y $ (16b)
$ \frac{{{\text{d}}x}}{{{\text{d}}t}} = - a\left( {x - y} \right){\mathrm{.}} $ (16c)
Let the initial point of the system equation be (1,1,1). The parameters a, b, and c are 10, 8/3, and 28, respectively. As with the data preprocessing in subsection 4.1, the training and test sets are selected. In this experiment, the x-component sequence of the Lorenz system is selected for prediction analysis. Following preprocessing of the data set by the traversal layer, the embedding dimension of the training sample is computed as 3 using the incremental attention layer, with a delay time of 6. Subsequently, the parameters obtained via the IAN, CAO, FNN, and C-C methods are detailed in Table 6 below. It is worth noting that the Lorenz system is a continuous chaotic system, and the unit of delay time is an integration step, which is set to 0.001.

Table 6. Phase space reconstruction parameters of the Lorenz system.
Table 6. Phase space reconstruction parameters of the Lorenz system.
Method | Embedding dimension | Delay | IAN | 3 | 6 | CAO | 7 | 17 | FNN | 5 | 17 | C-C | 6 | 10 |
|
As evidenced by Table 6, the Lorenz phase space reconstruction parameters determined by the traditional physical phase space reconstruction method are notably large, with the maximum embedding dimension reaching 7 and the maximum delay time being 17. Table 7 illustrates the contribution weights of each dimension of reconstructed phase space by IAN. The maximum dimension that best satisfies DWC corresponds to the values in the third row. Thus, the reconstruction phase space of IAN is three-dimensional (3D), and the corresponding weights for each dimension are 0.1748, 0.2845, and 0.5427. The sample data reconstructed in the phase space is substituted into the prediction model for training. The prediction fitting diagram of the test set is shown in Fig. 6 .

Table 7. Dimension contribution weight of the Lorenz system.
Table 7. Dimension contribution weight of the Lorenz system.
Dimension | Contribution weight | Incremental | 1 | [1.0] | Yes | 2 | [0.01051, 0.98949] | Yes | 3 | [0.1748, 0.2845, 0.5427] | Yes | 4 | [0.1911, 0.2078, 0.3186, 0.2825] | No | 5 | [0.1689, 0.2367, 0.1999, 0.2808, 0.1136] | No |
|

Figure 6.Prediction of the Lorenz system.
Fig. 6 illustrates that the model proposed in this paper has achieved superior performance in predicting the Lorenz system. Not only does it consistently track the changing trend, but it also minimizes fluctuations in the error between the real and predicted values. Tables 8 and 9 also present the RMSE and MAE values of various prediction networks and their corresponding phase space parameters under the Lorenz data set. The CNN-LSTM network based on IAN still performs well on the Lorenz sample set. The two types of prediction accuracy error are the lowest among 20 combinations of five networks and four phase space reconstruction methods, and values are reduced to 0.1264 and 0.00516, respectively.

Table 8. Predicted RMSE error table of the Lorenz system.
Table 8. Predicted RMSE error table of the Lorenz system.
Model | IAN | CAO | FNN | C-C | CNN-LSTM | 0.1264 | 0.3306 | 0.2801 | 0.2393 | LSTMs | 0.1331 | 0.2279 | 0.2511 | 0.1857 | CNN | 0.1491 | 0.3062 | 0.4627 | 0.6170 | SVR | 1.7839 | 2.1778 | 2.1305 | 1.9869 | RNN | 0.1352 | 0.3709 | 0.3106 | 0.3161 |
|

Table 9. Predicted MAE error table of the Lorenz system.
Table 9. Predicted MAE error table of the Lorenz system.
Model | IAN | CAO | FNN | C-C | CNN-LSTM | 0.00516 | 0.01350 | 0.01144 | 0.00977 | LSTM | 0.00543 | 0.00930 | 0.01025 | 0.00758 | CNN | 0.00609 | 0.01250 | 0.01889 | 0.02519 | SVR | 0.07283 | 0.08891 | 0.08698 | 0.08111 | RNN | 0.00552 | 0.01514 | 0.01268 | 0.01290 |
|
4.3 Prediction of the sunspot time series
To verify the model’s performance in the real dynamical system, we used the sunspot data from 1749 to 2021 [26]. The unit is a month since the monthly average sunspot data is used. After the data set is traversed, the embedding dimension of the training sample is calculated to be 5, with a delay time of 3 months using the incremental attention layer. The sample delay time and embedding dimension obtained using the IAN, CAO, FNN, and C-C methods are also shown in Table 10. The parameters determined by the traditional physical phase space reconstruction method are too large for the sunspot time-series data. The maximum embedding dimension reaches 12, and the maximum delay time is 27. The reconstruction parameters, as determined by IAN, were derived with an embedded dimension of 5 and a delay time of 3 through a progressive search and validation process from small to large values. Table 11 illustrates the contribution weights of each dimension of reconstructed phase space by IAN. The weights of the five-dimension space meet DWC.

Table 10. Phase space reconstruction parameters of the sunspot series.
Table 10. Phase space reconstruction parameters of the sunspot series.
Method | Embedding dimension | Delay (month) | IAN | 5 | 3 | CAO | 12 | 14 | FNN | 5 | 14 | C-C | 3 | 27 |
|
The sample data reconstructed in the phase space is substituted into the prediction model for training. The prediction fitting diagram of the test set is shown in Fig. 7. The proposed model has achieved good results in predicting the sunspot sequence, and the error between the real value and the predicted value varies within a certain range, indicating that the model proposed in this paper has a good prediction effect.

Figure 7.Prediction of the sunspot series.

Table 11. Dimension contribution weight of the sunspot series.
Table 11. Dimension contribution weight of the sunspot series.
Dimension | Contribution weight | Incremental | 1 | [1.0] | No | 2 | [0.4175, 0.5825] | Yes | 3 | [0.2307, 0.2431, 0.5263] | Yes | 4 | [0.1751, 0.2042, 0.2757, 0.3450] | Yes | 5 | [0.0180, 0.0587, 0.1143, 0.2405, 0.5685] | Yes | 6 | [0.2109, 0.1194, 0.1376, 0.1675, 0.1984, 0.1662] | No | 7 | [0.1173, 0.2157, 0.1516, 0.1115, 0.1474, 0.1319, 0.1245] | No |
|
In addition, the RMSE and MAE values of different prediction networks and phase space parameters corresponding to the network are also predicted using the sunspot data. They are shown in Tables 12 and 13. For the five networks of CNN-LSTM, LSTM, CNN, SVR, and RNN, the IAN phase space reconstruction method proposed in this paper improves the prediction accuracy by at least 15.97%, 0.34%, 1.89%, 0.86%, and 0.51% compared with CAO, FNN, and C-C, respectively, indicating that the IAN phase space reconstruction method proposed in this paper is better than the other three reconstruction methods in chaotic prediction.

Table 12. Predicted RMSE error table of the sunspot series.
Table 12. Predicted RMSE error table of the sunspot series.
Model | IAN | CAO | FNN | C-C | CNN-LSTM | 25.6003 | 28.7535 | 33.9891 | 30.2471 | LSTMs | 25.9045 | 25.9927 | 26.5143 | 26.8429 | CNN | 25.7932 | 28.1824 | 26.2913 | 26.5868 | SVR | 30.2663 | 32.5171 | 32.1185 | 30.5299 | RNN | 25.7662 | 25.8984 | 26.5871 | 26.6672 |
|

Table 13. Predicted MAE error table of the sunspot series.
Table 13. Predicted MAE error table of the sunspot series.
Model | IAN | CAO | FNN | C-C | CNN-LSTM | 1.04513 | 1.17386 | 1.38760 | 1.23483 | LSTM | 1.05755 | 1.06115 | 1.08244 | 1.09586 | CNN | 1.05300 | 1.15054 | 1.07334 | 1.08540 | SVR | 1.23562 | 1.32751 | 1.31123 | 1.24638 | RNN | 1.05190 | 1.05730 | 1.08541 | 1.08868 |
|
5 Conclusion
We proposed a phase space reconstruction method based on DWC and an approach that applied attention weighting to each dimension of the reconstructed phase space, embedding it into a composite CNN-LSTM network.
Initially, this paper presented a phase space reconstruction method based on DWC. The core idea was to use these criteria to traverse and validate finite phase space parameters, selecting parameters that the best met DWC.
Experiments on three chaotic systems show that the proposed method to reconstruct phase space has higher prediction accuracy for the same network type than traditional methods like CAO, FNN, and C-C. Furthermore, the contribution of dimensions, i.e., attention weight, was applied to each dimension of phase space data and embedded as an incremental attention layer in the input layer of the CNN-LSTM network, thus constructing a complete chaotic prediction network.
Disclosures
The authors declare no conflicts of interest.