博客專(zhuān)欄:
https://www.cnblogs.com/xuruilong100
5.3 預(yù)測(cè)未來(lái) 10 年的數(shù)據(jù)
我們可以通過(guò)調(diào)整預(yù)測(cè)函數(shù)來(lái)使用完整的數(shù)據(jù)集預(yù)測(cè)未來(lái) 10 年的數(shù)據(jù)。新函數(shù) predict_keras_lstm_future()
用來(lái)預(yù)測(cè)未來(lái) 120 步(或 10 年)的數(shù)據(jù)。
predict_keras_lstm_future <- function(data,
epochs = 300,
...) {
lstm_prediction <- function(data,
epochs,
...) {
# 5.1.2 Data Setup (MODIFIED)
df <- data
# 5.1.3 Preprocessing
rec_obj <- recipe(value ~ ., df) %>%
step_sqrt(value) %>%
step_center(value) %>%
step_scale(value) %>%
prep()
df_processed_tbl <- bake(rec_obj, df)
center_history <- rec_obj$steps[[2]]$means['value']
scale_history <- rec_obj$steps[[3]]$sds['value']
# 5.1.4 LSTM Plan
lag_setting <- 120 # = nrow(df_tst)
batch_size <- 40
train_length <- 440
tsteps <- 1
epochs <- epochs
# 5.1.5 Train Setup (MODIFIED)
lag_train_tbl <- df_processed_tbl %>%
mutate(
value_lag = lag(value, n = lag_setting)) %>%
filter(!is.na(value_lag)) %>%
tail(train_length)
x_train_vec <- lag_train_tbl$value_lag
x_train_arr <- array(
data = x_train_vec, dim = c(length(x_train_vec), 1, 1))
y_train_vec <- lag_train_tbl$value
y_train_arr <- array(
data = y_train_vec, dim = c(length(y_train_vec), 1))
x_test_vec <- y_train_vec %>% tail(lag_setting)
x_test_arr <- array(
data = x_test_vec, dim = c(length(x_test_vec), 1, 1))
# 5.1.6 LSTM Model
model <- keras_model_sequential()
model %>%
layer_lstm(
units = 50,
input_shape = c(tsteps, 1),
batch_size = batch_size,
return_sequences = TRUE,
stateful = TRUE) %>%
layer_lstm(
units = 50,
return_sequences = FALSE,
stateful = TRUE) %>%
layer_dense(units = 1)
model %>%
compile(loss = 'mae', optimizer = 'adam')
# 5.1.7 Fitting LSTM
for (i in 1:epochs) {
model %>%
fit(x = x_train_arr,
y = y_train_arr,
batch_size = batch_size,
epochs = 1,
verbose = 1,
shuffle = FALSE)
model %>% reset_states()
cat('Epoch: ', i)
}
# 5.1.8 Predict and Return Tidy Data (MODIFIED)
# Make Predictions
pred_out <- model %>%
predict(x_test_arr, batch_size = batch_size) %>%
.[,1]
# Make future index using tk_make_future_timeseries()
idx <- data %>%
tk_index() %>%
tk_make_future_timeseries(n_future = lag_setting)
# Retransform values
pred_tbl <- tibble(
index = idx,
value = (pred_out * scale_history center_history)^2)
# Combine actual data with predictions
tbl_1 <- df %>%
add_column(key = 'actual')
tbl_3 <- pred_tbl %>%
add_column(key = 'predict')
# Create time_bind_rows() to solve dplyr issue
time_bind_rows <- function(data_1,
data_2,
index) {
index_expr <- enquo(index)
bind_rows(data_1, data_2) %>% as_tbl_time(index = !! index_expr)
}
ret <- list(tbl_1, tbl_3) %>%
reduce(time_bind_rows, index = index) %>%
arrange(key, index) %>%
mutate(key = as_factor(key))
return(ret)
}
safe_lstm <- possibly(lstm_prediction, otherwise = NA)
safe_lstm(data, epochs, ...)
}
下一步,在 sun_spots
數(shù)據(jù)集上運(yùn)行 predict_keras_lstm_future()
函數(shù)。
future_sun_spots_tbl <- predict_keras_lstm_future(sun_spots, epochs = 300)
最后,我們使用 plot_prediction()
可視化預(yù)測(cè)結(jié)果,需要設(shè)置 id = NULL
。我們使用 filter_time()
函數(shù)將數(shù)據(jù)集縮放到 1900 年之后。
future_sun_spots_tbl %>%
filter_time('1900' ~ 'end') %>%
plot_prediction(
id = NULL, alpha = 0.4, size = 1.5)
theme(legend.position = 'bottom')
ggtitle(
'Sunspots: Ten Year Forecast',
subtitle = 'Forecast Horizon: 2013 - 2023')
結(jié)論
本文演示了使用 keras
包構(gòu)建的狀態(tài) LSTM 模型的強(qiáng)大功能。令人驚訝的是,提供的唯一特征是滯后 120 階的歷史數(shù)據(jù),深度學(xué)習(xí)方法依然識(shí)別出了數(shù)據(jù)中的趨勢(shì)。回測(cè)模型的 RMSE 均值等于 34,RMSE 標(biāo)準(zhǔn)差等于 13。雖然本文未顯示,但我們對(duì)比測(cè)試1了 ARIMA 模型和 prophet 模型(Facebook 開(kāi)發(fā)的時(shí)間序列預(yù)測(cè)模型),LSTM 模型的表現(xiàn)優(yōu)越:平均誤差減少了 30% 以上,標(biāo)準(zhǔn)差減少了 40%。這顯示了機(jī)器學(xué)習(xí)工具-應(yīng)用適合性的好處。
除了使用的深度學(xué)習(xí)方法之外,文章還揭示了使用 ACF 圖確定 LSTM 模型對(duì)于給定時(shí)間序列是否適用的方法。我們還揭示了時(shí)間序列模型的準(zhǔn)確性應(yīng)如何通過(guò)回測(cè)來(lái)進(jìn)行基準(zhǔn)測(cè)試,這種策略保持了時(shí)間序列的連續(xù)性,可用于時(shí)間序列數(shù)據(jù)的交叉驗(yàn)證。