Title: | Fast Time Series Modeling for Seasonal Series with Exogenous Variables |
---|---|
Description: | An implementation of sparsity-ranked lasso and related methods for time series data. This methodology is especially useful for large time series with exogenous features and/or complex seasonality. Originally described in Peterson and Cavanaugh (2022) <doi:10.1007/s10182-021-00431-7> in the context of variable selection with interactions and/or polynomials, ranked sparsity is a philosophy with methods useful for variable selection in the presence of prior informational asymmetry. This situation exists for time series data with complex seasonality, as shown in Peterson and Cavanaugh (2024) <doi:10.1177/1471082X231225307>, which also describes this package in greater detail. The sparsity-ranked penalization methods for time series implemented in 'fastTS' can fit large/complex/high-frequency time series quickly, even with a high-dimensional exogenous feature set. The method is considerably faster than its competitors, while often producing more accurate predictions. Also included is a long hourly series of arrivals into the University of Iowa Emergency Department with concurrent local temperature. |
Authors: | Ryan Andrew Peterson [aut, cre, cph] |
Maintainer: | Ryan Andrew Peterson <[email protected]> |
License: | GPL (>= 3) |
Version: | 1.0.1.9000 |
Built: | 2024-11-23 04:36:04 UTC |
Source: | https://github.com/petersonr/fastts |
internal AICc function for lasso models
Internal function for obtaining oos results
Internal function for converting time series into model matrix of lags
AICc(fit, eps = 1) get_oos_results(fits, ytest, Xtest) get_model_matrix(y, X = NULL, n_lags_max)
AICc(fit, eps = 1) get_oos_results(fits, ytest, Xtest) get_model_matrix(y, X = NULL, n_lags_max)
fit |
an object with logLik method, |
eps |
minimum df used in computation |
fits |
a list of fits with different tuning parameters |
ytest |
validation data |
Xtest |
new X data, including lags |
y |
time series vector |
X |
Additional exogenous features |
n_lags_max |
Maximum number of lags to add |
Uses penalized regression to quickly fit time series models with potentially complex seasonal patterns and exogenous variables. Based on methods described in Peterson & Cavanaugh (2024).
fastTS( y, X = NULL, n_lags_max, gamma = c(0, 2^(-2:4)), ptrain = 0.8, pf_eps = 0.01, w_endo, w_exo, weight_type = c("pacf", "parametric"), m = NULL, r = c(rep(0.1, length(m)), 0.01), plot = FALSE, ncvreg_args = list(penalty = "lasso", returnX = FALSE, lambda.min = 0.001) ) ## S3 method for class 'fastTS' plot(x, log.l = TRUE, ...) ## S3 method for class 'fastTS' coef(object, choose = c("AICc", "BIC"), ...) ## S3 method for class 'fastTS' print(x, ...) ## S3 method for class 'fastTS' summary(object, choose = c("AICc", "BIC"), ...)
fastTS( y, X = NULL, n_lags_max, gamma = c(0, 2^(-2:4)), ptrain = 0.8, pf_eps = 0.01, w_endo, w_exo, weight_type = c("pacf", "parametric"), m = NULL, r = c(rep(0.1, length(m)), 0.01), plot = FALSE, ncvreg_args = list(penalty = "lasso", returnX = FALSE, lambda.min = 0.001) ) ## S3 method for class 'fastTS' plot(x, log.l = TRUE, ...) ## S3 method for class 'fastTS' coef(object, choose = c("AICc", "BIC"), ...) ## S3 method for class 'fastTS' print(x, ...) ## S3 method for class 'fastTS' summary(object, choose = c("AICc", "BIC"), ...)
y |
univariate time series outcome |
X |
matrix of predictors (no intercept) |
n_lags_max |
maximum number of lags to consider |
gamma |
vector of exponent for weights |
ptrain |
prop. to leave out for test data |
pf_eps |
penalty factors below this will be set to zero |
w_endo |
optional pre-specified weights for endogenous terms |
w_exo |
optional pre-specified weights for exogenous terms (details) |
weight_type |
type of weights to use for endogenous terms |
m |
mode(s) for seasonal lags (used if weight_type = "parametric") |
r |
penalty factors for seasonal + local scaling functions (used if weight_type = "parametric") |
plot |
logical; whether to plot the penalty functions |
ncvreg_args |
additional args to pass through to ncvreg |
x |
a fastTS object |
log.l |
Should the x-axis (lambda) be logged? |
... |
passed to downstream functions |
object |
a fastTS object |
choose |
which criterion to use for lambda selection (AICc or BIC) |
The default weights for exogenous features will be chosen based on a
similar approach to the adaptive lasso (using bivariate OLS estimates). For
lower dimensional X, it's advised to set w_exo="unpenalized"
,
because this allows for statistical inference on exogenous variable
coefficients via the summary
function.
By default, a seasonal frequency m
must not be specified and the
PACF is used to estimate the weights for endogenous terms. A parametric
version is also available, which allows for a penalty scaling function that
penalizes seasonal and recent lags less according to the penalty scaling
functions described in Peterson & Cavanaugh (2024). See the
penalty_scaler
function for more details, and to plot the penalty
function for various values of m
and r
.
A list of class fastTS
with elements
fits |
a list of lasso fits |
ncvreg_args |
arguments passed to ncvreg |
gamma |
the (negative) exponent on the penalty weights, one for each fit |
n_lags_max |
the maximum number of lags |
y |
the time series |
X |
the utilized matrix of exogenous features |
oos_results |
results on test data using best of fits |
train_idx |
index of observations used in training data |
weight_type |
the type of weights used for endogenous terms |
m |
the mode(s) for seasonal lags (used if weight_type = "parametric") |
r |
penalty factors for seasonal + local scaling functions |
ptrain |
the proportion used to train the model |
x invisibly
a vector of model coefficients
x (invisibly)
the summary object produced by ncvreg evaluated at the best tuning parameter combination (best AICc).
Breheny, P. and Huang, J. (2011) Coordinate descent algorithms for nonconvex penalized regression, with applications to biological feature selection. Ann. Appl. Statist., 5: 232-253.
Peterson, R.A., Cavanaugh, J.E. (2022) Ranked sparsity: a cogent regularization framework for selecting and estimating feature interactions and polynomials. AStA Adv Stat Anal. https://doi.org/10.1007/s10182-021-00431-7
Peterson, R.A., Cavanaugh, J.E. (2024). Fast, effective, and coherent time series modeling using the sparsity-ranked lasso. Statistical Modelling (accepted). DOI: https://doi.org/10.48550/arXiv.2211.01492
predict.fastTS
data("LakeHuron") fit_LH <- fastTS(LakeHuron) fit_LH coef(fit_LH) plot(fit_LH)
data("LakeHuron") fit_LH <- fastTS(LakeHuron) fit_LH coef(fit_LH) plot(fit_LH)
Penalty Scaling Function for parametric penalty weights
penalty_scaler(lag, m, r, plot = TRUE, log = TRUE)
penalty_scaler(lag, m, r, plot = TRUE, log = TRUE)
lag |
a vector of lags for which to calculate the penalty function |
m |
a vector of seasonality modes |
r |
a vector of dim (m + 1) for the factor penalties on c(m, time) |
plot |
logical; whether to plot the penalty function |
log |
logical; whether to return the log of the penalty function |
Predict function for fastTS object
## S3 method for class 'fastTS' predict( object, n_ahead = 1, X_test, y_test, cumulative = FALSE, forecast_ahead = FALSE, return_intermediate = FALSE, ... )
## S3 method for class 'fastTS' predict( object, n_ahead = 1, X_test, y_test, cumulative = FALSE, forecast_ahead = FALSE, return_intermediate = FALSE, ... )
object |
an fastTS object |
n_ahead |
the look-ahead period for predictions |
X_test |
a matrix exogenous features for future predictions (optional) |
y_test |
the test series for future predictions (optional) |
cumulative |
cumulative (rolling) sums of 1-, 2-, 3-, ..., k-step-ahead predictions. |
forecast_ahead |
returns forecasted values for end of training series |
return_intermediate |
if TRUE, returns the intermediate predictions between the 1st and n_ahead predictions, as data frame. |
... |
currently unused |
The 'y_test' argument must be supplied if predictions are desired or if 'n_ahead' < 'nrow(X_test)'. This is because in order to obtain 1-step forecast for, say, the 10th observation in the test data set, the 9th observation of 'y_test' is required.
Forecasts for the first 'n_ahead' observations after the training set can be obtained by setting 'forecast_ahead' to TRUE, which will return the forecasted values at the end of the training data. it produces the 1-step-ahead prediction, the 2-step-ahead prediction, ... through the 'n_ahead'-step prediction. The 'cumulative' argument is similar but will return the cumulative (rolling) sums of 1-, 2-, 3=, ..., 'n_ahead'-step-ahead predictions.
a vector of predictions, or a matrix of 1- through n_ahead predictions.
data("LakeHuron") fit_LH <- fastTS(LakeHuron) predict(fit_LH)
data("LakeHuron") fit_LH <- fastTS(LakeHuron) predict(fit_LH)
A data set containing the 17 columns described below. There are 41640 observations running from 2013 to 2018. Data set are already sorted by time.
uihc_ed_arrivals
uihc_ed_arrivals
a data frame with 17 columns and 41640 rows:
Calendar year
Fiscal year quarter
Integer for month of year
Integer for day of month
Integer for hour of day
Number of arrivals into the ED (outcome)
Date
Indicator for day of week
hourly concurrent temperature
Christmas day indicator
Day after Christmas
New Years Eve indicator
New Years Day indicator
Thanksgiving day indicator
Thanksgiving day (after) indicator
Independence day indicator
Hawkeye football game day indicator
UIHC Emergency Department.