sse
#
Functions to calculate the squared standard error series.
Functions:
-
get_sse_series_init_seq
–Compute a series of squared standard errors for a time series as data
-
get_sse_series_window
–Compute a series of squared standard errors for a time series as data
get_sse_series_init_seq
#
get_sse_series_init_seq(
data: NDArray[float64],
sequence_estimator: str = "initial_convex",
min_max_lag_time: int = 3,
max_max_lag_time: Optional[int] = None,
smooth_lag_times: bool = False,
frac_padding: float = 0.1,
) -> Tuple[NDArray[float64], NDArray[float64]]
Compute a series of squared standard errors for a time series as data is discarded from the beginning of the time series. The squared standard error is computed using the sequence estimator specified.
Parameters:
-
data
(ndarray
) –A time series of data with shape (n_samples,).
-
sequence_estimator
(str
, default:'initial_convex'
) –The initial sequence estimator to use. Can be "positive", "initial_positive", "initial_monotone", or "initial_convex". The default is "initial_convex". "positive" corresponds to truncating the auto-covariance function at the first negative value, as is done in pymbar. The other methods correspond to the methods described in Geyer, 1992: https://www.jstor.org/stable/2246094.
-
min_max_lag_time
(int
, default:3
) –The minimum maximum lag time to use when estimating the statistical inefficiency.
-
max_max_lag_time
(int
, default:None
) –The maximum maximum lag time to use when calculating the auto-correlation function. If None, the maximum lag time will be the length of the time series.
-
smooth_lag_times
(bool
, default:False
) –Whether to smooth out the max lag times by a) converting them to a monotinically decreasing sequence and b) linearly interpolating between points where the sequence changes. This may be useful when the max lag times are noisy.
-
frac_padding
(float
, default:0.1
) –The fraction of the end of the timeseries to avoid calculating the variance for. For example, if frac_padding = 0.1, the variance will be calculated for the first 90% of the time series. This helps to avoid noise in the variance when there are few data points.
Returns:
-
ndarray
–The squared standard error series.
-
ndarray
–The maximum lag times used.
Source code in red/sse.py
get_sse_series_window
#
get_sse_series_window(
data: NDArray[float64],
kernel: Callable[[int], NDArray[float64]] = bartlett,
window_size_fn: Optional[
Callable[[int], int]
] = lambda x: round(x**0.5),
window_size: Optional[int] = None,
frac_padding: float = 0.1,
) -> Tuple[NDArray[float64], NDArray[float64]]
Compute a series of squared standard errors for a time series as data is discarded from the beginning of the time series. The squared standard error is computed using the window size and kernel specified.
Parameters:
-
data
(ndarray
) –A time series of data with shape (n_samples,).
-
kernel
(callable
, default:numpy.bartlett
) –A function that takes a window size and returns a window function.
-
window_size_fn
(callable
, default:lambda x: round(x**0.5)
) –A function that takes the length of the time series and returns the window size to use. If this is not None, window_size must be None.
-
window_size
(int
, default:None
) –The size of the window to use, defined in terms of time lags in the forwards direction. If this is not None, window_size_fn must be None.
-
frac_padding
(float
, default:0.1
) –The fraction of the end of the timeseries to avoid calculating the variance for. For example, if frac_padding = 0.1, the variance will be calculated for the first 90% of the time series. This helps to avoid noise in the variance when there are few data points.
Returns:
-
ndarray
–The squared standard error series.
-
ndarray
–The window sizes used.