ess
#
Functions to calculate the statistical inefficiency and effective sample size.
Functions:
-
convert_sse_series_to_ess_series
–Convert a series of squared standard errors to a series of effective sample sizes.
-
get_ess_series_init_seq
–Compute a series of effective sample sizes for a time series as data
-
get_ess_series_window
–Compute a series of effective sample sizes for a time series as data
-
statistical_inefficiency_inter_variance
–Compute the statistical inefficiency of a time series by dividing
-
statistical_inefficiency_lugsail_variance
–Compute the statistical inefficiency of a time series by dividing
-
ess_inter_variance
–Compute the effective sample size of a time series by dividing
-
ess_lugsail_variance
–Compute the effective sample size of a time series by dividing
convert_sse_series_to_ess_series
#
convert_sse_series_to_ess_series(
data: NDArray[float64], sse_series: NDArray[float64]
) -> NDArray[float64]
Convert a series of squared standard errors to a series of effective sample sizes.
Parameters:
-
sse_series
(ndarray
) –The squared standard error series.
-
uncor_vars
(ndarray
) –The uncorrelated variances.
Returns:
-
ndarray
–The effective sample size series.
Source code in red/ess.py
get_ess_series_init_seq
#
get_ess_series_init_seq(
data: NDArray[float64],
sequence_estimator: str = "initial_convex",
min_max_lag_time: int = 3,
max_max_lag_time: Optional[int] = None,
smooth_lag_times: bool = False,
frac_padding: float = 0.1,
) -> Tuple[NDArray[float64], NDArray[float64]]
Compute a series of effective sample sizes for a time series as data is discarded from the beginning of the time series. The autocorrelation is computed using the sequence estimator specified.
Parameters:
-
data
(ndarray
) –A time series of data with shape (n_samples,).
-
sequence_estimator
(str
, default:'initial_convex'
) –The initial sequence estimator to use. Can be "positive", "initial_positive", "initial_monotone", or "initial_convex". The default is "initial_convex". "positive" corresponds to truncating the auto-covariance function at the first negative value, as is done in pymbar. The other methods correspond to the methods described in Geyer, 1992: https://www.jstor.org/stable/2246094.
-
min_max_lag_time
(int
, default:3
) –The minimum maximum lag time to use when estimating the statistical inefficiency.
-
max_max_lag_time
(int
, default:None
) –The maximum maximum lag time to use when calculating the auto-correlation function. If None, the maximum lag time will be the length of the time series.
-
smooth_lag_times
(bool
, default:False
) –Whether to smooth out the max lag times by a) converting them to a monotinically decreasing sequence and b) linearly interpolating between points where the sequence changes. This may be useful when the max lag times are noisy.
-
frac_padding
(float
, default:0.1
) –The fraction of the end of the timeseries to avoid calculating the variance for. For example, if frac_padding = 0.1, the variance will be calculated for the first 90% of the time series. This helps to avoid noise in the variance when there are few data points.
Returns:
-
ndarray
–The effective sample size series.
-
ndarray
–The maximum lag times used.
Source code in red/ess.py
get_ess_series_window
#
get_ess_series_window(
data: NDArray[float64],
kernel: Callable[[int], NDArray[float64]] = bartlett,
window_size_fn: Optional[
Callable[[int], int]
] = lambda x: round(x**0.5),
window_size: Optional[int] = None,
) -> Tuple[NDArray[float64], NDArray[float64]]
Compute a series of effective sample sizes for a time series as data is discarded from the beginning of the time series. The squared standard error is computed using the window size and kernel specified.
Parameters:
-
data
(ndarray
) –A time series of data with shape (n_samples,).
-
kernel
(callable
, default:numpy.bartlett
) –A function that takes a window size and returns a window function.
-
window_size_fn
(callable
, default:lambda x: round(x**0.5)
) –A function that takes the length of the time series and returns the window size to use. If this is not None, window_size must be None.
-
window_size
(int
, default:None
) –The size of the window to use, defined in terms of time lags in the forwards direction. If this is not None, window_size_fn must be None.
Returns:
-
ndarray
–The squared standard error series.
-
ndarray
–The window sizes used.
Source code in red/ess.py
statistical_inefficiency_inter_variance
#
Compute the statistical inefficiency of a time series by dividing the inter-run variance estimate by the intra-run variance estimate. More than one run is required.
Parameters:
-
data
(ndarray
) –The time series data. This should have shape (n_runs, n_samples) and must have at least two runs.
Returns:
-
float
–The statistical inefficiency.
Source code in red/ess.py
statistical_inefficiency_lugsail_variance
#
Compute the statistical inefficiency of a time series by dividing the lugsail replicated batch means variance estimate by the intra-run variance estimate. This is applicable to a single run.
Parameters:
-
data
(ndarray
) –The time series data. This should have shape (n_runs, n_samples) and must have at least two runs.
-
n_pow
(float
, default:1 / 3
) –The power to use in the lugsail variance estimate. This should be between 0 and 1. The default is 1/3.
Returns:
-
float
–The statistical inefficiency.
Source code in red/ess.py
ess_inter_variance
#
Compute the effective sample size of a time series by dividing the total number of samples by the statistical inefficiency, where the statistical inefficiency is calculated using the ratio of the inter-run and intra-run variance estimates.
Parameters:
-
data
(ndarray
) –The time series data. This should have shape (n_runs, n_samples) and must have at least two runs.
Returns:
-
float
–The effective sample size.
Source code in red/ess.py
ess_lugsail_variance
#
Compute the effective sample size of a time series by dividing the total number of samples by the statistical inefficiency, where the statistical inefficiency is calculated using the ratio of the lugsail replicated batch means and intra-run variance estimates.
Parameters:
-
data
(ndarray
) –The time series data. This should have shape (n_runs, n_samples) and must have at least two runs.
-
n_pow
(float
, default:1 / 3
) –The power to use in the lugsail variance estimate. This should be between 0 and 1. The default is 1/3.
Returns:
-
float
–The effective sample size.