westpa.mclib package
Module contents
A package for performing Monte Carlo bootstrap estimates of statistics.
- westpa.mclib.mcbs_correltime(dataset, alpha, n_sets=None)
Calculate the correlation time of the given
dataset, significant to the (1-alpha) level, using the method described in Huber & Kim, “Weighted-ensemble Brownian dynamics simulations for protein association reactions” (1996), doi:10.1016/S0006-3495(96)79552-8. An appropriate balance between space and speed is chosen based on the size of the input data.Returns 0 for data statistically uncorrelated with (1-alpha) confidence, otherwise the correlation length. (Thus, the appropriate stride for blocking is the result of this function plus one.)
- westpa.mclib.get_bssize(alpha)
Return a bootstrap data set size appropriate for the given confidence level.
- westpa.mclib.mcbs_ci(dataset, estimator, alpha, dlen, n_sets=None, args=None, kwargs=None, sort=<function msort>)
Perform a Monte Carlo bootstrap estimate for the (1-
alpha) confidence interval on the givendatasetwith the givenestimator. This routine is not appropriate for time-correlated data.Returns
(estimate, ci_lb, ci_ub)whereestimateis the application of the givenestimatorto the inputdataset, andci_lbandci_ubare the lower and upper limits, respectively, of the (1-alpha) confidence interval onestimate.estimatoris called asestimator(dataset, *args, **kwargs). Common estimators include:numpy.mean – calculate the confidence interval on the mean of
datasetnumpy.median – calculate a confidence interval on the median of
datasetnumpy.std – calculate a confidence interval on the standard deviation of
datset.
n_setsis the number of synthetic data sets to generate using the givenestimator, which will be chosen using `get_bssize()`_ ifn_setsis not given.sortcan be used to override the sorting routine used to calculate the confidence interval, which should only be necessary for estimators returning vectors rather than scalars.
- westpa.mclib.mcbs_ci_correl(estimator_datasets, estimator, alpha, n_sets=None, args=None, autocorrel_alpha=None, autocorrel_n_sets=None, subsample=None, do_correl=True, mcbs_enable=None, estimator_kwargs={})
Perform a Monte Carlo bootstrap estimate for the (1-
alpha) confidence interval on the givendatasetwith the givenestimator. This routine is appropriate for time-correlated data, using the method described in Huber & Kim, “Weighted-ensemble Brownian dynamics simulations for protein association reactions” (1996), doi:10.1016/S0006-3495(96)79552-8 to determine a statistically-significant correlation time and then reducing the dataset by a factor of that correlation time before running a “classic” Monte Carlo bootstrap.Returns
(estimate, ci_lb, ci_ub, correl_time)whereestimateis the application of the givenestimatorto the inputdataset,ci_lbandci_ubare the lower and upper limits, respectively, of the (1-alpha) confidence interval onestimate, andcorrel_timeis the correlation time of the dataset, significant to (1-autocorrel_alpha).estimatoris called asestimator(dataset, *args, **kwargs). Common estimators include:np.mean – calculate the confidence interval on the mean of
datasetnp.median – calculate a confidence interval on the median of
datasetnp.std – calculate a confidence interval on the standard deviation of
datset.
n_setsis the number of synthetic data sets to generate using the givenestimator, which will be chosen using `get_bssize()`_ ifn_setsis not given.autocorrel_alpha(which defaults toalpha) can be used to adjust the significance level of the autocorrelation calculation. Note that too high a significance level (too low an alpha) for evaluating the significance of autocorrelation values can result in a failure to detect correlation if the autocorrelation function is noisy.The given
subsamplefunction is used, if provided, to subsample the dataset prior to running the full Monte Carlo bootstrap. If none is provided, then a random entry from each correlated block is used as the value for that block. Other reasonable choices includenp.mean,np.median,(lambda x: x[0])or(lambda x: x[-1]). In particular, usingsubsample=np.meanwill converge to the block averaged mean and standard error, while accounting for any non-normality in the distribution of the mean.