Diagnostics
MCMCDiagnosticTools.AutocovMethod
MCMCDiagnosticTools.BDAAutocovMethod
MCMCDiagnosticTools.FFTAutocovMethod
MCMCDiagnosticTools.bfmi
MCMCDiagnosticTools.ess
MCMCDiagnosticTools.ess_rhat
MCMCDiagnosticTools.mcse
MCMCDiagnosticTools.rhat
MCMCDiagnosticTools.rstar
Bayesian fraction of missing information
MCMCDiagnosticTools.bfmi
— Functionbfmi(energy::AbstractVector{<:Real}) -> Real
bfmi(energy::AbstractMatrix{<:Real}; dims::Int=1) -> AbstractVector{<:Real}
Calculate the estimated Bayesian fraction of missing information (BFMI).
When sampling with Hamiltonian Monte Carlo (HMC), BFMI quantifies how well momentum resampling matches the marginal energy distribution.
The current advice is that values smaller than 0.3 indicate poor sampling. However, this threshold is provisional and may change. A BFMI value below the threshold often indicates poor adaptation of sampling parameters or that the target distribution has heavy tails that were not well explored by the Markov chain.
For more information, see Section 6.1 of [Betancourt2018] or [Betancourt2016] for a complete account.
energy
is either a vector of Hamiltonian energies of draws or a matrix of energies of draws for multiple chains. dims
indicates the dimension in energy
that contains the draws. The default dims=1
assumes energy
has the shape draws
or (draws, chains)
. If a different shape is provided, dims
must be set accordingly.
If energy
is a vector, a single BFMI value is returned. Otherwise, a vector of BFMI values for each chain is returned.
Effective sample size and $\widehat{R}$ diagnostic
MCMCDiagnosticTools.ess
— Functioness(data::InferenceData; kwargs...) -> Dataset
ess(data::Dataset; kwargs...) -> Dataset
Calculate the effective sample size (ESS) for each parameter in the data.
ess(
samples::AbstractArray{<:Union{Missing,Real}};
kind=:bulk,
relative::Bool=false,
autocov_method=AutocovMethod(),
split_chains::Int=2,
maxlag::Int=250,
kwargs...
)
Estimate the effective sample size (ESS) of the samples
of shape (draws, [chains[, parameters...]])
with the autocov_method
.
Optionally, the kind
of ESS estimate to be computed can be specified (see below). Some kind
s accept additional kwargs
.
If relative
is true
, the relative ESS is returned, i.e. ess / (draws * chains)
.
split_chains
indicates the number of chains each chain is split into. When split_chains > 1
, then the diagnostics check for within-chain convergence. When d = mod(draws, split_chains) > 0
, i.e. the chains cannot be evenly split, then 1 draw is discarded after each of the first d
splits within each chain. There must be at least 3 draws in each chain after splitting.
maxlag
indicates the maximum lag for which autocovariance is computed and must be greater than 0.
For a given estimand, it is recommended that the ESS is at least 100 * chains
and that $\widehat{R} < 1.01$.[VehtariGelman2021]
See also: AutocovMethod
, FFTAutocovMethod
, BDAAutocovMethod
, rhat
, ess_rhat
, mcse
Kinds of ESS estimates
If kind
isa a Symbol
, it may take one of the following values:
:bulk
: basic ESS computed on rank-normalized draws. This kind diagnoses poor convergence in the bulk of the distribution due to trends or different locations of the chains.:tail
: minimum of the quantile-ESS for the symmetric quantiles wheretail_prob=0.1
is the probability in the tails. This kind diagnoses poor convergence in the tails of the distribution. If this kind is chosen,kwargs
may contain atail_prob
keyword.:basic
: basic ESS, equivalent to specifyingkind=Statistics.mean
.
While Bulk-ESS is conceptually related to basic ESS, it is well-defined even if the chains do not have finite variance.[VehtariGelman2021] For each parameter, rank-normalization proceeds by first ranking the inputs using "tied ranking" and then transforming the ranks to normal quantiles so that the result is standard normally distributed. This transform is monotonic.
Otherwise, kind
specifies one of the following estimators, whose ESS is to be estimated:
Statistics.mean
Statistics.median
Statistics.std
StatsBase.mad
Base.Fix2(Statistics.quantile, p::Real)
MCMCDiagnosticTools.rhat
— Functionrhat(data::InferenceData; kwargs...) -> Dataset
rhat(data::Dataset; kwargs...) -> Dataset
Calculate the $\widehat{R}$ diagnostic for each parameter in the data.
rhat(samples::AbstractArray{Union{Real,Missing}}; kind::Symbol=:rank, split_chains=2)
Compute the $\widehat{R}$ diagnostics for each parameter in samples
of shape (draws, [chains[, parameters...]])
.[VehtariGelman2021]
kind
indicates the kind of $\widehat{R}$ to compute (see below).
split_chains
indicates the number of chains each chain is split into. When split_chains > 1
, then the diagnostics check for within-chain convergence. When d = mod(draws, split_chains) > 0
, i.e. the chains cannot be evenly split, then 1 draw is discarded after each of the first d
splits within each chain.
Kinds of $\widehat{R}$
The following kind
s are supported:
:rank
: maximum of $\widehat{R}$ withkind=:bulk
andkind=:tail
.:bulk
: basic $\widehat{R}$ computed on rank-normalized draws. This kind diagnoses poor convergence in the bulk of the distribution due to trends or different locations of the chains.:tail
: $\widehat{R}$ computed on draws folded around the median and then rank-normalized. This kind diagnoses poor convergence in the tails of the distribution due to different scales of the chains.:basic
: Classic $\widehat{R}$.
MCMCDiagnosticTools.ess_rhat
— Functioness_rhat(data::InferenceData; kwargs...) -> Dataset
ess_rhat(data::Dataset; kwargs...) -> Dataset
Calculate the effective sample size (ESS) and $\widehat{R}$ diagnostic for each parameter in the data.
ess_rhat(
samples::AbstractArray{<:Union{Missing,Real}};
kind::Symbol=:rank,
kwargs...,
) -> NamedTuple{(:ess, :rhat)}
Estimate the effective sample size and $\widehat{R}$ of the samples
of shape (draws, [chains[, parameters...]])
.
When both ESS and $\widehat{R}$ are needed, this method is often more efficient than calling ess
and rhat
separately.
See rhat
for a description of supported kind
s and ess
for a description of kwargs
.
The following autocovariance methods are supported:
MCMCDiagnosticTools.AutocovMethod
— TypeAutocovMethod <: AbstractAutocovMethod
The AutocovMethod
uses a standard algorithm for estimating the mean autocovariance of MCMC chains.
It is is based on the discussion by [VehtariGelman2021] and uses the biased estimator of the autocovariance, as discussed by [Geyer1992].
MCMCDiagnosticTools.FFTAutocovMethod
— TypeFFTAutocovMethod <: AbstractAutocovMethod
The FFTAutocovMethod
uses a standard algorithm for estimating the mean autocovariance of MCMC chains.
The algorithm is the same as the one of AutocovMethod
but this method uses fast Fourier transforms (FFTs) for estimating the autocorrelation.
To be able to use this method, you have to load a package that implements the AbstractFFTs.jl interface such as FFTW.jl or FastTransforms.jl.
MCMCDiagnosticTools.BDAAutocovMethod
— TypeBDAAutocovMethod <: AbstractAutocovMethod
The BDAAutocovMethod
uses a standard algorithm for estimating the mean autocovariance of MCMC chains.
It is is based on the discussion by [VehtariGelman2021]. and uses the variogram estimator of the autocorrelation function discussed by [BDA3].
Monte Carlo standard error
MCMCDiagnosticTools.mcse
— Functionmcse(data::InferenceData; kwargs...) -> Dataset
mcse(data::Dataset; kwargs...) -> Dataset
Calculate the Monte Carlo standard error (MCSE) for each parameter in the data.
mcse(samples::AbstractArray{<:Union{Missing,Real}}; kind=Statistics.mean, kwargs...)
Estimate the Monte Carlo standard errors (MCSE) of the estimator kind
applied to samples
of shape (draws, [chains[, parameters...]])
.
See also: ess
Kinds of MCSE estimates
The estimator whose MCSE should be estimated is specified with kind
. kind
must accept a vector of the same eltype
as samples
and return a real estimate.
For the following estimators, the effective sample size ess
and an estimate of the asymptotic variance are used to compute the MCSE, and kwargs
are forwarded to ess
:
Statistics.mean
Statistics.median
Statistics.std
Base.Fix2(Statistics.quantile, p::Real)
For other estimators, the subsampling bootstrap method (SBM)[FlegalJones2011][Flegal2012] is used as a fallback, and the only accepted kwargs
are batch_size
, which indicates the size of the overlapping batches used to estimate the MCSE, defaulting to floor(Int, sqrt(draws * chains))
. Note that SBM tends to underestimate the MCSE, especially for highly autocorrelated chains. One should verify that autocorrelation is low by checking the bulk- and tail-ESS values.
$R^*$ diagnostic
MCMCDiagnosticTools.rstar
— Functionrstar(
rng::Random.AbstractRNG=Random.default_rng(),
classifier,
data::Union{InferenceData,Dataset};
kwargs...,
)
Calculate the $R^*$ diagnostic for the data.
rstar(
rng::Random.AbstractRNG=Random.default_rng(),
classifier,
samples,
chain_indices::AbstractVector{Int};
subset::Real=0.7,
split_chains::Int=2,
verbosity::Int=0,
)
Compute the $R^*$ convergence statistic of the table samples
with the classifier
.
samples
must be either an AbstractMatrix
, an AbstractVector
, or a table (i.e. implements the Tables.jl interface) whose rows are draws and whose columns are parameters.
chain_indices
indicates the chain ids of each row of samples
.
This method supports ragged chains, i.e. chains of nonequal lengths.
rstar(
rng::Random.AbstractRNG=Random.default_rng(),
classifier,
samples::AbstractArray{<:Real};
subset::Real=0.7,
split_chains::Int=2,
verbosity::Int=0,
)
Compute the $R^*$ convergence statistic of the samples
with the classifier
.
samples
is an array of draws with the shape (draws, [chains[, parameters...]])
.`
This implementation is an adaption of algorithms 1 and 2 described by Lambert and Vehtari.
The classifier
has to be a supervised classifier of the MLJ framework (see the MLJ documentation for a list of supported models). It is trained with a subset
of the samples from each chain. Each chain is split into split_chains
separate chains to additionally check for within-chain convergence. The training of the classifier can be inspected by adjusting the verbosity
level.
If the classifier is deterministic, i.e., if it predicts a class, the value of the $R^*$ statistic is returned (algorithm 1). If the classifier is probabilistic, i.e., if it outputs probabilities of classes, the scaled Poisson-binomial distribution of the $R^*$ statistic is returned (algorithm 2).
The correctness of the statistic depends on the convergence of the classifier
used internally in the statistic.
Examples
julia> using MLJBase, MLJIteration, EvoTrees, Statistics, StatisticalMeasures
julia> samples = fill(4.0, 100, 3, 2);
One can compute the distribution of the $R^*$ statistic (algorithm 2) with a probabilistic classifier. For instance, we can use a gradient-boosted trees model with nrounds = 100
sequentially stacked trees and learning rate eta = 0.05
:
julia> model = EvoTreeClassifier(; nrounds=100, eta=0.05);
julia> distribution = rstar(model, samples);
julia> round(mean(distribution); digits=2)
1.0f0
Note, however, that it is recommended to determine nrounds
based on early-stopping. With the MLJ framework, this can be achieved in the following way (see the MLJ documentation for additional explanations):
julia> model = IteratedModel(;
model=EvoTreeClassifier(; eta=0.05),
iteration_parameter=:nrounds,
resampling=Holdout(),
measures=log_loss,
controls=[Step(5), Patience(2), NumberLimit(100)],
retrain=true,
);
julia> distribution = rstar(model, samples);
julia> round(mean(distribution); digits=2)
1.0f0
For deterministic classifiers, a single $R^*$ statistic (algorithm 1) is returned. Deterministic classifiers can also be derived from probabilistic classifiers by e.g. predicting the mode. In MLJ this corresponds to a pipeline of models.
julia> evotree_deterministic = Pipeline(model; operation=predict_mode);
julia> value = rstar(evotree_deterministic, samples);
julia> round(value; digits=2)
1.0
References
Lambert, B., & Vehtari, A. (2020). $R^*$: A robust MCMC convergence diagnostic with uncertainty using decision tree classifiers.
- Betancourt2018Betancourt M. (2018). A Conceptual Introduction to Hamiltonian Monte Carlo. arXiv:1701.02434v2 [stat.ME]
- Betancourt2016Betancourt M. (2016). Diagnosing Suboptimal Cotangent Disintegrations in Hamiltonian Monte Carlo. arXiv:1604.00695v1 [stat.ME]
- VehtariGelman2021Vehtari, A., Gelman, A., Simpson, D., Carpenter, B., & Bürkner, P. C. (2021). Rank-normalization, folding, and localization: An improved $\widehat {R}$ for assessing convergence of MCMC. Bayesian Analysis. doi: 10.1214/20-BA1221 arXiv: 1903.08008
- VehtariGelman2021Vehtari, A., Gelman, A., Simpson, D., Carpenter, B., & Bürkner, P. C. (2021). Rank-normalization, folding, and localization: An improved $\widehat {R}$ for assessing convergence of MCMC. Bayesian Analysis. doi: 10.1214/20-BA1221 arXiv: 1903.08008
- VehtariGelman2021Vehtari, A., Gelman, A., Simpson, D., Carpenter, B., & Bürkner, P. C. (2021). Rank-normalization, folding, and localization: An improved $\widehat {R}$ for assessing convergence of MCMC. Bayesian Analysis. doi: 10.1214/20-BA1221 arXiv: 1903.08008
- Geyer1992Geyer, C. J. (1992). Practical Markov Chain Monte Carlo. Statistical Science, 473-483.
- VehtariGelman2021Vehtari, A., Gelman, A., Simpson, D., Carpenter, B., & Bürkner, P. C. (2021). Rank-normalization, folding, and localization: An improved $\widehat {R}$ for assessing convergence of MCMC. Bayesian Analysis. doi: 10.1214/20-BA1221 arXiv: 1903.08008
- BDA3Gelman, A., Carlin, J. B., Stern, H. S., Dunson, D. B., Vehtari, A., & Rubin, D. B. (2013). Bayesian data analysis. CRC press.
- FlegalJones2011Flegal JM, Jones GL. (2011) Implementing MCMC: estimating with confidence. Handbook of Markov Chain Monte Carlo. pp. 175-97. pdf
- Flegal2012Flegal JM. (2012) Applicability of subsampling bootstrap methods in Markov chain Monte Carlo. Monte Carlo and Quasi-Monte Carlo Methods 2010. pp. 363-72. doi: 10.1007/978-3-642-27440-4_18