# Stats

## General statistics

ArviZ.hdiFunction
Note

This function is forwarded to Python's arviz.hdi. The docstring of that function is included below.

    Calculate highest density interval (HDI) of array for given probability.

The HDI is the minimum width Bayesian credible interval (BCI).

Parameters
----------
ary: obj
object containing posterior samples.
Any object that can be converted to an :class:arviz.InferenceData object.
Refer to documentation of :func:arviz.convert_to_dataset for details.
hdi_prob: float, optional
Prob for which the highest density interval will be computed. Defaults to
stats.hdi_prob rcParam.
circular: bool, optional
Whether to compute the hdi taking into account x is a circular variable
(in the range [-np.pi, np.pi]) or not. Defaults to False (i.e non-circular variables).
Only works if multimodal is False.
multimodal: bool, optional
If true it may compute more than one hdi if the distribution is multimodal and the
modes are well separated.
skipna: bool, optional
If true ignores nan values when computing the hdi. Defaults to false.
group: str, optional
Specifies which InferenceData group should be used to calculate hdi.
Defaults to 'posterior'
var_names: list, optional
Names of variables to include in the hdi report. Prefix the variables by ~
when you want to exclude them from the report: ["~beta"] instead of ["beta"]
(see :func:arviz.summary for more details).
filter_vars: {None, "like", "regex"}, optional, default=None
If None (default), interpret var_names as the real variables names. If "like",
interpret var_names as substrings of the real variables names. If "regex",
interpret var_names as regular expressions on the real variables names. A la
pandas.filter.
coords: mapping, optional
Specifies the subset over to calculate hdi.
max_modes: int, optional
Specifies the maximum number of modes for multimodal case.
dask_kwargs : dict, optional
Dask related kwargs passed to :func:~arviz.wrap_xarray_ufunc.
kwargs: dict, optional
Additional keywords passed to :func:~arviz.wrap_xarray_ufunc.

Returns
-------
np.ndarray or xarray.Dataset, depending upon input
lower(s) and upper(s) values of the interval(s).

See Also
--------
plot_hdi : Plot highest density intervals for regression data.
xarray.Dataset.quantile : Calculate quantiles of array for given probabilities.

Examples
--------
Calculate the HDI of a Normal random variable:

.. ipython::

In [1]: import arviz as az
...: import numpy as np
...: data = np.random.normal(size=2000)
...: az.hdi(data, hdi_prob=.68)

Calculate the HDI of a dataset:

.. ipython::

In [1]: import arviz as az
...: data = az.load_arviz_data('centered_eight')
...: az.hdi(data)

We can also calculate the HDI of some of the variables of dataset:

.. ipython::

In [1]: az.hdi(data, var_names=["mu", "theta"])

By default, hdi is calculated over the chain and draw dimensions. We can use the
input_core_dims argument of :func:~arviz.wrap_xarray_ufunc to change this. In this example
we calculate the HDI also over the school dimension:

.. ipython::

In [1]: az.hdi(data, var_names="theta", input_core_dims = [["chain","draw", "school"]])

We can also calculate the hdi over a particular selection:

.. ipython::

In [1]: az.hdi(data, coords={"chain":[0, 1, 3]}, input_core_dims = [["draw"]])


source
ArviZ.summaryFunction
summary(
data; group = :posterior, coords dims, kwargs...,
) -> Union{Dataset,DataFrames.DataFrame}

Compute summary statistics on any object that can be passed to convert_to_dataset.

Keywords

• coords: Map from named dimension to named indices.
• dims: Map from variable name to names of its dimensions.
• kwargs: Keyword arguments passed to summarystats.
source
StatsBase.summarystatsFunction
summarystats(
data::InferenceData;
group = :posterior,
kwargs...,
) -> Union{Dataset,DataFrames.DataFrame}
summarystats(data::Dataset; kwargs...) -> Union{Dataset,DataFrames.DataFrame}

Compute summary statistics on data.

Arguments

• data::Union{Dataset,InferenceData}: The data on which to compute summary statistics. If data is an InferenceData, only the dataset corresponding to group is used.

Keywords

• var_names: Collection of names of variables as Symbols to include in summary
• include_circ::Bool=false: Whether to include circular statistics
• digits::Int: Number of decimals used to round results. If not provided, numbers are not rounded.
• stat_funcs::Union{Dict{String,Function},Vector{Function}}=nothing: A vector of functions or a dict of functions with function names as keys used to calculate statistics. By default, the mean, standard deviation, simulation standard error, and highest posterior density intervals are included. The functions will be given one argument, the samples for a variable as an array, The functions should operate on an array, returning a single number. For example, Statistics.mean, or Statistics.var would both work.
• extend::Bool=true: If true, use the statistics returned by stat_funcs in addition to, rather than in place of, the default statistics. This is only meaningful when stat_funcs is not nothing.
• hdi_prob::Real=0.94: HDI interval to compute. This is only meaningful when stat_funcs is nothing.
• skipna::Bool=false: If true, ignores NaN values when computing the summary statistics. It does not affect the behaviour of the functions passed to stat_funcs.

Returns

• DataFrames.DataFrame: Summary statistics for each variable. Default statistics are:
• mean
• sd
• hdi_3%
• hdi_97%
• mcse_mean
• mcse_sd
• ess_bulk
• ess_tail
• r_hat (only computed for traces with 2 or more chains)

Examples

using ArviZ
idata = load_example_data("centered_eight")
summarystats(idata; var_names=(:mu, :tau))

Other statistics can be calculated by passing a list of functions or a dictionary with key, function pairs:

using Statistics
function median_sd(x)
med = median(x)
sd = sqrt(mean((x .- med).^2))
return sd
end

func_dict = Dict(
"std" => x -> std(x; corrected = false),
"median_std" => median_sd,
"5%" => x -> quantile(x, 0.05),
"median" => median,
"95%" => x -> quantile(x, 0.95),
)

summarystats(idata; var_names = (:mu, :tau), stat_funcs = func_dict, extend = false)
source
ArviZ.r2_scoreFunction

R² for Bayesian regression models. Only valid for linear models.

Note

This function is forwarded to Python's arviz.r2_score. The docstring of that function is included below.


Parameters
----------
y_true: array-like of shape = (n_outputs,)
Ground truth (correct) target values.
y_pred: array-like of shape = (n_posterior_samples, n_outputs)
Estimated target values.

Returns
-------
Pandas Series with the following indices:
r2: Bayesian R²
r2_std: standard deviation of the Bayesian R².

See Also
--------
plot_lm : Posterior predictive and mean plots for regression-like data.

Examples
--------
Calculate R² for Bayesian regression models :

.. ipython::

In [1]: import arviz as az
...: data = az.load_arviz_data('regression1d')
...: y_true = data.observed_data["y"].values
...: y_pred = data.posterior_predictive.stack(sample=("chain", "draw"))["y"].values.T
...: az.r2_score(y_true, y_pred)


source

## Pareto-smoothed importance sampling

PSIS.PSISResultType
PSISResult

Result of Pareto-smoothed importance sampling (PSIS) using psis.

Properties

• log_weights: un-normalized Pareto-smoothed log weights
• weights: normalized Pareto-smoothed weights (allocates a copy)
• pareto_shape: Pareto $k=ξ$ shape parameter
• nparams: number of parameters in log_weights
• ndraws: number of draws in log_weights
• nchains: number of chains in log_weights
• reff: the ratio of the effective sample size of the unsmoothed importance ratios and the actual sample size.
• ess: estimated effective sample size of estimate of mean using smoothed importance samples (see ess_is)
• log_weights_norm: the logarithm of the normalization constant of log_weights
• tail_length: length of the upper tail of log_weights that was smoothed
• tail_dist: the generalized Pareto distribution that was fit to the tail of log_weights. Note that the tail weights are scaled to have a maximum of 1, so tail_dist * exp(maximum(log_ratios)) is the corresponding fit directly to the tail of log_ratios.

Diagnostic

The pareto_shape parameter $k=ξ$ of the generalized Pareto distribution tail_dist can be used to diagnose reliability and convergence of estimates using the importance weights [VehtariSimpson2021].

• if $k < \frac{1}{3}$, importance sampling is stable, and importance sampling (IS) and PSIS both are reliable.
• if $k ≤ \frac{1}{2}$, then the importance ratio distributon has finite variance, and the central limit theorem holds. As $k$ approaches the upper bound, IS becomes less reliable, while PSIS still works well but with a higher RMSE.
• if $\frac{1}{2} < k ≤ 0.7$, then the variance is infinite, and IS can behave quite poorly. However, PSIS works well in this regime.
• if $0.7 < k ≤ 1$, then it quickly becomes impractical to collect enough importance weights to reliably compute estimates, and importance sampling is not recommended.
• if $k > 1$, then neither the variance nor the mean of the raw importance ratios exists. The convergence rate is close to zero, and bias can be large with practical sample sizes.

See paretoshapeplot for a diagnostic plot.

PSIS.psisFunction
psis(log_ratios, reff = 1.0; kwargs...) -> PSISResult
psis!(log_ratios, reff = 1.0; kwargs...) -> PSISResult

Compute Pareto smoothed importance sampling (PSIS) log weights [VehtariSimpson2021].

While psis computes smoothed log weights out-of-place, psis! smooths them in-place.

Arguments

• log_ratios: an array of logarithms of importance ratios, with one of the following sizes:

• (ndraws,): a vector of draws for a single parameter from a single chain
• (nparams, ndraws): a matrix of draws for a multiple parameter from a single chain
• (nparams, ndraws, nchains): an array of draws for multiple parameters from multiple chains, e.g. as might be generated with Markov chain Monte Carlo.
• reff::Union{Real,AbstractVector}: the ratio(s) of effective sample size of log_ratios and the actual sample size reff = ess/(ndraws * nchains), used to account for autocorrelation, e.g. due to Markov chain Monte Carlo.

Keywords

• improved=false: If true, use the adaptive empirical prior of [Zhang2010]. If false, use the simpler prior of [ZhangStephens2009], which is also used in [VehtariSimpson2021].
• warn=true: If true, warning messages are delivered

Returns

A warning is raised if the Pareto shape parameter $k ≥ 0.7$. See PSISResult for details and paretoshapeplot for a diagnostic plot.

PSIS.psis!Function
psis(log_ratios, reff = 1.0; kwargs...) -> PSISResult
psis!(log_ratios, reff = 1.0; kwargs...) -> PSISResult

Compute Pareto smoothed importance sampling (PSIS) log weights [VehtariSimpson2021].

While psis computes smoothed log weights out-of-place, psis! smooths them in-place.

Arguments

• log_ratios: an array of logarithms of importance ratios, with one of the following sizes:

• (ndraws,): a vector of draws for a single parameter from a single chain
• (nparams, ndraws): a matrix of draws for a multiple parameter from a single chain
• (nparams, ndraws, nchains): an array of draws for multiple parameters from multiple chains, e.g. as might be generated with Markov chain Monte Carlo.
• reff::Union{Real,AbstractVector}: the ratio(s) of effective sample size of log_ratios and the actual sample size reff = ess/(ndraws * nchains), used to account for autocorrelation, e.g. due to Markov chain Monte Carlo.

Keywords

• improved=false: If true, use the adaptive empirical prior of [Zhang2010]. If false, use the simpler prior of [ZhangStephens2009], which is also used in [VehtariSimpson2021].
• warn=true: If true, warning messages are delivered

Returns

A warning is raised if the Pareto shape parameter $k ≥ 0.7$. See PSISResult for details and paretoshapeplot for a diagnostic plot.

## Model assessment and selection

ArviZ.compareFunction

Compare models based on PSIS-LOO loo or WAIC waic cross-validation.

Note

This function is forwarded to Python's arviz.compare. The docstring of that function is included below.


LOO is leave-one-out (PSIS-LOO loo) cross-validation and
WAIC is the widely applicable information criterion.
Read more theory here - in a paper by some of the leading authorities
on model selection dx.doi.org/10.1111/1467-9868.00353

Parameters
----------
compare_dict: dict of {str: InferenceData or ELPDData}
A dictionary of model names and :class:arviz.InferenceData or ELPDData.
ic: str, optional
Information Criterion (PSIS-LOO loo or WAIC waic) used to compare models. Defaults to
rcParams["stats.information_criterion"].
method: str, optional
Method used to estimate the weights for each model. Available options are:

- 'stacking' : stacking of predictive distributions.
- 'BB-pseudo-BMA' : pseudo-Bayesian Model averaging using Akaike-type
weighting. The weights are stabilized using the Bayesian bootstrap.
- 'pseudo-BMA': pseudo-Bayesian Model averaging using Akaike-type
weighting, without Bootstrap stabilization (not recommended).

For more information read https://arxiv.org/abs/1704.02030
b_samples: int, optional default = 1000
Number of samples taken by the Bayesian bootstrap estimation.
Only useful when method = 'BB-pseudo-BMA'.
Defaults to rcParams["stats.ic_compare_method"].
alpha: float, optional
The shape parameter in the Dirichlet distribution used for the Bayesian bootstrap. Only
useful when method = 'BB-pseudo-BMA'. When alpha=1 (default), the distribution is uniform
on the simplex. A smaller alpha will keeps the final weights more away from 0 and 1.
seed: int or np.random.RandomState instance, optional
If int or RandomState, use it for seeding Bayesian bootstrap. Only
useful when method = 'BB-pseudo-BMA'. Default None the global
:mod:numpy.random state is used.
scale: str, optional
Output scale for IC. Available options are:

- log : (default) log-score (after Vehtari et al. (2017))
- negative_log : -1 * (log-score)
- deviance : -2 * (log-score)

A higher log-score (or a lower deviance) indicates a model with better predictive
accuracy.
var_name: str, optional
If there is more than a single observed variable in the InferenceData, which
should be used as the basis for comparison.

Returns
-------
A DataFrame, ordered from best to worst model (measured by information criteria).
The index reflects the key with which the models are passed to this function. The columns are:
rank: The rank-order of the models. 0 is the best.
IC: Information Criteria (PSIS-LOO loo or WAIC waic).
Higher IC indicates higher out-of-sample predictive fit ("better" model). Default LOO.
If scale is deviance or negative_log smaller IC indicates
higher out-of-sample predictive fit ("better" model).
pIC: Estimated effective number of parameters.
dIC: Relative difference between each IC (PSIS-LOO loo or WAIC waic)
and the lowest IC (PSIS-LOO loo or WAIC waic).
The top-ranked model is always 0.
weight: Relative weight for each model.
This can be loosely interpreted as the probability of each model (among the compared model)
given the data. By default the uncertainty in the weights estimation is considered using
Bayesian bootstrap.
SE: Standard error of the IC estimate.
If method = BB-pseudo-BMA these values are estimated using Bayesian bootstrap.
dSE: Standard error of the difference in IC between each model and the top-ranked model.
It's always 0 for the top-ranked model.
warning: A value of 1 indicates that the computation of the IC may not be reliable.
This could be indication of WAIC/LOO starting to fail see
http://arxiv.org/abs/1507.04544 for details.
scale: Scale used for the IC.

Examples
--------
Compare the centered and non centered models of the eight school problem:

.. ipython::

In [1]: import arviz as az
...: data1 = az.load_arviz_data("non_centered_eight")
...: data2 = az.load_arviz_data("centered_eight")
...: compare_dict = {"non centered": data1, "centered": data2}
...: az.compare(compare_dict)

Compare the models using LOO-CV, returning the IC in log scale and calculating the
weights using the stacking method.

.. ipython::

In [1]: az.compare(compare_dict, ic="loo", method="stacking", scale="log")

See Also
--------
loo : Compute the Pareto Smoothed importance sampling Leave One Out cross-validation.
waic : Compute the widely applicable information criterion.
plot_compare : Summary plot for model comparison.

References
----------
.. [1] Vehtari, A., Gelman, A. & Gabry, J. Practical Bayesian model evaluation using
leave-one-out cross-validation and WAIC. Stat Comput 27, 1413–1432 (2017)
see https://doi.org/10.1007/s11222-016-9696-4


source
ArviZ.looFunction

Compute Pareto-smoothed importance sampling leave-one-out cross-validation (PSIS-LOO-CV).

Note

This function is forwarded to Python's arviz.loo. The docstring of that function is included below.


Estimates the expected log pointwise predictive density (elpd) using Pareto-smoothed
importance sampling leave-one-out cross-validation (PSIS-LOO-CV). Also calculates LOO's
standard error and the effective number of parameters. Read more theory here
https://arxiv.org/abs/1507.04544 and here https://arxiv.org/abs/1507.02646

Parameters
----------
data: obj
Any object that can be converted to an :class:arviz.InferenceData object.
Refer to documentation of
:func:arviz.convert_to_dataset for details.
pointwise: bool, optional
If True the pointwise predictive accuracy will be returned. Defaults to
stats.ic_pointwise rcParam.
var_name : str, optional
The name of the variable in log_likelihood groups storing the pointwise log
likelihood data to use for loo computation.
reff: float, optional
Relative MCMC efficiency, ess / n i.e. number of effective samples divided by the number
of actual samples. Computed from trace by default.
scale: str
Output scale for loo. Available options are:

- log : (default) log-score
- negative_log : -1 * log-score
- deviance : -2 * log-score

A higher log-score (or a lower deviance or negative log_score) indicates a model with
better predictive accuracy.

Returns
-------
ELPDData object (inherits from :class:pandas.Series) with the following row/attributes:
loo: approximated expected log pointwise predictive density (elpd)
loo_se: standard error of loo
p_loo: effective number of parameters
shape_warn: bool
True if the estimated shape parameter of
Pareto distribution is greater than 0.7 for one or more samples
loo_i: array of pointwise predictive accuracy, only if pointwise True
pareto_k: array of Pareto shape values, only if pointwise True
loo_scale: scale of the loo results

The returned object has a custom print method that overrides pd.Series method.

See Also
--------
compare : Compare models based on PSIS-LOO loo or WAIC waic cross-validation.
waic : Compute the widely applicable information criterion.
plot_compare : Summary plot for model comparison.
plot_elpd : Plot pointwise elpd differences between two or more models.
plot_khat : Plot Pareto tail indices for diagnosing convergence.

Examples
--------
Calculate LOO of a model:

.. ipython::

In [1]: import arviz as az
...: data = az.load_arviz_data("centered_eight")
...: az.loo(data)

Calculate LOO of a model and return the pointwise values:

.. ipython::

In [2]: data_loo = az.loo(data, pointwise=True)
...: data_loo.loo_i

source
ArviZ.loo_pitFunction

Compute leave one out (PSIS-LOO) probability integral transform (PIT) values.

Note

This function is forwarded to Python's arviz.loo_pit. The docstring of that function is included below.


Parameters
----------
idata: InferenceData
:class:arviz.InferenceData object.
y: array, DataArray or str
Observed data. If str, idata must be present and contain the observed data group
y_hat: array, DataArray or str
Posterior predictive samples for y. It must have the same shape as y plus an
extra dimension at the end of size n_samples (chains and draws stacked). If str or
None, idata must contain the posterior predictive group. If None, y_hat is taken
equal to y, thus, y must be str too.
log_weights: array or DataArray
Smoothed log_weights. It must have the same shape as y_hat
dask_kwargs : dict, optional
Dask related kwargs passed to :func:~arviz.wrap_xarray_ufunc.

Returns
-------
loo_pit: array or DataArray
Value of the LOO-PIT at each observed data point.

See Also
--------
plot_loo_pit : Plot Leave-One-Out probability integral transformation (PIT) predictive checks.
loo : Compute Pareto-smoothed importance sampling leave-one-out
cross-validation (PSIS-LOO-CV).
plot_elpd : Plot pointwise elpd differences between two or more models.
plot_khat : Plot Pareto tail indices for diagnosing convergence.

Examples
--------
Calculate LOO-PIT values using as test quantity the observed values themselves.

.. ipython::

In [1]: import arviz as az
...: data = az.load_arviz_data("centered_eight")
...: az.loo_pit(idata=data, y="obs")

Calculate LOO-PIT values using as test quantity the square of the difference between
each observation and mu. Both y and y_hat inputs will be array-like,
but idata will still be passed in order to calculate the log_weights from
there.

.. ipython::

In [1]: T = data.observed_data.obs - data.posterior.mu.median(dim=("chain", "draw"))
...: T_hat = data.posterior_predictive.obs - data.posterior.mu
...: T_hat = T_hat.stack(__sample__=("chain", "draw"))
...: az.loo_pit(idata=data, y=T**2, y_hat=T_hat**2)


source
ArviZ.waicFunction

Compute the widely applicable information criterion.

Note

This function is forwarded to Python's arviz.waic. The docstring of that function is included below.


Estimates the expected log pointwise predictive density (elpd) using WAIC. Also calculates the
WAIC's standard error and the effective number of parameters.
Read more theory here https://arxiv.org/abs/1507.04544 and here https://arxiv.org/abs/1004.2316

Parameters
----------
data: obj
Any object that can be converted to an :class:arviz.InferenceData object.
Refer to documentation of :func:arviz.convert_to_inference_data for details.
pointwise: bool
If True the pointwise predictive accuracy will be returned. Defaults to
stats.ic_pointwise rcParam.
var_name : str, optional
The name of the variable in log_likelihood groups storing the pointwise log
likelihood data to use for waic computation.
scale: str
Output scale for WAIC. Available options are:

- log : (default) log-score
- negative_log : -1 * log-score
- deviance : -2 * log-score

A higher log-score (or a lower deviance or negative log_score) indicates a model with
better predictive accuracy.
dask_kwargs : dict, optional
Dask related kwargs passed to :func:~arviz.wrap_xarray_ufunc.

Returns
-------
ELPDData object (inherits from :class:pandas.Series) with the following row/attributes:
waic: approximated expected log pointwise predictive density (elpd)
waic_se: standard error of waic
p_waic: effective number parameters
var_warn: bool
True if posterior variance of the log predictive densities exceeds 0.4
waic_i: :class:~xarray.DataArray with the pointwise predictive accuracy,
only if pointwise=True
waic_scale: scale of the reported waic results

The returned object has a custom print method that overrides pd.Series method.

See Also
--------
loo : Compute Pareto-smoothed importance sampling leave-one-out cross-validation (PSIS-LOO-CV).
compare : Compare models based on PSIS-LOO-CV or WAIC.
plot_compare : Summary plot for model comparison.

Examples
--------
Calculate WAIC of a model:

.. ipython::

In [1]: import arviz as az
...: data = az.load_arviz_data("centered_eight")
...: az.waic(data)

Calculate WAIC of a model and return the pointwise values:

.. ipython::

In [2]: data_waic = az.waic(data, pointwise=True)
...: data_waic.waic_i

source