Data

Inference library converters

ArviZ.from_cmdstanFunction

Convert CmdStan data into an InferenceData object.

Note

This function is forwarded to Python's arviz.from_cmdstan. The docstring of that function is included below.


    For a usage example read the
    :ref:`Creating InferenceData section on from_cmdstan <creating_InferenceData>`

    Parameters
    ----------
    posterior : str or list of str, optional
        List of paths to output.csv files.
    posterior_predictive : str or list of str, optional
        Posterior predictive samples for the fit. If endswith ".csv" assumes file.
    predictions : str or list of str, optional
        Out of sample predictions samples for the fit. If endswith ".csv" assumes file.
    prior : str or list of str, optional
        List of paths to output.csv files
    prior_predictive : str or list of str, optional
        Prior predictive samples for the fit. If endswith ".csv" assumes file.
    observed_data : str, optional
        Observed data used in the sampling. Path to data file in Rdump or JSON format.
    observed_data_var : str or list of str, optional
        Variable(s) used for slicing observed_data. If not defined, all
        data variables are imported.
    constant_data : str, optional
        Constant data used in the sampling. Path to data file in Rdump or JSON format.
    constant_data_var : str or list of str, optional
        Variable(s) used for slicing constant_data. If not defined, all
        data variables are imported.
    predictions_constant_data : str, optional
        Constant data for predictions used in the sampling.
        Path to data file in Rdump or JSON format.
    predictions_constant_data_var : str or list of str, optional
        Variable(s) used for slicing predictions_constant_data.
        If not defined, all data variables are imported.
    log_likelihood : dict of {str: str}, list of str or str, optional
        Pointwise log_likelihood for the data. log_likelihood is extracted from the
        posterior. It is recommended to use this argument as a dictionary whose keys
        are observed variable names and its values are the variables storing log
        likelihood arrays in the Stan code. In other cases, a dictionary with keys
        equal to its values is used. By default, if a variable ``log_lik`` is
        present in the Stan model, it will be retrieved as pointwise log
        likelihood values. Use ``False`` to avoid this behaviour.
    index_origin : int, optional
        Starting value of integer coordinate values. Defaults to the value in rcParam
        ``data.index_origin``.
    coords : dict of {str: array_like}, optional
        A dictionary containing the values that are used as index. The key
        is the name of the dimension, the values are the index values.
    dims : dict of {str: list of str}, optional
        A mapping from variables to a list of coordinate names for the variable.
    disable_glob : bool
        Don't use glob for string input. This means that all string input is
        assumed to be variable names (samples) or a path (data).
    save_warmup : bool
        Save warmup iterations into InferenceData object, if found in the input files.
        If not defined, use default defined by the rcParams.
    dtypes : dict or str
        A dictionary containing dtype information (int, float) for parameters.
        If input is a string, it is assumed to be a model code or path to model code file.

    Returns
    -------
    InferenceData object
    
source
ArviZ.from_mcmcchainsFunction
from_mcmcchains(posterior::MCMCChains.Chains; kwargs...) -> InferenceData
from_mcmcchains(; kwargs...) -> InferenceData
from_mcmcchains(
    posterior::MCMCChains.Chains,
    posterior_predictive,
    predictions,
    log_likelihood;
    kwargs...
) -> InferenceData

Convert data in an MCMCChains.Chains format into an InferenceData.

Any keyword argument below without an an explicitly annotated type above is allowed, so long as it can be passed to convert_to_inference_data.

Arguments

  • posterior::MCMCChains.Chains: Draws from the posterior

Keywords

  • posterior_predictive::Any=nothing: Draws from the posterior predictive distribution or name(s) of predictive variables in posterior
  • predictions: Out-of-sample predictions for the posterior.
  • prior: Draws from the prior
  • prior_predictive: Draws from the prior predictive distribution or name(s) of predictive variables in prior
  • observed_data: Observed data on which the posterior is conditional. It should only contain data which is modeled as a random variable. Keys are parameter names and values.
  • constant_data: Model constants, data included in the model that are not modeled as random variables. Keys are parameter names.
  • predictions_constant_data: Constants relevant to the model predictions (i.e. new x values in a linear regression).
  • log_likelihood: Pointwise log-likelihood for the data. It is recommended to use this argument as a named tuple whose keys are observed variable names and whose values are log likelihood arrays. Alternatively, provide the name of variable in posterior containing log likelihoods.
  • library=MCMCChains: Name of library that generated the chains
  • coords: Map from named dimension to named indices
  • dims: Map from variable name to names of its dimensions
  • eltypes: Map from variable names to eltypes. This is primarily used to assign discrete eltypes to discrete variables that were stored in Chains as floats.

Returns

  • InferenceData: The data with groups corresponding to the provided data
source
ArviZ.from_samplechainsFunction
from_samplechains(
    posterior=nothing;
    prior=nothing,
    library=SampleChains,
    kwargs...,
) -> InferenceData

Convert SampleChains samples to an InferenceData.

Either posterior or prior may be a SampleChains.AbstractChain or SampleChains.MultiChain object.

For descriptions of remaining kwargs, see from_namedtuple.

source

IO / Conversion

ArviZ.from_dictFunction

Convert Dictionary data into an InferenceData object.

Note

This function is forwarded to Python's arviz.from_dict. The docstring of that function is included below.


    For a usage example read the
    :ref:`Creating InferenceData section on from_dict <creating_InferenceData>`

    Parameters
    ----------
    posterior : dict
    posterior_predictive : dict
    predictions: dict
    sample_stats : dict
    log_likelihood : dict
        For stats functions, log likelihood data should be stored here.
    prior : dict
    prior_predictive : dict
    observed_data : dict
    constant_data : dict
    predictions_constant_data: dict
    warmup_posterior : dict
    warmup_posterior_predictive : dict
    warmup_predictions : dict
    warmup_log_likelihood : dict
    warmup_sample_stats : dict
    save_warmup : bool
        Save warmup iterations InferenceData object. If not defined, use default
        defined by the rcParams.
    index_origin : int, optional
    coords : dict[str, iterable]
        A dictionary containing the values that are used as index. The key
        is the name of the dimension, the values are the index values.
    dims : dict[str, List(str)]
        A mapping from variables to a list of coordinate names for the variable.
    pred_dims : dict[str, List(str)]
        A mapping from variables to a list of coordinate names for predictions.
    pred_coords : dict[str, List(str)]
        A mapping from variables to a list of coordinate values for predictions.
    attrs : dict
        A dictionary containing attributes for different groups.
    kwargs : dict
        A dictionary containing group attrs.
        Accepted kwargs are:
        - posterior_attrs, posterior_warmup_attrs : attrs for posterior group
        - sample_stats_attrs, sample_stats_warmup_attrs : attrs for sample_stats group
        - log_likelihood_attrs, log_likelihood_warmup_attrs : attrs for log_likelihood group
        - posterior_predictive_attrs, posterior_predictive_warmup_attrs : attrs for
                posterior_predictive group
        - predictions_attrs, predictions_warmup_attrs : attrs for predictions group
        - prior_attrs : attrs for prior group
        - sample_stats_prior_attrs : attrs for sample_stats_prior group
        - prior_predictive_attrs : attrs for prior_predictive group

    Returns
    -------
    InferenceData object
    
source
ArviZ.from_jsonFunction

Initialize object from a json file.

Note

This function is forwarded to Python's arviz.from_json. The docstring of that function is included below.


    Will use the faster `ujson` (https://github.com/ultrajson/ultrajson) if it is available.

    Parameters
    ----------
    filename : str
        location of json file

    Returns
    -------
    InferenceData object
    
source
ArviZ.from_netcdfFunction

Load netcdf file back into an arviz.InferenceData.

Note

This function is forwarded to Python's arviz.from_netcdf. The docstring of that function is included below.


    Parameters
    ----------
    filename : str
        name or path of the file to load trace
    group_kwargs : dict of {str: dict}
        Keyword arguments to be passed into each call of :func:`xarray.open_dataset`.
        The keys of the higher level should be group names or regex matching group
        names, the inner dicts re passed to ``open_dataset``.
        This feature is currently experimental
    regex : str
        Specifies where regex search should be used to extend the keyword arguments.

    Returns
    -------
        InferenceData object

    Notes
    -----
    By default, the datasets of the InferenceData object will be lazily loaded instead
    of loaded into memory. This behaviour is regulated by the value of
    ``az.rcParams["data.load"]``.
    
source
ArviZ.to_netcdfFunction

Save dataset as a netcdf file.

Note

This function is forwarded to Python's arviz.to_netcdf. The docstring of that function is included below.


    WARNING: Only idempotent in case `data` is InferenceData

    Parameters
    ----------
    data : InferenceData, or any object accepted by `convert_to_inference_data`
        Object to be saved
    filename : str
        name or path of the file to load trace
    group : str (optional)
        In case `data` is not InferenceData, this is the group it will be saved to
    coords : dict (optional)
        See `convert_to_inference_data`
    dims : dict (optional)
        See `convert_to_inference_data`

    Returns
    -------
    str
        filename saved to
    
source

General functions

ArviZ.concatFunction

Concatenate InferenceData objects.

Note

This function is forwarded to Python's arviz.concat. The docstring of that function is included below.


    Concatenates over `group`, `chain` or `draw`.
    By default concatenates over unique groups.
    To concatenate over `chain` or `draw` function
    needs identical groups and variables.

    The `variables` in the `data` -group are merged if `dim` are not found.


    Parameters
    ----------
    *args : InferenceData
        Variable length InferenceData list or
        Sequence of InferenceData.
    dim : str, optional
        If defined, concatenated over the defined dimension.
        Dimension which is concatenated. If None, concatenates over
        unique groups.
    copy : bool
        If True, groups are copied to the new InferenceData object.
        Used only if `dim` is None.
    inplace : bool
        If True, merge args to first object.
    reset_dim : bool
        Valid only if dim is not None.

    Returns
    -------
    InferenceData
        A new InferenceData object by default.
        When `inplace==True` merge args to first arg and return `None`

    See Also
    --------
    add_groups : Add new groups to InferenceData object.
    extend : Extend InferenceData with groups from another InferenceData.

    Examples
    --------
    Use ``concat`` method to concatenate InferenceData objects. This will concatenates over
    unique groups by default. We first create an ``InferenceData`` object:

    .. ipython::

        In [1]: import arviz as az
           ...: import numpy as np
           ...: data = {
           ...:     "a": np.random.normal(size=(4, 100, 3)),
           ...:     "b": np.random.normal(size=(4, 100)),
           ...: }
           ...: coords = {"a_dim": ["x", "y", "z"]}
           ...: dataA = az.from_dict(data, coords=coords, dims={"a": ["a_dim"]})
           ...: dataA

    We have created an ``InferenceData`` object with default group 'posterior'. Now, we will
    create another ``InferenceData`` object:

    .. ipython::

        In [1]: dataB = az.from_dict(prior=data, coords=coords, dims={"a": ["a_dim"]})
           ...: dataB

    We have created another ``InferenceData`` object with group 'prior'. Now, we will concatenate
    these two ``InferenceData`` objects:

    .. ipython::

        In [1]: az.concat(dataA, dataB)

    Now, we will concatenate over chain (or draw). It requires identical groups and variables.
    Here we are concatenating two identical ``InferenceData`` objects over dimension chain:

    .. ipython::

        In [1]: az.concat(dataA, dataA, dim="chain")

    It will create an ``InferenceData`` with the original group 'posterior'. In similar way,
    we can also concatenate over draws.

    
source
ArviZ.extract_datasetFunction

Extract an InferenceData group or subset of it as a :class:xarray.Dataset.

Note

This function is forwarded to Python's arviz.extract_dataset. The docstring of that function is included below.


    Parameters
    ----------
    idata : InferenceData or InferenceData_like
        InferenceData from which to extract the data.
    group : str, optional
        Which InferenceData data group to extract data from.
    combined : bool, optional
        Combine ``chain`` and ``draw`` dimensions into ``sample``. Won't work if
        a dimension named ``sample`` already exists.
    var_names : str or list of str, optional
        Variables to be plotted, two variables are required. Prefix the variables by `~`
        when you want to exclude them from the plot.
    filter_vars: {None, "like", "regex"}, optional
        If `None` (default), interpret var_names as the real variables names. If "like",
        interpret var_names as substrings of the real variables names. If "regex",
        interpret var_names as regular expressions on the real variables names. A la
        `pandas.filter`.
        Like with plotting, sometimes it's easier to subset saying what to exclude
        instead of what to include
    num_samples : int, optional
        Extract only a subset of the samples. Only valid if ``combined=True``
    rng : bool, int, numpy.Generator, optional
        Shuffle the samples, only valid if ``combined=True``. By default,
        samples are shuffled if ``num_samples`` is not ``None``, and are left
        in the same order otherwise. This ensures that subsetting the samples doesn't return
        only samples from a single chain and consecutive draws.

    Returns
    -------
    xarray.Dataset

    Examples
    --------
    The default behaviour is to return the posterior group after stacking the chain and
    draw dimensions.

    .. jupyter-execute::

        import arviz as az
        idata = az.load_arviz_data("centered_eight")
        az.extract_dataset(idata)

    You can also indicate a subset to be returned, but in variables and in samples:

    .. jupyter-execute::

        az.extract_dataset(idata, var_names="theta", num_samples=100)

    To keep the chain and draw dimensions, use ``combined=False``.

    .. jupyter-execute::

        az.extract_dataset(idata, group="prior", combined=False)

    
source

Example data

ArviZ.load_example_dataFunction
load_example_data(name; kwargs...) -> InferenceData
load_example_data() -> Dict{String,AbstractFileMetadata}

Load a local or remote pre-made dataset.

kwargs are forwarded to from_netcdf.

Pass no parameters to get a Dict listing all available datasets.

Data files are handled by DataDeps.jl. A file is downloaded only when it is requested and then cached for future use.

Examples

julia> keys(load_example_data())
KeySet for a Dict{String, ArviZ.AbstractFileMetadata} with 9 entries. Keys:
  "centered_eight"
  "radon"
  "glycan_torsion_angles"
  "rugby"
  "non_centered_eight"
  "regression10d"
  "classification1d"
  "classification10d"
  "regression1d"

julia> load_example_data("centered_eight")
InferenceData with groups:
  > posterior
  > posterior_predictive
  > sample_stats
  > prior
  > observed_data
source