InferenceData

Type definition

InferenceObjects.InferenceDataType
InferenceData{group_names,group_types}

Container for inference data storage using DimensionalData.

This object implements the InferenceData schema.

Internally, groups are stored in a NamedTuple, which can be accessed using parent(::InferenceData).

Constructors

InferenceData(groups::NamedTuple)
InferenceData(; groups...)

Construct an inference data from either a NamedTuple or keyword arguments of groups.

Groups must be Dataset objects.

Instead of directly creating an InferenceData, use the exported from_xyz functions or convert_to_inference_data.

source

Property interface

Base.getpropertyFunction
getproperty(data::InferenceData, name::Symbol) -> Dataset

Get group with the specified name.

source

Indexing interface

Base.getindexFunction
Base.getindex(data::InferenceData, groups::Symbol; coords...) -> Dataset
Base.getindex(data::InferenceData, groups; coords...) -> InferenceData

Return a new InferenceData containing the specified groups sliced to the specified coords.

coords specifies a dimension name mapping to an index, a DimensionalData.Selector, or an IntervalSets.AbstractInterval.

If one or more groups lack the specified dimension, a warning is raised but can be ignored. All groups that contain the dimension must also contain the specified indices, or an exception will be raised.

Examples

Select data from all groups for just the specified id values.

julia> using InferenceObjects, DimensionalData

julia> idata = from_namedtuple(
           (θ=randn(4, 100, 4), τ=randn(4, 100));
           prior=(θ=randn(4, 100, 4), τ=randn(4, 100)),
           observed_data=(y=randn(4),),
           dims=(θ=[:id], y=[:id]),
           coords=(id=["a", "b", "c", "d"],),
       )
InferenceData with groups:
  > posterior
  > prior
  > observed_data

julia> idata.posterior
Dataset with dimensions:
  Dim{:chain} Sampled 1:4 ForwardOrdered Regular Points,
  Dim{:draw} Sampled 1:100 ForwardOrdered Regular Points,
  Dim{:id} Categorical String[a, b, c, d] ForwardOrdered
and 2 layers:
  :θ Float64 dims: Dim{:chain}, Dim{:draw}, Dim{:id} (4×100×4)
  :τ Float64 dims: Dim{:chain}, Dim{:draw} (4×100)

with metadata Dict{String, Any} with 1 entry:
  "created_at" => "2022-08-11T11:15:21.4"

julia> idata_sel = idata[id=At(["a", "b"])]
InferenceData with groups:
  > posterior
  > prior
  > observed_data

julia> idata_sel.posterior
Dataset with dimensions:
  Dim{:chain} Sampled 1:4 ForwardOrdered Regular Points,
  Dim{:draw} Sampled 1:100 ForwardOrdered Regular Points,
  Dim{:id} Categorical String[a, b] ForwardOrdered
and 2 layers:
  :θ Float64 dims: Dim{:chain}, Dim{:draw}, Dim{:id} (4×100×2)
  :τ Float64 dims: Dim{:chain}, Dim{:draw} (4×100)

with metadata Dict{String, Any} with 1 entry:
  "created_at" => "2022-08-11T11:15:21.4"

Select data from just the posterior, returning a Dataset if the indices index more than one element from any of the variables:

julia> idata[:observed_data, id=At(["a"])]
Dataset with dimensions:
  Dim{:id} Categorical String[a] ForwardOrdered
and 1 layer:
  :y Float64 dims: Dim{:id} (1)

with metadata Dict{String, Any} with 1 entry:
  "created_at" => "2022-08-11T11:19:25.982"

Note that if a single index is provided, the behavior is still to slice so that the dimension is preserved.

source
Base.setindexFunction
Base.setindex(data::InferenceData, group::Dataset, name::Symbol) -> InferenceData

Create a new InferenceData containing the group with the specified name.

If a group with name is already in data, it is replaced.

source

Iteration interface

InferenceData also implements the same iteration interface as its underlying NamedTuple. That is, iterating over an InferenceData iterates over its groups.

General conversion

InferenceObjects.convert_to_inference_dataFunction
convert_to_inference_data(obj; group, kwargs...) -> InferenceData

Convert a supported object to an InferenceData object.

If obj converts to a single dataset, group specifies which dataset in the resulting InferenceData that is.

See convert_to_dataset

Arguments

  • obj can be many objects. Basic supported types are:

    • InferenceData: return unchanged
    • Dataset/DimensionalData.AbstractDimStack: add to InferenceData as the only group
    • NamedTuple/AbstractDict: create a Dataset as the only group
    • AbstractArray{<:Real}: create a Dataset as the only group, given an arbitrary name, if the name is not set

More specific types may be documented separately.

Keywords

  • group::Symbol = :posterior: If obj converts to a single dataset, assign the resulting dataset to this group.

  • dims: a collection mapping variable names to collections of objects containing dimension names. Acceptable such objects are:

    • Symbol: dimension name
    • Type{<:DimensionsionalData.Dimension}: dimension type
    • DimensionsionalData.Dimension: dimension, potentially with indices
    • Nothing: no dimension name provided, dimension name is automatically generated
  • coords: a collection indexable by dimension name specifying the indices of the given dimension. If indices for a dimension in dims are provided, they are used even if the dimension contains its own indices. If a dimension is missing, its indices are automatically generated.

  • kwargs: remaining keywords forwarded to converter functions

source
InferenceObjects.from_dictFunction
from_dict(posterior::AbstractDict; kwargs...) -> InferenceData

Convert a dictionary to an InferenceData.

Arguments

  • posterior: The data to be converted. Its strings must be Symbol or AbstractString, and its values must be arrays.

Keywords

  • posterior_predictive::Any=nothing: Draws from the posterior predictive distribution
  • sample_stats::Any=nothing: Statistics of the posterior sampling process
  • predictions::Any=nothing: Out-of-sample predictions for the posterior.
  • prior::Dict=nothing: Draws from the prior
  • prior_predictive::Any=nothing: Draws from the prior predictive distribution
  • sample_stats_prior::Any=nothing: Statistics of the prior sampling process
  • observed_data::NamedTuple: Observed data on which the posterior is conditional. It should only contain data which is modeled as a random variable. Keys are parameter names and values.
  • constant_data::NamedTuple: Model constants, data included in the model which is not modeled as a random variable. Keys are parameter names and values.
  • predictions_constant_data::NamedTuple: Constants relevant to the model predictions (i.e. new x values in a linear regression).
  • log_likelihood: Pointwise log-likelihood for the data. It is recommended to use this argument as a NamedTuple whose keys are observed variable names and whose values are log likelihood arrays.
  • library: Name of library that generated the draws
  • coords: Map from named dimension to named indices
  • dims: Map from variable name to names of its dimensions

Returns

  • InferenceData: The data with groups corresponding to the provided data

Examples

using InferenceObjects
nchains = 2
ndraws = 100

data = Dict(
    :x => rand(ndraws, nchains),
    :y => randn(2, ndraws, nchains),
    :z => randn(3, 2, ndraws, nchains),
)
idata = from_dict(data)
source
InferenceObjects.from_namedtupleFunction
from_namedtuple(posterior::NamedTuple; kwargs...) -> InferenceData
from_namedtuple(posterior::Vector{Vector{<:NamedTuple}}; kwargs...) -> InferenceData
from_namedtuple(
    posterior::NamedTuple,
    sample_stats::Any,
    posterior_predictive::Any,
    predictions::Any,
    log_likelihood::Any;
    kwargs...
) -> InferenceData

Convert a NamedTuple or container of NamedTuples to an InferenceData.

If containers are passed, they are flattened into a single NamedTuple with array elements whose first dimensions correspond to the dimensions of the containers.

Arguments

  • posterior: The data to be converted. It may be of the following types:

    • ::NamedTuple: The keys are the variable names and the values are arrays with dimensions (ndraws, nchains[, sizes...]).
    • ::Vector{Vector{<:NamedTuple}}: A vector of length nchains whose elements have length ndraws.

Keywords

  • posterior_predictive::Any=nothing: Draws from the posterior predictive distribution
  • sample_stats::Any=nothing: Statistics of the posterior sampling process
  • predictions::Any=nothing: Out-of-sample predictions for the posterior.
  • prior=nothing: Draws from the prior. Accepts the same types as posterior.
  • prior_predictive::Any=nothing: Draws from the prior predictive distribution
  • sample_stats_prior::Any=nothing: Statistics of the prior sampling process
  • observed_data::NamedTuple: Observed data on which the posterior is conditional. It should only contain data which is modeled as a random variable. Keys are parameter names and values.
  • constant_data::NamedTuple: Model constants, data included in the model which is not modeled as a random variable. Keys are parameter names and values.
  • predictions_constant_data::NamedTuple: Constants relevant to the model predictions (i.e. new x values in a linear regression).
  • log_likelihood: Pointwise log-likelihood for the data. It is recommended to use this argument as a NamedTuple whose keys are observed variable names and whose values are log likelihood arrays.
  • library: Name of library that generated the draws
  • coords: Map from named dimension to named indices
  • dims: Map from variable name to names of its dimensions

Returns

  • InferenceData: The data with groups corresponding to the provided data
Note

If a NamedTuple is provided for observed_data, constant_data, or predictionsconstantdata`, any non-array values (e.g. integers) are converted to 0-dimensional arrays.

Examples

using InferenceObjects
nchains = 2
ndraws = 100

data1 = (
    x=rand(ndraws, nchains), y=randn(ndraws, nchains, 2), z=randn(ndraws, nchains, 3, 2)
)
idata1 = from_namedtuple(data1)

data2 = [[(x=rand(), y=randn(2), z=randn(3, 2)) for _ in 1:ndraws] for _ in 1:nchains];
idata2 = from_namedtuple(data2)
source

General functions

Base.catFunction
cat(data::InferenceData...; [groups=keys(data[1]),] dims) -> InferenceData

Concatenate InferenceData objects along the specified dimension dims.

Only the groups in groups are concatenated. Remaining groups are merged into the new InferenceData object.

Examples

Here is how we can concatenate all groups of two InferenceData objects along the existing chain dimension:

julia> coords = (; a_dim=["x", "y", "z"]);

julia> dims = dims=(; a=[:a_dim]);

julia> data = Dict(:a => randn(100, 4, 3), :b => randn(100, 4));

julia> idata = from_dict(data; coords=coords, dims=dims)
InferenceData with groups:
  > posterior

julia> idata_cat1 = cat(idata, idata; dims=:chain)
InferenceData with groups:
  > posterior

julia> idata_cat1.posterior
╭─────────────────╮
│ 100×8×3 Dataset │
├─────────────────┴──────────────────────────────────── dims ┐
  ↓ draw ,
  → chain,
  ↗ a_dim Categorical{String} ["x", "y", "z"] ForwardOrdered
├──────────────────────────────────────────────────── layers ┤
  :a eltype: Float64 dims: draw, chain, a_dim size: 100×8×3
  :b eltype: Float64 dims: draw, chain size: 100×8
├────────────────────────────────────────────────── metadata ┤
  Dict{String, Any} with 1 entry:
  "created_at" => "2024-03-11T14:10:48.434"

Alternatively, we can concatenate along a new run dimension, which will be created.

julia> idata_cat2 = cat(idata, idata; dims=:run)
InferenceData with groups:
  > posterior

julia> idata_cat2.posterior
╭───────────────────╮
│ 100×4×3×2 Dataset │
├───────────────────┴─────────────────────────────────── dims ┐
  ↓ draw ,
  → chain,
  ↗ a_dim Categorical{String} ["x", "y", "z"] ForwardOrdered,
  ⬔ run
├─────────────────────────────────────────────────────────────┴ layers ┐
  :a eltype: Float64 dims: draw, chain, a_dim, run size: 100×4×3×2
  :b eltype: Float64 dims: draw, chain, run size: 100×4×2
├──────────────────────────────────────────────────────────── metadata ┤
  Dict{String, Any} with 1 entry:
  "created_at" => "2024-03-11T14:10:48.434"

We can also concatenate only a subset of groups and merge the rest, which is useful when some groups are present only in some of the InferenceData objects or will be identical in all of them:

julia> observed_data = Dict(:y => randn(10));

julia> idata2 = from_dict(data; observed_data=observed_data, coords=coords, dims=dims)
InferenceData with groups:
  > posterior
  > observed_data

julia> idata_cat3 = cat(idata, idata2; groups=(:posterior,), dims=:run)
InferenceData with groups:
  > posterior
  > observed_data

julia> idata_cat3.posterior
╭───────────────────╮
│ 100×4×3×2 Dataset │
├───────────────────┴─────────────────────────────────── dims ┐
  ↓ draw ,
  → chain,
  ↗ a_dim Categorical{String} ["x", "y", "z"] ForwardOrdered,
  ⬔ run
├─────────────────────────────────────────────────────────────┴ layers ┐
  :a eltype: Float64 dims: draw, chain, a_dim, run size: 100×4×3×2
  :b eltype: Float64 dims: draw, chain, run size: 100×4×2
├──────────────────────────────────────────────────────────── metadata ┤
  Dict{String, Any} with 1 entry:
  "created_at" => "2024-03-11T14:10:48.434"

julia> idata_cat3.observed_data
╭────────────────────╮
│ 10-element Dataset │
├────────────── dims ┤
  ↓ y_dim_1
├────────────────────┴─────────────── layers ┐
  :y eltype: Float64 dims: y_dim_1 size: 10
├────────────────────────────────────────────┴ metadata ┐
  Dict{String, Any} with 1 entry:
  "created_at" => "2024-03-11T14:10:53.539"
source
Base.mergeFunction
merge(data::InferenceData...) -> InferenceData

Merge InferenceData objects.

The result contains all groups in data and others. If a group appears more than once, the one that occurs last is kept.

See also: cat

Examples

Here we merge an InferenceData containing only a posterior group with one containing only a prior group to create a new one containing both groups.

julia> idata1 = from_dict(Dict(:a => randn(100, 4, 3), :b => randn(100, 4)))
InferenceData with groups:
  > posterior

julia> idata2 = from_dict(; prior=Dict(:a => randn(100, 1, 3), :c => randn(100, 1)))
InferenceData with groups:
  > prior

julia> idata_merged = merge(idata1, idata2)
InferenceData with groups:
  > posterior
  > prior
source

I/O extensions

The following types of storage are provided via extensions.

NetCDF I/O using NCDatasets.jl

InferenceObjects.from_netcdfFunction
from_netcdf(path::AbstractString; kwargs...) -> InferenceData

Load an InferenceData from an unopened NetCDF file.

Remaining kwargs are passed to NCDatasets.NCDataset. This method loads data eagerly. To instead load data lazily, pass an opened NCDataset to from_netcdf.

Note

This method requires that NCDatasets is loaded before it can be used.

Examples

julia> using InferenceObjects, NCDatasets

julia> idata = from_netcdf("centered_eight.nc")
InferenceData with groups:
  > posterior
  > posterior_predictive
  > sample_stats
  > prior
  > observed_data
from_netcdf(ds::NCDatasets.NCDataset; load_mode) -> InferenceData

Load an InferenceData from an opened NetCDF file.

load_mode defaults to :lazy, which avoids reading variables into memory. Operations on these arrays will be slow. load_mode can also be :eager, which copies all variables into memory. It is then safe to close ds. If load_mode is :lazy and ds is closed after constructing InferenceData, using the variable arrays will have undefined behavior.

Examples

Here is how we might load an InferenceData from an InferenceData lazily from a web-hosted NetCDF file.

julia> using HTTP, InferenceObjects, NCDatasets

julia> resp = HTTP.get("https://github.com/arviz-devs/arviz_example_data/blob/main/data/centered_eight.nc?raw=true");

julia> ds = NCDataset("centered_eight", "r"; memory = resp.body);

julia> idata = from_netcdf(ds)
InferenceData with groups:
  > posterior
  > posterior_predictive
  > sample_stats
  > prior
  > observed_data

julia> idata_copy = copy(idata); # disconnect from the loaded dataset

julia> close(ds);
source
InferenceObjects.to_netcdfFunction
to_netcdf(data, dest::AbstractString; group::Symbol=:posterior, kwargs...)
to_netcdf(data, dest::NCDatasets.NCDataset; group::Symbol=:posterior)

Write data to a NetCDF file.

data is any type that can be converted to an InferenceData using convert_to_inference_data. If not an InferenceData, then group specifies which group the data represents.

dest specifies either the path to the NetCDF file or an opened NetCDF file. If dest is a path, remaining kwargs are passed to NCDatasets.NCDataset.

Note

This method requires that NCDatasets is loaded before it can be used.

Examples

julia> using InferenceObjects, NCDatasets

julia> idata = from_namedtuple((; x = randn(4, 100, 3), z = randn(4, 100)))
InferenceData with groups:
  > posterior

julia> to_netcdf(idata, "data.nc")
"data.nc"
source