InferenceData
InferenceObjects.InferenceData
Base.cat
Base.getindex
Base.getproperty
Base.merge
Base.propertynames
Base.setindex
InferenceObjects.convert_to_inference_data
InferenceObjects.from_dict
InferenceObjects.from_namedtuple
InferenceObjects.from_netcdf
InferenceObjects.to_netcdf
Type definition
InferenceObjects.InferenceData
— TypeInferenceData{group_names,group_types}
Container for inference data storage using DimensionalData.
This object implements the InferenceData schema.
Internally, groups are stored in a NamedTuple
, which can be accessed using parent(::InferenceData)
.
Constructors
InferenceData(groups::NamedTuple)
InferenceData(; groups...)
Construct an inference data from either a NamedTuple
or keyword arguments of groups.
Groups must be Dataset
objects.
Instead of directly creating an InferenceData
, use the exported from_xyz
functions or convert_to_inference_data
.
Property interface
Base.getproperty
— Functiongetproperty(data::InferenceData, name::Symbol) -> Dataset
Get group with the specified name
.
Base.propertynames
— Functionpropertynames(data::InferenceData) -> Tuple{Symbol}
Get names of groups
Indexing interface
Base.getindex
— FunctionBase.getindex(data::InferenceData, groups::Symbol; coords...) -> Dataset
Base.getindex(data::InferenceData, groups; coords...) -> InferenceData
Return a new InferenceData
containing the specified groups sliced to the specified coords
.
coords
specifies a dimension name mapping to an index, a DimensionalData.Selector
, or an IntervalSets.AbstractInterval
.
If one or more groups lack the specified dimension, a warning is raised but can be ignored. All groups that contain the dimension must also contain the specified indices, or an exception will be raised.
Examples
Select data from all groups for just the specified id values.
julia> using InferenceObjects, DimensionalData
julia> idata = from_namedtuple(
(θ=randn(4, 100, 4), τ=randn(4, 100));
prior=(θ=randn(4, 100, 4), τ=randn(4, 100)),
observed_data=(y=randn(4),),
dims=(θ=[:id], y=[:id]),
coords=(id=["a", "b", "c", "d"],),
)
InferenceData with groups:
> posterior
> prior
> observed_data
julia> idata.posterior
Dataset with dimensions:
Dim{:chain} Sampled 1:4 ForwardOrdered Regular Points,
Dim{:draw} Sampled 1:100 ForwardOrdered Regular Points,
Dim{:id} Categorical String[a, b, c, d] ForwardOrdered
and 2 layers:
:θ Float64 dims: Dim{:chain}, Dim{:draw}, Dim{:id} (4×100×4)
:τ Float64 dims: Dim{:chain}, Dim{:draw} (4×100)
with metadata Dict{String, Any} with 1 entry:
"created_at" => "2022-08-11T11:15:21.4"
julia> idata_sel = idata[id=At(["a", "b"])]
InferenceData with groups:
> posterior
> prior
> observed_data
julia> idata_sel.posterior
Dataset with dimensions:
Dim{:chain} Sampled 1:4 ForwardOrdered Regular Points,
Dim{:draw} Sampled 1:100 ForwardOrdered Regular Points,
Dim{:id} Categorical String[a, b] ForwardOrdered
and 2 layers:
:θ Float64 dims: Dim{:chain}, Dim{:draw}, Dim{:id} (4×100×2)
:τ Float64 dims: Dim{:chain}, Dim{:draw} (4×100)
with metadata Dict{String, Any} with 1 entry:
"created_at" => "2022-08-11T11:15:21.4"
Select data from just the posterior, returning a Dataset
if the indices index more than one element from any of the variables:
julia> idata[:observed_data, id=At(["a"])]
Dataset with dimensions:
Dim{:id} Categorical String[a] ForwardOrdered
and 1 layer:
:y Float64 dims: Dim{:id} (1)
with metadata Dict{String, Any} with 1 entry:
"created_at" => "2022-08-11T11:19:25.982"
Note that if a single index is provided, the behavior is still to slice so that the dimension is preserved.
Base.setindex
— FunctionBase.setindex(data::InferenceData, group::Dataset, name::Symbol) -> InferenceData
Create a new InferenceData
containing the group
with the specified name
.
If a group with name
is already in data
, it is replaced.
Iteration interface
InferenceData
also implements the same iteration interface as its underlying NamedTuple
. That is, iterating over an InferenceData
iterates over its groups.
General conversion
InferenceObjects.convert_to_inference_data
— Functionconvert_to_inference_data(obj; group, kwargs...) -> InferenceData
Convert a supported object to an InferenceData
object.
If obj
converts to a single dataset, group
specifies which dataset in the resulting InferenceData
that is.
Arguments
obj
can be many objects. Basic supported types are:InferenceData
: return unchangedDataset
/DimensionalData.AbstractDimStack
: add toInferenceData
as the only groupNamedTuple
/AbstractDict
: create aDataset
as the only groupAbstractArray{<:Real}
: create aDataset
as the only group, given an arbitrary name, if the name is not set
More specific types may be documented separately.
Keywords
group::Symbol = :posterior
: Ifobj
converts to a single dataset, assign the resulting dataset to this group.dims
: a collection mapping variable names to collections of objects containing dimension names. Acceptable such objects are:Symbol
: dimension nameType{<:DimensionsionalData.Dimension}
: dimension typeDimensionsionalData.Dimension
: dimension, potentially with indicesNothing
: no dimension name provided, dimension name is automatically generated
coords
: a collection indexable by dimension name specifying the indices of the given dimension. If indices for a dimension indims
are provided, they are used even if the dimension contains its own indices. If a dimension is missing, its indices are automatically generated.kwargs
: remaining keywords forwarded to converter functions
InferenceObjects.from_dict
— Functionfrom_dict(posterior::AbstractDict; kwargs...) -> InferenceData
Convert a dictionary to an InferenceData
.
Arguments
posterior
: The data to be converted. Its strings must beSymbol
orAbstractString
, and its values must be arrays.
Keywords
posterior_predictive::Any=nothing
: Draws from the posterior predictive distributionsample_stats::Any=nothing
: Statistics of the posterior sampling processpredictions::Any=nothing
: Out-of-sample predictions for the posterior.prior::Dict=nothing
: Draws from the priorprior_predictive::Any=nothing
: Draws from the prior predictive distributionsample_stats_prior::Any=nothing
: Statistics of the prior sampling processobserved_data::NamedTuple
: Observed data on which theposterior
is conditional. It should only contain data which is modeled as a random variable. Keys are parameter names and values.constant_data::NamedTuple
: Model constants, data included in the model which is not modeled as a random variable. Keys are parameter names and values.predictions_constant_data::NamedTuple
: Constants relevant to the model predictions (i.e. newx
values in a linear regression).log_likelihood
: Pointwise log-likelihood for the data. It is recommended to use this argument as aNamedTuple
whose keys are observed variable names and whose values are log likelihood arrays.library
: Name of library that generated the drawscoords
: Map from named dimension to named indicesdims
: Map from variable name to names of its dimensions
Returns
InferenceData
: The data with groups corresponding to the provided data
Examples
using InferenceObjects
nchains = 2
ndraws = 100
data = Dict(
:x => rand(ndraws, nchains),
:y => randn(2, ndraws, nchains),
:z => randn(3, 2, ndraws, nchains),
)
idata = from_dict(data)
InferenceObjects.from_namedtuple
— Functionfrom_namedtuple(posterior::NamedTuple; kwargs...) -> InferenceData
from_namedtuple(posterior::Vector{Vector{<:NamedTuple}}; kwargs...) -> InferenceData
from_namedtuple(
posterior::NamedTuple,
sample_stats::Any,
posterior_predictive::Any,
predictions::Any,
log_likelihood::Any;
kwargs...
) -> InferenceData
Convert a NamedTuple
or container of NamedTuple
s to an InferenceData
.
If containers are passed, they are flattened into a single NamedTuple
with array elements whose first dimensions correspond to the dimensions of the containers.
Arguments
posterior
: The data to be converted. It may be of the following types:::NamedTuple
: The keys are the variable names and the values are arrays with dimensions(ndraws, nchains[, sizes...])
.::Vector{Vector{<:NamedTuple}}
: A vector of lengthnchains
whose elements have lengthndraws
.
Keywords
posterior_predictive::Any=nothing
: Draws from the posterior predictive distributionsample_stats::Any=nothing
: Statistics of the posterior sampling processpredictions::Any=nothing
: Out-of-sample predictions for the posterior.prior=nothing
: Draws from the prior. Accepts the same types asposterior
.prior_predictive::Any=nothing
: Draws from the prior predictive distributionsample_stats_prior::Any=nothing
: Statistics of the prior sampling processobserved_data::NamedTuple
: Observed data on which theposterior
is conditional. It should only contain data which is modeled as a random variable. Keys are parameter names and values.constant_data::NamedTuple
: Model constants, data included in the model which is not modeled as a random variable. Keys are parameter names and values.predictions_constant_data::NamedTuple
: Constants relevant to the model predictions (i.e. newx
values in a linear regression).log_likelihood
: Pointwise log-likelihood for the data. It is recommended to use this argument as aNamedTuple
whose keys are observed variable names and whose values are log likelihood arrays.library
: Name of library that generated the drawscoords
: Map from named dimension to named indicesdims
: Map from variable name to names of its dimensions
Returns
InferenceData
: The data with groups corresponding to the provided data
If a NamedTuple
is provided for observed_data
, constant_data
, or predictionsconstantdata`, any non-array values (e.g. integers) are converted to 0-dimensional arrays.
Examples
using InferenceObjects
nchains = 2
ndraws = 100
data1 = (
x=rand(ndraws, nchains), y=randn(ndraws, nchains, 2), z=randn(ndraws, nchains, 3, 2)
)
idata1 = from_namedtuple(data1)
data2 = [[(x=rand(), y=randn(2), z=randn(3, 2)) for _ in 1:ndraws] for _ in 1:nchains];
idata2 = from_namedtuple(data2)
General functions
Base.cat
— Functioncat(data::InferenceData...; [groups=keys(data[1]),] dims) -> InferenceData
Concatenate InferenceData
objects along the specified dimension dims
.
Only the groups in groups
are concatenated. Remaining groups are merge
d into the new InferenceData
object.
Examples
Here is how we can concatenate all groups of two InferenceData
objects along the existing chain
dimension:
julia> coords = (; a_dim=["x", "y", "z"]);
julia> dims = dims=(; a=[:a_dim]);
julia> data = Dict(:a => randn(100, 4, 3), :b => randn(100, 4));
julia> idata = from_dict(data; coords=coords, dims=dims)
InferenceData with groups:
> posterior
julia> idata_cat1 = cat(idata, idata; dims=:chain)
InferenceData with groups:
> posterior
julia> idata_cat1.posterior
╭─────────────────╮
│ 100×8×3 Dataset │
├─────────────────┴──────────────────────────────────── dims ┐
↓ draw ,
→ chain,
↗ a_dim Categorical{String} ["x", "y", "z"] ForwardOrdered
├──────────────────────────────────────────────────── layers ┤
:a eltype: Float64 dims: draw, chain, a_dim size: 100×8×3
:b eltype: Float64 dims: draw, chain size: 100×8
├────────────────────────────────────────────────── metadata ┤
Dict{String, Any} with 1 entry:
"created_at" => "2024-03-11T14:10:48.434"
Alternatively, we can concatenate along a new run
dimension, which will be created.
julia> idata_cat2 = cat(idata, idata; dims=:run)
InferenceData with groups:
> posterior
julia> idata_cat2.posterior
╭───────────────────╮
│ 100×4×3×2 Dataset │
├───────────────────┴─────────────────────────────────── dims ┐
↓ draw ,
→ chain,
↗ a_dim Categorical{String} ["x", "y", "z"] ForwardOrdered,
⬔ run
├─────────────────────────────────────────────────────────────┴ layers ┐
:a eltype: Float64 dims: draw, chain, a_dim, run size: 100×4×3×2
:b eltype: Float64 dims: draw, chain, run size: 100×4×2
├──────────────────────────────────────────────────────────── metadata ┤
Dict{String, Any} with 1 entry:
"created_at" => "2024-03-11T14:10:48.434"
We can also concatenate only a subset of groups and merge the rest, which is useful when some groups are present only in some of the InferenceData
objects or will be identical in all of them:
julia> observed_data = Dict(:y => randn(10));
julia> idata2 = from_dict(data; observed_data=observed_data, coords=coords, dims=dims)
InferenceData with groups:
> posterior
> observed_data
julia> idata_cat3 = cat(idata, idata2; groups=(:posterior,), dims=:run)
InferenceData with groups:
> posterior
> observed_data
julia> idata_cat3.posterior
╭───────────────────╮
│ 100×4×3×2 Dataset │
├───────────────────┴─────────────────────────────────── dims ┐
↓ draw ,
→ chain,
↗ a_dim Categorical{String} ["x", "y", "z"] ForwardOrdered,
⬔ run
├─────────────────────────────────────────────────────────────┴ layers ┐
:a eltype: Float64 dims: draw, chain, a_dim, run size: 100×4×3×2
:b eltype: Float64 dims: draw, chain, run size: 100×4×2
├──────────────────────────────────────────────────────────── metadata ┤
Dict{String, Any} with 1 entry:
"created_at" => "2024-03-11T14:10:48.434"
julia> idata_cat3.observed_data
╭────────────────────╮
│ 10-element Dataset │
├────────────── dims ┤
↓ y_dim_1
├────────────────────┴─────────────── layers ┐
:y eltype: Float64 dims: y_dim_1 size: 10
├────────────────────────────────────────────┴ metadata ┐
Dict{String, Any} with 1 entry:
"created_at" => "2024-03-11T14:10:53.539"
Base.merge
— Functionmerge(data::InferenceData...) -> InferenceData
Merge InferenceData
objects.
The result contains all groups in data
and others
. If a group appears more than once, the one that occurs last is kept.
See also: cat
Examples
Here we merge an InferenceData
containing only a posterior group with one containing only a prior group to create a new one containing both groups.
julia> idata1 = from_dict(Dict(:a => randn(100, 4, 3), :b => randn(100, 4)))
InferenceData with groups:
> posterior
julia> idata2 = from_dict(; prior=Dict(:a => randn(100, 1, 3), :c => randn(100, 1)))
InferenceData with groups:
> prior
julia> idata_merged = merge(idata1, idata2)
InferenceData with groups:
> posterior
> prior
I/O extensions
The following types of storage are provided via extensions.
NetCDF I/O using NCDatasets.jl
InferenceObjects.from_netcdf
— Functionfrom_netcdf(path::AbstractString; kwargs...) -> InferenceData
Load an InferenceData
from an unopened NetCDF file.
Remaining kwargs
are passed to NCDatasets.NCDataset
. This method loads data eagerly. To instead load data lazily, pass an opened NCDataset
to from_netcdf
.
This method requires that NCDatasets is loaded before it can be used.
Examples
julia> using InferenceObjects, NCDatasets
julia> idata = from_netcdf("centered_eight.nc")
InferenceData with groups:
> posterior
> posterior_predictive
> sample_stats
> prior
> observed_data
from_netcdf(ds::NCDatasets.NCDataset; load_mode) -> InferenceData
Load an InferenceData
from an opened NetCDF file.
load_mode
defaults to :lazy
, which avoids reading variables into memory. Operations on these arrays will be slow. load_mode
can also be :eager
, which copies all variables into memory. It is then safe to close ds
. If load_mode
is :lazy
and ds
is closed after constructing InferenceData
, using the variable arrays will have undefined behavior.
Examples
Here is how we might load an InferenceData
from an InferenceData
lazily from a web-hosted NetCDF file.
julia> using HTTP, InferenceObjects, NCDatasets
julia> resp = HTTP.get("https://github.com/arviz-devs/arviz_example_data/blob/main/data/centered_eight.nc?raw=true");
julia> ds = NCDataset("centered_eight", "r"; memory = resp.body);
julia> idata = from_netcdf(ds)
InferenceData with groups:
> posterior
> posterior_predictive
> sample_stats
> prior
> observed_data
julia> idata_copy = copy(idata); # disconnect from the loaded dataset
julia> close(ds);
InferenceObjects.to_netcdf
— Functionto_netcdf(data, dest::AbstractString; group::Symbol=:posterior, kwargs...)
to_netcdf(data, dest::NCDatasets.NCDataset; group::Symbol=:posterior)
Write data
to a NetCDF file.
data
is any type that can be converted to an InferenceData
using convert_to_inference_data
. If not an InferenceData
, then group
specifies which group the data represents.
dest
specifies either the path to the NetCDF file or an opened NetCDF file. If dest
is a path, remaining kwargs
are passed to NCDatasets.NCDataset
.
This method requires that NCDatasets is loaded before it can be used.
Examples
julia> using InferenceObjects, NCDatasets
julia> idata = from_namedtuple((; x = randn(4, 100, 3), z = randn(4, 100)))
InferenceData with groups:
> posterior
julia> to_netcdf(idata, "data.nc")
"data.nc"