| Title: | Vectorised Probability Distributions |
| Version: | 0.6.0 |
| Description: | Vectorised distribution objects with tools for manipulating, visualising, and using probability distributions. Designed to allow model prediction outputs to return distributions rather than their parameters, allowing users to directly interact with predictive distributions in a data-oriented workflow. In addition to providing generic replacements for p/d/q/r functions, other useful statistics can be computed including means, variances, intervals, and highest density regions. |
| License: | GPL-3 |
| Depends: | R (≥ 4.0.0) |
| Imports: | vctrs (≥ 0.3.0), rlang (≥ 0.4.5), generics, stats, numDeriv, utils, lifecycle, pillar |
| Suggests: | testthat (≥ 2.1.0), covr, mvtnorm, actuar (≥ 2.0.0), evd, ggdist, ggplot2, gk, pkgdown |
| RdMacros: | lifecycle |
| URL: | https://pkg.mitchelloharawild.com/distributional/, https://github.com/mitchelloharawild/distributional |
| BugReports: | https://github.com/mitchelloharawild/distributional/issues |
| Encoding: | UTF-8 |
| Language: | en-GB |
| RoxygenNote: | 7.3.3 |
| NeedsCompilation: | no |
| Packaged: | 2026-01-14 10:04:53 UTC; mitchell |
| Author: | Mitchell O'Hara-Wild
|
| Maintainer: | Mitchell O'Hara-Wild <mail@mitchelloharawild.com> |
| Repository: | CRAN |
| Date/Publication: | 2026-01-14 10:50:03 UTC |
distributional: Vectorised Probability Distributions
Description
Vectorised distribution objects with tools for manipulating, visualising, and using probability distributions. Designed to allow model prediction outputs to return distributions rather than their parameters, allowing users to directly interact with predictive distributions in a data-oriented workflow. In addition to providing generic replacements for p/d/q/r functions, other useful statistics can be computed including means, variances, intervals, and highest density regions.
Author(s)
Maintainer: Mitchell O'Hara-Wild mail@mitchelloharawild.com (ORCID)
Authors:
Other contributors:
See Also
Useful links:
Report bugs at https://github.com/mitchelloharawild/distributional/issues
The cumulative distribution function
Description
Usage
cdf(x, q, ..., log = FALSE)
## S3 method for class 'distribution'
cdf(x, q, ...)
Arguments
x |
The distribution(s). |
q |
The quantile at which the cdf is calculated. |
... |
Additional arguments passed to methods. |
log |
If |
Covariance
Description
A generic function for computing the covariance of an object.
Usage
covariance(x, ...)
Arguments
x |
An object. |
... |
Additional arguments used by methods. |
See Also
covariance.distribution(), variance()
Covariance of a probability distribution
Description
Returns the empirical covariance of the probability distribution. If the method does not exist, the covariance of a random sample will be returned.
Usage
## S3 method for class 'distribution'
covariance(x, ...)
Arguments
x |
The distribution(s). |
... |
Additional arguments used by methods. |
The probability density/mass function
Description
Computes the probability density function for a continuous distribution, or the probability mass function for a discrete distribution.
Usage
## S3 method for class 'distribution'
density(x, at, ..., log = FALSE)
Arguments
x |
The distribution(s). |
at |
The point at which to compute the density/mass. |
... |
Additional arguments passed to methods. |
log |
If |
The Bernoulli distribution
Description
Bernoulli distributions are used to represent events like coin flips
when there is single trial that is either successful or unsuccessful.
The Bernoulli distribution is a special case of the Binomial()
distribution with n = 1.
Usage
dist_bernoulli(prob)
Arguments
prob |
The probability of success on each trial, |
Details
We recommend reading this documentation on pkgdown which renders math nicely. https://pkg.mitchelloharawild.com/distributional/reference/dist_bernoulli.html
In the following, let X be a Bernoulli random variable with parameter
prob = p. Some textbooks also define q = 1 - p, or use
\pi instead of p.
The Bernoulli probability distribution is widely used to model
binary variables, such as 'failure' and 'success'. The most
typical example is the flip of a coin, when p is thought as the
probability of flipping a head, and q = 1 - p is the
probability of flipping a tail.
Support: \{0, 1\}
Mean: p
Variance: p \cdot (1 - p) = p \cdot q
Probability mass function (p.m.f):
P(X = x) = p^x (1 - p)^{1-x} = p^x q^{1-x}
Cumulative distribution function (c.d.f):
P(X \le x) =
\left \{
\begin{array}{ll}
0 & x < 0 \\
1 - p & 0 \leq x < 1 \\
1 & x \geq 1
\end{array}
\right.
Moment generating function (m.g.f):
E(e^{tX}) = (1 - p) + p e^t
Skewness:
\frac{1 - 2p}{\sqrt{p(1-p)}} = \frac{q - p}{\sqrt{pq}}
Excess Kurtosis:
\frac{1 - 6p(1-p)}{p(1-p)} = \frac{1 - 6pq}{pq}
See Also
Examples
dist <- dist_bernoulli(prob = c(0.05, 0.5, 0.3, 0.9, 0.1))
dist
mean(dist)
variance(dist)
skewness(dist)
kurtosis(dist)
generate(dist, 10)
density(dist, 2)
density(dist, 2, log = TRUE)
cdf(dist, 4)
quantile(dist, 0.7)
The Beta distribution
Description
The Beta distribution is a continuous probability distribution defined on the interval [0, 1], commonly used to model probabilities and proportions.
Usage
dist_beta(shape1, shape2)
Arguments
shape1, shape2 |
The non-negative shape parameters of the Beta distribution. |
Details
We recommend reading this documentation on pkgdown which renders math nicely. https://pkg.mitchelloharawild.com/distributional/reference/dist_beta.html
In the following, let X be a Beta random variable with parameters
shape1 = \alpha and shape2 = \beta.
Support: x \in [0, 1]
Mean: \frac{\alpha}{\alpha + \beta}
Variance: \frac{\alpha\beta}{(\alpha + \beta)^2(\alpha + \beta + 1)}
Probability density function (p.d.f):
f(x) = \frac{x^{\alpha - 1}(1-x)^{\beta - 1}}{B(\alpha, \beta)} =
\frac{\Gamma(\alpha + \beta)}{\Gamma(\alpha)\Gamma(\beta)} x^{\alpha - 1}(1-x)^{\beta - 1}
where B(\alpha, \beta) = \frac{\Gamma(\alpha)\Gamma(\beta)}{\Gamma(\alpha + \beta)}
is the Beta function.
Cumulative distribution function (c.d.f):
F(x) = I_x(alpha, beta) = \frac{B(x; \alpha, \beta)}{B(\alpha, \beta)}
where I_x(\alpha, \beta) is the regularized incomplete beta function and
B(x; \alpha, \beta) = \int_0^x t^{\alpha-1}(1-t)^{\beta-1} dt.
Moment generating function (m.g.f):
The moment generating function does not have a simple closed form, but the moments can be calculated as:
E(X^k) = \prod_{r=0}^{k-1} \frac{\alpha + r}{\alpha + \beta + r}
See Also
Examples
dist <- dist_beta(shape1 = c(0.5, 5, 1, 2, 2), shape2 = c(0.5, 1, 3, 2, 5))
dist
mean(dist)
variance(dist)
skewness(dist)
kurtosis(dist)
generate(dist, 10)
density(dist, 2)
density(dist, 2, log = TRUE)
cdf(dist, 4)
quantile(dist, 0.7)
The Binomial distribution
Description
Binomial distributions are used to represent situations can that can
be thought as the result of n Bernoulli experiments (here the
n is defined as the size of the experiment). The classical
example is n independent coin flips, where each coin flip has
probability p of success. In this case, the individual probability of
flipping heads or tails is given by the Bernoulli(p) distribution,
and the probability of having x equal results (x heads,
for example), in n trials is given by the Binomial(n, p) distribution.
The equation of the Binomial distribution is directly derived from
the equation of the Bernoulli distribution.
Usage
dist_binomial(size, prob)
Arguments
size |
The number of trials. Must be an integer greater than or equal
to one. When |
prob |
The probability of success on each trial, |
Details
We recommend reading this documentation on pkgdown which renders math nicely. https://pkg.mitchelloharawild.com/distributional/reference/dist_binomial.html
The Binomial distribution comes up when you are interested in the portion
of people who do a thing. The Binomial distribution
also comes up in the sign test, sometimes called the Binomial test
(see stats::binom.test()), where you may need the Binomial C.D.F. to
compute p-values.
In the following, let X be a Binomial random variable with parameter
size = n and p = p. Some textbooks define q = 1 - p,
or called \pi instead of p.
Support: \{0, 1, 2, ..., n\}
Mean: np
Variance: np \cdot (1 - p) = np \cdot q
Probability mass function (p.m.f):
P(X = k) = {n \choose k} p^k (1 - p)^{n-k}
Cumulative distribution function (c.d.f):
P(X \le k) = \sum_{i=0}^{\lfloor k \rfloor} {n \choose i} p^i (1 - p)^{n-i}
Moment generating function (m.g.f):
E(e^{tX}) = (1 - p + p e^t)^n
Skewness:
\frac{1 - 2p}{\sqrt{np(1-p)}}
Excess kurtosis:
\frac{1 - 6p(1-p)}{np(1-p)}
See Also
Examples
dist <- dist_binomial(size = 1:5, prob = c(0.05, 0.5, 0.3, 0.9, 0.1))
dist
mean(dist)
variance(dist)
skewness(dist)
kurtosis(dist)
generate(dist, 10)
density(dist, 2)
density(dist, 2, log = TRUE)
cdf(dist, 4)
quantile(dist, 0.7)
The Burr distribution
Description
The Burr distribution (Type XII) is a flexible continuous probability distribution often used for modeling income distributions, reliability data, and failure times.
Usage
dist_burr(shape1, shape2, rate = 1, scale = 1/rate)
Arguments
shape1, shape2, scale |
parameters. Must be strictly positive. |
rate |
an alternative way to specify the scale. |
Details
We recommend reading this documentation on pkgdown which renders math nicely. https://pkg.mitchelloharawild.com/distributional/reference/dist_burr.html
In the following, let X be a Burr random variable with parameters
shape1 = \alpha, shape2 = \gamma, and rate = \lambda.
Support: x \in (0, \infty)
Mean: \frac{\lambda^{-1/\alpha} \gamma B(\gamma - 1/\alpha, 1 + 1/\alpha)}{\gamma} (for \alpha \gamma > 1)
Variance: \frac{\lambda^{-2/\alpha} \gamma B(\gamma - 2/\alpha, 1 + 2/\alpha)}{\gamma} - \mu^2 (for \alpha \gamma > 2)
Probability density function (p.d.f):
f(x) = \alpha \gamma \lambda x^{\alpha - 1} (1 + \lambda x^\alpha)^{-\gamma - 1}
Cumulative distribution function (c.d.f):
F(x) = 1 - (1 + \lambda x^\alpha)^{-\gamma}
Quantile function:
F^{-1}(p) = \lambda^{-1/\alpha} ((1 - p)^{-1/\gamma} - 1)^{1/\alpha}
Moment generating function (m.g.f):
Does not exist in closed form.
See Also
Examples
dist <- dist_burr(shape1 = c(1,1,1,2,3,0.5), shape2 = c(1,2,3,1,1,2))
dist
mean(dist)
variance(dist)
support(dist)
generate(dist, 10)
density(dist, 2)
density(dist, 2, log = TRUE)
cdf(dist, 4)
quantile(dist, 0.7)
The Categorical distribution
Description
Categorical distributions are used to represent events with multiple
outcomes, such as what number appears on the roll of a dice. This is also
referred to as the 'generalised Bernoulli' or 'multinoulli' distribution.
The Categorical distribution is a special case of the Multinomial()
distribution with n = 1.
Usage
dist_categorical(prob, outcomes = NULL)
Arguments
prob |
A list of probabilities of observing each outcome category. |
outcomes |
The list of vectors where each value represents each outcome. |
Details
We recommend reading this documentation on pkgdown which renders math nicely. https://pkg.mitchelloharawild.com/distributional/reference/dist_categorical.html
In the following, let X be a Categorical random variable with
probability parameters prob = \{p_1, p_2, \ldots, p_k\}.
The Categorical probability distribution is widely used to model the
occurance of multiple events. A simple example is the roll of a dice, where
p = \{1/6, 1/6, 1/6, 1/6, 1/6, 1/6\} giving equal chance of observing
each number on a 6 sided dice.
Support: \{1, \ldots, k\}
Mean: Not defined for unordered categories. For ordered categories with
integer outcomes \{1, 2, \ldots, k\}, the mean is:
E(X) = \sum_{i=1}^{k} i \cdot p_i
Variance: Not defined for unordered categories. For ordered categories
with integer outcomes \{1, 2, \ldots, k\}, the variance is:
\text{Var}(X) = \sum_{i=1}^{k} i^2 \cdot p_i - \left(\sum_{i=1}^{k} i \cdot p_i\right)^2
Probability mass function (p.m.f):
P(X = i) = p_i
Cumulative distribution function (c.d.f):
The c.d.f is undefined for unordered categories. For ordered categories
with outcomes x_1 < x_2 < \ldots < x_k, the c.d.f is:
P(X \le x_j) = \sum_{i=1}^{j} p_i
Moment generating function (m.g.f):
E(e^{tX}) = \sum_{i=1}^{k} e^{tx_i} \cdot p_i
Skewness: Approximated numerically for ordered categories.
Kurtosis: Approximated numerically for ordered categories.
See Also
Examples
dist <- dist_categorical(prob = list(c(0.05, 0.5, 0.15, 0.2, 0.1), c(0.3, 0.1, 0.6)))
dist
generate(dist, 10)
density(dist, 2)
density(dist, 2, log = TRUE)
# The outcomes aren't ordered, so many statistics are not applicable.
cdf(dist, 0.6)
quantile(dist, 0.7)
mean(dist)
variance(dist)
skewness(dist)
kurtosis(dist)
# Some of these statistics are meaningful for ordered outcomes
dist <- dist_categorical(list(rpois(26, 3)), list(ordered(letters)))
dist
cdf(dist, "m")
quantile(dist, 0.5)
dist <- dist_categorical(
prob = list(c(0.05, 0.5, 0.15, 0.2, 0.1), c(0.3, 0.1, 0.6)),
outcomes = list(letters[1:5], letters[24:26])
)
generate(dist, 10)
density(dist, "a")
density(dist, "z", log = TRUE)
The Cauchy distribution
Description
The Cauchy distribution is the student's t distribution with one degree of freedom. The Cauchy distribution does not have a well defined mean or variance. Cauchy distributions often appear as priors in Bayesian contexts due to their heavy tails.
Usage
dist_cauchy(location, scale)
Arguments
location, scale |
location and scale parameters. |
Details
We recommend reading this documentation on pkgdown which renders math nicely. https://pkg.mitchelloharawild.com/distributional/reference/dist_cauchy.html
In the following, let X be a Cauchy variable with mean
location = x_0 and scale = \gamma.
Support: R, the set of all real numbers
Mean: Undefined.
Variance: Undefined.
Probability density function (p.d.f):
f(x) = \frac{1}{\pi \gamma \left[1 + \left(\frac{x - x_0}{\gamma} \right)^2 \right]}
Cumulative distribution function (c.d.f):
F(t) = \frac{1}{\pi} \arctan \left( \frac{t - x_0}{\gamma} \right) +
\frac{1}{2}
Moment generating function (m.g.f):
Does not exist.
See Also
Examples
dist <- dist_cauchy(location = c(0, 0, 0, -2), scale = c(0.5, 1, 2, 1))
dist
mean(dist)
variance(dist)
skewness(dist)
kurtosis(dist)
generate(dist, 10)
density(dist, 2)
density(dist, 2, log = TRUE)
cdf(dist, 4)
quantile(dist, 0.7)
The (non-central) Chi-Squared Distribution
Description
Chi-square distributions show up often in frequentist settings as the sampling distribution of test statistics, especially in maximum likelihood estimation settings.
Usage
dist_chisq(df, ncp = 0)
Arguments
df |
Degrees of freedom (non-centrality parameter). Can be any positive real number. |
ncp |
Non-centrality parameter. Can be any non-negative real number. Defaults to 0 (central chi-squared distribution). |
Details
We recommend reading this documentation on pkgdown which renders math nicely. https://pkg.mitchelloharawild.com/distributional/reference/dist_chisq.html
In the following, let X be a \chi^2 random variable with
df = k and ncp = \lambda.
Support: R^+, the set of positive real numbers
Mean: k + \lambda
Variance: 2(k + 2\lambda)
Probability density function (p.d.f):
For the central chi-squared distribution (\lambda = 0):
f(x) = \frac{1}{2^{k/2} \Gamma(k/2)} x^{k/2 - 1} e^{-x/2}
For the non-central chi-squared distribution (\lambda > 0):
f(x) = \frac{1}{2} e^{-(x+\lambda)/2} \left(\frac{x}{\lambda}\right)^{k/4-1/2} I_{k/2-1}\left(\sqrt{\lambda x}\right)
where I_\nu(z) is the modified Bessel function of the first kind.
Cumulative distribution function (c.d.f):
For the central chi-squared distribution (\lambda = 0):
F(x) = \frac{\gamma(k/2, x/2)}{\Gamma(k/2)} = P(k/2, x/2)
where \gamma(s, x) is the lower incomplete gamma function and
P(s, x) is the regularized gamma function.
For the non-central chi-squared distribution (\lambda > 0):
F(x) = \sum_{j=0}^{\infty} \frac{e^{-\lambda/2} (\lambda/2)^j}{j!} P(k/2 + j, x/2)
This is approximated numerically.
Moment generating function (m.g.f):
For the central chi-squared distribution (\lambda = 0):
E(e^{tX}) = (1 - 2t)^{-k/2}, \quad t < 1/2
For the non-central chi-squared distribution (\lambda > 0):
E(e^{tX}) = \frac{e^{\lambda t / (1 - 2t)}}{(1 - 2t)^{k/2}}, \quad t < 1/2
Skewness:
\gamma_1 = \frac{2^{3/2}(k + 3\lambda)}{(k + 2\lambda)^{3/2}}
For the central case (\lambda = 0), this simplifies to
\sqrt{8/k}.
Excess Kurtosis:
\gamma_2 = \frac{12(k + 4\lambda)}{(k + 2\lambda)^2}
For the central case (\lambda = 0), this simplifies to
12/k.
See Also
Examples
dist <- dist_chisq(df = c(1,2,3,4,6,9))
dist
mean(dist)
variance(dist)
skewness(dist)
kurtosis(dist)
generate(dist, 10)
density(dist, 2)
density(dist, 2, log = TRUE)
cdf(dist, 4)
quantile(dist, 0.7)
The degenerate distribution
Description
The degenerate distribution takes a single value which is certain to be observed. It takes a single parameter, which is the value that is observed by the distribution.
Usage
dist_degenerate(x)
Arguments
x |
The value of the distribution (location parameter). Can be any real number. |
Details
We recommend reading this documentation on pkgdown which renders math nicely. https://pkg.mitchelloharawild.com/distributional/reference/dist_degenerate.html
In the following, let X be a degenerate random variable with value
x = k_0.
Support: \{k_0\}, a single point
Mean: \mu = k_0
Variance: \sigma^2 = 0
Probability density function (p.d.f):
f(x) = 1 \textrm{ for } x = k_0
f(x) = 0 \textrm{ for } x \neq k_0
Cumulative distribution function (c.d.f):
F(t) = 0 \textrm{ for } t < k_0
F(t) = 1 \textrm{ for } t \ge k_0
Moment generating function (m.g.f):
E(e^{tX}) = e^{k_0 t}
Skewness: Undefined (NA)
Excess Kurtosis: Undefined (NA)
See Also
Examples
dist <- dist_degenerate(x = 1:5)
dist
mean(dist)
variance(dist)
skewness(dist)
kurtosis(dist)
generate(dist, 10)
density(dist, 2)
density(dist, 2, log = TRUE)
cdf(dist, 4)
quantile(dist, 0.7)
The Exponential Distribution
Description
Exponential distributions are frequently used to model waiting times and the time between events in a Poisson process.
Usage
dist_exponential(rate)
Arguments
rate |
vector of rates. |
Details
We recommend reading this documentation on pkgdown which renders math nicely. https://pkg.mitchelloharawild.com/distributional/reference/dist_exponential.html
In the following, let X be an Exponential random variable with
parameter rate = \lambda.
Support: x \in [0, \infty)
Mean: \frac{1}{\lambda}
Variance: \frac{1}{\lambda^2}
Probability density function (p.d.f):
f(x) = \lambda e^{-\lambda x}
Cumulative distribution function (c.d.f):
F(x) = 1 - e^{-\lambda x}
Moment generating function (m.g.f):
E(e^{tX}) = \frac{\lambda}{\lambda - t}, \quad t < \lambda
See Also
Examples
dist <- dist_exponential(rate = c(2, 1, 2/3))
dist
mean(dist)
variance(dist)
skewness(dist)
kurtosis(dist)
generate(dist, 10)
density(dist, 2)
density(dist, 2, log = TRUE)
cdf(dist, 4)
quantile(dist, 0.7)
The F Distribution
Description
The F distribution is commonly used in statistical inference, particularly in the analysis of variance (ANOVA), testing the equality of variances, and in regression analysis. It arises as the ratio of two scaled chi-squared distributions divided by their respective degrees of freedom.
Usage
dist_f(df1, df2, ncp = NULL)
Arguments
df1 |
Degrees of freedom for the numerator. Can be any positive number. |
df2 |
Degrees of freedom for the denominator. Can be any positive number. |
ncp |
Non-centrality parameter. If |
Details
We recommend reading this documentation on pkgdown which renders math nicely. https://pkg.mitchelloharawild.com/distributional/reference/dist_f.html
In the following, let X be an F random variable with numerator
degrees of freedom df1 = d_1 and denominator degrees of freedom
df2 = d_2.
Support: x \in (0, \infty)
Mean:
For the central F distribution (ncp = NULL):
E(X) = \frac{d_2}{d_2 - 2}
for d_2 > 2, otherwise undefined.
For the non-central F distribution with non-centrality parameter
ncp = \lambda:
E(X) = \frac{d_2 (d_1 + \lambda)}{d_1 (d_2 - 2)}
for d_2 > 2, otherwise undefined.
Variance:
For the central F distribution (ncp = NULL):
\text{Var}(X) = \frac{2 d_2^2 (d_1 + d_2 - 2)}{d_1 (d_2 - 2)^2 (d_2 - 4)}
for d_2 > 4, otherwise undefined.
For the non-central F distribution with non-centrality parameter
ncp = \lambda:
\text{Var}(X) = \frac{2 d_2^2}{d_1^2} \cdot \frac{(d_1 + \lambda)^2 + (d_1 + 2\lambda)(d_2 - 2)}{(d_2 - 2)^2 (d_2 - 4)}
for d_2 > 4, otherwise undefined.
Skewness:
For the central F distribution (ncp = NULL):
\text{Skew}(X) = \frac{(2 d_1 + d_2 - 2) \sqrt{8 (d_2 - 4)}}{(d_2 - 6) \sqrt{d_1 (d_1 + d_2 - 2)}}
for d_2 > 6, otherwise undefined.
For the non-central F distribution, skewness has no simple closed form and is not computed.
Excess Kurtosis:
For the central F distribution (ncp = NULL):
\text{Kurt}(X) = \frac{12[d_1 (5 d_2 - 22)(d_1 + d_2 - 2) + (d_2 - 4)(d_2 - 2)^2]}{d_1 (d_2 - 6)(d_2 - 8)(d_1 + d_2 - 2)}
for d_2 > 8, otherwise undefined.
For the non-central F distribution, kurtosis has no simple closed form and is not computed.
Probability density function (p.d.f):
For the central F distribution (ncp = NULL):
f(x) = \frac{\sqrt{\frac{(d_1 x)^{d_1} d_2^{d_2}}{(d_1 x + d_2)^{d_1 + d_2}}}}{x \, B(d_1/2, d_2/2)}
where B(\cdot, \cdot) is the beta function.
For the non-central F distribution, the density involves an infinite series and is approximated numerically.
Cumulative distribution function (c.d.f):
The c.d.f. does not have a simple closed form expression and is approximated numerically using regularized incomplete beta functions and related special functions.
Moment generating function (m.g.f):
The moment generating function for the F distribution does not exist
in general (it diverges for t > 0).
See Also
Examples
dist <- dist_f(df1 = c(1,2,5,10,100), df2 = c(1,1,2,1,100))
dist
mean(dist)
variance(dist)
skewness(dist)
kurtosis(dist)
generate(dist, 10)
density(dist, 2)
density(dist, 2, log = TRUE)
cdf(dist, 4)
quantile(dist, 0.7)
The Gamma distribution
Description
Several important distributions are special cases of the Gamma
distribution. When the shape parameter is 1, the Gamma is an
exponential distribution with parameter 1/\beta. When the
shape = n/2 and rate = 1/2, the Gamma is a equivalent to
a chi squared distribution with n degrees of freedom. Moreover, if
we have X_1 is Gamma(\alpha_1, \beta) and
X_2 is Gamma(\alpha_2, \beta), a function of these two variables
of the form \frac{X_1}{X_1 + X_2} Beta(\alpha_1, \alpha_2).
This last property frequently appears in another distributions, and it
has extensively been used in multivariate methods. More about the Gamma
distribution will be added soon.
Usage
dist_gamma(shape, rate = 1/scale, scale = 1/rate)
Arguments
shape, scale |
shape and scale parameters. Must be positive,
|
rate |
an alternative way to specify the scale. |
Details
We recommend reading this documentation on pkgdown which renders math nicely. https://pkg.mitchelloharawild.com/distributional/reference/dist_gamma.html
In the following, let X be a Gamma random variable
with parameters
shape = \alpha and
rate = \beta.
Support: x \in (0, \infty)
Mean: \frac{\alpha}{\beta}
Variance: \frac{\alpha}{\beta^2}
Probability density function (p.m.f):
f(x) = \frac{\beta^{\alpha}}{\Gamma(\alpha)} x^{\alpha - 1} e^{-\beta x}
Cumulative distribution function (c.d.f):
f(x) = \frac{\Gamma(\alpha, \beta x)}{\Gamma{\alpha}}
Moment generating function (m.g.f):
E(e^{tX}) = \Big(\frac{\beta}{ \beta - t}\Big)^{\alpha}, \thinspace t < \beta
See Also
Examples
dist <- dist_gamma(shape = c(1,2,3,5,9,7.5,0.5), rate = c(0.5,0.5,0.5,1,2,1,1))
dist
mean(dist)
variance(dist)
skewness(dist)
kurtosis(dist)
generate(dist, 10)
density(dist, 2)
density(dist, 2, log = TRUE)
cdf(dist, 4)
quantile(dist, 0.7)
The Geometric Distribution
Description
The Geometric distribution can be thought of as a generalization
of the dist_bernoulli() distribution where we ask: "if I keep flipping a
coin with probability p of heads, what is the probability I need
k flips before I get my first heads?" The Geometric
distribution is a special case of Negative Binomial distribution.
Usage
dist_geometric(prob)
Arguments
prob |
probability of success in each trial. |
Details
We recommend reading this documentation on pkgdown which renders math nicely. https://pkg.mitchelloharawild.com/distributional/reference/dist_geometric.html
In the following, let X be a Geometric random variable with
success probability prob = p. Note that there are multiple
parameterizations of the Geometric distribution.
Support: \{0, 1, 2, 3, ...\}
Mean: \frac{1-p}{p}
Variance: \frac{1-p}{p^2}
Probability mass function (p.m.f):
P(X = k) = p(1-p)^k
Cumulative distribution function (c.d.f):
P(X \le k) = 1 - (1-p)^{k+1}
Moment generating function (m.g.f):
E(e^{tX}) = \frac{pe^t}{1 - (1-p)e^t}
Skewness:
\frac{2 - p}{\sqrt{1 - p}}
Excess Kurtosis:
6 + \frac{p^2}{1 - p}
See Also
Examples
dist <- dist_geometric(prob = c(0.2, 0.5, 0.8))
dist
mean(dist)
variance(dist)
skewness(dist)
kurtosis(dist)
generate(dist, 10)
density(dist, 2)
density(dist, 2, log = TRUE)
cdf(dist, 4)
quantile(dist, 0.7)
The Generalized Extreme Value Distribution
Description
The GEV distribution is widely used in extreme value theory to model the distribution of maxima (or minima) of samples. The parametric form encompasses the Gumbel, Frechet, and reverse Weibull distributions.
Usage
dist_gev(location, scale, shape)
Arguments
location |
the location parameter |
scale |
the scale parameter |
shape |
the shape parameter |
Details
We recommend reading this documentation on pkgdown which renders math nicely. https://pkg.mitchelloharawild.com/distributional/reference/dist_gev.html
In the following, let X be a GEV random variable with parameters
location = \mu, scale = \sigma, and shape = \xi.
Support:
-
x \in \mathbb{R}(all real numbers) if\xi = 0 -
x \geq \mu - \sigma/\xiif\xi > 0 -
x \leq \mu - \sigma/\xiif\xi < 0
Mean:
E(X) = \begin{cases}
\mu + \sigma \gamma & \text{if } \xi = 0 \\
\mu + \sigma \frac{\Gamma(1-\xi) - 1}{\xi} & \text{if } \xi < 1 \\
\infty & \text{if } \xi \geq 1
\end{cases}
where \gamma \approx 0.5772 is the Euler-Mascheroni constant and
\Gamma(\cdot) is the gamma function.
Median:
\text{Median}(X) = \begin{cases}
\mu - \sigma \log(\log 2) & \text{if } \xi = 0 \\
\mu + \sigma \frac{(\log 2)^{-\xi} - 1}{\xi} & \text{if } \xi \neq 0
\end{cases}
Variance:
\text{Var}(X) = \begin{cases}
\frac{\pi^2 \sigma^2}{6} & \text{if } \xi = 0 \\
\frac{\sigma^2}{\xi^2} [\Gamma(1-2\xi) - \Gamma(1-\xi)^2] & \text{if } \xi < 0.5 \\
\infty & \text{if } \xi \geq 0.5
\end{cases}
Probability density function (p.d.f):
For \xi = 0 (Gumbel):
f(x) = \frac{1}{\sigma} \exp\left(-\frac{x-\mu}{\sigma}\right)
\exp\left[-\exp\left(-\frac{x-\mu}{\sigma}\right)\right]
For \xi \neq 0:
f(x) = \frac{1}{\sigma} \left[1 + \xi\left(\frac{x-\mu}{\sigma}\right)\right]^{-1/\xi-1}
\exp\left\{-\left[1 + \xi\left(\frac{x-\mu}{\sigma}\right)\right]^{-1/\xi}\right\}
where 1 + \xi(x-\mu)/\sigma > 0.
Cumulative distribution function (c.d.f):
For \xi = 0 (Gumbel):
F(x) = \exp\left[-\exp\left(-\frac{x-\mu}{\sigma}\right)\right]
For \xi \neq 0:
F(x) = \exp\left\{-\left[1+\xi\left(\frac{x-\mu}{\sigma}\right)\right]^{-1/\xi}\right\}
where 1 + \xi(x-\mu)/\sigma > 0.
Quantile function:
For \xi = 0 (Gumbel):
Q(p) = \mu - \sigma \log(-\log p)
For \xi \neq 0:
Q(p) = \mu + \frac{\sigma}{\xi}\left[(-\log p)^{-\xi} - 1\right]
References
Jenkinson, A. F. (1955) The frequency distribution of the annual maximum (or minimum) of meteorological elements. Quart. J. R. Met. Soc., 81, 158–171.
See Also
Examples
# Create GEV distributions with different shape parameters
# Gumbel distribution (shape = 0)
gumbel <- dist_gev(location = 0, scale = 1, shape = 0)
# Frechet distribution (shape > 0, heavy-tailed)
frechet <- dist_gev(location = 0, scale = 1, shape = 0.3)
# Reverse Weibull distribution (shape < 0, bounded above)
weibull <- dist_gev(location = 0, scale = 1, shape = -0.2)
dist <- c(gumbel, frechet, weibull)
dist
# Statistical properties
mean(dist)
median(dist)
variance(dist)
# Generate random samples
generate(dist, 10)
# Evaluate density
density(dist, 2)
density(dist, 2, log = TRUE)
# Evaluate cumulative distribution
cdf(dist, 4)
# Calculate quantiles
quantile(dist, 0.95)
The generalised g-and-h Distribution
Description
The generalised g-and-h distribution is a flexible distribution used to model univariate data, similar to the g-k distribution. It is known for its ability to handle skewness and heavy-tailed behavior.
Usage
dist_gh(A, B, g, h, c = 0.8)
Arguments
A |
Vector of A (location) parameters. |
B |
Vector of B (scale) parameters. Must be positive. |
g |
Vector of g parameters. |
h |
Vector of h parameters. Must be non-negative. |
c |
Vector of c parameters (used for generalised g-and-h). Often fixed at 0.8 which is the default. |
Details
We recommend reading this documentation on pkgdown which renders math nicely. https://pkg.mitchelloharawild.com/distributional/reference/dist_gh.html
In the following, let X be a g-and-h random variable with parameters
A = A, B = B, g = g, h = h, and c = c.
Support: (-\infty, \infty)
Mean: Does not have a closed-form expression. Approximated numerically.
Variance: Does not have a closed-form expression. Approximated numerically.
Probability density function (p.d.f):
The g-and-h distribution does not have a closed-form expression for its density. The density is approximated numerically from the quantile function. The distribution is defined through its quantile function:
Q(u) = A + B \left( 1 + c \frac{1 - \exp(-gz(u))}{1 + \exp(-gz(u))} \right) \exp(h z(u)^2/2) z(u)
where z(u) = \Phi^{-1}(u) is the standard normal quantile function.
Cumulative distribution function (c.d.f):
Does not have a closed-form expression. The cumulative distribution function is approximated numerically by inverting the quantile function.
Quantile function:
Q(p) = A + B \left( 1 + c \frac{1 - \exp(-g\Phi^{-1}(p))}{1 + \exp(-g\Phi^{-1}(p))} \right) \exp(h (\Phi^{-1}(p))^2/2) \Phi^{-1}(p)
where \Phi^{-1}(p) is the standard normal quantile function.
See Also
gk::dgh(), gk::pgh(), gk::qgh(), gk::rgh(), dist_gk()
Examples
dist <- dist_gh(A = 0, B = 1, g = 0, h = 0.5)
dist
mean(dist)
variance(dist)
support(dist)
generate(dist, 10)
density(dist, 2)
density(dist, 2, log = TRUE)
cdf(dist, 4)
quantile(dist, 0.7)
The g-and-k Distribution
Description
The g-and-k distribution is a flexible distribution often used to model univariate data. It is particularly known for its ability to handle skewness and heavy-tailed behavior.
Usage
dist_gk(A, B, g, k, c = 0.8)
Arguments
A |
Vector of A (location) parameters. |
B |
Vector of B (scale) parameters. Must be positive. |
g |
Vector of g parameters. |
k |
Vector of k parameters. Must be at least -0.5. |
c |
Vector of c parameters. Often fixed at 0.8 which is the default. |
Details
We recommend reading this documentation on pkgdown which renders math nicely. https://pkg.mitchelloharawild.com/distributional/reference/dist_gk.html
In the following, let X be a g-k random variable with parameters
A, B, g, k, and c.
Support: (-\infty, \infty)
Mean: Not available in closed form.
Variance: Not available in closed form.
Probability density function (p.d.f):
The g-k distribution does not have a closed-form expression for its density. Instead, it is defined through its quantile function:
Q(u) = A + B \left( 1 + c \frac{1 - \exp(-gz(u))}{1 + \exp(-gz(u))} \right) (1 + z(u)^2)^k z(u)
where z(u) = \Phi^{-1}(u), the standard normal quantile of u.
Cumulative distribution function (c.d.f):
The cumulative distribution function is typically evaluated numerically due to the lack of a closed-form expression.
See Also
Examples
dist <- dist_gk(A = 0, B = 1, g = 0, k = 0.5)
dist
mean(dist)
variance(dist)
support(dist)
generate(dist, 10)
density(dist, 2)
density(dist, 2, log = TRUE)
cdf(dist, 4)
quantile(dist, 0.7)
The Generalized Pareto Distribution
Description
The GPD distribution is commonly used to model the tails of distributions, particularly in extreme value theory.
The Pickands–Balkema–De Haan theorem states that for a large class of distributions, the tail (above some threshold) can be approximated by a GPD.
Usage
dist_gpd(location, scale, shape)
Arguments
location |
the location parameter |
scale |
the scale parameter |
shape |
the shape parameter |
Details
We recommend reading this documentation on pkgdown which renders math nicely. https://pkg.mitchelloharawild.com/distributional/reference/dist_gpd.html
In the following, let X be a Generalized Pareto random variable with
parameters location = a, scale = b > 0, and
shape = s.
Support:
x \ge a if s \ge 0,
a \le x \le a - b/s if s < 0
Mean:
E(X) = a + \frac{b}{1 - s} \quad \textrm{for } s < 1
E(X) = \infty for s \ge 1
Variance:
\textrm{Var}(X) = \frac{b^2}{(1-s)^2(1-2s)} \quad \textrm{for } s < 0.5
\textrm{Var}(X) = \infty for s \ge 0.5
Probability density function (p.d.f):
For s = 0:
f(x) = \frac{1}{b}\exp\left(-\frac{x-a}{b}\right) \quad \textrm{for } x \ge a
For s \ne 0:
f(x) = \frac{1}{b}\left(1 + s\frac{x-a}{b}\right)^{-1/s - 1}
where 1 + s(x-a)/b > 0
Cumulative distribution function (c.d.f):
For s = 0:
F(x) = 1 - \exp\left(-\frac{x-a}{b}\right) \quad \textrm{for } x \ge a
For s \ne 0:
F(x) = 1 - \left(1 + s\frac{x-a}{b}\right)^{-1/s}
where 1 + s(x-a)/b > 0
Quantile function:
For s = 0:
Q(p) = a - b\log(1-p)
For s \ne 0:
Q(p) = a + \frac{b}{s}\left[(1-p)^{-s} - 1\right]
Median:
For s = 0:
\textrm{Median}(X) = a + b\log(2)
For s \ne 0:
\textrm{Median}(X) = a + \frac{b}{s}\left(2^s - 1\right)
Skewness and Kurtosis: No closed-form expressions; approximated numerically.
See Also
Examples
dist <- dist_gpd(location = 0, scale = 1, shape = 0)
dist
mean(dist)
variance(dist)
generate(dist, 10)
density(dist, 2)
density(dist, 2, log = TRUE)
cdf(dist, 4)
quantile(dist, 0.7)
The Gumbel distribution
Description
The Gumbel distribution is a special case of the Generalized Extreme Value
distribution, obtained when the GEV shape parameter \xi is equal to 0.
It may be referred to as a type I extreme value distribution.
Usage
dist_gumbel(alpha, scale)
Arguments
alpha |
location parameter. |
scale |
parameter. Must be strictly positive. |
Details
We recommend reading this documentation on pkgdown which renders math nicely. https://pkg.mitchelloharawild.com/distributional/reference/dist_gumbel.html
In the following, let X be a Gumbel random variable with location
parameter alpha = \alpha and scale parameter scale = \sigma.
Support: R, the set of all real numbers.
Mean:
E(X) = \alpha + \sigma\gamma
where \gamma is the Euler-Mascheroni constant,
approximately equal to 0.5772157.
Variance:
\textrm{Var}(X) = \frac{\pi^2 \sigma^2}{6}
Skewness:
\textrm{Skew}(X) = \frac{12\sqrt{6}\zeta(3)}{\pi^3} \approx 1.1395
where \zeta(3) is Apery's constant,
approximately equal to 1.2020569. Note that skewness is independent
of the distribution parameters.
Kurtosis (excess):
\textrm{Kurt}(X) = \frac{12}{5} = 2.4
Note that excess kurtosis is independent of the distribution parameters.
Median:
\textrm{Median}(X) = \alpha - \sigma\ln(\ln 2)
Probability density function (p.d.f):
f(x) = \frac{1}{\sigma} \exp\left[-\frac{x - \alpha}{\sigma}\right]
\exp\left\{-\exp\left[-\frac{x - \alpha}{\sigma}\right]\right\}
for x in R, the set of all real numbers.
Cumulative distribution function (c.d.f):
F(x) = \exp\left\{-\exp\left[-\frac{x - \alpha}{\sigma}\right]\right\}
for x in R, the set of all real numbers.
Quantile function (inverse c.d.f):
F^{-1}(p) = \alpha - \sigma \ln(-\ln p)
for p in (0, 1).
Moment generating function (m.g.f):
E(e^{tX}) = \Gamma(1 - \sigma t) e^{\alpha t}
for \sigma t < 1, where \Gamma is the gamma function.
See Also
actuar::Gumbel, actuar::dgumbel(), actuar::pgumbel(),
actuar::qgumbel(), actuar::rgumbel(), actuar::mgumbel()
Examples
dist <- dist_gumbel(alpha = c(0.5, 1, 1.5, 3), scale = c(2, 2, 3, 4))
dist
mean(dist)
variance(dist)
skewness(dist)
kurtosis(dist)
support(dist)
generate(dist, 10)
density(dist, 2)
density(dist, 2, log = TRUE)
cdf(dist, 4)
quantile(dist, 0.7)
The Hypergeometric distribution
Description
To understand the HyperGeometric distribution, consider a set of
r objects, of which m are of the type I and
n are of the type II. A sample with size k (k<r)
with no replacement is randomly chosen. The number of observed
type I elements observed in this sample is set to be our random
variable X.
Usage
dist_hypergeometric(m, n, k)
Arguments
m |
The number of type I elements available. |
n |
The number of type II elements available. |
k |
The size of the sample taken. |
Details
We recommend reading this documentation on pkgdown which renders math nicely. https://pkg.mitchelloharawild.com/distributional/reference/dist_hypergeometric.html
In the following, let X be a HyperGeometric random variable with
success probability p = p = m/(m+n).
Support: x \in \{\max(0, k-n), \dots, \min(k,m)\}
Mean: \frac{km}{m+n} = kp
Variance: \frac{kmn(m+n-k)}{(m+n)^2 (m+n-1)} =
kp(1-p)\left(1 - \frac{k-1}{m+n-1}\right)
Probability mass function (p.m.f):
P(X = x) = \frac{{m \choose x}{n \choose k-x}}{{m+n \choose k}}
Cumulative distribution function (c.d.f):
P(X \le x) = \sum_{i = \max(0, k-n)}^{\lfloor x \rfloor}
\frac{{m \choose i}{n \choose k-i}}{{m+n \choose k}}
Moment generating function (m.g.f):
E(e^{tX}) = \frac{{m \choose k}}{{m+n \choose k}}{}_2F_1(-m, -k; m+n-k+1; e^t)
where _2F_1 is the hypergeometric function.
Skewness:
\frac{(m+n-2k)(m+n-1)^{1/2}(m+n-2n)}{[kmn(m+n-k)]^{1/2}(m+n-2)}
See Also
Examples
dist <- dist_hypergeometric(m = rep(500, 3), n = c(50, 60, 70), k = c(100, 200, 300))
dist
mean(dist)
variance(dist)
skewness(dist)
kurtosis(dist)
generate(dist, 10)
density(dist, 2)
density(dist, 2, log = TRUE)
cdf(dist, 4)
quantile(dist, 0.7)
Inflate a value of a probability distribution
Description
Inflated distributions add extra probability mass at a specific value, most commonly zero (zero-inflation). These distributions are useful for modeling data with excess observations at a particular value compared to what the base distribution would predict. Common applications include zero-inflated Poisson or negative binomial models for count data with many zeros.
Usage
dist_inflated(dist, prob, x = 0)
Arguments
dist |
The distribution(s) to inflate. |
prob |
The added probability of observing |
x |
The value to inflate. The default of |
Details
We recommend reading this documentation on pkgdown which renders math nicely. https://pkg.mitchelloharawild.com/distributional/reference/dist_inflated.html
In the following, let Y be an inflated random variable based on
a base distribution X, with inflation value x = c and
inflation probability prob = p.
Support: Same as the base distribution, but with additional
probability mass at c
Mean: (when x is numeric)
E(Y) = p \cdot c + (1-p) \cdot E(X)
Variance: (when x = 0)
\text{Var}(Y) = (1-p) \cdot \text{Var}(X) + p(1-p) \cdot [E(X)]^2
For non-zero inflation values, the variance is not computed in closed form.
Probability mass/density function (p.m.f/p.d.f):
For discrete distributions:
f_Y(y) = \begin{cases}
p + (1-p) \cdot f_X(c) & \text{if } y = c \\
(1-p) \cdot f_X(y) & \text{if } y \neq c
\end{cases}
For continuous distributions:
f_Y(y) = \begin{cases}
p & \text{if } y = c \\
(1-p) \cdot f_X(y) & \text{if } y \neq c
\end{cases}
Cumulative distribution function (c.d.f):
F_Y(q) = \begin{cases}
(1-p) \cdot F_X(q) & \text{if } q < c \\
p + (1-p) \cdot F_X(q) & \text{if } q \geq c
\end{cases}
Quantile function:
The quantile function is computed numerically by inverting the inflated CDF, accounting for the jump in probability at the inflation point.
Examples
# Zero-inflated Poisson
dist <- dist_inflated(dist_poisson(lambda = 2), prob = 0.3, x = 0)
dist
mean(dist)
variance(dist)
generate(dist, 10)
density(dist, 0)
density(dist, 1)
cdf(dist, 2)
quantile(dist, 0.5)
The Inverse Exponential distribution
Description
The Inverse Exponential distribution is used to model the reciprocal of exponentially distributed variables.
Usage
dist_inverse_exponential(rate)
Arguments
rate |
an alternative way to specify the scale. |
Details
We recommend reading this documentation on pkgdown which renders math nicely. https://pkg.mitchelloharawild.com/distributional/reference/dist_inverse_exponential.html
In the following, let X be an Inverse Exponential random variable
with parameter rate = \lambda.
Support: x > 0
Mean: Does not exist, returns NA
Variance: Does not exist, returns NA
Probability density function (p.d.f):
f(x) = \frac{\lambda}{x^2} e^{-\lambda/x}
Cumulative distribution function (c.d.f):
F(x) = e^{-\lambda/x}
Quantile function (inverse c.d.f):
F^{-1}(p) = -\frac{\lambda}{\log(p)}
Moment generating function (m.g.f):
Does not exist (divergent integral).
See Also
Examples
dist <- dist_inverse_exponential(rate = 1:5)
dist
mean(dist)
variance(dist)
support(dist)
generate(dist, 10)
density(dist, 2)
density(dist, 2, log = TRUE)
cdf(dist, 4)
quantile(dist, 0.7)
The Inverse Gamma distribution
Description
The Inverse Gamma distribution is commonly used as a prior distribution in Bayesian statistics, particularly for variance parameters.
Usage
dist_inverse_gamma(shape, rate = 1/scale, scale)
Arguments
shape, scale |
parameters. Must be strictly positive. |
rate |
an alternative way to specify the scale. |
Details
We recommend reading this documentation on pkgdown which renders math nicely. https://pkg.mitchelloharawild.com/distributional/reference/dist_inverse_gamma.html
In the following, let X be an Inverse Gamma random variable with
shape parameter shape = \alpha and rate parameter
rate = \beta (equivalently, scale = 1/\beta).
Support: x \in (0, \infty)
Mean: \frac{\beta}{\alpha - 1} for \alpha > 1,
otherwise undefined
Variance: \frac{\beta^2}{(\alpha - 1)^2 (\alpha - 2)}
for \alpha > 2, otherwise undefined
Probability density function (p.d.f):
f(x) = \frac{\beta^\alpha}{\Gamma(\alpha)} x^{-\alpha - 1}
e^{-\beta/x}
Cumulative distribution function (c.d.f):
F(x) = \frac{\Gamma(\alpha, \beta/x)}{\Gamma(\alpha)} =
Q(\alpha, \beta/x)
where \Gamma(\alpha, z) is the upper incomplete gamma function and
Q is the regularized incomplete gamma function.
Moment generating function (m.g.f):
M_X(t) = \frac{2 (-\beta t)^{\alpha/2}}{\Gamma(\alpha)}
K_\alpha\left(\sqrt{-4\beta t}\right)
for t < 0, where K_\alpha is the modified Bessel function
of the second kind. The MGF does not exist for t \ge 0.
See Also
Examples
dist <- dist_inverse_gamma(shape = c(1,2,3,3), rate = c(1,1,1,2))
dist
mean(dist)
variance(dist)
support(dist)
generate(dist, 10)
density(dist, 2)
density(dist, 2, log = TRUE)
cdf(dist, 4)
quantile(dist, 0.7)
The Inverse Gaussian distribution
Description
Usage
dist_inverse_gaussian(mean, shape)
Arguments
mean, shape |
parameters. Must be strictly positive. Infinite values are supported. |
Details
The inverse Gaussian distribution (also known as the Wald distribution) is commonly used to model positive-valued data, particularly in contexts involving first passage times and reliability analysis.
We recommend reading this documentation on pkgdown which renders math nicely. https://pkg.mitchelloharawild.com/distributional/reference/dist_inverse_gaussian.html
In the following, let X be an Inverse Gaussian random variable with
parameters mean = \mu and shape = \lambda.
Support: (0, \infty)
Mean: \mu
Variance: \frac{\mu^3}{\lambda}
Probability density function (p.d.f):
f(x) = \sqrt{\frac{\lambda}{2\pi x^3}}
\exp\left(-\frac{\lambda(x - \mu)^2}{2\mu^2 x}\right)
Cumulative distribution function (c.d.f):
F(x) = \Phi\left(\sqrt{\frac{\lambda}{x}}
\left(\frac{x}{\mu} - 1\right)\right) +
\exp\left(\frac{2\lambda}{\mu}\right)
\Phi\left(-\sqrt{\frac{\lambda}{x}}
\left(\frac{x}{\mu} + 1\right)\right)
where \Phi is the standard normal c.d.f.
Moment generating function (m.g.f):
E(e^{tX}) = \exp\left(\frac{\lambda}{\mu}
\left(1 - \sqrt{1 - \frac{2\mu^2 t}{\lambda}}\right)\right)
for t < \frac{\lambda}{2\mu^2}.
Skewness: 3\sqrt{\frac{\mu}{\lambda}}
Excess Kurtosis: \frac{15\mu}{\lambda}
Quantiles: No closed-form expression, approximated numerically.
See Also
Examples
dist <- dist_inverse_gaussian(mean = c(1,1,1,3,3), shape = c(0.2, 1, 3, 0.2, 1))
dist
mean(dist)
variance(dist)
support(dist)
generate(dist, 10)
density(dist, 2)
density(dist, 2, log = TRUE)
cdf(dist, 4)
quantile(dist, 0.7)
The Laplace distribution
Description
The Laplace distribution, also known as the double exponential distribution, is a continuous probability distribution that is symmetric around its location parameter.
Usage
dist_laplace(mu, sigma)
Arguments
mu |
The location parameter (mean) of the Laplace distribution. |
sigma |
The positive scale parameter of the Laplace distribution. |
Details
We recommend reading this documentation on pkgdown which renders math nicely. https://pkg.mitchelloharawild.com/distributional/reference/dist_laplace.html
In the following, let X be a Laplace random variable with location
parameter mu = \mu and scale parameter sigma = \sigma.
Support: R, the set of all real numbers
Mean: \mu
Variance: 2\sigma^2
Probability density function (p.d.f):
f(x) = \frac{1}{2\sigma} \exp\left(-\frac{|x - \mu|}{\sigma}\right)
Cumulative distribution function (c.d.f):
F(x) = \begin{cases}
\frac{1}{2} \exp\left(\frac{x - \mu}{\sigma}\right) & \text{if } x < \mu \\
1 - \frac{1}{2} \exp\left(-\frac{x - \mu}{\sigma}\right) & \text{if } x \geq \mu
\end{cases}
Moment generating function (m.g.f):
E(e^{tX}) = \frac{\exp(\mu t)}{1 - \sigma^2 t^2} \text{ for } |t| < \frac{1}{\sigma}
See Also
Examples
dist <- dist_laplace(mu = c(0, 2, -1), sigma = c(1, 2, 0.5))
dist
mean(dist)
variance(dist)
skewness(dist)
kurtosis(dist)
generate(dist, 10)
density(dist, 0)
density(dist, 0, log = TRUE)
cdf(dist, 1)
quantile(dist, 0.7)
The Logarithmic distribution
Description
The Logarithmic distribution is a discrete probability distribution derived from the logarithmic series. It is useful in modeling the abundance of species and other phenomena where the frequency of an event follows a logarithmic pattern.
Usage
dist_logarithmic(prob)
Arguments
prob |
parameter. |
Details
We recommend reading this documentation on pkgdown which renders math nicely. https://pkg.mitchelloharawild.com/distributional/reference/dist_logarithmic.html
In the following, let X be a Logarithmic random variable with
parameter prob = p.
Support: \{1, 2, 3, ...\}
Mean: \frac{-1}{\log(1-p)} \cdot \frac{p}{1-p}
Variance: \frac{-(p^2 + p\log(1-p))}{[(1-p)\log(1-p)]^2}
Probability mass function (p.m.f):
P(X = k) = \frac{-1}{\log(1-p)} \cdot \frac{p^k}{k}
for k = 1, 2, 3, \ldots
Cumulative distribution function (c.d.f):
The c.d.f. does not have a simple closed form. It is computed
using the recurrence relationship
P(X = k+1) = \frac{p \cdot k}{k+1} \cdot P(X = k)
starting from P(X = 1) = \frac{-p}{\log(1-p)}.
Moment generating function (m.g.f):
E(e^{tX}) = \frac{\log(1 - pe^t)}{\log(1-p)}
for pe^t < 1
See Also
Examples
dist <- dist_logarithmic(prob = c(0.33, 0.66, 0.99))
dist
mean(dist)
variance(dist)
support(dist)
generate(dist, 10)
density(dist, 2)
density(dist, 2, log = TRUE)
cdf(dist, 4)
quantile(dist, 0.7)
The Logistic distribution
Description
A continuous distribution on the real line. For binary outcomes
the model given by P(Y = 1 | X) = F(X \beta) where
F is the Logistic cdf() is called logistic regression.
Usage
dist_logistic(location, scale)
Arguments
location, scale |
location and scale parameters. |
Details
We recommend reading this documentation on pkgdown which renders math nicely. https://pkg.mitchelloharawild.com/distributional/reference/dist_logistic.html
In the following, let X be a Logistic random variable with
location = \mu and scale = s.
Support: R, the set of all real numbers
Mean: \mu
Variance: s^2 \pi^2 / 3
Probability density function (p.d.f):
f(x) = \frac{e^{-\frac{x - \mu}{s}}}{s \left[1 + e^{-\frac{x - \mu}{s}}\right]^2}
Cumulative distribution function (c.d.f):
F(x) = \frac{1}{1 + e^{-\frac{x - \mu}{s}}}
Moment generating function (m.g.f):
E(e^{tX}) = e^{\mu t} B(1 - st, 1 + st)
for -1 < st < 1, where B(a, b) is the Beta function.
See Also
Examples
dist <- dist_logistic(location = c(5,9,9,6,2), scale = c(2,3,4,2,1))
dist
mean(dist)
variance(dist)
skewness(dist)
kurtosis(dist)
generate(dist, 10)
density(dist, 2)
density(dist, 2, log = TRUE)
cdf(dist, 4)
quantile(dist, 0.7)
The log-normal distribution
Description
The log-normal distribution is a commonly used transformation of the Normal
distribution. If X follows a log-normal distribution, then \ln{X}
would be characterised by a Normal distribution.
Usage
dist_lognormal(mu = 0, sigma = 1)
Arguments
mu |
The mean (location parameter) of the distribution, which is the mean of the associated Normal distribution. Can be any real number. |
sigma |
The standard deviation (scale parameter) of the distribution. Can be any positive number. |
Details
We recommend reading this documentation on pkgdown which renders math nicely. https://pkg.mitchelloharawild.com/distributional/reference/dist_lognormal.html
In the following, let X be a log-normal random variable with
mu = \mu and sigma = \sigma.
Support: R^+, the set of positive real numbers.
Mean: e^{\mu + \sigma^2/2}
Variance: (e^{\sigma^2} - 1) e^{2\mu + \sigma^2}
Skewness: (e^{\sigma^2} + 2) \sqrt{e^{\sigma^2} - 1}
Excess Kurtosis: e^{4\sigma^2} + 2 e^{3\sigma^2} + 3 e^{2\sigma^2} - 6
Probability density function (p.d.f):
f(x) = \frac{1}{x\sqrt{2 \pi \sigma^2}} e^{-(\ln{x} - \mu)^2 / (2 \sigma^2)}
Cumulative distribution function (c.d.f):
F(x) = \Phi\left(\frac{\ln{x} - \mu}{\sigma}\right)
where \Phi is the c.d.f. of the standard Normal distribution.
Moment generating function (m.g.f):
Does not exist in closed form.
See Also
Examples
dist <- dist_lognormal(mu = 1:5, sigma = 0.1)
dist
mean(dist)
variance(dist)
skewness(dist)
kurtosis(dist)
generate(dist, 10)
density(dist, 2)
density(dist, 2, log = TRUE)
cdf(dist, 4)
quantile(dist, 0.7)
# A log-normal distribution X is exp(Y), where Y is a Normal distribution of
# the same parameters. So log(X) will produce the Normal distribution Y.
log(dist)
Missing distribution
Description
A placeholder distribution for handling missing values in a vector of distributions.
Usage
dist_missing(length = 1)
Arguments
length |
The number of missing distributions |
Details
We recommend reading this documentation on pkgdown which renders math nicely. https://pkg.mitchelloharawild.com/distributional/reference/dist_missing.html
The missing distribution represents the absence of distributional
information. It is used as a placeholder when distribution values are
not available or not applicable, similar to how NA is used for missing
scalar values.
Support: Undefined
Mean: \text{NA}
Variance: \text{NA}
Skewness: \text{NA}
Kurtosis: \text{NA}
Probability density function (p.d.f): Undefined
f(x) = \text{NA}
Cumulative distribution function (c.d.f): Undefined
F(t) = \text{NA}
Quantile function: Undefined
Q(p) = \text{NA}
Moment generating function (m.g.f): Undefined
E(e^{tX}) = \text{NA}
All statistical operations on missing distributions return NA values
of appropriate length, propagating the missingness through calculations.
See Also
Examples
dist <- dist_missing(3L)
dist
mean(dist)
variance(dist)
generate(dist, 10)
density(dist, 2)
density(dist, 2, log = TRUE)
cdf(dist, 4)
quantile(dist, 0.7)
Create a mixture of distributions
Description
A mixture distribution combines multiple component distributions with specified weights. The resulting distribution can model complex, multimodal data by representing it as a weighted sum of simpler distributions.
Usage
dist_mixture(..., weights = numeric())
Arguments
... |
Distributions to be used in the mixture. Can be any distributional objects. |
weights |
A numeric vector of non-negative weights that sum to 1.
The length must match the number of distributions passed to |
Details
In the following, let X be a mixture random variable composed
of K component distributions F_1, F_2, \ldots, F_K with
corresponding weights w_1, w_2, \ldots, w_K where
\sum_{i=1}^K w_i = 1 and w_i \geq 0 for all i.
Support: The union of the supports of all component distributions
Mean:
For univariate mixtures:
E(X) = \sum_{i=1}^K w_i \mu_i
where \mu_i is the mean of the i-th component distribution.
For multivariate mixtures:
E(\mathbf{X}) = \sum_{i=1}^K w_i \boldsymbol{\mu}_i
where \boldsymbol{\mu}_i is the mean vector of the i-th
component distribution.
Variance:
For univariate mixtures:
\text{Var}(X) = \sum_{i=1}^K w_i (\mu_i^2 + \sigma_i^2) - \left(\sum_{i=1}^K w_i \mu_i\right)^2
where \sigma_i^2 is the variance of the i-th component
distribution.
Covariance:
For multivariate mixtures:
\text{Cov}(\mathbf{X}) = \sum_{i=1}^K w_i \left[ (\boldsymbol{\mu}_i - \bar{\boldsymbol{\mu}})(\boldsymbol{\mu}_i - \bar{\boldsymbol{\mu}})^T + \boldsymbol{\Sigma}_i \right]
where \bar{\boldsymbol{\mu}} = \sum_{i=1}^K w_i \boldsymbol{\mu}_i
is the overall mean vector and \boldsymbol{\Sigma}_i is the
covariance matrix of the i-th component distribution.
Probability density/mass function (p.d.f/p.m.f):
f(x) = \sum_{i=1}^K w_i f_i(x)
where f_i(x) is the density or mass function of the i-th
component distribution.
Cumulative distribution function (c.d.f):
For univariate mixtures:
F(x) = \sum_{i=1}^K w_i F_i(x)
where F_i(x) is the c.d.f. of the i-th component
distribution.
For multivariate mixtures, the c.d.f. is approximated numerically.
Quantile function:
For univariate mixtures, the quantile function has no closed form
and is computed numerically by inverting the c.d.f. using root-finding
(stats::uniroot()).
For multivariate mixtures, quantiles are not yet implemented.
See Also
stats::uniroot(), vctrs::vec_unique_count()
Examples
# Univariate mixture of two normal distributions
dist <- dist_mixture(dist_normal(0, 1), dist_normal(5, 2), weights = c(0.3, 0.7))
dist
mean(dist)
variance(dist)
density(dist, 2)
cdf(dist, 2)
quantile(dist, 0.5)
generate(dist, 10)
The Multinomial distribution
Description
The multinomial distribution is a generalization of the binomial
distribution to multiple categories. It is perhaps easiest to think
that we first extend a dist_bernoulli() distribution to include more
than two categories, resulting in a dist_categorical() distribution.
We then extend repeat the Categorical experiment several (n)
times.
Usage
dist_multinomial(size, prob)
Arguments
size |
The number of draws from the Categorical distribution. |
prob |
The probability of an event occurring from each draw. |
Details
We recommend reading this documentation on pkgdown which renders math nicely. https://pkg.mitchelloharawild.com/distributional/reference/dist_multinomial.html
In the following, let X = (X_1, ..., X_k) be a Multinomial
random variable with success probability prob = p. Note that
p is vector with k elements that sum to one. Assume
that we repeat the Categorical experiment size = n times.
Support: Each X_i is in \{0, 1, 2, ..., n\}.
Mean: The mean of X_i is n p_i.
Variance: The variance of X_i is n p_i (1 - p_i).
For i \neq j, the covariance of X_i and X_j
is -n p_i p_j.
Probability mass function (p.m.f):
P(X_1 = x_1, ..., X_k = x_k) = \frac{n!}{x_1! x_2! \cdots x_k!} p_1^{x_1} \cdot p_2^{x_2} \cdot \ldots \cdot p_k^{x_k}
where \sum_{i=1}^k x_i = n and \sum_{i=1}^k p_i = 1.
Cumulative distribution function (c.d.f):
P(X_1 \le q_1, ..., X_k \le q_k) = \sum_{\substack{x_1, \ldots, x_k \ge 0 \\ x_i \le q_i \text{ for all } i \\ \sum_{i=1}^k x_i = n}} \frac{n!}{x_1! x_2! \cdots x_k!} p_1^{x_1} \cdot p_2^{x_2} \cdot \ldots \cdot p_k^{x_k}
The c.d.f. is computed as a finite sum of the p.m.f. over all integer vectors in the support that satisfy the componentwise inequalities.
Moment generating function (m.g.f):
E(e^{t'X}) = \left(\sum_{i=1}^k p_i e^{t_i}\right)^n
where t = (t_1, ..., t_k) is a vector of the same dimension as X.
Skewness: The skewness of X_i is
\frac{1 - 2p_i}{\sqrt{n p_i (1 - p_i)}}
Excess Kurtosis: The excess kurtosis of X_i is
\frac{1 - 6p_i(1 - p_i)}{n p_i (1 - p_i)}
See Also
stats::dmultinom(), stats::rmultinom()
Examples
dist <- dist_multinomial(size = c(4, 3), prob = list(c(0.3, 0.5, 0.2), c(0.1, 0.5, 0.4)))
dist
mean(dist)
variance(dist)
generate(dist, 10)
density(dist, list(d = rbind(cbind(1,2,1), cbind(0,2,1))))
density(dist, list(d = rbind(cbind(1,2,1), cbind(0,2,1))), log = TRUE)
cdf(dist, cbind(1,2,1))
The multivariate normal distribution
Description
The multivariate normal distribution is a generalization of the univariate normal distribution to higher dimensions. It is widely used in multivariate statistics and describes the joint distribution of multiple correlated continuous random variables.
Usage
dist_multivariate_normal(mu = 0, sigma = diag(1))
Arguments
mu |
A list of numeric vectors for the distribution's mean. |
sigma |
A list of matrices for the distribution's variance-covariance matrix. |
Details
We recommend reading this documentation on pkgdown which renders math nicely. https://pkg.mitchelloharawild.com/distributional/reference/dist_multivariate_normal.html
In the following, let \mathbf{X} be a k-dimensional multivariate
normal random variable with mean vector mu = \boldsymbol{\mu} and
variance-covariance matrix sigma = \boldsymbol{\Sigma}.
Support: \mathbf{x} \in \mathbb{R}^k
Mean: \boldsymbol{\mu}
Variance-covariance matrix: \boldsymbol{\Sigma}
Probability density function (p.d.f):
f(\mathbf{x}) = \frac{1}{(2\pi)^{k/2} |\boldsymbol{\Sigma}|^{1/2}}
\exp\left(-\frac{1}{2}(\mathbf{x} - \boldsymbol{\mu})^T
\boldsymbol{\Sigma}^{-1}(\mathbf{x} - \boldsymbol{\mu})\right)
where |\boldsymbol{\Sigma}| is the determinant of
\boldsymbol{\Sigma}.
Cumulative distribution function (c.d.f):
P(\mathbf{X} \le \mathbf{q}) = P(X_1 \le q_1, \ldots, X_k \le q_k)
The c.d.f. does not have a closed-form expression and is computed numerically.
Moment generating function (m.g.f):
M(\mathbf{t}) = E(e^{\mathbf{t}^T \mathbf{X}}) =
\exp\left(\mathbf{t}^T \boldsymbol{\mu} + \frac{1}{2}\mathbf{t}^T
\boldsymbol{\Sigma} \mathbf{t}\right)
See Also
mvtnorm::dmvnorm(), mvtnorm::pmvnorm(), mvtnorm::qmvnorm(),
mvtnorm::rmvnorm()
Examples
dist <- dist_multivariate_normal(mu = list(c(1,2)), sigma = list(matrix(c(4,2,2,3), ncol=2)))
dimnames(dist) <- c("x", "y")
dist
mean(dist)
variance(dist)
support(dist)
generate(dist, 10)
density(dist, cbind(2, 1))
density(dist, cbind(2, 1), log = TRUE)
cdf(dist, 4)
quantile(dist, 0.7, kind = "equicoordinate")
quantile(dist, 0.7, kind = "marginal")
The multivariate t-distribution
Description
The multivariate t-distribution is a generalization of the univariate Student's t-distribution to multiple dimensions. It is commonly used for modeling heavy-tailed multivariate data and in robust statistics.
Usage
dist_multivariate_t(df = 1, mu = 0, sigma = diag(1))
Arguments
df |
A numeric vector of degrees of freedom (must be positive). |
mu |
A list of numeric vectors for the distribution location parameter. |
sigma |
A list of matrices for the distribution scale matrix. |
Details
We recommend reading this documentation on pkgdown which renders math nicely. https://pkg.mitchelloharawild.com/distributional/reference/dist_multivariate_t.html
In the following, let \mathbf{X} be a multivariate t random vector
with degrees of freedom df = \nu, location parameter
mu = \boldsymbol{\mu}, and scale matrix
sigma = \boldsymbol{\Sigma}.
Support: \mathbf{x} \in \mathbb{R}^k, where k is the
dimension of the distribution
Mean: \boldsymbol{\mu} for \nu > 1, undefined otherwise
Covariance matrix:
\text{Cov}(\mathbf{X}) = \frac{\nu}{\nu - 2} \boldsymbol{\Sigma}
for \nu > 2, undefined otherwise
Probability density function (p.d.f):
f(\mathbf{x}) = \frac{\Gamma\left(\frac{\nu + k}{2}\right)}
{\Gamma\left(\frac{\nu}{2}\right) \nu^{k/2} \pi^{k/2}
|\boldsymbol{\Sigma}|^{1/2}}
\left[1 + \frac{1}{\nu}(\mathbf{x} - \boldsymbol{\mu})^T
\boldsymbol{\Sigma}^{-1} (\mathbf{x} - \boldsymbol{\mu})\right]^{-\frac{\nu + k}{2}}
where k is the dimension of the distribution and \Gamma(\cdot) is
the gamma function.
Cumulative distribution function (c.d.f):
F(\mathbf{t}) = \int_{-\infty}^{t_1} \cdots \int_{-\infty}^{t_k} f(\mathbf{x}) \, d\mathbf{x}
This integral does not have a closed form solution and is approximated numerically.
Quantile function:
The equicoordinate quantile function finds q such that:
P(X_1 \leq q, \ldots, X_k \leq q) = p
This does not have a closed form solution and is approximated numerically.
The marginal quantile function for each dimension i is:
Q_i(p) = \mu_i + \sqrt{\Sigma_{ii}} \cdot t_{\nu}^{-1}(p)
where t_{\nu}^{-1}(p) is the quantile function of the univariate
Student's t-distribution with \nu degrees of freedom, and
\Sigma_{ii} is the i-th diagonal element of sigma.
See Also
mvtnorm::dmvt, mvtnorm::pmvt, mvtnorm::qmvt, mvtnorm::rmvt
Examples
dist <- dist_multivariate_t(
df = 5,
mu = list(c(1, 2)),
sigma = list(matrix(c(4, 2, 2, 3), ncol = 2))
)
dimnames(dist) <- c("x", "y")
dist
mean(dist)
variance(dist)
support(dist)
generate(dist, 10)
density(dist, cbind(2, 1))
density(dist, cbind(2, 1), log = TRUE)
cdf(dist, 4)
quantile(dist, 0.7)
quantile(dist, 0.7, kind = "marginal")
The Negative Binomial distribution
Description
A generalization of the geometric distribution. It is the number
of failures in a sequence of i.i.d. Bernoulli trials before
a specified number of successes (size) occur. The probability of success in
each trial is given by prob.
Usage
dist_negative_binomial(size, prob)
Arguments
size |
The number of successful trials (target number of successes). Must be a positive number. Also called the dispersion parameter. |
prob |
The probability of success in each trial. Must be between 0 and 1. |
Details
We recommend reading this documentation on pkgdown which renders math nicely. https://pkg.mitchelloharawild.com/distributional/reference/dist_negative_binomial.html
In the following, let X be a Negative Binomial random variable with
success probability prob = p and the number of successes size =
r.
Support: \{0, 1, 2, 3, ...\}
Mean: \frac{r(1-p)}{p}
Variance: \frac{r(1-p)}{p^2}
Probability mass function (p.m.f):
P(X = k) = \binom{k + r - 1}{k} (1-p)^r p^k
Cumulative distribution function (c.d.f):
F(k) = \sum_{i=0}^{\lfloor k \rfloor} \binom{i + r - 1}{i} (1-p)^r p^i
This can also be expressed in terms of the regularized incomplete beta function, and is computed numerically.
Moment generating function (m.g.f):
E(e^{tX}) = \left(\frac{1-p}{1-pe^t}\right)^r, \quad t < -\log p
Skewness:
\gamma_1 = \frac{2-p}{\sqrt{r(1-p)}}
Excess Kurtosis:
\gamma_2 = \frac{6}{r} + \frac{p^2}{r(1-p)}
See Also
Examples
dist <- dist_negative_binomial(size = 10, prob = 0.5)
dist
mean(dist)
variance(dist)
skewness(dist)
kurtosis(dist)
support(dist)
generate(dist, 10)
density(dist, 2)
density(dist, 2, log = TRUE)
cdf(dist, 4)
quantile(dist, 0.7)
The Normal distribution
Description
The Normal distribution is ubiquitous in statistics, partially because of the central limit theorem, which states that sums of i.i.d. random variables eventually become Normal. Linear transformations of Normal random variables result in new random variables that are also Normal. If you are taking an intro stats course, you'll likely use the Normal distribution for Z-tests and in simple linear regression. Under regularity conditions, maximum likelihood estimators are asymptotically Normal. The Normal distribution is also called the gaussian distribution.
Usage
dist_normal(mu = 0, sigma = 1, mean = mu, sd = sigma)
Arguments
mu, mean |
The mean (location parameter) of the distribution, which is also the mean of the distribution. Can be any real number. |
sigma, sd |
The standard deviation (scale parameter) of the distribution.
Can be any positive number. If you would like a Normal distribution with
variance |
Details
We recommend reading this documentation on pkgdown which renders math nicely. https://pkg.mitchelloharawild.com/distributional/reference/dist_normal.html
In the following, let X be a Normal random variable with mean
mu = \mu and standard deviation sigma = \sigma.
Support: R, the set of all real numbers
Mean: \mu
Variance: \sigma^2
Probability density function (p.d.f):
f(x) = \frac{1}{\sqrt{2 \pi \sigma^2}} e^{-(x - \mu)^2 / 2 \sigma^2}
Cumulative distribution function (c.d.f):
F(t) = \int_{-\infty}^t \frac{1}{\sqrt{2 \pi \sigma^2}} e^{-(x - \mu)^2 / 2 \sigma^2} dx
This integral does not have a closed form solution and is
approximated numerically. The c.d.f. of a standard Normal is sometimes
called the "error function". The notation \Phi(t) also stands
for the c.d.f. of a standard Normal evaluated at t. Z-tables
list the value of \Phi(t) for various t.
Moment generating function (m.g.f):
E(e^{tX}) = e^{\mu t + \sigma^2 t^2 / 2}
See Also
Examples
dist <- dist_normal(mu = 1:5, sigma = 3)
dist
mean(dist)
variance(dist)
skewness(dist)
kurtosis(dist)
generate(dist, 10)
density(dist, 2)
density(dist, 2, log = TRUE)
cdf(dist, 4)
quantile(dist, 0.7)
The Pareto Distribution
Description
The Pareto distribution is a power-law probability distribution commonly used in actuarial science to model loss severity and in economics to model income distributions and firm sizes.
Usage
dist_pareto(shape, scale)
Arguments
shape, scale |
parameters. Must be strictly positive. |
Details
We recommend reading this documentation on pkgdown which renders math nicely. https://pkg.mitchelloharawild.com/distributional/reference/dist_pareto.html
In the following, let X be a Pareto random variable with parameters
shape = \alpha and scale = \theta.
Support: (0, \infty)
Mean: \frac{\theta}{\alpha - 1} for \alpha > 1,
undefined otherwise
Variance: \frac{\alpha\theta^2}{(\alpha - 1)^2(\alpha - 2)}
for \alpha > 2, undefined otherwise
Probability density function (p.d.f):
f(x) = \frac{\alpha\theta^\alpha}{(x + \theta)^{\alpha + 1}}
for x > 0, \alpha > 0 and \theta > 0.
Cumulative distribution function (c.d.f):
F(x) = 1 - \left(\frac{\theta}{x + \theta}\right)^\alpha
for x > 0.
Moment generating function (m.g.f):
Does not exist in closed form, but the kth raw moment E[X^k] exists
for -1 < k < \alpha.
Note
There are many different definitions of the Pareto distribution in the literature; see Arnold (2015) or Kleiber and Kotz (2003). This implementation uses the Pareto distribution without a location parameter as described in actuar::Pareto.
References
Kleiber, C. and Kotz, S. (2003), Statistical Size Distributions in Economics and Actuarial Sciences, Wiley.
Klugman, S. A., Panjer, H. H. and Willmot, G. E. (2012), Loss Models, From Data to Decisions, Fourth Edition, Wiley.
See Also
Examples
dist <- dist_pareto(shape = c(10, 3, 2, 1), scale = rep(1, 4))
dist
mean(dist)
variance(dist)
support(dist)
generate(dist, 10)
density(dist, 2)
density(dist, 2, log = TRUE)
cdf(dist, 4)
quantile(dist, 0.7)
Percentile distribution
Description
The Percentile distribution is a non-parametric distribution defined by a set of quantiles at specified percentile values. This distribution is useful for representing empirical distributions or elicited expert knowledge when only percentile information is available. The distribution uses linear interpolation between percentiles and can be used to approximate complex distributions that may not have simple parametric forms.
Usage
dist_percentile(x, percentile)
Arguments
x |
A list of values |
percentile |
A list of percentiles |
Details
We recommend reading this documentation on pkgdown which renders math nicely. https://pkg.mitchelloharawild.com/distributional/reference/dist_percentile.html
In the following, let X be a Percentile random variable defined by
values x_1, x_2, \ldots, x_n at percentiles
p_1, p_2, \ldots, p_n where 0 \le p_i \le 100.
Support: [\min(x_i), \max(x_i)] if \min(p_i) > 0 or
\max(p_i) < 100, otherwise support is approximated from the
specified percentiles.
Mean: Approximated numerically using spline interpolation and numerical integration:
E(X) \approx \int_0^1 Q(u) du
where Q(u) is a spline function interpolating the percentile values.
Variance: Approximated numerically.
Probability density function (p.d.f): Approximated numerically using kernel density estimation from generated samples.
Cumulative distribution function (c.d.f): Defined by linear interpolation:
F(t) = \begin{cases}
p_1/100 & \text{if } t < x_1 \\
p_i/100 + \frac{(t - x_i)(p_{i+1} - p_i)}{100(x_{i+1} - x_i)} & \text{if } x_i \le t < x_{i+1} \\
p_n/100 & \text{if } t \ge x_n
\end{cases}
Quantile function: Defined by linear interpolation:
Q(u) = x_i + \frac{(100u - p_i)(x_{i+1} - x_i)}{p_{i+1} - p_i}
for p_i/100 \le u \le p_{i+1}/100.
Examples
dist <- dist_normal()
percentiles <- seq(0.01, 0.99, by = 0.01)
x <- vapply(percentiles, quantile, double(1L), x = dist)
dist_percentile(list(x), list(percentiles*100))
The Poisson Distribution
Description
Poisson distributions are frequently used to model counts. The Poisson distribution is commonly used to model the number of events occurring in a fixed interval of time or space when these events occur with a known constant mean rate and independently of the time since the last event. Examples include the number of emails received per hour, the number of decay events per second from a radioactive source, or the number of customers arriving at a store per day.
Usage
dist_poisson(lambda)
Arguments
lambda |
The rate parameter (mean and variance) of the distribution. Can be any positive number. This represents the expected number of events in the given interval. |
Details
We recommend reading this documentation on pkgdown which renders math nicely. https://pkg.mitchelloharawild.com/distributional/reference/dist_poisson.html
In the following, let X be a Poisson random variable with parameter
lambda = \lambda.
Support: \{0, 1, 2, 3, ...\}
Mean: \lambda
Variance: \lambda
Probability mass function (p.m.f):
P(X = k) = \frac{\lambda^k e^{-\lambda}}{k!}
Cumulative distribution function (c.d.f):
P(X \le k) = e^{-\lambda}
\sum_{i = 0}^{\lfloor k \rfloor} \frac{\lambda^i}{i!}
Moment generating function (m.g.f):
E(e^{tX}) = e^{\lambda (e^t - 1)}
Skewness:
\gamma_1 = \frac{1}{\sqrt{\lambda}}
Excess kurtosis:
\gamma_2 = \frac{1}{\lambda}
See Also
Examples
dist <- dist_poisson(lambda = c(1, 4, 10))
dist
mean(dist)
variance(dist)
skewness(dist)
kurtosis(dist)
generate(dist, 10)
density(dist, 2)
density(dist, 2, log = TRUE)
cdf(dist, 4)
quantile(dist, 0.7)
The Poisson-Inverse Gaussian distribution
Description
The Poisson-Inverse Gaussian distribution is a compound Poisson distribution where the rate parameter follows an Inverse Gaussian distribution. It is useful for modeling overdispersed count data.
Usage
dist_poisson_inverse_gaussian(mean, shape)
Arguments
mean, shape |
parameters. Must be strictly positive. Infinite values are supported. |
Details
We recommend reading this documentation on pkgdown which renders math nicely. https://pkg.mitchelloharawild.com/distributional/reference/dist_poisson_inverse_gaussian.html
In the following, let X be a Poisson-Inverse Gaussian random variable
with parameters mean = \mu and shape = \phi.
Support: \{0, 1, 2, 3, ...\}
Mean: \mu
Variance: \frac{\mu}{\phi}(\mu^2 + \phi)
Probability mass function (p.m.f):
P(X = x) = \frac{e^{\phi}}{\sqrt{2\pi}}
\left(\frac{\phi}{\mu^2}\right)^{x/2}
\frac{1}{x!}
\int_0^\infty u^{x-1/2}
\exp\left(-\frac{\phi u}{2} - \frac{\phi}{2\mu^2 u}\right) du
for x = 0, 1, 2, \ldots
Cumulative distribution function (c.d.f):
P(X \le x) = \sum_{k=0}^{\lfloor x \rfloor} P(X = k)
The c.d.f does not have a closed form and is approximated numerically.
Moment generating function (m.g.f):
E(e^{tX}) = \exp\left\{\phi\left[1 - \sqrt{1 - \frac{2\mu^2}{\phi}(e^t - 1)}\right]\right\}
for t < -\log(1 + \phi/(2\mu^2))
See Also
actuar::PoissonInverseGaussian, actuar::dpoisinvgauss(),
actuar::ppoisinvgauss(), actuar::qpoisinvgauss(), actuar::rpoisinvgauss()
Examples
dist <- dist_poisson_inverse_gaussian(mean = rep(0.1, 3), shape = c(0.4, 0.8, 1))
dist
mean(dist)
variance(dist)
support(dist)
generate(dist, 10)
density(dist, 2)
density(dist, 2, log = TRUE)
cdf(dist, 4)
quantile(dist, 0.7)
Sampling distribution
Description
The sampling distribution represents an empirical distribution based on observed samples. It is useful for bootstrapping, representing posterior distributions from Markov Chain Monte Carlo (MCMC) algorithms, or working with any empirical data where the parametric form is unknown. Unlike parametric distributions, the sampling distribution makes no assumptions about the underlying data-generating process and instead uses the sample itself to estimate distributional properties. The distribution can handle both univariate and multivariate samples.
Usage
dist_sample(x)
Arguments
x |
A list of sampled values. For univariate distributions, each element should be a numeric vector. For multivariate distributions, each element should be a matrix where columns represent variables and rows represent observations. |
Details
We recommend reading this documentation on pkgdown which renders math nicely. https://pkg.mitchelloharawild.com/distributional/reference/dist_sample.html
In the following, let X be a random variable with sample
x_1, x_2, \ldots, x_n of size n.
Support: The observed range of the sample
Mean (univariate):
\bar{x} = \frac{1}{n} \sum_{i=1}^{n} x_i
Mean (multivariate): Computed independently for each variable.
Variance (univariate):
s^2 = \frac{1}{n-1} \sum_{i=1}^{n} (x_i - \bar{x})^2
Covariance (multivariate): The sample covariance matrix.
Skewness (univariate):
g_1 = \frac{\sqrt{n} \sum_{i=1}^{n} (x_i - \bar{x})^3}{\left(\sum_{i=1}^{n} (x_i - \bar{x})^2\right)^{3/2}} \left(1 - \frac{1}{n}\right)^{3/2}
Probability density function: Approximated numerically using kernel density estimation.
Cumulative distribution function (univariate):
F(q) = \frac{1}{n} \sum_{i=1}^{n} I(x_i \leq q)
where I(\cdot) is the indicator function.
Cumulative distribution function (multivariate):
F(\mathbf{q}) = \frac{1}{n} \sum_{i=1}^{n} I(\mathbf{x}_i \leq \mathbf{q})
where the inequality is applied element-wise.
Quantile function (univariate): The sample quantile, computed using
the specified quantile type (see stats::quantile()).
Quantile function (multivariate): Marginal quantiles are computed independently for each variable.
Random generation: Bootstrap sampling with replacement from the empirical sample.
See Also
stats::density(), stats::quantile(), stats::cov()
Examples
# Univariate numeric samples
dist <- dist_sample(x = list(rnorm(100), rnorm(100, 10)))
dist
mean(dist)
variance(dist)
skewness(dist)
generate(dist, 10)
density(dist, 1)
# Multivariate numeric samples
dist <- dist_sample(x = list(cbind(rnorm(100), rnorm(100, 10))))
dimnames(dist) <- c("x", "y")
dist
mean(dist)
variance(dist)
generate(dist, 10)
quantile(dist, 0.4) # Returns the marginal quantiles
cdf(dist, matrix(c(0.3,9), nrow = 1))
The (non-central) location-scale Student t Distribution
Description
The Student's T distribution is closely related to the Normal()
distribution, but has heavier tails. As \nu increases to \infty,
the Student's T converges to a Normal. The T distribution appears
repeatedly throughout classic frequentist hypothesis testing when
comparing group means.
Usage
dist_student_t(df, mu = 0, sigma = 1, ncp = NULL)
Arguments
df |
degrees of freedom ( |
mu |
The location parameter of the distribution.
If |
sigma |
The scale parameter of the distribution. |
ncp |
non-centrality parameter |
Details
We recommend reading this documentation on pkgdown which renders math nicely. https://pkg.mitchelloharawild.com/distributional/reference/dist_student_t.html
In the following, let X be a location-scale Student's T random variable with
df = \nu, mu = \mu, sigma = \sigma, and
ncp = \delta (non-centrality parameter).
If Z follows a standard Student's T distribution (with df = \nu
and ncp = \delta), then X = \mu + \sigma Z.
Support: R, the set of all real numbers
Mean:
For the central distribution (ncp = 0 or NULL):
E(X) = \mu
for \nu > 1, and undefined otherwise.
For the non-central distribution (ncp \neq 0):
E(X) = \mu + \delta \sqrt{\frac{\nu}{2}} \frac{\Gamma((\nu-1)/2)}{\Gamma(\nu/2)} \sigma
for \nu > 1, and undefined otherwise.
Variance:
For the central distribution (ncp = 0 or NULL):
\mathrm{Var}(X) = \frac{\nu}{\nu - 2} \sigma^2
for \nu > 2. Undefined if \nu \le 1, infinite when 1 < \nu \le 2.
For the non-central distribution (ncp \neq 0):
\mathrm{Var}(X) = \left[\frac{\nu(1+\delta^2)}{\nu-2} - \left(\delta \sqrt{\frac{\nu}{2}} \frac{\Gamma((\nu-1)/2)}{\Gamma(\nu/2)}\right)^2\right] \sigma^2
for \nu > 2. Undefined if \nu \le 1, infinite when 1 < \nu \le 2.
Probability density function (p.d.f):
For the central distribution (ncp = 0 or NULL), the standard
t distribution with df = \nu has density:
f_Z(z) = \frac{\Gamma((\nu + 1)/2)}{\sqrt{\pi \nu} \Gamma(\nu/2)} \left(1 + \frac{z^2}{\nu} \right)^{- (\nu + 1)/2}
The location-scale version with mu = \mu and sigma = \sigma
has density:
f(x) = \frac{1}{\sigma} f_Z\left(\frac{x - \mu}{\sigma}\right)
For the non-central distribution (ncp \neq 0), the density is
computed numerically via stats::dt().
Cumulative distribution function (c.d.f):
For the central distribution (ncp = 0 or NULL), the cumulative
distribution function is computed numerically via stats::pt(), which
uses the relationship to the incomplete beta function:
F_\nu(t) = \frac{1}{2} I_x\left(\frac{\nu}{2}, \frac{1}{2}\right)
for t \le 0, where x = \nu/(\nu + t^2) and I_x(a,b) is
the incomplete beta function (stats::pbeta()). For t \ge 0:
F_\nu(t) = 1 - \frac{1}{2} I_x\left(\frac{\nu}{2}, \frac{1}{2}\right)
The location-scale version is: F(x) = F_\nu((x - \mu)/\sigma).
For the non-central distribution (ncp \neq 0), the cumulative
distribution function is computed numerically via stats::pt().
Moment generating function (m.g.f):
Does not exist in closed form. Moments are computed using the formulas for mean and variance above where available.
See Also
Examples
dist <- dist_student_t(df = c(1,2,5), mu = c(0,1,2), sigma = c(1,2,3))
dist
mean(dist)
variance(dist)
generate(dist, 10)
density(dist, 2)
density(dist, 2, log = TRUE)
cdf(dist, 4)
quantile(dist, 0.7)
The Studentized Range distribution
Description
Tukey's studentized range distribution, used for Tukey's honestly significant differences test in ANOVA.
Usage
dist_studentized_range(nmeans, df, nranges)
Arguments
nmeans |
sample size for range (same for each group). |
df |
degrees of freedom for |
nranges |
number of groups whose maximum range is considered. |
Details
We recommend reading this documentation on pkgdown which renders math nicely. https://pkg.mitchelloharawild.com/distributional/reference/dist_studentized_range.html
In the following, let Q be a Studentized Range random variable with
parameters nmeans = k (number of groups), df = \nu (degrees
of freedom), and nranges = n (number of ranges).
Support: R^+, the set of positive real numbers.
Mean: Approximated numerically.
Variance: Approximated numerically.
Probability density function (p.d.f): The density does not have a closed-form expression and is computed numerically.
Cumulative distribution function (c.d.f): The c.d.f does not have a
simple closed-form expression. For n = 1 (single range), it involves
integration over the joint distribution of the sample range and an
independent chi-square variable. The general form is computed numerically
using algorithms described in the references for stats::ptukey().
Moment generating function (m.g.f): Does not exist in closed form.
See Also
Examples
dist <- dist_studentized_range(nmeans = c(6, 2), df = c(5, 4), nranges = c(1, 1))
dist
cdf(dist, 4)
quantile(dist, 0.7)
Modify a distribution with a transformation
Description
A transformed distribution applies a monotonic transformation to an existing distribution. This is useful for creating derived distributions such as log-normal (exponential transformation of normal), or other custom transformations of base distributions.
The density(), mean(), and variance() methods are approximate as
they are based on numerical derivatives.
Usage
dist_transformed(dist, transform, inverse)
Arguments
dist |
A univariate distribution vector. |
transform |
A function used to transform the distribution. This transformation should be monotonic over appropriate domain. |
inverse |
The inverse of the |
Details
We recommend reading this documentation on pkgdown which renders math nicely. https://pkg.mitchelloharawild.com/distributional/reference/dist_transformed.html
Let Y = g(X) where X is the base distribution with
transformation function transform = g and inverse = g^{-1}.
The transformation g must be monotonic over the support of X.
Support: g(S_X) where S_X is the support of X
Mean: Approximated numerically using a second-order Taylor expansion:
E(Y) \approx g(\mu_X) + \frac{1}{2}g''(\mu_X)\sigma_X^2
where \mu_X and \sigma_X^2 are the mean and variance of the
base distribution X, and g'' is the second derivative of the
transformation. The derivative is computed numerically using
numDeriv::hessian().
Variance: Approximated numerically using the delta method:
\mathrm{Var}(Y) \approx [g'(\mu_X)]^2\sigma_X^2 + \frac{1}{2}[g''(\mu_X)\sigma_X^2]^2
where g' is the first derivative (Jacobian) computed numerically
using numDeriv::jacobian().
Probability density function (p.d.f): Using the change of variables formula:
f_Y(y) = f_X(g^{-1}(y)) \left|\frac{d}{dy}g^{-1}(y)\right|
where f_X is the p.d.f. of the base distribution and the Jacobian
|d/dy \, g^{-1}(y)| is computed numerically using
numDeriv::jacobian().
Cumulative distribution function (c.d.f):
For monotonically increasing g:
F_Y(y) = F_X(g^{-1}(y))
For monotonically decreasing g:
F_Y(y) = 1 - F_X(g^{-1}(y))
where F_X is the c.d.f. of the base distribution.
Quantile function: The inverse of the c.d.f.
For monotonically increasing g:
Q_Y(p) = g(Q_X(p))
For monotonically decreasing g:
Q_Y(p) = g(Q_X(1-p))
where Q_X is the quantile function of the base distribution.
See Also
numDeriv::jacobian(), numDeriv::hessian()
Examples
# Create a log normal distribution
dist <- dist_transformed(dist_normal(0, 0.5), exp, log)
density(dist, 1) # dlnorm(1, 0, 0.5)
cdf(dist, 4) # plnorm(4, 0, 0.5)
quantile(dist, 0.1) # qlnorm(0.1, 0, 0.5)
generate(dist, 10) # rlnorm(10, 0, 0.5)
Truncate a distribution
Description
Note that the samples are generated using inverse transform sampling, and the means and variances are estimated from samples.
Usage
dist_truncated(dist, lower = -Inf, upper = Inf)
Arguments
dist |
The distribution(s) to truncate. |
lower, upper |
The range of values to keep from a distribution. |
Details
We recommend reading this documentation on pkgdown which renders math nicely. https://pkg.mitchelloharawild.com/distributional/reference/dist_truncated.html
In the following, let X be a truncated random variable with
underlying distribution Y, truncation bounds lower = a and
upper = b, where F_Y(x) is the c.d.f. of Y and
f_Y(x) is the p.d.f. of Y.
Support: [a, b]
Mean: For the general case, the mean is approximated numerically.
For a truncated Normal distribution with underlying mean \mu and
standard deviation \sigma, the mean is:
E(X) = \mu + \frac{\phi(\alpha) - \phi(\beta)}{\Phi(\beta) - \Phi(\alpha)} \sigma
where \alpha = (a - \mu)/\sigma, \beta = (b - \mu)/\sigma,
\phi is the standard Normal p.d.f., and \Phi is the
standard Normal c.d.f.
Variance: Approximated numerically for all distributions.
Probability density function (p.d.f):
f(x) = \begin{cases}
\frac{f_Y(x)}{F_Y(b) - F_Y(a)} & \text{if } a \le x \le b \\
0 & \text{otherwise}
\end{cases}
Cumulative distribution function (c.d.f):
F(x) = \begin{cases}
0 & \text{if } x < a \\
\frac{F_Y(x) - F_Y(a)}{F_Y(b) - F_Y(a)} & \text{if } a \le x \le b \\
1 & \text{if } x > b
\end{cases}
Quantile function:
Q(p) = F_Y^{-1}(F_Y(a) + p(F_Y(b) - F_Y(a)))
clamped to the interval [a, b].
Examples
dist <- dist_truncated(dist_normal(2,1), lower = 0)
dist
mean(dist)
variance(dist)
generate(dist, 10)
density(dist, 2)
density(dist, 2, log = TRUE)
cdf(dist, 4)
quantile(dist, 0.7)
if(requireNamespace("ggdist")) {
library(ggplot2)
ggplot() +
ggdist::stat_dist_halfeye(
aes(y = c("Normal", "Truncated"),
dist = c(dist_normal(2,1), dist_truncated(dist_normal(2,1), lower = 0)))
)
}
The Uniform distribution
Description
A distribution with constant density on an interval.
Usage
dist_uniform(min, max)
Arguments
min, max |
lower and upper limits of the distribution. Must be finite. |
Details
We recommend reading this documentation on pkgdown which renders math nicely. https://pkg.mitchelloharawild.com/distributional/reference/dist_uniform.html
In the following, let X be a Uniform random variable with parameters
min = a and max = b.
Support: [a, b]
Mean: \frac{a + b}{2}
Variance: \frac{(b - a)^2}{12}
Probability density function (p.d.f):
f(x) = \frac{1}{b - a}
for x \in [a, b], and f(x) = 0 otherwise.
Cumulative distribution function (c.d.f):
F(x) = \frac{x - a}{b - a}
for x \in [a, b], with F(x) = 0 for x < a
and F(x) = 1 for x > b.
Moment generating function (m.g.f):
E(e^{tX}) = \frac{e^{tb} - e^{ta}}{t(b - a)}
for t \neq 0, and E(e^{tX}) = 1 for t = 0.
Skewness: 0
Excess Kurtosis: -\frac{6}{5}
See Also
Examples
dist <- dist_uniform(min = c(3, -2), max = c(5, 4))
dist
mean(dist)
variance(dist)
skewness(dist)
kurtosis(dist)
generate(dist, 10)
density(dist, 2)
density(dist, 2, log = TRUE)
cdf(dist, 4)
quantile(dist, 0.7)
The Weibull distribution
Description
Generalization of the gamma distribution. Often used in survival and time-to-event analyses.
Usage
dist_weibull(shape, scale)
Arguments
shape, scale |
shape and scale parameters, the latter defaulting to 1. |
Details
We recommend reading this documentation on pkgdown which renders math nicely. https://pkg.mitchelloharawild.com/distributional/reference/dist_weibull.html
In the following, let X be a Weibull random variable with
shape parameter shape = k and scale parameter scale = \lambda.
Support: [0, \infty)
Mean:
E(X) = \lambda \Gamma\left(1 + \frac{1}{k}\right)
where \Gamma is the gamma function.
Variance:
\text{Var}(X) = \lambda^2 \left[\Gamma\left(1 + \frac{2}{k}\right) - \left(\Gamma\left(1 + \frac{1}{k}\right)\right)^2\right]
Probability density function (p.d.f):
f(x) = \frac{k}{\lambda}\left(\frac{x}{\lambda}\right)^{k-1}e^{-(x/\lambda)^k}, \quad x \ge 0
Cumulative distribution function (c.d.f):
F(x) = 1 - e^{-(x/\lambda)^k}, \quad x \ge 0
Moment generating function (m.g.f):
E(e^{tX}) = \sum_{n=0}^\infty \frac{t^n\lambda^n}{n!} \Gamma\left(1+\frac{n}{k}\right)
Skewness:
\gamma_1 = \frac{\mu^3 - 3\mu\sigma^2 - \mu^3}{\sigma^3}
where \mu = E(X), \sigma^2 = \text{Var}(X), and the third
raw moment is
\mu^3 = \lambda^3 \Gamma\left(1 + \frac{3}{k}\right)
Excess Kurtosis:
\gamma_2 = \frac{\mu^4 - 4\gamma_1\mu\sigma^3 - 6\mu^2\sigma^2 - \mu^4}{\sigma^4} - 3
where the fourth raw moment is
\mu^4 = \lambda^4 \Gamma\left(1 + \frac{4}{k}\right)
See Also
Examples
dist <- dist_weibull(shape = c(0.5, 1, 1.5, 5), scale = rep(1, 4))
dist
mean(dist)
variance(dist)
skewness(dist)
kurtosis(dist)
generate(dist, 10)
density(dist, 2)
density(dist, 2, log = TRUE)
cdf(dist, 4)
quantile(dist, 0.7)
Create a distribution from p/d/q/r style functions
Description
If a distribution is not yet supported, you can vectorise p/d/q/r functions
using this function. dist_wrap() stores the distributions parameters, and
provides wrappers which call the appropriate p/d/q/r functions.
Using this function to wrap a distribution should only be done if the distribution is not yet available in this package. If you need a distribution which isn't in the package yet, consider making a request at https://github.com/mitchelloharawild/distributional/issues.
Usage
dist_wrap(dist, ..., package = NULL)
Arguments
dist |
The name of the distribution used in the functions (name that is prefixed by p/d/q/r) |
... |
Named arguments used to parameterise the distribution. |
package |
The package from which the distribution is provided. If NULL, the calling environment's search path is used to find the distribution functions. Alternatively, an arbitrary environment can also be provided here. |
Details
The dist_wrap() function provides a generic interface to create distribution
objects from any set of p/d/q/r style functions. The statistical properties
depend on the specific distribution being wrapped.
Examples
dist <- dist_wrap("norm", mean = 1:3, sd = c(3, 9, 2))
density(dist, 1) # dnorm()
cdf(dist, 4) # pnorm()
quantile(dist, 0.975) # qnorm()
generate(dist, 10) # rnorm()
library(actuar)
dist <- dist_wrap("invparalogis", package = "actuar", shape = 2, rate = 2)
density(dist, 1) # actuar::dinvparalogis()
cdf(dist, 4) # actuar::pinvparalogis()
quantile(dist, 0.975) # actuar::qinvparalogis()
generate(dist, 10) # actuar::rinvparalogis()
Extract the name of the distribution family
Description
Usage
## S3 method for class 'distribution'
family(object, ...)
Arguments
object |
The distribution(s). |
... |
Additional arguments used by methods. |
Examples
dist <- c(
dist_normal(1:2),
dist_poisson(3),
dist_multinomial(size = c(4, 3),
prob = list(c(0.3, 0.5, 0.2), c(0.1, 0.5, 0.4)))
)
family(dist)
Randomly sample values from a distribution
Description
Generate random samples from probability distributions.
Usage
## S3 method for class 'distribution'
generate(x, times, ...)
Arguments
x |
The distribution(s). |
times |
The number of samples. |
... |
Additional arguments used by methods. |
Check if a distribution is symmetric
Description
Determines whether a probability distribution is symmetric around its center.
Usage
has_symmetry(x, ...)
Arguments
x |
The distribution(s). |
... |
Additional arguments used by methods. |
Value
A logical value indicating whether the distribution is symmetric.
Examples
# Normal distribution is symmetric
has_symmetry(dist_normal(mu = 0, sigma = 1))
has_symmetry(dist_normal(mu = 5, sigma = 2))
# Beta distribution symmetry depends on parameters
has_symmetry(dist_beta(shape1 = 2, shape2 = 2)) # symmetric
has_symmetry(dist_beta(shape1 = 2, shape2 = 5)) # not symmetric
Compute highest density regions
Description
Used to extract a specified prediction interval at a particular confidence level from a distribution.
Usage
hdr(x, ...)
Arguments
x |
Object to create hilo from. |
... |
Additional arguments used by methods. |
Highest density regions of probability distributions
Description
This function is highly experimental and will change in the future. In particular, improved functionality for object classes and visualisation tools will be added in a future release.
Computes minimally sized probability intervals highest density regions.
Usage
## S3 method for class 'distribution'
hdr(x, size = 95, n = 512, ...)
Arguments
x |
The distribution(s). |
size |
The size of the interval (between 0 and 100). |
n |
The resolution used to estimate the distribution's density. |
... |
Additional arguments used by methods. |
Compute intervals
Description
Used to extract a specified prediction interval at a particular confidence level from a distribution.
The numeric lower and upper bounds can be extracted from the interval using
<hilo>$lower and <hilo>$upper as shown in the examples below.
Usage
hilo(x, ...)
Arguments
x |
Object to create hilo from. |
... |
Additional arguments used by methods. |
Examples
# 95% interval from a standard normal distribution
interval <- hilo(dist_normal(0, 1), 95)
interval
# Extract the individual quantities with `$lower`, `$upper`, and `$level`
interval$lower
interval$upper
interval$level
Probability intervals of a probability distribution
Description
Returns a hilo central probability interval with probability coverage of
size. By default, the distribution's quantile() will be used to compute
the lower and upper bound for a centered interval
Usage
## S3 method for class 'distribution'
hilo(x, size = 95, ...)
Arguments
x |
The distribution(s). |
size |
The size of the interval (between 0 and 100). |
... |
Additional arguments used by methods. |
See Also
Test if the object is a distribution
Description
This function returns TRUE for distributions and FALSE for all other objects.
Usage
is_distribution(x)
Arguments
x |
An object. |
Value
TRUE if the object inherits from the distribution class.
Examples
dist <- dist_normal()
is_distribution(dist)
is_distribution("distributional")
Is the object a hdr
Description
Is the object a hdr
Usage
is_hdr(x)
Arguments
x |
An object. |
Is the object a hilo
Description
Is the object a hilo
Usage
is_hilo(x)
Arguments
x |
An object. |
Kurtosis of a probability distribution
Description
Usage
kurtosis(x, ...)
## S3 method for class 'distribution'
kurtosis(x, ...)
Arguments
x |
The distribution(s). |
... |
Additional arguments used by methods. |
The (log) likelihood of a sample matching a distribution
Description
Usage
likelihood(x, ...)
## S3 method for class 'distribution'
likelihood(x, sample, ..., log = FALSE)
log_likelihood(x, ...)
Arguments
x |
The distribution(s). |
... |
Additional arguments used by methods. |
sample |
A list of sampled values to compare to distribution(s). |
log |
If |
Mean of a probability distribution
Description
Returns the empirical mean of the probability distribution. If the method does not exist, the mean of a random sample will be returned.
Usage
## S3 method for class 'distribution'
mean(x, ...)
Arguments
x |
The distribution(s). |
... |
Additional arguments used by methods. |
Median of a probability distribution
Description
Returns the median (50th percentile) of a probability distribution. This is
equivalent to quantile(x, p=0.5).
Usage
## S3 method for class 'distribution'
median(x, na.rm = FALSE, ...)
Arguments
x |
The distribution(s). |
na.rm |
Unused, included for consistency with the generic function. |
... |
Additional arguments used by methods. |
Construct distributions
Description
Allows extension package developers to define a new distribution class compatible with the distributional package.
Usage
new_dist(..., class = NULL, dimnames = NULL)
Arguments
... |
Parameters of the distribution (named). |
class |
The class of the distribution for S3 dispatch. |
dimnames |
The names of the variables in the distribution (optional). |
Construct hdr intervals
Description
Construct hdr intervals
Usage
new_hdr(
lower = list_of(.ptype = double()),
upper = list_of(.ptype = double()),
size = double()
)
Arguments
lower, upper |
A list of numeric vectors specifying the region's lower and upper bounds. |
size |
A numeric vector specifying the coverage size of the region. |
Value
A "hdr" vector
Author(s)
Mitchell O'Hara-Wild
Examples
new_hdr(lower = list(1, c(3,6)), upper = list(10, c(5, 8)), size = c(80, 95))
Construct hilo intervals
Description
Class constructor function to help with manually creating hilo interval objects.
Usage
new_hilo(lower = double(), upper = double(), size = double())
Arguments
lower, upper |
A numeric vector of values for lower and upper limits. |
size |
Size of the interval between [0, 100]. |
Value
A "hilo" vector
Author(s)
Earo Wang & Mitchell O'Hara-Wild
Examples
new_hilo(lower = rnorm(10), upper = rnorm(10) + 5, size = 95)
Construct support regions
Description
Construct support regions
Usage
new_support_region(x = numeric(), limits = list(), closed = list())
Arguments
x |
A list of prototype vectors defining the distribution type. |
limits |
A list of value limits for the distribution. |
closed |
A list of logical(2L) indicating whether the limits are closed. |
Extract the parameters of a distribution
Description
Usage
parameters(x, ...)
## S3 method for class 'distribution'
parameters(x, ...)
Arguments
x |
The distribution(s). |
... |
Additional arguments used by methods. |
Examples
dist <- c(
dist_normal(1:2),
dist_poisson(3),
dist_multinomial(size = c(4, 3),
prob = list(c(0.3, 0.5, 0.2), c(0.1, 0.5, 0.4)))
)
parameters(dist)
Distribution Quantiles
Description
Computes the quantiles of a distribution.
Usage
## S3 method for class 'distribution'
quantile(x, p, ..., log = FALSE)
Arguments
x |
The distribution(s). |
p |
The probability of the quantile. |
... |
Additional arguments passed to methods. |
log |
If |
Objects exported from other packages
Description
These objects are imported from other packages. Follow the links below to see their documentation.
- generics
Skewness of a probability distribution
Description
Usage
skewness(x, ...)
## S3 method for class 'distribution'
skewness(x, ...)
Arguments
x |
The distribution(s). |
... |
Additional arguments used by methods. |
Region of support of a distribution
Description
Usage
support(x, ...)
## S3 method for class 'distribution'
support(x, ...)
Arguments
x |
The distribution(s). |
... |
Additional arguments used by methods. |
Variance
Description
A generic function for computing the variance of an object.
Usage
variance(x, ...)
## S3 method for class 'numeric'
variance(x, ...)
## S3 method for class 'matrix'
variance(x, ...)
## S3 method for class 'numeric'
covariance(x, ...)
Arguments
x |
An object. |
... |
Additional arguments used by methods. |
Details
The implementation of variance() for numeric variables coerces the input to
a vector then uses stats::var() to compute the variance. This means that,
unlike stats::var(), if variance() is passed a matrix or a 2-dimensional
array, it will still return the variance (stats::var() returns the
covariance matrix in that case).
See Also
variance.distribution(), covariance()
Variance of a probability distribution
Description
Returns the empirical variance of the probability distribution. If the method does not exist, the variance of a random sample will be returned.
Usage
## S3 method for class 'distribution'
variance(x, ...)
Arguments
x |
The distribution(s). |
... |
Additional arguments used by methods. |