Mathematical Foundations of Series Systems

Series System Theory

A series system consists of \(m\) independent components, each with lifetime \(T_j\) and survival function \(S_j(t) = P(T_j > t)\). The system fails when any component fails, so the system lifetime is:

\[T_{sys} = \min(T_1, \ldots, T_m)\]

Since the components are independent:

\[S_{sys}(t) = P(T_{sys} > t) = P(T_1 > t, \ldots, T_m > t) = \prod_{j=1}^{m} S_j(t)\]

Taking the negative logarithm gives the cumulative hazard:

\[H_{sys}(t) = -\log S_{sys}(t) = \sum_{j=1}^{m} H_j(t)\]

Differentiating yields the hazard rate:

\[h_{sys}(t) = \frac{d}{dt} H_{sys}(t) = \sum_{j=1}^{m} h_j(t)\]

This is the fundamental result: the system hazard is the sum of component hazards.

Key Properties

Let’s verify these relationships numerically.

Property 1: Hazard Sum

library(serieshaz)

sys <- dfr_dist_series(list(
    dfr_weibull(shape = 2, scale = 100),
    dfr_exponential(0.01),
    dfr_gompertz(a = 0.001, b = 0.05)
))

h_sys <- hazard(sys)
h1 <- component_hazard(sys, 1)
h2 <- component_hazard(sys, 2)
h3 <- component_hazard(sys, 3)

t_vals <- c(10, 50, 100, 200)
for (t0 in t_vals) {
    lhs <- h_sys(t0)
    rhs <- h1(t0) + h2(t0) + h3(t0)
    cat(sprintf("t=%3d: h_sys=%.6f, sum h_j=%.6f, diff=%.2e\n",
                t0, lhs, rhs, abs(lhs - rhs)))
}
#> t= 10: h_sys=0.013649, sum h_j=0.013649, diff=0.00e+00
#> t= 50: h_sys=0.032182, sum h_j=0.032182, diff=0.00e+00
#> t=100: h_sys=0.178413, sum h_j=0.178413, diff=0.00e+00
#> t=200: h_sys=22.076466, sum h_j=22.076466, diff=0.00e+00

Property 2: Survival Product

S_sys <- surv(sys)
comp1 <- component(sys, 1)
comp2 <- component(sys, 2)
comp3 <- component(sys, 3)
S1 <- surv(comp1)
S2 <- surv(comp2)
S3 <- surv(comp3)

for (t0 in t_vals) {
    lhs <- S_sys(t0)
    rhs <- S1(t0) * S2(t0) * S3(t0)
    cat(sprintf("t=%3d: S_sys=%.6f, prod S_j=%.6f, diff=%.2e\n",
                t0, lhs, rhs, abs(lhs - rhs)))
}
#> t= 10: S_sys=0.884286, prod S_j=0.884286, diff=1.11e-16
#> t= 50: S_sys=0.377702, prod S_j=0.377702, diff=0.00e+00
#> t=100: S_sys=0.007096, prod S_j=0.007096, diff=0.00e+00
#> t=200: S_sys=0.000000, prod S_j=0.000000, diff=1.52e-210

Property 3: Cumulative Hazard Sum

# The system's cumulative hazard is the sum of component cumulative hazards
H_sys <- cum_haz(sys)
H1 <- cum_haz(comp1)
H2 <- cum_haz(comp2)
H3 <- cum_haz(comp3)

for (t0 in t_vals) {
    lhs <- H_sys(t0)
    rhs <- H1(t0) + H2(t0) + H3(t0)
    cat(sprintf("t=%3d: H_sys=%.6f, sum H_j=%.6f, diff=%.2e\n",
                t0, lhs, rhs, abs(lhs - rhs)))
}
#> t= 10: H_sys=0.122974, sum H_j=0.122974, diff=0.00e+00
#> t= 50: H_sys=0.973650, sum H_j=0.973650, diff=0.00e+00
#> t=100: H_sys=4.948263, sum H_j=4.948263, diff=0.00e+00
#> t=200: H_sys=446.509316, sum H_j=446.509316, diff=0.00e+00

Property 4: Density Formula

The system density is:

\[f_{sys}(t) = h_{sys}(t) \cdot S_{sys}(t) = \left[\sum_{j=1}^m h_j(t)\right] \exp\left[-\sum_{j=1}^m H_j(t)\right]\]

f_sys <- density(sys)
for (t0 in t_vals) {
    from_density <- f_sys(t0)
    from_hS <- h_sys(t0) * S_sys(t0)
    cat(sprintf("t=%3d: f(t)=%.8f, h(t)*S(t)=%.8f, diff=%.2e\n",
                t0, from_density, from_hS, abs(from_density - from_hS)))
}
#> t= 10: f(t)=0.01206938, h(t)*S(t)=0.01206938, diff=0.00e+00
#> t= 50: f(t)=0.01215539, h(t)*S(t)=0.01215539, diff=0.00e+00
#> t=100: f(t)=0.00126597, h(t)*S(t)=0.00126597, diff=0.00e+00
#> t=200: f(t)=0.00000000, h(t)*S(t)=0.00000000, diff=0.00e+00

The density() method and the manual \(h(t) \cdot S(t)\) calculation agree to machine precision. Note that at \(t = 200\), both the density and survival are effectively zero — by that time, the combined Weibull + exponential + Gompertz hazard has driven the cumulative hazard so high that virtually all systems have already failed.

The Parameter Layout

The series system stores parameters from all components as a single flat vector. This is critical for optimization — MLE fitting via optim() requires a single parameter vector.

sys <- dfr_dist_series(list(
    dfr_weibull(shape = 2, scale = 100),    # 2 params
    dfr_exponential(0.05),            # 1 param
    dfr_gompertz(a = 0.01, b = 0.1)         # 2 params
))

# Full parameter vector (5 elements)
params(sys)
#> [1] 2e+00 1e+02 5e-02 1e-02 1e-01

# Layout maps global indices to components
param_layout(sys)
#> [[1]]
#> [1] 1 2
#> 
#> [[2]]
#> [1] 3
#> 
#> [[3]]
#> [1] 4 5

# Component 1 uses par[1:2], component 2 uses par[3], component 3 uses par[4:5]

When an optimizer proposes a new parameter vector par = c(p1, p2, p3, p4, p5), the series system internally slices it: component 1 gets par[1:2], component 2 gets par[3], and component 3 gets par[4:5].

Analytical vs. Numerical Cumulative Hazard

The cumulative hazard \(H(t) = \int_0^t h(u)\,du\) is needed for survival, CDF, and log-likelihood computations. When all components provide an analytical cum_haz_rate, the series system computes \(H_{sys}(t) = \sum_j H_j(t)\) exactly. Otherwise, it falls back to numerical integration.

# All standard distributions provide analytical cumulative hazard
sys_analytical <- dfr_dist_series(list(
    dfr_weibull(shape = 2, scale = 100),
    dfr_exponential(0.05)
))
cat("Has analytical cum_haz:", !is.null(sys_analytical$cum_haz_rate), "\n")
#> Has analytical cum_haz: TRUE

# A custom dfr_dist without cum_haz_rate forces numerical fallback
custom <- dfr_dist(
    rate = function(t, par, ...) par[1] * t^(par[1] - 1),
    par = c(1.5)
)
sys_numerical <- dfr_dist_series(list(
    custom,
    dfr_exponential(0.05)
))
cat("Has analytical cum_haz:", !is.null(sys_numerical$cum_haz_rate), "\n")
#> Has analytical cum_haz: FALSE

The analytical path is faster and more precise, but the numerical fallback works correctly for any valid hazard function.

Score Function: Decomposed Gradient

The log-likelihood for a series system with observed data \((t_i, \delta_i)\) is:

\[\ell(\boldsymbol{\theta}) = \sum_{i} \left[\delta_i \log h_{sys}(t_i; \boldsymbol{\theta}) - H_{sys}(t_i; \boldsymbol{\theta})\right]\]

where \(\delta_i = 1\) for exact observations, \(\delta_i = 0\) for right-censored, and \(\delta_i = -1\) for left-censored.

The naive approach

A generic gradient computation would apply numDeriv::grad to the full log-likelihood, perturbing the entire parameter vector \(\boldsymbol{\theta}\) of length \(P\). Each perturbation re-evaluates all \(m\) component hazards at all \(n\) observations, for a cost of \(O(P \cdot n \cdot m)\).

How serieshaz decomposes the gradient

The series structure admits a per-component decomposition. Because \(H_{sys} = \sum_j H_j\) and \(h_{sys} = \sum_j h_j\), the gradient with respect to component \(k\)’s parameters \(\boldsymbol{\theta}_k\) separates into two terms:

\[\frac{\partial \ell}{\partial \boldsymbol{\theta}_k} = \underbrace{ \sum_{i:\,\delta_i=1} \frac{1}{h_{sys}(t_i)} \frac{\partial h_k(t_i)}{\partial \boldsymbol{\theta}_k} }_{\text{hazard derivative term}} \;-\; \underbrace{ \sum_{i} \frac{\partial H_k(t_i)}{\partial \boldsymbol{\theta}_k} }_{\text{cumulative hazard derivative term}}\]

The two terms are computed differently:

Term Method Cost
Hazard derivative numDeriv::jacobian on \(h_k(t, \boldsymbol{\theta}_k)\) \(O(p_k \cdot n_{\text{exact}})\) per component
Cumulative hazard derivative Analytical via all-censored trick (when the component provides score_fn) \(O(n)\) per component
Cumulative hazard derivative numDeriv::grad on \(\sum_i H_k(t_i)\) (fallback) \(O(p_k \cdot n)\) per component

Total cost: \(O(P \cdot n)\), versus \(O(P \cdot n \cdot m)\) for the naive approach — roughly an \(m\)-fold speedup.

The all-censored trick

The key insight for the cumulative hazard derivative is that a component’s own score_fn computes:

\[\text{score}_k(\boldsymbol{\theta}_k) = \sum_i \delta_i \frac{\partial \log h_k(t_i)}{\partial \boldsymbol{\theta}_k} - \sum_i \frac{\partial H_k(t_i)}{\partial \boldsymbol{\theta}_k}\]

If we call score_fn with all observations marked as right-censored (\(\delta_i = 0\)), the first sum vanishes and we get:

\[\text{score}_k\big|_{\delta=0} = -\sum_i \frac{\partial H_k(t_i)}{\partial \boldsymbol{\theta}_k}\]

This extracts the cumulative hazard derivative analytically — no numerical differentiation needed. All built-in distributions (exponential, Weibull, Gompertz, log-logistic) provide score_fn, so this analytical path is the common case.

Numerical demonstration

sys <- dfr_dist_series(list(
    dfr_weibull(shape = 2, scale = 100),
    dfr_exponential(0.01),
    dfr_gompertz(a = 0.001, b = 0.05)
))

set.seed(42)
times <- sampler(sys)(200)
df <- data.frame(t = times, delta = rep(1L, 200))

par0 <- params(sys)
sc_fn <- score(sys)
ll_fn <- loglik(sys)

# Decomposed score (what serieshaz computes internally)
score_val <- sc_fn(df, par = par0)

# Independent finite-difference verification
eps <- 1e-6
fd_score <- numeric(length(par0))
for (k in seq_along(par0)) {
    par_p <- par_m <- par0
    par_p[k] <- par0[k] + eps
    par_m[k] <- par0[k] - eps
    fd_score[k] <- (ll_fn(df, par = par_p) - ll_fn(df, par = par_m)) / (2 * eps)
}

result <- data.frame(
    Parameter = c("shape", "scale", "rate", "a", "b"),
    Decomposed = round(score_val, 6),
    FiniteDiff = round(fd_score, 6),
    AbsDiff = formatC(abs(score_val - fd_score), format = "e", digits = 1)
)
print(result, row.names = FALSE)
#>  Parameter  Decomposed  FiniteDiff AbsDiff
#>      shape    3.794196    3.794196 3.6e-08
#>      scale    0.058314    0.058314 3.5e-08
#>       rate -649.403393 -649.403384 9.0e-06
#>          a 1793.760220 1793.766367 6.1e-03
#>          b  328.806995  328.806992 2.9e-06

The decomposed score and the finite-difference approximation agree to high precision across all five parameters, spanning three different distribution families.

Score at the MLE

At the maximum likelihood estimate, the score should be approximately zero (the first-order optimality condition). We compare the score magnitude at the true parameters (generally nonzero with finite \(n\)) versus at the MLE:

sys2 <- dfr_dist_series(list(
    dfr_weibull(shape = 2, scale = 100),
    dfr_exponential(0.01)
))

set.seed(42)
df2 <- data.frame(t = sampler(sys2)(300), delta = rep(1L, 300))

sc2 <- score(sys2)
cat("Score at true params:", round(sc2(df2), 4), "\n")
#> Score at true params: 2.7316 -0.0614 378.5898

solver <- fit(sys2)
result <- suppressWarnings(solver(df2, par = c(1.5, 80, 0.02)))

cat("MLE:                ", round(coef(result), 4), "\n")
#> MLE:                 2.4743 109.1372 0.0121
cat("Score at MLE:       ", round(sc2(df2, par = coef(result)), 6), "\n")
#> Score at MLE:        0.054226 -0.001491 0.375314

The score at the MLE is orders of magnitude smaller than at the true parameters, confirming the optimizer has found the stationary point. (The residual is due to finite optimizer tolerance, not a problem with the score computation.)

Decomposed Hessian

The Hessian also exploits the series structure. It has a natural block decomposition into within-component and cross-component terms:

Within-component block (\(k = l\)): \[\frac{\partial^2 \ell}{\partial \boldsymbol{\theta}_k \partial \boldsymbol{\theta}_k^\top} = \sum_{i:\,\delta_i=1} \left[ \frac{1}{h_{sys}(t_i)} \frac{\partial^2 h_k}{\partial \boldsymbol{\theta}_k \partial \boldsymbol{\theta}_k^\top} - \frac{1}{h_{sys}(t_i)^2} \frac{\partial h_k}{\partial \boldsymbol{\theta}_k} \frac{\partial h_k^\top}{\partial \boldsymbol{\theta}_k} \right] - \sum_i \frac{\partial^2 H_k}{\partial \boldsymbol{\theta}_k \partial \boldsymbol{\theta}_k^\top}\]

Cross-component block (\(k \neq l\)): \[\frac{\partial^2 \ell}{\partial \boldsymbol{\theta}_k \partial \boldsymbol{\theta}_l^\top} = -\sum_{i:\,\delta_i=1} \frac{1}{h_{sys}(t_i)^2} \frac{\partial h_k}{\partial \boldsymbol{\theta}_k} \frac{\partial h_l^\top}{\partial \boldsymbol{\theta}_l}\]

The cross-component blocks are free — they reuse the rate Jacobians already computed for the score. The cumulative hazard Hessian \(\partial^2 H_k / \partial \boldsymbol{\theta}_k^2\) is extracted analytically via the all-censored trick on hess_fn when available. The rate Hessian \(\partial^2 h_k / \partial \boldsymbol{\theta}_k^2\) uses numDeriv::hessian per-component per observation.

H_fn <- hess_loglik(sys2)
hess_val <- H_fn(df2, par = coef(result))
evals <- round(eigen(hess_val, symmetric = TRUE)$values, 2)
cat("Hessian eigenvalues at MLE:", evals, "\n")
#> Hessian eigenvalues at MLE: -0.01 -8.85 -966824.3
cat("Hessian is symmetric:", all.equal(hess_val, t(hess_val)), "\n")
#> Hessian is symmetric: TRUE

All eigenvalues are negative, confirming the MLE is a local maximum. (For non-identifiable models like exponential-only series, one eigenvalue would be near zero.)

Special Cases

Exponential Series = Exponential(sum of rates)

When all components are exponential with rates \(\lambda_1, \ldots, \lambda_m\), the system hazard is constant: \(h_{sys}(t) = \sum_j \lambda_j\). This is equivalent to a single exponential with rate \(\lambda = \sum_j \lambda_j\).

rates <- c(0.1, 0.2, 0.3)
sys <- dfr_dist_series(lapply(rates, dfr_exponential))
equiv <- dfr_exponential(sum(rates))

h_sys <- hazard(sys)
h_eq  <- hazard(equiv)
S_sys <- surv(sys)
S_eq  <- surv(equiv)

for (t0 in c(1, 5, 10, 50)) {
    cat(sprintf("t=%2d: h_sys=%.4f h_eq=%.4f | S_sys=%.6f S_eq=%.6f\n",
                t0, h_sys(t0), h_eq(t0), S_sys(t0), S_eq(t0)))
}
#> t= 1: h_sys=0.6000 h_eq=0.6000 | S_sys=0.548812 S_eq=0.548812
#> t= 5: h_sys=0.6000 h_eq=0.6000 | S_sys=0.049787 S_eq=0.049787
#> t=10: h_sys=0.6000 h_eq=0.6000 | S_sys=0.002479 S_eq=0.002479
#> t=50: h_sys=0.6000 h_eq=0.6000 | S_sys=0.000000 S_eq=0.000000

Identical Weibull Components

When all \(m\) components are Weibull with the same shape \(k\) and scale \(\lambda\), the system is also Weibull with shape \(k\) and scale \(\lambda / m^{1/k}\):

m <- 3
shape <- 2
scale <- 100

sys <- dfr_dist_series(replicate(
    m, dfr_weibull(shape = shape, scale = scale), simplify = FALSE
))
equiv <- dfr_weibull(shape = shape, scale = scale / m^(1/shape))

S_sys <- surv(sys)
S_eq  <- surv(equiv)

for (t0 in c(10, 30, 50)) {
    cat(sprintf("t=%2d: S_sys=%.6f S_equiv=%.6f diff=%.2e\n",
                t0, S_sys(t0), S_eq(t0), abs(S_sys(t0) - S_eq(t0))))
}
#> t=10: S_sys=0.970446 S_equiv=0.970446 diff=0.00e+00
#> t=30: S_sys=0.763379 S_equiv=0.763379 diff=1.11e-16
#> t=50: S_sys=0.472367 S_equiv=0.472367 diff=5.55e-17

Single Component (Degenerate Case)

A series system with one component is equivalent to that component:

single <- dfr_dist_series(list(dfr_weibull(shape = 2, scale = 100)))
direct <- dfr_weibull(shape = 2, scale = 100)

h1 <- hazard(single)
h2 <- hazard(direct)
S1 <- surv(single)
S2 <- surv(direct)

for (t0 in c(10, 50, 100)) {
    cat(sprintf("t=%3d: h=%.6f/%.6f S=%.6f/%.6f\n",
                t0, h1(t0), h2(t0), S1(t0), S2(t0)))
}
#> t= 10: h=0.002000/0.002000 S=0.990050/0.990050
#> t= 50: h=0.010000/0.010000 S=0.778801/0.778801
#> t=100: h=0.020000/0.020000 S=0.367879/0.367879