Type: Package
Title: Estimating Bivariate Dependency from Marginal Data
Version: 1.1.0
Description: Provides maximum likelihood methods to estimate bivariate dependency (correlation) from marginal summary statistics in multi-study settings. The package supports both binary and continuous variables assumed to follow a bivariate normal distribution, enabling privacy-preserving joint estimation when individual-level data are unavailable. The binary method is fully described in the manuscript by Shang, Tsao and Zhang (2025) <doi:10.48550/arXiv.2505.03995>: "Estimating the Joint Distribution of Two Binary Variables from Their Marginal Summaries".
License: GPL (≥ 3)
Encoding: UTF-8
LazyData: true
Depends: R (≥ 3.5.0)
Imports: stats
RoxygenNote: 7.3.2
NeedsCompilation: no
Packaged: 2025-07-17 19:26:28 UTC; shanglongwen
Author: Longwen Shang [aut, cre], Min Tsao [aut], Xuekui Zhang [aut]
Maintainer: Longwen Shang <shanglongwen0918@gmail.com>
Repository: CRAN
Date/Publication: 2025-07-17 19:40:01 UTC

Example Dataset

Description

Simulated dataset for testing the cor_bin() function.

Usage

data(bin_example)

Format

A data frame with 3 columns:

ni

Sample size per study

xi

Count of first binary variable

yi

Count of second binary variable


Example Data: Continuous Variables

Description

Simulated dataset for testing the cor_cont() function.

Usage

data(cont_example)

Format

A data frame with 5 columns:

Sample_Size

Sample size for each study.

Mean_X

Sample mean of variable X.

Mean_Y

Sample mean of variable Y.

Variance_X

Sample variance of variable X.

Variance_Y

Sample variance of variable Y.


Estimate the Joint Distribution of Two Binary Variables from Marginal Summaries

Description

Performs maximum likelihood estimation (MLE) of the joint distribution of two binary variables using only marginal summary data from multiple studies.

Usage

cor_bin(ni, xi, yi, ci_method = c("none", "normal", "lr"))

Arguments

ni

Numeric vector. Sample sizes for each dataset.

xi

Numeric vector. Count of observations where variable 1 equals 1.

yi

Numeric vector. Count of observations where variable 2 equals 1.

ci_method

Character string. Method for confidence interval computation. Options are "none" (default), "normal", or "lr" (likelihood ratio).

Value

A named list with point estimates, variance, standard error, and confidence interval (if requested).

p1_hat

Estimated marginal probability for variable 1.

p2_hat

Estimated marginal probability for variable 2.

p11_hat

Estimated joint probability.

var_hat

Estimated variance of p11_hat.

sd_hat

Standard error of p11_hat.

ci

Confidence interval for p11_hat, if requested.

Examples

data(bin_example)
cor_bin(bin_example$ni, bin_example$xi, bin_example$yi, ci_method = "lr")

Estimate the Bivariate Normal Distribution from Marginal Summaries

Description

Estimate the correlation coefficient \rho (and marginal means / SDs) of two normally-distributed variables using summary-level data from multiple independent studies.

Usage

cor_cont(
  n,
  xbar,
  ybar,
  s2x = NULL,
  s2y = NULL,
  method = c("proposed", "weighted"),
  ci_method = c("none", "normal", "lr")
)

Arguments

n

Numeric vector. Sample size of each study.

xbar, ybar

Numeric vectors. Sample means of the two variables.

s2x, s2y

Numeric vectors. Sample variances; required for method = "proposed".

method

Character. "proposed" uses the proposed MLE method in the paper; "weighted" replicates the weighted mean based method (Baseline) when no variances are available.

ci_method

Confidence interval type: "none", "normal", or "lr" (likelihood ratio). Only implemented when method = "proposed".

Value

A list with elements

Examples

data(cont_example)
# Example with full summaries
cor_cont(cont_example$Sample_Size, cont_example$Mean_X, cont_example$Mean_Y,
 cont_example$Variance_X, cont_example$Variance_Y, method = "proposed", ci_method = "lr")

# Only means + n, weighted mean method
cor_cont(cont_example$Sample_Size, cont_example$Mean_X, cont_example$Mean_Y, method = "weighted")