Rescaled Bootstrap Variance Estimation

Implements the rescaled bootstrap method for variance estimation in survey data, supporting both stratified simple random sampling and multistage complex designs.

Usage

rescaled_bootstrap(
  data,
  y,
  strata,
  N_h = NULL,
  psu = NULL,
  weights = NULL,
  estimator,
  by_strata = TRUE,
  B = 200,
  m_h = NULL,
  seed = NULL,
  verbose = TRUE
)

Arguments

data: A data frame containing the survey data.
y: A character string specifying the variable name to be used for the target variable.
strata: A character string specifying the stratification variable.
N_h: Optional vector of stratum population sizes, used for the finite population correction (FPC). Can be a single value (applied to all strata) or one value per stratum.
psu: Optional character string specifying the Primary Sampling Unit (PSU) variable. Required for multistage complex designs.
weights: Optional character string specifying the sampling weight variable. Required for complex designs with unequal inclusion probabilities.
estimator: A function that computes the statistic of interest, accepting arguments estimator(y, weights) for complex designs or estimator(y) for simple designs.
by_strata: Logical; if TRUE, variances are computed separately by stratum.
B: Integer; number of bootstrap replicates (default = 200).
m_h: Optional vector of bootstrap sample sizes per stratum (PSUs for complex designs). If NULL, defaults to $m_h = \lfloor (n_h - 2)^2 / (n_h - 1) \rfloor$.
seed: Optional integer for reproducibility.
verbose: Logical; if TRUE (default), displays a progress bar during bootstrap iterations.

Value

A list containing:

variance: Bootstrap variance estimate
boot_estimates: Vector of B bootstrap estimates
B: Number of bootstrap replicates
by_strata: Logical; TRUE if variance is computed separately by stratum.
design: Character string: "two-stage stratified" or "stratified SRS".
strata_info: Data frame with number of observations/PSUs per stratum.
call: The matched function call.

Details

The rescaled bootstrap is a resampling technique designed for complex survey data that preserves stratification and primary sampling unit (PSU) structure, providing consistent variance estimation for both smooth and non-smooth statistics. The methodology is based on Rao and Wu (1988) and Rao et al. (1992) .

(1) Stratified Simple Random Sampling design

Consider a finite population divided into $H$ strata, each of size $N_h$, with a sample of size $n_h$ selected independently in each $h$ stratum. Suppose to be interested in some $\theta$ parameter, with $\hat{\theta}$ sampling estimator. For each $b$ bootstrap replicate, $b = 1, \ldots, B$ and stratum $h$:

Draw a bootstrap sample of size $m_h$ with replacement from the $n_h$ sampled units. By default, $m_h = \lfloor (n_h - 2)^2 / (n_h - 1) \rfloor \approx n_h - 3$.
Compute rescaled bootstrap values: $$ \tilde{y}_{hj}^{*(b)} = \bar{y}_h + \sqrt{\frac{m_h(1-f_h)}{n_h - 1}} (y_{hj}^{*(b)} - \bar{y}_h), $$ where $y_{hj}^{*(b)}$ is the bootstrap observation, $1 - f_h$ is the FPC, with $f_h = n_h / N_h$, and $\bar{y}_h$ is the sample stratum mean.
Compute the statistic of interest $\hat{\theta}^{*(b)}_h$ using rescaled values.

The variance is then estimated by the bootstrap variance.

(2) Two-Stage Stratified Sampling design

For designs with PSUs and sampling weights:

Within each stratum $h$, draw $m_h$ PSUs with replacement from the $n_h$ sampled PSUs. By default, $m_h = \lfloor (n_h - 2)^2 / (n_h - 1) \rfloor \approx n_h - 3$.
Let $m_{hi}^{(b)}$ denote the number of times PSU $i$ is selected in replicate $b$. Each observation in the $i$-th PSU is assigned a rescaled bootstrap weight: $$ w_{hij}^{*(b)} = \left[ 1 - c_h + c_h \frac{n_h}{m_h} m_{hi}^{(b)} \right] w_{hij}, \qquad c_h = \sqrt{\frac{m_h}{n_h - 1}}. $$ $w_{hij}$ is the sampling weight associated to individual $j$ in PSU $i$ in stratum $h$
The statistic $\hat{\theta}^{*(b)}_h$ is computed using the rescaled weights.

The variance is then estimated by the bootstrap variance.

Multiple estimators

The estimator argument accepts any function with signature f(y, weights) (complex design) or f(y) (simple design), including functions from this package and user-defined ones. When estimator returns a named numeric vector, variances are computed for all outputs simultaneously from the same bootstrap replicates, so the resulting standard errors are directly comparable across indicators.

References

Rao J, Wu C (1988). “Resampling inference with complex survey data.” Journal of the American Statistical Association, 83, 231–241.

Rao J, Wu C, Yue K (1992). “Some recent work on resampling methods for complex surveys.” Survey methodology, 18, 209–217.

Kolenikov S (2010). “Resampling variance estimation for complex survey data.” The Stata Journal, 10, 165–199.

Scarpa S, Ferrante MR, Sperlich S (2025). “Inference for the quantile ratio inequality index in the context of survey data.” Journal of Survey Statistics and Methodology. doi:10.1093/jssam/smaf024 .

Examples