Skip to contents

Implements the rescaled bootstrap method for variance estimation in survey data, supporting both stratified simple random sampling and multistage complex designs.

Usage

rescaled_bootstrap(
  data,
  y,
  strata,
  N_h = NULL,
  psu = NULL,
  weights = NULL,
  estimator,
  by_strata = TRUE,
  B = 200,
  m_h = NULL,
  seed = NULL,
  verbose = TRUE
)

Arguments

data

A data frame containing the survey data.

y

A character string specifying the variable name to be used for the target variable.

strata

A character string specifying the stratification variable.

N_h

Optional vector of stratum population sizes, used for the finite population correction (FPC). Can be a single value (applied to all strata) or one value per stratum.

psu

Optional character string specifying the Primary Sampling Unit (PSU) variable. Required for multistage complex designs.

weights

Optional character string specifying the sampling weight variable. Required for complex designs with unequal inclusion probabilities.

estimator

A function that computes the statistic of interest, accepting arguments estimator(y, weights) for complex designs or estimator(y) for simple designs.

by_strata

Logical; if TRUE, variances are computed separately by stratum.

B

Integer; number of bootstrap replicates (default = 200).

m_h

Optional vector of bootstrap sample sizes per stratum (PSUs for complex designs). If NULL, defaults to \(m_h = \lfloor (n_h - 2)^2 / (n_h - 1) \rfloor\).

seed

Optional integer for reproducibility.

verbose

Logical; if TRUE (default), displays a progress bar during bootstrap iterations.

Value

A list containing:

variance

Bootstrap variance estimate

boot_estimates

Vector of B bootstrap estimates

B

Number of bootstrap replicates

by_strata

if variance is computed by stratum or overall

design

if the sampling design is complex or simple

strata_info

Returns information about number of observations/PSUs per stratum

call

The matched function call.

Details

The rescaled bootstrap is a resampling technique designed for complex survey data that preserves stratification and primary sampling unit (PSU) structure, providing consistent variance estimation for both smooth and non-smooth statistics. The methodology is based on Rao and Wu (1988) and Rao et al. (1992) .

(1) Stratified Simple Random Sampling design

Consider a finite population divided into \(H\) strata, each of size \(N_h\), with a sample of size \(n_h\) selected independently in each \(h\) stratum. Suppose to be interested on some \(\theta\) parameter, with \(\hat{\theta}\) sampling estimator. For each \(b\) bootstrap replicate, \(b = \ldots, B\):

  1. Draw a bootstrap sample of size \(m_h\) with replacement from the \(n_h\) sampled units. By default, \(m_h = \lfloor (n_h - 2)^2 / (n_h - 1) \rfloor \approx n_h - 3\).

  2. Compute rescaled bootstrap values: $$ \tilde{y}_{hj}^{*(b)} = \bar{y}_h + \sqrt{\frac{m_h(1-f_h)}{n_h - 1}} (y_{hj}^{*(b)} - \bar{y}_h), $$ where \(y_{hj}^*\) is the bootstrap observation, \(f_h = n_h / N_h\) is the sampling fraction and \(\bar{y}_h\) is the sample stratum mean.

  3. Compute the statistic of interest \(\hat{\theta}^{*(b)}\) using rescaled values.

The bootstrap variance is then given by: $$ \widehat{V}_{boot}(\hat{\theta}) = \frac{1}{B-1} \sum_{b=1}^{B} \left( \hat{\theta}^{*(b)} - \hat{\bar{\theta}}^{*} \right)^2, \qquad \hat{\bar{\theta}}^{*} = \frac{1}{B} \sum_{b=1}^{B} \hat{\theta}^{*(b)}. $$

(2) Two-Stage Stratified Sampling design

For designs with PSUs and sampling weights:

  1. Within each stratum \(h\), draw \(m_h\) PSUs with replacement from the \(n_h\) sampled PSUs. By default, \(m_h = \lfloor (n_h - 2)^2 / (n_h - 1) \rfloor \approx n_h - 3\).

  2. Let \(m_{hi}^{(b)}\) denote the number of times PSU \(i\) is selected in replicate \(b\). Each observation in the \(i\)-th PSU is assigned a rescaled bootstrap weight: $$ w_{hij}^{*(b)} = \left[ 1 - c_h + c_h \frac{n_h}{m_h} m_{hi}^{(b)} \right] w_{hij}, \qquad c_h = \sqrt{\frac{m_h}{n_h - 1}}. $$ \(w_{hij}\) is the sampling weight associtaed to individual \(j\) in PSU \(i\) in stratum \(h\)-

  3. The statistic \(\hat{\theta}^{*(b)}\) is computed using the rescaled weights.

The rescaled bootstrap variance estimate is then: $$ \widehat{V}_{boot}(\hat{\theta}) = \frac{1}{B-1} \sum_{b=1}^{B} \left( \hat{\theta}^{*(b)} - \bar{\theta}^{*} \right)^2. $$

Multiple estimators

The estimator argument accepts any function with signature f(y, weights) (complex design) or f(y) (simple design), including functions from this package and user-defined ones. When estimator returns a named numeric vector, variances are computed for all outputs simultaneously from the same bootstrap replicates, so the resulting standard errors are directly comparable across indicators. For a convenience wrapper that automatically computes all package inequality indicators and their standard errors in a single call, see inequantiles.

References

Rao J, Wu C (1988). “Resampling inference with complex survey data.” Journal of the American Statistical Association, 83, 231–241.

Rao J, Wu C, Yue K (1992). “Some recent work on resampling methods for complex surveys.” Survey methodology, 18, 209–217.

Kolenikov S (2010). “Resampling variance estimation for complex survey data.” The Stata Journal, 10, 165–199.

Scarpa S, Ferrante MR, Sperlich S (2025). “Inference for the quantile ratio inequality index in the context of survey data.” Journal of Survey Statistics and Methodology. doi:10.1093/jssam/smaf024 .

Examples

# \donttest{
data(synthouse)


# ================================================================
# Example 1: Stratified Simple Random Sampling (SRS)
# ================================================================


# Use NUTS2 as strata
set.seed(123)

# Simulate population sizes per stratum (for FPC)
N_values <- sample(2000:5000, length(unique(synthouse$NUTS2)), replace = TRUE)
names(N_values) <- sort(unique(synthouse$NUTS2))

# Define a simple mean estimator
mean_estimator <- function(y) mean(y, na.rm = TRUE)

# Apply the rescaled bootstrap under stratified SRS
boot_srs <- rescaled_bootstrap(
  data = synthouse,
  y = "eq_income",
  strata = "NUTS2",
  N_h = N_values,
  estimator = mean_estimator,
  by_strata = TRUE,
  B = 50,  # small number for illustration
  seed = 123,
  verbose = FALSE
)

# View results
boot_srs$variance
#>       C01       C02       C03       C04       C05       C06       N01       N02 
#>  499863.9  573118.2  671053.7  545446.5  775512.9  475065.6  341097.8  350873.9 
#>       N03       N04       N05       N06      NE01      NE02      NE03      NE04 
#>  307397.0  232378.7  320968.3  398986.7  509534.9  371716.5  416917.1  551225.8 
#>      NE05      NE06      NO01      NO02      NO03      NO04      NO05      NO06 
#>  447320.5  337993.0  374534.2  305472.6  758880.7  398586.5  782366.0 1053463.7 
#>       S01       S02       S03       S04       S05       S06 
#>  257089.8  416672.4  301026.8  420146.3  342398.1  536383.0 


# ================================================================
# Example 2: Two-stage Complex Design
# ================================================================

# Estimate the QRI estimator sampling variance.
boot_complex <- rescaled_bootstrap(
  data = synthouse,
  y = "eq_income",
  strata = "NUTS2",
  psu = "municipality",
  weights = "weight",
  estimator = qri,
  by_strata = TRUE,
  B = 50,
  seed = 456,
  verbose = FALSE
)

# Display variance and bootstrap estimates
summary(boot_complex$variance)
#>      Min.   1st Qu.    Median      Mean   3rd Qu.      Max. 
#> 0.0003333 0.0006152 0.0007907 0.0008676 0.0010729 0.0016656 

# Strata and PSU summary



# ================================================================
# Example 3: Multiple estimators in a single bootstrap loop
# ================================================================

# Create a function returning a named vector of estimates,
# including package functions and user-defined ones. All indicators share
# the same bootstrap replicates, ensuring directly comparable standard errors.

multi_estimator <- function(y, weights) {
  c(
    w_mean = sum(y * weights) / sum(weights),   # custom: weighted mean
    qri    = qri(y, weights = weights),          # package function
    qsr    = qsr(y, weights = weights)           # package function
  )
}

boot_multi <- rescaled_bootstrap(
  data      = synthouse,
  y         = "eq_income",
  strata    = "NUTS2",
  psu       = "municipality",
  weights   = "weight",
  estimator = multi_estimator,
  by_strata = FALSE,
  B         = 50,
  seed      = 42,
  verbose   = FALSE
)

# One variance per indicator, all from the same replicates
boot_multi$variance
#>       w_mean          qri          qsr 
#> 1.292902e+05 2.151047e-05 3.087562e-02 
# }


# ================================================================
# Note:
# These examples use small B for speed. For actual analysis,
# use B >= 200 for stable estimates.
# ================================================================