Implements the rescaled bootstrap method for variance estimation in survey data, supporting both stratified simple random sampling and multistage complex designs.
Usage
rescaled_bootstrap(
data,
y,
strata,
N_h = NULL,
psu = NULL,
weights = NULL,
estimator,
by_strata = TRUE,
B = 200,
m_h = NULL,
seed = NULL,
verbose = TRUE
)Arguments
- data
A data frame containing the survey data.
- y
A character string specifying the variable name to be used for the target variable.
- strata
A character string specifying the stratification variable.
- N_h
Optional vector of stratum population sizes, used for the finite population correction (FPC). Can be a single value (applied to all strata) or one value per stratum.
- psu
Optional character string specifying the Primary Sampling Unit (PSU) variable. Required for multistage complex designs.
- weights
Optional character string specifying the sampling weight variable. Required for complex designs with unequal inclusion probabilities.
- estimator
A function that computes the statistic of interest, accepting arguments
estimator(y, weights)for complex designs orestimator(y)for simple designs.- by_strata
Logical; if
TRUE, variances are computed separately by stratum.- B
Integer; number of bootstrap replicates (default = 200).
- m_h
Optional vector of bootstrap sample sizes per stratum (PSUs for complex designs). If
NULL, defaults to \(m_h = \lfloor (n_h - 2)^2 / (n_h - 1) \rfloor\).- seed
Optional integer for reproducibility.
- verbose
Logical; if
TRUE(default), displays a progress bar during bootstrap iterations.
Value
A list containing:
- variance
Bootstrap variance estimate
- boot_estimates
Vector of B bootstrap estimates
- B
Number of bootstrap replicates
- by_strata
if variance is computed by stratum or overall
- design
if the sampling design is complex or simple
- strata_info
Returns information about number of observations/PSUs per stratum
- call
The matched function call.
Details
The rescaled bootstrap is a resampling technique designed for complex survey data that preserves stratification and primary sampling unit (PSU) structure, providing consistent variance estimation for both smooth and non-smooth statistics. The methodology is based on Rao and Wu (1988) and Rao et al. (1992) .
(1) Stratified Simple Random Sampling design
Consider a finite population divided into \(H\) strata, each of size \(N_h\), with a sample of size \(n_h\) selected independently in each \(h\) stratum. Suppose to be interested on some \(\theta\) parameter, with \(\hat{\theta}\) sampling estimator. For each \(b\) bootstrap replicate, \(b = \ldots, B\):
Draw a bootstrap sample of size \(m_h\) with replacement from the \(n_h\) sampled units. By default, \(m_h = \lfloor (n_h - 2)^2 / (n_h - 1) \rfloor \approx n_h - 3\).
Compute rescaled bootstrap values: $$ \tilde{y}_{hj}^{*(b)} = \bar{y}_h + \sqrt{\frac{m_h(1-f_h)}{n_h - 1}} (y_{hj}^{*(b)} - \bar{y}_h), $$ where \(y_{hj}^*\) is the bootstrap observation, \(f_h = n_h / N_h\) is the sampling fraction and \(\bar{y}_h\) is the sample stratum mean.
Compute the statistic of interest \(\hat{\theta}^{*(b)}\) using rescaled values.
The bootstrap variance is then given by: $$ \widehat{V}_{boot}(\hat{\theta}) = \frac{1}{B-1} \sum_{b=1}^{B} \left( \hat{\theta}^{*(b)} - \hat{\bar{\theta}}^{*} \right)^2, \qquad \hat{\bar{\theta}}^{*} = \frac{1}{B} \sum_{b=1}^{B} \hat{\theta}^{*(b)}. $$
(2) Two-Stage Stratified Sampling design
For designs with PSUs and sampling weights:
Within each stratum \(h\), draw \(m_h\) PSUs with replacement from the \(n_h\) sampled PSUs. By default, \(m_h = \lfloor (n_h - 2)^2 / (n_h - 1) \rfloor \approx n_h - 3\).
Let \(m_{hi}^{(b)}\) denote the number of times PSU \(i\) is selected in replicate \(b\). Each observation in the \(i\)-th PSU is assigned a rescaled bootstrap weight: $$ w_{hij}^{*(b)} = \left[ 1 - c_h + c_h \frac{n_h}{m_h} m_{hi}^{(b)} \right] w_{hij}, \qquad c_h = \sqrt{\frac{m_h}{n_h - 1}}. $$ \(w_{hij}\) is the sampling weight associtaed to individual \(j\) in PSU \(i\) in stratum \(h\)-
The statistic \(\hat{\theta}^{*(b)}\) is computed using the rescaled weights.
The rescaled bootstrap variance estimate is then: $$ \widehat{V}_{boot}(\hat{\theta}) = \frac{1}{B-1} \sum_{b=1}^{B} \left( \hat{\theta}^{*(b)} - \bar{\theta}^{*} \right)^2. $$
Multiple estimators
The estimator argument accepts any function with signature
f(y, weights) (complex design) or f(y) (simple design),
including functions from this package and user-defined ones.
When estimator returns a named numeric vector, variances are
computed for all outputs simultaneously from the same bootstrap replicates,
so the resulting standard errors are directly comparable across indicators.
For a convenience wrapper that automatically computes all package inequality
indicators and their standard errors in a single call, see inequantiles.
References
Rao J, Wu C (1988). “Resampling inference with complex survey data.” Journal of the American Statistical Association, 83, 231–241.
Rao J, Wu C, Yue K (1992). “Some recent work on resampling methods for complex surveys.” Survey methodology, 18, 209–217.
Kolenikov S (2010). “Resampling variance estimation for complex survey data.” The Stata Journal, 10, 165–199.
Scarpa S, Ferrante MR, Sperlich S (2025). “Inference for the quantile ratio inequality index in the context of survey data.” Journal of Survey Statistics and Methodology. doi:10.1093/jssam/smaf024 .
Examples
# \donttest{
data(synthouse)
# ================================================================
# Example 1: Stratified Simple Random Sampling (SRS)
# ================================================================
# Use NUTS2 as strata
set.seed(123)
# Simulate population sizes per stratum (for FPC)
N_values <- sample(2000:5000, length(unique(synthouse$NUTS2)), replace = TRUE)
names(N_values) <- sort(unique(synthouse$NUTS2))
# Define a simple mean estimator
mean_estimator <- function(y) mean(y, na.rm = TRUE)
# Apply the rescaled bootstrap under stratified SRS
boot_srs <- rescaled_bootstrap(
data = synthouse,
y = "eq_income",
strata = "NUTS2",
N_h = N_values,
estimator = mean_estimator,
by_strata = TRUE,
B = 50, # small number for illustration
seed = 123,
verbose = FALSE
)
# View results
boot_srs$variance
#> C01 C02 C03 C04 C05 C06 N01 N02
#> 499863.9 573118.2 671053.7 545446.5 775512.9 475065.6 341097.8 350873.9
#> N03 N04 N05 N06 NE01 NE02 NE03 NE04
#> 307397.0 232378.7 320968.3 398986.7 509534.9 371716.5 416917.1 551225.8
#> NE05 NE06 NO01 NO02 NO03 NO04 NO05 NO06
#> 447320.5 337993.0 374534.2 305472.6 758880.7 398586.5 782366.0 1053463.7
#> S01 S02 S03 S04 S05 S06
#> 257089.8 416672.4 301026.8 420146.3 342398.1 536383.0
# ================================================================
# Example 2: Two-stage Complex Design
# ================================================================
# Estimate the QRI estimator sampling variance.
boot_complex <- rescaled_bootstrap(
data = synthouse,
y = "eq_income",
strata = "NUTS2",
psu = "municipality",
weights = "weight",
estimator = qri,
by_strata = TRUE,
B = 50,
seed = 456,
verbose = FALSE
)
# Display variance and bootstrap estimates
summary(boot_complex$variance)
#> Min. 1st Qu. Median Mean 3rd Qu. Max.
#> 0.0003333 0.0006152 0.0007907 0.0008676 0.0010729 0.0016656
# Strata and PSU summary
# ================================================================
# Example 3: Multiple estimators in a single bootstrap loop
# ================================================================
# Create a function returning a named vector of estimates,
# including package functions and user-defined ones. All indicators share
# the same bootstrap replicates, ensuring directly comparable standard errors.
multi_estimator <- function(y, weights) {
c(
w_mean = sum(y * weights) / sum(weights), # custom: weighted mean
qri = qri(y, weights = weights), # package function
qsr = qsr(y, weights = weights) # package function
)
}
boot_multi <- rescaled_bootstrap(
data = synthouse,
y = "eq_income",
strata = "NUTS2",
psu = "municipality",
weights = "weight",
estimator = multi_estimator,
by_strata = FALSE,
B = 50,
seed = 42,
verbose = FALSE
)
# One variance per indicator, all from the same replicates
boot_multi$variance
#> w_mean qri qsr
#> 1.292902e+05 2.151047e-05 3.087562e-02
# }
# ================================================================
# Note:
# These examples use small B for speed. For actual analysis,
# use B >= 200 for stable estimates.
# ================================================================