Skip to contents

Computes quantiles for weighted or unweighted data, allowing for sampling weights and several interpolation types. The method extends the standard quantile definitions of Hyndman and Fan (1996) and Harrell and Davis (1982) estimator to the case of complex survey data by incorporating sampling weights into the cumulative distribution function (CDF) and interpolation points, as proposed in Scarpa et al. (2025) .

Usage

csquantile(y, weights = NULL, probs = seq(0, 1, 0.1), type = 4, na.rm = FALSE)

Arguments

y

Numeric vector of observations.

weights

Optional numeric vector of sampling weights (default: NULL for equal weights)

probs

Numeric vector of probabilities (default = seq(0, 1, 0.1)).

type

Quantile estimation algorithm: integer 4-9 or "HD" for Harrell-Davis (default: 4)

na.rm

Logical indicating whether to remove NA values (default: TRUE)

Value

A named numeric vector of estimated quantiles corresponding to probs.

Details

Consider a random sample \(s\) of size \(n\). Let \(Y_1, \ldots, Y_n\) be the sample observations from a finite population, with order statistics \(Y_{(1)} \le \ldots \le Y_{(n)}\) and corresponding sampling weights \(w_1, \ldots, w_n\). Define the cumulative weights \(W_j = \sum_{i \le j} w_i\) and the total weight \(W_n = \sum_{i=1}^n w_i\). The weighted quantile estimator is computed as a linear interpolation between adjacent order statistics:

$$ \widehat{Q}(p) = Y_{(k-1)} + (Y_{(k)} - Y_{(k-1)}) \frac{p - \widehat{r}_{k-1}}{\widehat{r}_k - \widehat{r}_{k-1}}, $$

where \(\widehat{r}_k\) denotes the estimated cumulative distribution function (the “plotting position”), and the order \(k\) is such that \(W_{k-1} - m_{k-1} < W_n p < W_k - m_k\), with \(m_k\) determined by the interpolation method.

The function supports several interpolation rules (types 4–9), extending the quantile definitions in Hyndman and Fan (1996) to incorporate sampling weights.

The table below summarizes the six interpolation types (4–9) extended from Hyndman and Fan (1996) to incorporate sampling weights, as described in Scarpa et al. (2025) .

TypeEstimator \(\widehat{r}_k\)Interpolation \(\widehat{m}_k\)Selection rule for \(k\)
4\(W_k / W_n\)0\(W_{k-1} \le W_n p < W_k\)
5\((W_k - \tfrac{1}{2} w_k) / W_n\)\(w_k / 2\)\(W_{k-1} - \tfrac{w_{k-1}}{2} \le W_n p < W_k - \tfrac{w_k}{2}\)
6\(W_k / (W_n + w_n)\)\(w_n p\)\(W_{k-1} \le (W_n + w_n)p < W_k\)
7\(W_{k-1} / W_{n-1}\)\(w_k - w_n p\)\(W_{k-2} \le W_{n-1}p < W_{k-1}\)
8\((W_k - \tfrac{1}{3}w_k) / (W_n + \tfrac{w_n}{3})\)\(\tfrac{w_k}{3} + \tfrac{w_n}{3}p\)\(W_{k-1} - \tfrac{w_{k-1}}{3} \le (W_n - \tfrac{w_n}{3})p < W_k - \tfrac{w_k}{3}\)
9\((W_k - \tfrac{3}{8}w_k) / (W_n + \tfrac{1}{4}w_n)\)\(\tfrac{3}{8}w_k + \tfrac{w_n}{4}p\)\(W_{k-1} - \tfrac{3w_{k-1}}{8} \le (W_n + \tfrac{w_n}{4})p < W_k - \tfrac{3w_k}{8}\)

For unweighted data, the function returns the standard R quantiles.

The Harrell–Davis estimator ("HD") is also extended to the weighted case as proposed in Kreutzmann (2018), by redefining the weighting coefficients \(\widehat{\mathcal{W}}_j(p)\) for order statistics as:

$$ \widehat{\mathcal{W}}_j(p) = b_{(W_j / W_n)}\{(W_n + w_n)p, W_n - (W_n + w_n)p + w_n\} - b_{(W_{j-1}/W_n)}\{(W_n + w_n)p, W_n - (W_n + w_n)p + w_n\}, $$

where \(b_x(a,b)\) denotes the incomplete beta function.

The resulting quantile estimator is \(\widehat{Q}_{HD}(p) = \sum_{j \in s} \widehat{\mathcal{W}}_j(p) Y_{(j)}.\) For unweighted data, the function returns the Harrell & Davis quantile estimator.

References

Harrell FE, Davis CE (1982). “A new distribution-free quantile estimator.” Biometrika, 69, 635–640.

Hyndman RJ, Fan Y (1996). “Sample quantiles in statistical packages.” The American Statistician, 50, 361–365.

Scarpa S, Ferrante MR, Sperlich S (2025). “Inference for the quantile ratio inequality index in the context of survey data.” Journal of Survey Statistics and Methodology. doi:10.1093/jssam/smaf024 .

Examples


data(synthouse)

y <- synthouse$eq_income ### Income data

# Compute unweighted quantiles (default: type = 4)
csquantile(y, probs = c(0.25, 0.5, 0.75), type = 6)
#>      25%      50%      75% 
#> 12910.48 20429.20 32529.02 

# Consider the sampling weights
w <- synthouse$weight
csquantile(y, weight = w, probs = c(0.25, 0.5, 0.75), type = 6)
#>      25%      50%      75% 
#> 12353.29 20014.17 32222.93