Computes quantiles for weighted or unweighted data, allowing for sampling weights and several interpolation types. The method extends the standard quantile definitions of Hyndman and Fan (1996) and Harrell and Davis (1982) estimator to the case of complex survey data by incorporating sampling weights into the cumulative distribution function and interpolation points, as proposed in Scarpa et al. (2025) .
Usage
csquantile(y, weights = NULL, probs = seq(0, 1, 0.1), type = 4, na.rm = FALSE)Arguments
- y
Numeric vector of observations.
- weights
Optional numeric vector of sampling weights; if
NULL(default), all observations are equally weighted.- probs
Numeric vector of probabilities (default:
seq(0, 1, 0.1)).- type
Quantile estimation type: integer
4–9or"HD"for Harrell–Davis (default:4).- na.rm
Logical; if
TRUE, missing values are removed before computing. Default:FALSE. Higher-level functions in this package handleNAremoval before callingcsquantile(), so the default is keptFALSEto avoid redundant filtering.
Details
Consider a random sample \(s\) of size \(n\). Let \(y_1, \ldots, y_n\) be the sample observations from a finite population, with order statistics \(y_{(1)} \le \ldots \le y_{(n)}\) and corresponding sampling weights \(w_1, \ldots, w_n\). Define the cumulative weights \(W_j = \sum_{i \le j} w_i\) and the total weight \(W_n = \sum_{i=1}^n w_i\). The weighted quantile estimator is computed as a linear interpolation between adjacent order statistics:
$$ \widehat{Q}(p) = y_{(k-1)} + (y_{(k)} - y_{(k-1)}) \frac{p - \widehat{r}_{k-1}}{\widehat{r}_k - \widehat{r}_{k-1}}, $$
where \(\widehat{r}_k\) denotes the estimated cumulative distribution function (the “plotting position”), and the order \(k\) is such that \(W_{k-1} - m_{k-1} < W_n p < W_k - m_k\), with \(m_k\) determined by the interpolation method.
The table below summarizes the six interpolation types (4–9) extended from Hyndman and Fan (1996) to incorporate sampling weights, as described in Scarpa et al. (2025) .
| Type | Estimator \(\widehat{r}_k\) | Interpolation \(\widehat{m}_k\) | Selection rule for \(k\) |
| 4 | \(W_k / W_n\) | 0 | \(W_{k-1} \le W_n p < W_k\) |
| 5 | \((W_k - \tfrac{1}{2} w_k) / W_n\) | \(w_k / 2\) | \(W_{k-1} - \tfrac{w_{k-1}}{2} \le W_n p < W_k - \tfrac{w_k}{2}\) |
| 6 | \(W_k / (W_n + w_n)\) | \(w_n p\) | \(W_{k-1} \le (W_n + w_n)p < W_k\) |
| 7 | \(W_{k-1} / W_{n-1}\) | \(w_k - w_n p\) | \(W_{k-2} \le W_{n-1}p < W_{k-1}\) |
| 8 | \((W_k - \tfrac{1}{3}w_k) / (W_n + \tfrac{w_n}{3})\) | \(\tfrac{w_k}{3} + \tfrac{w_n}{3}p\) | \(W_{k-1} - \tfrac{w_{k-1}}{3} \le (W_n - \tfrac{w_n}{3})p < W_k - \tfrac{w_k}{3}\) |
| 9 | \((W_k - \tfrac{3}{8}w_k) / (W_n + \tfrac{1}{4}w_n)\) | \(\tfrac{3}{8}w_k + \tfrac{w_n}{4}p\) | \(W_{k-1} - \tfrac{3w_{k-1}}{8} \le (W_n + \tfrac{w_n}{4})p < W_k - \tfrac{3w_k}{8}\) |
The function supports several interpolation rules (types 4–9) and extends the quantile definitions in Hyndman and Fan (1996) to incorporate sampling weights. For unweighted data, the function returns the standard R quantiles.
The Harrell–Davis estimator ("HD") is extended to the weighted case as proposed in Kreutzmann (2018) , by redefining the weighting coefficients \(\widehat{\mathcal{W}}_j(p)\) for order statistics as:
$$ \widehat{\mathcal{W}}_j(p) = b_{(W_j / W_n)}\{(W_n + w_n)p, W_n - (W_n + w_n)p + w_n\} - b_{(W_{j-1}/W_n)}\{(W_n + w_n)p, W_n - (W_n + w_n)p + w_n\}, $$
where \(b_x(a,b)\) denotes the incomplete beta function.
The resulting quantile estimator is \(\widehat{Q}_{HD}(p) = \sum_{j \in s} \widehat{\mathcal{W}}_j(p) y_{(j)}.\) For unweighted data, the function returns the Harrell-Davis quantile estimator.
References
Harrell FE, Davis CE (1982). “A new distribution-free quantile estimator.” Biometrika, 69, 635–640.
Hyndman RJ, Fan Y (1996). “Sample quantiles in statistical packages.” The American Statistician, 50, 361–365.
Kreutzmann AK (2018). “Estimation of sample quantiles: challenges and issues in the context of income and wealth distributions.” AStA Wirtschafts-und Sozialstatistisches Archiv, 12, 245–270.
Scarpa S, Ferrante MR, Sperlich S (2025). “Inference for the quantile ratio inequality index in the context of survey data.” Journal of Survey Statistics and Methodology. doi:10.1093/jssam/smaf024 .
Examples
data(synthouse)
y <- synthouse$eq_income
w <- synthouse$weight
# Unweighted quantiles
csquantile(y, probs = c(0.25, 0.5, 0.75), type = 6)
#> 25% 50% 75%
#> 12910.48 20429.20 32529.02
# Weighted quantiles
csquantile(y, weights = w, probs = c(0.25, 0.5, 0.75), type = 6)
#> 25% 50% 75%
#> 12353.29 20014.17 32222.93
# Harrell-Davis estimator
csquantile(y, weights = w, probs = c(0.25, 0.5, 0.75), type = "HD")
#> 25% 50% 75%
#> 12352.14 20017.37 32213.50