Computes the influence function of sample quantiles, allowing for both simple random sampling and complex survey designs with sampling weights, in the context of finite population. See Hampel et al. (1986) for an explanation of influence function and Deville (1999) for its definition in finite population theory.
Arguments
- y
A numeric vector of data values
- weights
A numeric vector of sampling weights (optional)
- probs
A numeric value specifying the probability for the quantile (e.g., 0.5 for median)
- type
Quantile estimation type: integer 4-9 or "HD" for Harrell-Davis (default: 6)
- na.rm
Logical, should missing values be removed? (default: TRUE)
Value
A numeric vector containing the estimtaed quantile influence function values for each observation.
Details
From the definiton in Van der Vaart (2000) and (Osier 2009) , the population influence function of the quantile \(Q(p)\) is defined as:
$$IF(Q(p))_i = \frac{p - \mathbf{1}(y_i \leq Q(p))}{f(Q(p)) \, N},$$
where \(f(Q(p))\) is the population density function evaluated at the quantile and \(N\) is the population size.
In the sample, this is estimated as:
$$\widehat{IF}(Q(p))_i = \frac{p - \mathbf{1}(y_i \leq \widehat{Q}(p))}{\widehat{f}(\widehat{Q}(p)) \, \widehat{N}},$$
where \(\widehat{Q}(p)\) is the weighted sample quantile estimated by
csquantile(), and \(\widehat{N} = \sum_{i \in s} w_i\) is the estimated population size.
The density \(\widehat{f}(y)\) is estimated using a Gaussian kernel density function:
$$ \widehat{f}(y) = \frac{1}{\widehat{N} \, h \sqrt{2\pi}} \sum_{j \in s} w_j \exp\!\left\{ -\frac{(y - y_j)^2}{2h^2} \right\}, $$
with bandwidth \(h = 0.79 \cdot IQR \cdot \widehat{N}^{-1/5}\)
References
Hampel FR, Ronchetti E, Rousseeuw P, Stahel W (1986). Robust statistics: the approach based on influence functions. John Wiley & Sons.
Deville J (1999). “Variance estimation for complex statistics and estimators: linearization and residual techniques.” Survey methodology, 25, 193–204.
Van der Vaart AW (2000). Asymptotic statistics, volume 3. Cambridge University Press.
Osier G (2009). “Variance estimation for complex indicators of poverty and inequality using linearization techniques.” Survey Research Methods, 3, 167–195.
Examples
# On synthetic data
eq_synth <- rlnorm(30, 9, 0.7)
IF_synth <- if_quantile(y = eq_synth, probs = 0.3)
# On real data
data(synthouse)
eq <- synthouse$eq_income[1:30] ## Take some observations (as example)
w <- synthouse$weight[1:30]
IF_quantile <- if_quantile(y = eq, weights = w, type = 6, probs = 0.5)