Influence Function for Quantiles

Computes the influence function of sample quantiles, allowing for both simple random sampling and complex survey designs with sampling weights, in the context of finite population. See Hampel et al. (1986) for an explanation of influence function and Deville (1999) for its definition in finite population theory.

Usage

if_quantile(y, weights = NULL, probs, type = 6, na.rm = TRUE)

Arguments

y: A numeric vector of data values
weights: A numeric vector of sampling weights (optional)
probs: A numeric value specifying the probability for the quantile (e.g., 0.5 for median)
type: Quantile estimation type: integer 4–9 or "HD" for Harrell–Davis (default: 6). See csquantile.
na.rm: Logical, should missing values be removed? (default: TRUE)

Value

A numeric vector containing the estimated influence function values for each observation.

Details

From the definition in Van der Vaart (2000) and (Osier 2009) , the population influence function of the quantile $Q(p)$ is defined as:

$$IF(Q(p))_k = \frac{p - \mathbf{1}(y_k \leq Q(p))}{f(Q(p)) \, N},$$

where $f(Q(p))$ is the population density function evaluated at the quantile and $N$ is the population size.

In the sample, this is estimated as:

$$\widehat{IF}(Q(p))_k = \frac{p - \mathbf{1}(y_k \leq \widehat{Q}(p))}{\widehat{f}(\widehat{Q}(p)) \, \widehat{N}},$$

where $\widehat{Q}(p)$ is the weighted sample quantile estimated by csquantile(), and $\widehat{N} = \sum_{i \in s} w_i$ is the estimated population size.

The density $\widehat{f}(y)$ is estimated using a Gaussian kernel density function:

$$ \widehat{f}(y) = \frac{1}{\widehat{N} \, h \sqrt{2\pi}} \sum_{j \in s} w_j \exp\!\left\{ -\frac{(y - y_j)^2}{2h^2} \right\}, $$

with bandwidth $h = 0.79 \cdot IQR \cdot \widehat{N}^{-1/5}$

References

Hampel FR, Ronchetti E, Rousseeuw P, Stahel W (1986). Robust statistics: the approach based on influence functions. John Wiley & Sons.

Deville J (1999). “Variance estimation for complex statistics and estimators: linearization and residual techniques.” Survey methodology, 25, 193–204.

Van der Vaart AW (2000). Asymptotic statistics, volume 3. Cambridge University Press.

Osier G (2009). “Variance estimation for complex indicators of poverty and inequality using linearization techniques.” Survey Research Methods, 3, 167–195.

Examples


# On synthetic data
eq_synth <- rlnorm(30, 9, 0.7)
IF_synth <- if_quantile(y = eq_synth, probs = 0.3)

# On real data
data(synthouse)
eq <- synthouse$eq_income[1:30] # First 30 observations
w <- synthouse$weight[1:30]
IF_quantile <- if_quantile(y = eq, weights = w, type = 6, probs = 0.5)