Skip to contents

Computes the Gini coefficient from grouped income data based on linear interpolation of income shares.

Usage

gini_grouped(Y, freq)

Arguments

Y

Numeric vector of total amounts per group. Each element represents the sum of the variable of interest (e.g. income, wealth, consumption) for all individuals or units in that group.

freq

Numeric vector of frequencies (number of individuals or units) per group. Must have the same length as Y.

Value

A numeric value representing the estimated Gini coefficient on grouped data. The Gini coefficient ranges from 0 (perfect equality) to 1 (complete inequality). Note that it assumes equality within groups.

Details

Consider grouped data divided into \(L\) classes with known boundaries, observed frequencies \(f_1, \ldots, f_L\) and total amounts \(Y_1, \ldots, Y_J\). The Gini coefficient is approximated by linear interpolation of cumulative shares, as:

$$G \approx 1 - \sum_{j=1}^{J} (s_j + s_{j-1})(u_j - u_{j-1})$$

where:

  • \(p_j = f_j / \sum_{i=1}^{J} f_i\) is the population share of group \(j\);

  • \(c_j = Y_j / \sum_{i=1}^{J} Y_i\) is the share of the variable of interest in group \(j\);

  • \(s_j = \sum_{k=1}^{j} c_k\) is the cumulative share of the variable up to group \(j\);

  • \(u_j = \sum_{k=1}^{j} p_k\) is the cumulative population share up to group \(j\);

  • \(s_0 = u_0 = 0\) by convention.

This formula computes twice the area between the egalitarian line (perfect equality) and the Lorenz curve obtained by linearly interpolating the points \((u_j, s_j)\). Since it assumes all observations within a group have identical values, it provides a lower-bound estimate of the true Gini coefficient – actual inequality may be larger (Jorda et al. 2021) .

Note

Important limitations:

  • This is a lower bound approximation. The true Gini coefficient may be higher due to within-group inequality.

  • The bias magnitude depends on the number of groups and how they are defined.

  • Groups should ideally be defined to maximize within-group homogeneity.

References

Jorda V, Sarabia JM, Jäntti M (2021). “Inequality measurement with grouped data: Parametric and non-parametric methods.” Journal of the Royal Statistical Society Series A: Statistics in Society, 184(3), 964–984.

See also

qri_grouped for computing the quantile ratio index from grouped data.

Other grouped data functions: qri_grouped(), quantile_grouped()

Examples

income_freq <- c(1200, 1800, 1500, 800, 400, 20, 10)
income_tot <- c(18800, 16300, 44700, 33900, 21500, 22100, 98300)

gini_grouped(Y = income_tot, freq = income_freq)
#> [1] 0.6201874