Skip to contents

Computes the Gini coefficient from grouped income data based on linear interpolation of income shares.

Usage

gini_grouped(Y, freq)

Arguments

Y

Numeric vector of total amounts per group (e.g., total income per income class)

freq

Numeric vector of frequencies per group or class.

Value

A numeric value representing the estimated Gini coefficient on grouped data. The Gini coefficient ranges from 0 (perfect equality) to 1 (complete inequality). Note that it assumes equality within groups.

Details

Consider grouped data divided into \(J\) classes with known boundaries, observed frequencies \(f_1, \ldots, f_J\) and total amounts \(Y_1, \ldots, Y_J\). The Gini coefficient is approximated by linear interpolation of cumulative shares, as:

$$G \approx 1 - \sum_{j=1}^{J} (s_j + s_{j-1})(u_j - u_{j-1})$$

where:

  • \(p_j = f_j / \sum_{i=1}^{J} f_i\) is the population share of group \(j\);

  • \(c_j = Y_j / \sum_{i=1}^{J} Y_i\) is the share of the variable of interest in group \(j\);

  • \(s_j = \sum_{k=1}^{j} c_k\) is the cumulative share of the variable up to group \(j\);

  • \(u_j = \sum_{k=1}^{j} p_k\) is the cumulative population share up to group \(j\);

  • \(s_0 = u_0 = 0\) by convention.

This formula computes twice the area between the egalitarian line (perfect equality) and the Lorenz curve obtained by linearly interpolating the points \((u_j, s_j)\). Since it assumes all observations within a group have identical values, it provides a lower-bound estimate of the true Gini coefficient, actual inequality may be larger (Jorda et al. 2021) . The bias magnitude depends on the number of groups and how they are defined.

References

Jorda V, Sarabia JM, Jäntti M (2021). “Inequality measurement with grouped data: Parametric and non-parametric methods.” Journal of the Royal Statistical Society Series A: Statistics in Society, 184(3), 964–984.

See also

qri_grouped for computing the quantile ratio index from grouped data.

Other grouped data functions: qri_grouped(), quantile_grouped()

Examples

income_freq <- c(1200, 1800, 1500, 800, 400, 20, 10)
income_tot <- c(18800, 16300, 44700, 33900, 21500, 22100, 98300)

gini_grouped(Y = income_tot, freq = income_freq)
#> [1] 0.6201874