Computes the Gini coefficient from grouped income data based on linear interpolation of income shares.
Value
A numeric value representing the estimated Gini coefficient on grouped data. The Gini coefficient ranges from 0 (perfect equality) to 1 (complete inequality). Note that it assumes equality within groups.
Details
Consider grouped data divided into \(L\) classes with known boundaries, observed frequencies \(f_1, \ldots, f_L\) and total amounts \(Y_1, \ldots, Y_J\). The Gini coefficient is approximated by linear interpolation of cumulative shares, as:
$$G \approx 1 - \sum_{j=1}^{J} (s_j + s_{j-1})(u_j - u_{j-1})$$
where:
\(p_j = f_j / \sum_{i=1}^{J} f_i\) is the population share of group \(j\);
\(c_j = Y_j / \sum_{i=1}^{J} Y_i\) is the share of the variable of interest in group \(j\);
\(s_j = \sum_{k=1}^{j} c_k\) is the cumulative share of the variable up to group \(j\);
\(u_j = \sum_{k=1}^{j} p_k\) is the cumulative population share up to group \(j\);
\(s_0 = u_0 = 0\) by convention.
This formula computes twice the area between the egalitarian line (perfect equality) and the Lorenz curve obtained by linearly interpolating the points \((u_j, s_j)\). Since it assumes all observations within a group have identical values, it provides a lower-bound estimate of the true Gini coefficient – actual inequality may be larger (Jorda et al. 2021) .
Note
Important limitations:
This is a lower bound approximation. The true Gini coefficient may be higher due to within-group inequality.
The bias magnitude depends on the number of groups and how they are defined.
Groups should ideally be defined to maximize within-group homogeneity.
References
Jorda V, Sarabia JM, Jäntti M (2021). “Inequality measurement with grouped data: Parametric and non-parametric methods.” Journal of the Royal Statistical Society Series A: Statistics in Society, 184(3), 964–984.
See also
qri_grouped for computing the quantile ratio index from grouped data.
Other grouped data functions:
qri_grouped(),
quantile_grouped()