1 Introduction

Although there is quite a consensus on the need for broadening the scope of the analysis of Well-being beyond the monetary dimension (see, e.g., the influential report by Stiglitz et al. 2010), there is not equal agreement on how such an ambitious task should be operationalized. It is well known that subjectivity and arbitrariness exist with respect to the choice of the dimensions to be included in the composite index, the normalisation of the variables, and the characterisation of the aggregation function (see, e.g., Ravallion 2012a; Decancq and Lugo 2013; Martinetti and von Jacobi 2012).Footnote 1 The socio-economic literature highlighted that no unanimous method exists to perform such choices, pointing out numerous theoretical issues (Stiglitz et al. 2010; Ravallion 2011, 2012a; Klugman et al. 2011; Maggino and Nuvolati 2012; Decancq and Lugo 2013), testing empirical robustness (Kasparian and Rolland 2012; Lefebvre et al. 2010; Saisana et al. 2005; Ravallion 2012b). Yet, although there may be no “absolute cure” for multidimensional evaluations, a good practice could consist in enhancing methodological transparency (Sen and Anand 1997).

While the major focus of the recent literature has been devoted to the choice of the dimensions’ weights, few studies have concentrated on the role played by normalisation in influencing the final results (Lefebvre et al. 2010; Saisana et al. 2005). Our contribution highlights that, in fact, normalisation is a crucial stage where an “early” implicit weighting takes place, which can strongly affect the overall results of the multidimensional analysis. We show that, since no golden rules exist on how a normalisation function should be selected and characterised, different strategies, all acceptable a-priori, can lead to very different weighting structures and, ultimately, opposite results for the composite indicator. Therefore, the unavoidable arbitrariness regarding the choice of the normalisation function, as well as its methodological justification, should be made as transparent as possible.

To illustrate these points, in this paper we will build a composite measure of Social Inclusion for 63 European administrative regions from 2004 to 2012, using data from EUROSTAT. The aggregation framework is a CES function (constant elasticity of substitution), and the selection of variables follows the relevant literature on this topic (stemming from Atkinson et al. 2002).Footnote 2

In this analysis, we adopt a baseline linear aggregation model where the normalised components have equal weights and we look at what happens to the aggregate measure of Social Inclusion when the sole normalisation function changes. In particular, we apply the widely used data-driven min–max normalisation strategy, whose parameters depend on the available data. This data-driven function generates implicit trade-offs (between the index’ components) and shadow prices with weak economic justification. We also propose a novel strategy, an expert-based min–max function, whose parameters are grounded on the responses to a survey conducted on a population of 150 professors of Economics or Management at the Ca’ Foscari University of Venice.

Our results indicate that, even within a simple-average framework, changing the normalisation function substantially affects the relative relevance of each component of the aggregate measure. As a consequence, significant differences emerge in the levels and rankings of regional Social Inclusion in Europe, leading to very different policy implications. The data-driven strategy softens the heterogeneities within and between European countries by putting a substantial weight on the longevity variable rather than on educational and economic statuses. As a result, the European regional distribution of Inclusion appears to be uni-modal around the mean. Conversely, the expert-based normalisation emphasises the unemployment and the school-dropouts variables, and returns a bi-modal distribution of Social Inclusion. We, thus, discuss how the different premises of the two strategies characterise the interpretation of the results: the data-driven approach allows for a positive interpretation of the index, while the survey-driven approach allows for a normative one. In other words, if the index’ intrinsic trade-offs are grounded on statistical terms, its results should be interpreted accordingly.

The remaining of the paper is organized as follows. Section 2 briefly describes the concept of Social Inclusion and the data. Section 3 sets a standard framework for multidimensional aggregation and details the baseline model. Section 4 introduces the normalisation strategies, while Sect. 5 discuss the implicit trade-offs resulting from applying the aforementioned normalisation functions on the baseline model. Section 6 details the results of the Social Inclusion indices, Sect. 6.1 concludes.

2 Social Inclusion, Definition and Sample Selection

Social inclusion (as its corresponding opposite, social exclusion)Footnote 3 is one of the five priorities selected by the European Commission in the context of the Europe 2020 Strategy. A definition of Exclusion was already drawn in December 1992 by the Commission of the European Communities (European Communities Commission 1992): “Social exclusion is a multidimensional phenomenon stemming from inadequacies or weaknesses in the services offered and policies pursued in these various policy areas. Such insufficiencies and weaknesses often combine to affect both people and regions via cumulative and interdependent processes of such a nature that it would be futile to try to combat exclusion by tackling only one of its dimensions. More clearly than the concept of poverty, (…) it states out the multidimensional nature of the mechanisms whereby individuals and groups are excluded from taking part in the social exchanges, from the component practices and rights of social integration and of identity”.

The Laeken European Council in 2001 has developed a set of unanimously agreed indicators that could capture the multifaceted aspects and outcomes of Social Inclusion, thus providing reliable and comparable data to monitor the social and economic conditions of European citizens (European Council 2001), through the Open Method of Coordination. In particular, four basic dimensions have been identified: the level and distribution of income, the performance in the labour-market, education and health areas. For each of them, a set of primary indicators were adopted: income (Poverty rate (after social transfers), Persistent risk-of-poverty rate, Relative median at risk-of-poverty gap, Inequality of income distribution); labour market (Long-term unemployment, Regional cohesion, Persons living in jobless households); education (early school leavers); health (life expectancy at birth, Self-defined health status by income level).Footnote 4

The target of this paper is to build an aggregate index of social exclusion at administrative-regional level in Europe. We choose administrative regions as the main territorial unit of this analysis, with the aim of capturing higher variability than it can be inferred from aggregate national data. Data-availability is often mentioned as a serious constraint for analyses which focus on a wide set of countries for a long time-period (Lefebvre et al. 2010; Martinetti and von Jacobi 2012). In the context of social exclusion at administrative regional level, we are able to gather data for four out of the 10 aforementioned indicators, one per dimension: poverty-rate, long-term unemployment, early school-leavers and life expectancy at birth. Our data-source is the on-line Eurostat Regional Database 2015, and the longest data interval available for all the four variables spans from year 2004 to 2012, for 63 administrative regions in five countries (Belgium, Denmark, Germany, Italy and Spain), even though data for Denmark are not available for 2006 and earlier. For other countries (e.g., Greece, France, Czech Republic and Norway), data were either not available for all the indicators, or they are available for statistical-regions, but not for administrative regions.Footnote 5

As argued in Lefebvre et al. (2010), “these indicators cover the most relevant concerns of a modern welfare state, also reflecting aspects that people who want to enlarge the concept of GDP to better measure social welfare generally take into account”. The latter referenced paper discusses, as do Atkinson et al. (2004), the limitations of these data and the necessary simplifying assumptions that have to be done when translating a complex multidimensional phenomenon like social exclusion in empirical terms. Table 1 provides a brief definition for our four variables:

Table 1 Variable definitions

The following table and figure report descriptive statistics on the four indicators in our sample (Table 2).

Table 2 Descriptive statistics for four indicators of social inclusion, years 2004–2012

Appendix A includes further descriptive statistics on correlations and data-distribution of the selected variables in the sample-data.

3 Aggregation Framework

Let us consider m dimensions (hereinafter also variables, attributes, components) of Social Inclusion, observed for n regions. For a generic region i we can therefore build the vector x \(^{i}\) \(=\) (\(x_{\mathrm {1}}^{i}\), ..., \(x_{m}^{i})\), while \(\mathbf{X}\in {\mathbb {R}}^{n\times m}\) is the distribution matrix of m attributes for n regions. To retrieve an aggregated measure for region i, we consider the function F defined as:

$$\begin{aligned} {F^i}\left( {v\left( {{\mathbf{{x}}^i}} \right) } \right) = {\left[ {{w_\mathrm{{1}}}{v_\mathrm{{1}}}{{\left( {x_1^i} \right) }^\beta } + \cdots + {w_m}{v_m}{{\left( {x_m^i} \right) }^\beta }} \right] ^{1/\beta }} \end{aligned}$$
(1)

which is often referred to as a constant elasticity of substitution (CES) function, or a generalized mean of order \(\beta \). Its arguments are the elements \(v_{1}, \ldots , v_{m}\) which are transformations of the original variables \(x_{1}{,\ldots ,x}_{m} \)(defined hereafter). The function F is non-decreasing, separable, weakly scale-invariant and homogenous of degree-one in its arguments v;  we refer to Blackorby and Donaldson (1982) and (Decancq and Lugo 2009, 2008) for an analytic characterization of these properties.

The parameters \(w_{1},\ldots ,w_{m}\), the weights of the normalised dimensions v, are non-negative and sum to one.

Provided that a choice of the m dimension has been performed, the main methodological task is now the selection of the set of functions \(v_{1}{,\ldots ,v}_{m},\) of parameters \(w_{1}{,\ldots w}_{m}\), as well as of \(\beta \).

3.1 Baseline Linear Aggregation Model

The parameter \(\beta \) in (1) determines the elasticity of substitution \(\upvarepsilon _{\mathrm {k,j}}\) between any pair (\(v_{k}, v_{j})\). In the CES function, the elasticity between any pair k, j is constant and equal to 1/1\(-\beta \). The elasticity of substitution determines the percentage change in \(v_{j}/v_{k,}\) which would result from a percentage change in the slope along a level-set (the marginal rate of substitution, MRS, along an indifference curve). The parameter \(\beta \) must be lower than one to generate iso-inclusion contours convex to the origin in the two-dimensional region of the space of attributes (Bourguignon and Chakravarty 2003). The smaller is \(\beta \), the higher is the increase in dimension \(v_{j}\) needed to keep constant the overall index after a one-unit decrease in dimension \(v_{k}\).

Since the focus of this paper is on the normalisation choices, let us adopt a standard aggregation framework by setting \(\beta =1\) in (1), therefore obtaining a linear weighted average with linear indifference curves, constant MRS and infinite elasticity of substitution between pairs of normalised dimensions. We also let the weights \(w_{j}\) be equal, i.e., \(w_{1} = \cdots = w_{m} =\,\)1/m \(\,=\,\)1/4 (since \(m=\,\)4 in our case study). The resulting model will be, for a generic region i (time subscripts are omitted) an aggregation function L, as in linear, such as:

$$\begin{aligned} L^{i}\left( {\nu \left( {\mathbf{x}}^{i} \right) } \right) =\frac{1}{m}\nu _{1} \left( {x_{1}^{i} } \right) +\cdots +\frac{1}{m}\nu _{m} \left( {x_{m}^{i} } \right) \end{aligned}$$
(2)

The arbitrary choice of setting equal weights is a widely adopted strategy in the literature of multidimensional measurement. As Hoskins and Mascherini (2009) and Decancq and Lugo (2013) highlight, this approach is often justified with the argument that all the dimensions are equally important (Atkinson et al. 2002) or, conversely, that there is insufficient knowledge for setting a more detailed weighting scheme (sometimes referred to as an “agnostic view”). Although being frequently described as a simple and relatively neutral strategy, “equal weighting” does not mean “no weighting”, because it involves an implicit judgment on the weights being equal, and because it often applies just on the normalised dimensions of the index.Footnote 6

In the following Sections, we will investigate how original attributes contribute to the overall measure, and what characterizes the relationship between attributes within the linear framework. In general, such effects can be retrieved via the partial derivative of the aggregate measure L with respect to variable \(x_{j}\) (region-specific indices are dropped for convenience), as follows:

$$\begin{aligned} \frac{\partial L\left( {v\left( {\mathbf{x}} \right) } \right) }{\partial x_{j} }=w_{j} {v}'_{j} \left( {x_{j} } \right) \end{aligned}$$
(3)

From (3) we can identify two main drivers that determine how the aggregate measure L reacts at small changes in the j-th real-valued dimension \(x_{j}\). First, the higher is the weight of the normalised j-th dimension, the higher will be the marginal variation in the L. Second, the steeper is the normalisation function, the higher will be the effect of a change in the j-th dimension on the aggregate measure.

Within the linear aggregation function L, the marginal rate of substitution between a pair of observed-indicators \(x_{j}\) and \(x_{k}\) is:

$$\begin{aligned} MRS_{x_{k},x_{j} } =-\frac{dx_{j} }{dx_{k} }=\frac{\frac{\partial L\left( {v\left( {\mathbf{x}} \right) } \right) }{\partial x_{k} }}{\frac{\partial L\left( {v\left( {\mathbf{x}} \right) } \right) }{\partial x_{j} }}=\frac{w_{k} }{w_{j} }\frac{{v}'_{k} \left( {x_{k} } \right) }{{v}'_{j} \left( {x_{j} } \right) } \end{aligned}$$
(4)

Both the marginal contribution of the j-th attribute and its MRS depend on the shape of the normalisation function \(v_{j}\). If, however, the transformation function is the identity function (\(v_{j}(x_{j}) = x_{j})\), the effect of a change in \(x_{j}\) can be uniquely determined by its weight \(w_{j}\), while the MRS between a pair of dimensions j and k is determined by the ratio between their weights.Footnote 7

4 Normalisation Framework

Raw variables are usually observed and measured with different measurement units. The component \(v_{i}(x_{j}) \) is a weakly monotonic and continuous normalisation function that maps the values of the j-th variable \(x_{j}\) on the closed interval [0, 100], i.e., \(v_{j}(x_{j}) \in \) [0, 100]. Moreover, attributes might be, alternatively, positively or negatively related to the latent phenomenon, i.e., they may have a positive or negative polarity. Hence, in order to ensure comparability and monotonicity of any aggregation function, each variable must be normalised such that better performances in the j-th dimension correspond to non-lower values of \(v_{j}(x_{j})\) and therefore of the aggregated value L. In other words, each normalised variable should have a positive polarity. The normalisation function thus ensures that L is bounded between 0 and 100 when the weights w sum to one. In what follows we will briefly present two normalisation strategies, and we refer the reader to Giovannini et al. (2008) for their detailed description. The first one is “data-driven”, that is, a transformation whose characteristics are entirely determined by the data at hand. The second one, conversely, is defined through the elicitation of explicit value judgements.

4.1 Data-Driven Normalisation Function

The min–max normalisation function is widely used in the literature of multidimensional measures (see, e.g., Cherchye et al. 2007; Silva and Ferreira-Lopes 2013; Pinar et al. 2014; Mazziotta and Pareto 2015), as well as in the Human Development Index (Anand and Sen 1994) and in the OECD Better Life (Boarini and D’Ercole 2013).

For two given variables \(x_{+}\) (with positive polarity) and \(x_{-}\) (with negative polarity) observed in region i at a time t, the corresponding generic min–max normalised values \(\nu _{+}\) and \(\nu _{-}\) are defined as follows:

$$\begin{aligned} \nu _{MM+}^{i,t} \left( {x_{+}^{i,t} } \right)&=100*\frac{x^{i,t}-b_{+} \min \left( {x_{+} } \right) }{b_{+} \max \left( {x_{+} } \right) -b_{+} \min \left( {x_{+} } \right) }\nonumber \\ \nu _{-}^{i,t} \left( {x}^{i,t} \right)&=100*\frac{b_{-} \max \left( x \right) -x^{i,t}}{b_{-} \max \left( x \right) -b_{-} \min \left( x^{i,t} \right) }\nonumber \\ \nu _{MM+}^{i,t} \left( {x_{+}^{i,t} } \right)&=0\quad \mathrm{if}\;x_{+}^{i,t} \le b_{+} \min \left( {x_{+} } \right) \quad \nu _{-}^{i,t} \left( {x_{-}^{i,t} } \right) =0\quad \mathrm{if}\;x_{-}^{i,t} \ge b_{-} \max \left( {x_{-} } \right) \nonumber \\ \nu _{MM+}^{i,t} \left( {x_{+}^{i,t} } \right)&=100\,\,\mathrm{if}\;x_{+}^{i,t} \ge b_{+} \max \left( {x_{+} } \right) \quad \nu _{-}^{i,t} \left( {x_{-}^{i,t} } \right) =100\,\, \mathrm{if}\;x_{-}^{i,t} \le b_{-} \min \left( {x_{-} } \right) \end{aligned}$$
(5)

In the generic min–max function, the coefficients b\(_{{\pm }}\) min \(_{i}\) and b\(_{{\pm }}\) max \(_{i}\) are the highest and lowest values to be used as benchmarks for the \(x_{{\pm }} \)variable for region i. Regardless on how the benchmarks are defined, it is straightforward that, for \(x_{\mathrm {+}}\), b\(_{\mathrm {+}}\) max corresponds to a more desirable performance than b\(_{\mathrm {+}}\) min, while the opposite is true for \(x_{-}\). The min–max strategy    rescales indicators into an identical range [0, 100].Footnote 8 E.g., for \(x_{\mathrm {+}}\), 0 is given to values lower or equal to b\(_{+}\) min, while 100 is given to those higher or equal to b\(_{+}\) max. The values within these benchmarks are proportionally converted into the 0–100 scale. Hence, \(\nu \) is a stepwise continuous function.

The data-driven min–max normalisation (6) defines the benchmarks min and max as the best and worst observed performance among selected regions (Lefebvre et al. 2010; Silva and Ferreira-Lopes 2013; Murias et al. 2012), and across a time-series, in order to take into account the evolution of indicators and offer time-comparability (Giovannini et al. 2008). In our case study, this corresponds to assigning a value of 0 to the region which reports the worst-observed performance in the period from 2004 to 2012, while assigning a value of 100 to the “best-observed” one.

For each region i where an attribute x is observed at a time t, the corresponding normalised value \(\nu \) \(_{dM}^{i,t},(x^{i},{t})\), where the subscript dM stands for “data-driven min–max”, is determined as:

$$\begin{aligned} \nu _{dM + }^{i,t}\left( {x_ + ^{i,t}} \right) = 100*\frac{{x_ + ^{i,t} - \min \limits _{t \in T} \min \limits _i \left( {x_ + ^t} \right) }}{{\max \limits _{t \in T} \min \limits _i \left( {x_ + ^t} \right) - \min \limits _{t \in T} \min \limits _{i} \left( {x_ + ^t} \right) }}\nonumber \\ \mathrm{{or}}\;\nu _{dM - }^{i,t}\left( {x_ - ^{i,t}} \right) = 100*\frac{{\max \limits _{t \in T} \max \limits _i \left( {x_ - ^t} \right) - x_ - ^{i,t}}}{{\max \limits _{t \in T} \max \limits _i \left( {x_ - ^t} \right) - \min \limits _{t \in T} \min \limits _i \left( {x_ - ^t} \right) }} \end{aligned}$$
(6)

where x\(_{\mathrm {+}}\)and x\(_{\mathrm {-}}\) have the usual meaning of a “good” and a “bad” attribute, respectively.

Table 3 displays the data-driven thresholds for our sample of regions.Footnote 9

Table 3 Data-driven benchmarks

4.2 A Novel Expert-Based Normalisation Function

Consistently with what is often debated with respect to the aggregation function, the parameters of the normalisation function can either reflect a predetermined choice by the researcher herself (e.g., through a data-driven strategy), or be elicited from some stakeholders group, e.g., field-experts, members of institutions, citizens (Kim et al. 2015; Decancq and Lugo 2013 produce a recent review of elicitation strategies).

In a simple linear model with \(\beta =1\) and \(w_{j} = 1/\mathrm{m}\) for each j-th indicators, the crucial determinant of a dimension’s relevance relies heavily on the normalisation function, as visible from (3). As we will discuss in the next section, if the function’s parameters are data-driven, then their implications in terms of dimensions’ weights and MRS are to be interpreted under a mathematical perspective, yet it is harder to determine what do they reflect in economic terms (Lefebvre et al. 2010). As an example, in the data-driven min–max, a variable with transformed-value equal to “0” just implies it being “the last one”, or “the worst one” observed among the available data, which does not necessarily corresponds to an undesirable condition of Well-being. A similar reasoning, with opposite meaning, can be done for normalised values of “100”.

An alternative to the data-driven normalisation would require to incorporate some value judgments in the normalisation (e.g., goalposts, see Mazziotta and Pareto 2015). This translates to linking the extreme values “0” and the “100” with, e.g., a certain definition of desirability, thus making the normalisation independent from the data. When an indicator lies above or below such fixed bounds, further variations do not contribute to the latent variable under study (see e.g., the discussion in Anand and Sen 1994; Klugman et al. 2011; Ravallion 2012b; Lefebvre et al. 2010; Gidwitz et al. 2010). A major example of fixed threshold is the Human Development Index that, since 1994, adopted “goalposts” as minimum and maximum values in the normalisation function. The interpretation behind these fixed thresholds relies on the belief that objective upper and lower bounds can be identified and defined as “subsistence” minimum or “satiation” points, beyond which additional increments would not contribute to the expansion of capabilities.

Contrary to Human Development, social exclusion’s concept has been developed with reference to advanced industrialized economies, as are those of the European Union members. Therefore, rather than on “subsistence”, its focus is posed on the “unacceptability” and “undesirability” of living conditions, as in an enlarged definition of poverty. Accordingly, a positive threshold for each of our four social-inclusion attributes would refer to a “certainly desirable and favourable condition of Well-being”, to which a normalised value of 100 would correspond. Conversely, a negative threshold would refer to a “certainly undesirable and harmful conditions of Well-being”, corresponding to a normalised value of 0.

In order to select the actual thresholds, we chose to elicit expert preferences through a survey, rather than to pre-determine them in a top-down fashion. To the best of our knowledge, this is a strategy rarely applied to normalisation stage, and mostly adopted for the aggregation phase instead.

Following Chowdhury and Squire (2006) and Hoskins and Mascherini (2009) (who, indeed, both elicit weights on aggregation rather than on normalisation), we intended to involve informed opinions and therefore selected the population of professors and researchers in the Departments of Economic and Management of the Ca’ Foscari University of Venice. Specifically, our population consisted of 149 professors (\(57 + 38\) full or associate professors of Economics and Management, respectively; \(29 + 25\) assistant professors, ricercatore universitario, of Economics and Management, respectively).Footnote 10 As for any expert sample, issues could be raised on our group’s capability of ensuring all values of efficiency, equity and democracy in the elicitation process. As Kim et al. (2015) pointed out, there is no elicitation method that can ensure all the aforementioned problems. Moreover, being concerned with democratic representativeness, one could argue that greater citizen participation were required; nevertheless, such strategy would likely cause loss of efficiency and quality of the elicitation, together with a lower degree of representativeness (given the resources’ constraints). Conversely, we selected a narrow population with specific characteristics but with a working experience that is, at least, partially related with the issues involved in social inclusion. Moreover, thanks to an adequate response rate, we are able to statistically represent it.

The survey was worded in Italian and conducted in electronic-form with the QUALTRICS software, a web-based tool that enables users to build custom surveys and distribute them via email.Footnote 11 Participants were invited with an email including a link to take part to the on-line questionnaire on an anonymous basis. Appendix B includes further details on the survey pages and wording.

4.2.1 The Expert-Based Thresholds and the Min–Max Normalisation Function

We implement a min–max normalisation as in (5), where the benchmarks corresponds to the median values elicited through the Qualtrics survey (Table 4). In particular, the favourable threshold for life expectancy is chosen at 83 years old, while the negative threshold is 73 years old. Early school-leaving’s range lies between 10 % (which corresponds to the EUROPE 2020’s target for members of the European Union). A rate of 9 % (or higher) of long-term unemployment denotes a median certainly undesirable condition, while the positive threshold is determined at 3 %. As for poverty rate, a certainly harmful level has its median value at 20 %, while desirability corresponds to 5 % (or lower) share of population below the poverty line set by the Eurostat.

The interquartile ranges are always relatively small, except for the negative threshold for early-school-leaving (15–25 %). Nevertheless, we are aware that no “true values” exist, with respect to these thresholds. In the words of Mascherini and Hoskins (2008), “the judgment of one of the outline may be correct, and those who share a consensus view may be wrong”.

A quick comparison of Tables 3 and 4 suggests that “certain desirability” and “certain undesirability” largely differ from observed minimum or maximum achievements. Indeed, the lowest observed level of longevity (77.4 years) is considered to be “certainly undesirable” just by a small fraction of respondents (Fig. 10 in Appendix A). Similarly, any rate of long-term unemployment beyond 9 %, or of school dropouts higher than 20 %, or of poverty-rate beyond 20 %, is regarded as certainly negative, while the actual observed maximums are quite higher. A capping on the positive threshold occurs for those regions which report long-term unemployment lower than 3 % or early school leaving rates lower than 10 %.Footnote 12

For each region i where an attribute x is observed at a time t, the corresponding normalised value \(\nu _{sM}^{i,t}(x^{i,t})\), where the subscript sM stands for “survey-driven min–max”, is determined as:

$$\begin{aligned} \nu _{sM + }^{i,t}\left( {x_ + ^{i,t}} \right)= & {} 100 * \frac{{x_ + ^{i,t}\mathrm{{lowest}}\;\mathrm{{threshold}} }}{{{{\mathrm{thresholds}{\hbox {'}}\mathrm{range}}}}}\quad \mathrm{or}\nonumber \\ \nu _{sM-}^{i,t} \left( {x_{-}^{i,t} } \right)= & {} 100 * \frac{\mathrm{highest\;threshold}-x_{-}^{i,t} }{{\mathrm{thresholds}{\hbox {'}}\mathrm{range}}} \quad \mathrm{with}\nonumber \\ \nu _{sM+}^{i,t} \left( {x_{+}^{i,t} } \right)= & {} 0\quad \mathrm{if}\;x_{+}^{i,t}<\mathrm{lowest\;thres\;hold\quad and}\nonumber \\ \nu _{sM+}^{i,t} \left( {x_{+}^{i,t} } \right)= & {} 100\quad \mathrm{if}\;x_{+}^{i,t}>\mathrm{highest\;threshold} \nonumber \\ \nu _{sM-}^{i,t} \left( {x_{-}^{i,t} } \right)= & {} 0\quad \mathrm{if}\;x_{-}^{i,t} >\mathrm{highest\;thres\;hold\quad and}\nonumber \\ \nu _{sM-}^{i,t} \left( {x_{-}^{i,t} } \right)= & {} 100\quad \mathrm{if}\;x_{-}^{i,t} <\mathrm{lowest\;threshold} \end{aligned}$$
(7)

where x\(_{\mathrm {+}}\) and x\(_{\mathrm {-}}\) have the usual meaning of a “good” and a “bad” attribute, respectively.

Table 4 Survey-elicited benchmarks

The expert-based normalisation is closer in nature to a “social value-functions”, in that the rescale is performed according to how much a value fulfils a “desirability” requirement. Moreover, the normalisation function may become weakly monotonic (instead of being strongly monotonic as the data-driven min–max), when the elicited constraints are binding for some observed variable. Indeed, there are regions having attributes with observed performances outside the elicited boundaries, which will receive a normalised value of 100 or 0. Therefore, as we will discuss in the next section, when an attribute’s value lies outside the thresholds, its marginal contribution to the aggregate measure is zero.Footnote 13

5 Implicit Trade-Offs from Normalisation

When implementing the min–max normalisation, both in the data-driven (6) and in the survey-driven (7) setup in the baseline linear model with “equal weighting” (2) we obtain two aggregation functions: LD (linear, data-driven min–max), LS (linear, survey-driven min–max). Normalisation benchmarks for the LD and LS models are taken from Tables 3 and 4, respectively. For a generic region i, such aggregation functions take the following form:

$$\begin{aligned} LD^{i}\left( {\nu _{dM} \left( {\mathbf{x}}^{i} \right) } \right)= & {} 100\left( 0.25*\frac{x_{1} -77.5}{84.2-77.5}+0.25*\frac{42.8-x_{2} }{42.8-5.4}\right. \nonumber \\&\left. +0.25*\frac{15.3-x_{3} }{15.3-0.3}+0.25*\frac{44.3-x_{4} }{44.3-5.2} \right) \end{aligned}$$
(8)
$$\begin{aligned} LS^{i}\left( {\nu _{sM} \left( {\mathbf{x}}^{i} \right) } \right)&=100\left( 0.25*\frac{x_{1} -73}{83-73}+0.25*\frac{20-x_{2} }{20-10}\right. \nonumber \\ {}&\left. \quad +0.25*\frac{9-x_{3} }{9-3}+0.25*\frac{20-x_{4} }{20-5} \right) \quad \mathrm{with} \nonumber \\ \nu _{sM} \left( {x_{1} } \right)&=0\quad \mathrm{if}\;x_{1}<73\quad \mathrm{and}\quad \nu _{sM} \left( {x_{1} } \right) =100\quad \mathrm{if}\;x_{1}>83\nonumber \\ \nu _{sM} \left( {x_{2} } \right)&=0\quad \mathrm{if}\;x_{2}>20\quad \mathrm{and}\quad \nu _{sM} \left( {x_{2} } \right) =100\quad \mathrm{if}\;x_{2}<10\nonumber \\ \nu _{sM} \left( {x_{3} } \right)&=0\quad \mathrm{if}\;x_{3}>9\quad \mathrm{and}\quad \nu _{sM} \left( {x_{3} } \right) =100\quad \mathrm{if}\;x_{3}<3\nonumber \\ \nu _{sM} \left( {x_{4} } \right)&=0\quad \mathrm{if}\;x_{4} >20\quad \mathrm{and}\quad \nu _{sM} \left( {x_{4} } \right) =100\quad \mathrm{if}\;x_{4} <5. \end{aligned}$$
(9)

Before implementing such models on the sample data, it is useful to highlight the implicit economic and statistical mechanisms acting beyond these aggregation functions, through the normalisation stage. The most direct way to do it is to investigate the “relative importance” (marginal contribution) that each dimension is given in each of the three aforementioned models of social inclusion. Indeed, the aggregation function is kept constant and it is characterised by equal weighting to the normalised dimensions. Nevertheless, since no such things as normalised-longevity or normalised-unemployment rates exist in reality, it is particularly useful to focus on how observed-attributes contribute to the overall measure of social inclusion, and what characterize the relationship between raw-variables within the aggregation framework.

The marginal contribution of each j-th raw-indicator to the overall synthetic measure can be determined by computing the partial derivative as in (3). Given that the selected aggregation model has \(w_{j} = {0.25}\), the magnitude of the marginal contribution is entirely determined by the steepness of the adopted normalisation function. Indeed, for a generic linear aggregation model L, and for any normalisation function v, it holds that:

$$\begin{aligned} \frac{\partial L\left( {v\left( {\mathbf{x}} \right) } \right) }{\partial x_{j} }=0.25\cdot {v}'_{j} \left( {x_{j} } \right) . \end{aligned}$$
(10)

The derivative \(v'\) represents the link implicitly imposed, when normalising data, between the original variable x and its counterpart \(\nu (x)\). The partial derivatives in (11) and (12) illustrate such link for the data-driven min–max (MM), and the survey-driven min–max (sM), respectively:

$$\begin{aligned} \frac{{\partial \nu _{MM \pm }^i\left( {{x^i}} \right) }}{{\partial {x^i}}} = \pm \frac{{25}}{{\max \limits _{t \in T} \max \limits _i \left( {x_ + ^t} \right) - \min \limits _{t \in T} \min \limits _i \left( {x_ + ^t} \right) }} \end{aligned}$$
(11)
$$\begin{aligned} \frac{\partial \nu _{sM\pm }^{i} \left( {x^{i} } \right) }{\partial x^{i} }= & {} \pm \frac{25}{{[\mathrm{thresholds}{\hbox {'}}~\mathrm{range}]}}\quad \mathrm{if}\;x^{i}\in [{\mathrm{thresholds}{\hbox {'}}\mathrm{range}}] \nonumber \\= & {} 0\quad \mathrm{if}\;x^{i}\le \mathrm{lowest\;threshold}\vee x^{i}\ge \mathrm{highest\;threshold} \end{aligned}$$
(12)

We can immediately notice that, in all of the cases, the effect of a one-unit increment in x on the transformed variable \(\nu (x)\) is constant. This is due to the transformation functions being linear, and the benchmarks bmax, bmin, being fixed (they are either extracted from the data, or elicited from experts).Footnote 14 In particular, the higher the range or the standard deviation of a raw variable, the lower its unitary marginal contribution to the normalised one. Hence, unless all the attributes have very similar distributions and comparable units of measurement (which would make the normalisation itself of secondary importance), all the normalisations are performing a preliminary, and unequal, weighting of the original variables, regardless of the choice of the aggregation function.

A partial exception must be highlighted for the survey-driven min–max (12), for which the usual non-satiation hypothesis (more of a “good” is always preferred to less) is maintained in a weaker form. Indeed, more of a “good” is non-ill favoured with respect to less of it, after a certain performance is reached (and, conversely, more of a “bad” is non-preferred to less of it). As a rough realisation of the diminishing sensitivity hypothesis, the effects of a change in variables’ score on the social utility is zero after the thresholds are crossed. From a policy-implication point of view, this suggests to focus on those dimensions whose performances lie farther away from the “desirability” level.

To help further clarifying the aforementioned observations, Fig. 1 reports a graphical visualisation of the two versions of the min–max transformation implemented on the selected data. The heterogeneity of the functions’ steepness both within and between normalisation frameworks reflects the differences in each variable’s thresholds’ range.    In particular, the steep of the survey-based functions for unemployment and school-dropouts is higher with respect to the data-driven version because of the shorter min–max range imposed by the experts. The opposite is true for life expectancy, which has a steeper normalisation under the data-driven strategy. To make some examples, a long-term unemployment rate of 3 % is normalised to 100 under the expert thresholds, to around 80 under the data-driven threshold. A life expectancy of 80 years old results in a transformed value around 70 under the expert-function, whereas it is around 40 in the data-driven normalisation.

Fig. 1
figure 1

Min–max normalisation: survey-driven vs data-driven benchmarks

We can now compute, by the means of partial derivations, the marginal contribution of each indicator with respect to the three synthetic measures LD, LS, that is, the impact that a unitary change in the original attribute has on the overall measure of social inclusion. Table 5 illustrates the results.

Table 5 Dimensions’ relative weights in linear data-driven model under different normalisations

Since these marginal contribution coefficients cannot be easily compared across normalisation-methods, we normalised them row-wise, so that their sum is always 100. This allows us to interpret the results in terms of “relative weights”, i.e., how much weight (in percentage) is given to a specific raw variable. Such normalised weights are shown in Fig. 2.

Fig. 2
figure 2

Relative weights in different normalisation strategies

Although the weights were set as equal for each normalised attribute, those related to the actual indicators are highly un-balanced, regardless of the transformation adopted.

In particular, the longevity dimension is assigned a predominant role in the aggregation (a relative weight higher than 55 %) under the data-driven min–max. Indeed, this is the variable for which the data-driven min–max exhibits the highest slope (Fig. 1). Much lower effects derive from a decrease of one unit in long-term unemployment, and an even lower one from reductions in school-dropouts and poverty-rate.

Such trade-offs change significantly when the expert-based min–max is adopted. The dimensions’ weights appear slightly more homogeneous: longevity and school-dropouts have equal relative relevance (24 %), while poverty-rate accounts for 15.4 % of the weight and the unemployment indicator being the one with the highest marginal effect on the aggregate measure (37.6 %).

Moreover, using (4), the marginal rates of substitution between any pairs of indicators \(x_{j,} x_{k} \)can be computed for each of the three aggregation models. Results are reported in the following Tables 6 and 7.

Table 6 Marginal rates of substitution in the linear model with data-driven min–max
Table 7 Marginal rates of substitution in the linear model with the survey-driven min–max

As expected, the marginal rates of substitution mirror the heterogeneity in the relative weights, and yet convey a more pragmatic evidence on the relevance of the hidden, and partially unintended, trade-offs lying behind the apparently simple and neutral aggregation framework adopted. Just to make an example, in the data-driven min–max model, one additional year of longevity increases the synthetic index of social inclusion as it would a reduction of at least 5.3 percentage points in school dropouts, around 2.5 percentage points in long-term unemployment, and around 6.2 points in poverty rate. In the survey-based min–max model, the marginal rates of substitution for a unitary increase in life-expectancy are much lower: namely, 1 percentage point change in early-school leavers, 0.62 points of long-term unemployment, 1.5 points of poverty rate.

5.1 Discussion: Positive vs Normative Analysis

The aforementioned heterogeneous relative weights and trade-offs, both within and between the aggregation models, arise because of the (differences in the) adopted normalisation strategies, and will strongly influence the resulting indices, as Sect. 6 will show. Moreover, this happens in the context of aggregation frameworks granting “equal weights” to their components. As already stated, such label can be partially misleading, since the equal weighting pertains just to the normalised attributes. Indeed, to the extent to which rescaling is a requirement for composite measures, the actual aggregation concerns the transformed variables, in place of the observed performances, and yet there is an unavoidable and intrinsic difference between the interpretation of original and normalised performances. The transformed unit of measurement (e.g., between zero and one, if the min–max rescaling is adopted) can be interpreted as a sort of degree of fulfilment of some criterion. Whether this criterion should be purely statistical (e.g., being far or close to the observed minimum or maximum achievements), or whether it should encompass some informed value judgements related to the topic at hand (as in the expert elicitation or in the adoption of policy benchmarks), relies on the researcher’s choice. Both directions are, in principle, correct.

What we would like to stress at this point is not whether such trade-offs are more acceptable under a data-driven or a survey-driven strategy: no normalisation is “safe” from the emergence of such hidden weights. Rather, we highlight that (i) normalisation-generated trade-offs are inevitable, (ii) the ground on which they are justified can differ greatly, depending on the normalisation strategy adopted, and, therefore, (iii) the resulting aggregate indices should be interpreted accordingly.

When a data-driven approach is selected, debating on the acceptability of the underlying marginal rates of substitution is a marginal issue: the justification, and therefore the interpretation, of such coefficients is inherently statistical. It follows that the interpretation of the resulting composite indices should be of the same nature, that is, statistical, which constitutes a strong and solid ground for a positive analysis of a composite phenomenon. That being said, the absence of value-judgements in the construction of a data-driven index does not neutralise the hidden trade-offs shown in the previous tables. It is still true that, with a data-driven min–max and with a data-selection as described in Sect. 2, life-expectancy carries a weight which is more than twice what is assigned to the remaining three variables, and that one additional year of longevity is implicitly made equivalent to, e.g., 6.77 percentage points of school dropouts.

It follows that, by construction, such relative weights and marginal rates of substitution are sensitive to the choice of the data-sample and to distribution of the original variables. Indeed, and especially for the data-driven min–max, the presence of outliers in the data would stretch the range over which the normalisation is performed, therefore altering the original variable‘s marginal contribution to the overall index (as noted, we prudently excluded the extremely high values for the Spanish autonomous cities of Ceuta and Medilla in computing the data-driven benchmarks). As an additional warning, such transformations—again, especially the min–max—are not stable when data for new years or new regions become available, which could sensibly affect the distribution of data (Lefebvre et al. 2010). Similarly, a shift in the territorial dimension of the analysis (e.g., from a national to a regional or provincial level) will cause similar changes, since the provincial data are likely to exhibit higher variability than the regional ones.

Under the strategy of expert-elicitation of the transformation parameters, standard properties as strong non-satiation and continuity of the normalisation function are not guaranteed (indeed, in our example, the min–max becomes weakly monotonic when the elicited constraints are binding for some observed variable). Dimensions’ trade-offs reflect the preferences of an actual group of experts, and are therefore independent from the selection of data and from the territorial dimension of the analysis. It is not possible to claim that opinions of experts would lead to more suitable benchmarks compared to the ones revealed from the data. Indeed, what differentiates such benchmarks and trade-offs from the data-driven ones is that the former are easier to be interpreted under an economic perspective, in terms of social desirability. Therefore, the resulting aggregate measure would constitute a tool for normative analysis.

However, such an elicitation method suffers from the arbitrariness embedded in any survey exercise, e.g., choice of the population, bias in the framing of questions, and is by definition sensitive to such choices. As an example, it is likely that a panel of experts from another Italian or European university would lead to different normalisation parameters. However, as far as the choice of the panel is kept homogeneous, the economic justification for the expert-based strategy is always preserved (i.e., normalisation as a social desirability function), whereas the variability in the experts’ answers among different panels is analogous to the aforementioned sensitivity in the data-driven method, when new territorial units or years are added to the data.

6 Results

Results for the linear model with data-driven normalisation (LD model) are summarized at country levels in Table 8 (population-weighted averages at country level), together with coefficients of variation within countries. Full results are available in the Appendix C.Footnote 15 Data for Denmark are not available before 2007.

Table 8 Social inclusion measure and coefficients of variation, data-driven normalisation

Results for the linear model of social inclusion with survey-driven normalisation (LS model) are summarized in Table 9 (aggregated at countries’ level), and fully reported in the Appendix D.

Table 9 Social inclusion measure and coefficients of variation, expert-based normalisation

The left graph in Fig. 3 provides an overall view of the social inclusion trends at country level, under the data-driven model. Differences between countries appear rather limited until 2007. Denmark and Italy report the highest levels of Inclusion, followed by Belgium, Germany and Spain. There is a general increase in the index for all countries until 2008. Since then, the Spanish performance declines and eventually reaches in 2012 levels of Inclusion close to those of 2004. Italy’s index is apparently slightly affected by the economic crisis (roughly, from 2010 onward): its aggregate Inclusion ceases to improve and starts to decline, reaching in 2012 the same levels of 2006. Belgium and Germany show a general continuous increase in their levels of Inclusion. In particular, since 2008 Germany overcomes Italy, while Belgium does it in 2011. Overall, the situation in 2012 appear to be more heterogeneous than it was in the early years in our sample: Italy and Spain show a negative trend (increasing exclusion), while Belgium and Germany continue to improve their aggregate performance. Moreover, while many Italian regions score very well throughout the available time span, some others are consistent bad performers. Most of the top-10 regions between 2004 and 2012 are Italians, yet both Campania and Sicilia constantly rank at the very bottom of the tables. Thus, the social inclusion index emphasizes the well-known dichotomous socio-economic picture of Italy as well as the contradictions of Belgium where important differences exist between the Flemish region, the Bruxelles region and Wallonia.

Fig. 3
figure 3

Time-trend of the social inclusion index under different normalisations

When adopting the survey-driven normalisation, (right graph in Fig. 3) country trends are confirmed yet between-country differences are more evident. Social inclusion increases in Belgium, Germany and Denmark, while Italy and Spain experience a continuous decline, which starts since the early years of the economic crisis. The overall picture is quite different from the one commented before, for at least three reasons. First, social inclusion in levels is lower for Italy, Spain and Belgium, and Italy lies now always below both Belgium and Germany for all the time-interval, while Denmark is by far the best performing country. In particular, Italy (blue circle markers) and Germany (green triangle markers) show similar levels of Inclusion in 2006. After that, the index continues to increase for Germany while it remains constant (and then declines) in Italy. Thus, there is a clear phenomenon of rank reversal between the two models: in terms of “desirability” (as defined by the expert-panel), the aggregated Italian picture is worse than the German one, and we will discuss this effect in a Sect. 6.1. Second, the German regions, together with Belgium’s Flanders, achieve more top-10 rankings than they were under the previous specification, especially after 2007. Second, Spain exhibits a decline in social inclusion which is much more dramatic than it appeared before. The negative trend starts after 2006 but the drop in performance is substantial after 2008, leading to final levels of Inclusion much lower than they were in 2004. Again, we stress the fact that, albeit the negative trend for Spain was already visible from the baseline data-driven model, this picture conveys a much stronger need for intervention. Third, the heterogeneity within each country is much higher, as noticeable from the coefficients of variation in Table 9. Spain, Italy and Belgium still report the highest coefficients, but heterogeneities are rather constant in the latter country while they are increasing in Italy and especially in Spain. An opposite trend appears in Germany and Denmark, where convergence of social inclusion between regions seems to occur.

Changing the normalisation strategy has large implications on the distribution of the social inclusion index in our regional sample, as we show by plotting a kernel-density estimation for the distribution of the data-driven and the survey-driven indices, for 2004 and 2012 (Fig. 4). The left graph (LD model) highlights how the distribution became less disperse and more uni-modal between the starting and the final period. Conversely, the one on the right reports a more heterogeneous starting distribution, which becomes clearly bi-modal in the final period of the analysis, confirming the previously commented trends. Indeed, such differences in trends and levels could lead to very different policy implications in terms of how to assuage and prevent social exclusion.

Fig. 4
figure 4

Distributions of inclusion indices

In order to test whether the two specifications convey similar rankings, we perform a Kendall’s tauFootnote 16 tests between the ranking of the data-driven model and the one coming from the survey-driven model, for each year. Results are reported in Table 10, excluding the Danish regions since they have no Inclusion index before 2007. Coefficients’ magnitude indicate that rakings’ correspondence is far from perfect, yet we can always reject (at 99 %) the null-hypothesis of no correlation between the models’ rankings.

Table 10 Kendall’s \(\tau \) correlation coefficients (Danish regions are excluded)

Finally, let us now focus on the results for the year 2012, the last available year in our sample. Some similarities emerge from the two specifications, i.e., Germany and Denmark stand out as the top-two countries for inclusion levels, regardless of the adopted normalisation strategy. Nevertheless, their ranking is reversed when switching from the data-driven approach, where Germany has a small advantage, to the survey-driven strategy, where the index for Denmark is considerably higher. Belgium is third-placed under every specification. Yet, its relative Inclusion-loss with respect to both Germany and Denmark is much lower under the two data-driven strategies. Similarly, average values for Italy and Spain are, respectively, fourth and fifth placed. Nevertheless, the spread between them and the remaining countries is extremely higher when the survey-based normalisation is implemented.

As the coefficients for standard deviation suggest, country averages conceal a substantial degree of heterogeneity between regions. Spain, Italy and Belgium, in particular, exhibit high levels of variability, with their standard deviation being around 40 % of national averages in the survey-driven model. Moreover, the choice of the normalisation strategy strongly affects the extent of such heterogeneity. This leads us to conclude that looking at national averages cannot provide a fully informative tool to evaluate the phenomenon at study. The graphical representations in Fig. 5 allow us to visualise the regional-dimension of the indices and to draw further valuable insights on both levels and variability between and within countries and models. We report such figures for 2012 as well as for the starting year 2004.

Fig. 5
figure 5

Heterogeneity within countries due to normalisation, 2004 and 2012

A comparison of the graphs for 2012 (the two graphs at the bottom) in Fig. 5 illustrates how, switching from the survey-to the data-based model: (i) Spanish and Italian regions experience a substantial increase in Inclusion, particularly with respect to Germany and Denmark; (ii) Danish territories are relatively worse off; (iii) all countries, except for Spain, exhibit a smaller degree of regional variation.

The case of Spain in 2012 is of particular interest for our methodological approach. Indeed, even though at country level the regional ranking is consistent, the Spanish picture conveyed by the survey-driven model is more troublesome: only three regions appear to have levels of Inclusion comparable with Germany and Italy (Navarra, Paìs Vasco, Cantabria), with the others lying far below on the metric scale, placed (almost exclusively, together with Italian territories) in the bottom-20 of the ranking. This denotes a notable welfare loss with respect to the remaining regions, as well as a distinctive skewedness of the distribution. Both of these features are completely absent from the data-driven results, where Spain exhibits a high degree of heterogeneity, with some territories performing relatively well, others relatively bad, and a group lying in between. In particular, under these frameworks, a substantial group of ten regions (therefore, a majority share) appears to be roughly in line with the German and Italian distribution of the index (Navarra, Paìs Vasco, Cantabria, Madrid, Castilla y León, Aragón, La Rioja, Galicia, Asturias, Cataluña). Evidently, the survey-driven model conveys a much stronger early-warning message than the data-driven ones, which would potentially lead to very different policy implications.

We can exploit Fig. 6 to spot the normalisation-induced variation in European territorial rankings. The graph summarizes the rankings obtained for each region under each normalisation for 2012, and territories are sorted by their ranking in the linear data-driven model (“X” marker, while the survey-driven ranking has a “circle” mark). The labels on the X-axis report the NUTS code for the administrative regions, whose first two letters identify the country (e.g., “ES01” refers to a Spanish region). The correspondence between labels and region names is illustrated in Appendix A (Tables 11, 12).

By focusing on the X and the circle marks, it is relatively easy to identify the regions which are “penalized” by the data-driven min–max normalisation with respect to the survey-driven one (i.e., the “x” lies below the “circle”), as well as the opposite (when the “x” lies above the “circle”). There are eight Spanish regions among the last eleven under all of the three specifications, and their ranking is basically normalisation-invariant (Valencia, Castilla-la Mancha, Murcia, Canarias, Extremadura, Andalucía, Melilla, Ceuta).Footnote 17 Conversely, almost all of the remaining regions’ rankings are strongly affected by normalisation choices, and numerous substantial rank-reversals occur, as we illustrate through few examples. The Comunidad Foral de Navarra (ES22) ranks 7th under the data-driven min–max, while dropping to 13th place under the survey-based model. Meanwhile, the German Land Hessen, ranked just below Navarra in the data-driven model (8th), reaches a much higher rank in the survey-driven model (4th). An even more dramatic set of rank-reversals occurs when comparing Navarra with all of the five Danish regions: although the latter group appears at far distance in data-driven model (the best ranked Danish region being Midtjylland at 16th), these territories largely outrank Navarra in the survey-based framework. Incidentally, the exact same reversal affects also the region of País Vasco (ES21). Spanish regions with lower positions in the table are similarly outranked by other European territories when the analysis is performed through survey-driven normalisation: e.g., Cantabria, Madrid, Castilla y León, overtaken by, among others, Nordrhein-Westfalen, Molise, Bremen.

Fig. 6
figure 6

Rankings’ sensitivity

As well as it is for Spain, the relative performance of Italian regions changes distinctly when comparing methods. Under the data-min–max, the Italian regional picture appears extremely heterogeneous, with some territories performing worse, and the remaining others being equal, or better off, with respect to regions in Germany and Denmark. Moreover, the levels and distribution of Italian social inclusion appear very similar to those in Spain, when excluding the two “outlier” autonomous cities of Ceuta and Medilla. Switching to the survey-based normalisation makes the picture change radically, and in a different fashion with respect to what was discussed for Spain. Italy appears as a highly severed country: the number of low-performing territories is sensibly higher, and a much larger distance separates these territories with the remaining relatively-good performing ones, which are—in turn—worse placed with respect to both Germany and Denmark than they were under the data-driven strategy. Nevertheless, the overall Italian picture looks substantially better than the Spanish one, again, contrarily to what could be inferred from the data-driven model. Figure 6 helps in identifying the numerous rank-reversals affecting Italian, German and Denmark regions, which happen almost exclusively to the benefit of the latter two countries. A notable example is the northern region of Trentino-Alto Adige (ITD1), ranked 2nd in the data-driven approach, which drops to 17th under the survey-based approach, being overtaken by German, Danish and Italian regions, as well as by Madrid. Similarly, the central regions Toscana (ITE1) and Marche (ITE3) as well as the industrial north-west region Piemonte (ITC1), who all rank mid-high in the data-driven models, lose numerous positions once the normalisation switches.

Belgium is affected by the aforementioned “Italian effect”, yet to a lower extent. Although the ranking of its three regions appears extremely robust, both the within-country heterogeneity and the between-countries relative rankings are substantially modified by the methodological choices. The Inclusion level for Flanders, as well as its ranking, is constantly very high, and is increased when the survey-normalisation is adopted. The French-speaking region Wallonia, however, has a significantly lower performance under the survey-based model, which results in a wider gap with the Flemish region. Finally, the Bruxelles region is already among the worst-ranked in the data-driven model, and yet it drops to bottom of the ranking under the survey-model. As a consequence, the emerging Belgian picture conveys much more heterogeneity in the survey-based framework.

A similar pattern can be found for Germany: although its overall levels of social inclusion are higher than Belgium and Italy, the survey-driven approach returns a degree of within-country variability that is absent in the data-driven models.

The distribution of the social inclusion index for Denmark is characterised by a low degree of dispersion, regardless of the adopted normalisation. As for levels and rankings in the data-driven models, Danish regions score relatively high values. Yet, their best performing territory is overcome by one Belgian, nine Italian, three German, and two Spanish regions. Conversely, in the survey-driven framework their ranking significantly improves: e.g., the worst performing Danish region is overcome by just three regions from Germany, three from Italy, and one from Belgium.

6.1 Weights and Rank-Reversals

It is useful to briefly recap how different normalisation strategies can lead to substantially different scenarios of social inclusion in Europe. The key factors to consider are: (i) the relative-advantage that each country has in a specific dimension; (ii) as well as the within-country heterogeneity between the performances of the four selected raw-variables. Indeed, Spain and Italy present, on average, a strongly unbalanced dashboard: the longevity dimension is particularly high, relative to the other countries, while education and socio-economic variables show much worse relative-values. Since the two normalisation strategies (data-driven vs survey-driven) give opposite relative weights to these dimensions (a prevalent weight to longevity in the data-driven models; a more equal set of weights in the survey-driven one, with some prevalence to unemployment) this leads to the aforementioned difference in-levels, with many Spanish and Italian regions falling to lower-ranked positions. Countries like Denmark and Germany are less affected by the change, given that their dashboard of indicators is uniform. Moreover, heterogeneity within countries is very different: Spain, Italy and Belgium show coefficients of standard deviations much higher than Denmark and Germany in each of the included variables. This means that the Inclusion-mix can differ greatly between regions, within the same country. This kind of heterogeneity is somehow softened in the data-driven models, given that a single dimension gets such a large relative weight. In the survey-driven model, conversely, because of the higher weight given to the remaining dimensions other than longevity, such heterogeneity is enhanced.

This, in turn, explains the numerous rank reversals discussed in this section. Let us consider, for instance, the case of Italy and Germany, who exhibit, since 2004, similar trends in early school-leaving, life expectancy at birth and poverty rate. Nevertheless, the levels of these variables are quite different: there are much more school-dropouts and poverty rates in Italy, which also presents substantially higher longevity. When it comes to long-term unemployment, the country-trends are crossing: Italy experienced a consistent decline in its labour market performance, while Germany saw a constant improvement (according to many observers, a consequence of the Hartz Reforms). Such trends are summarised in Fig. 7.

Fig. 7
figure 7

Time trends of indicators in Germany and Italy

In the data-driven model for Italy, the increase in normalised life-expectancy more than counterbalances, due to its substantially higher relative weight, the worsening conditions in the labour market, therefore allowing the overall measure to increase slightly. Almost no role is played by early school-leaving, which has very little variations. In Germany, all the dimensions improve, thus leading to a regular increase in the composite measure, yet the Inclusion index is lower than for Italy, exactly because of the weight given to life expectancy.

When the elicited benchmarks are implemented, life expectancy becomes the best performing dimension for Italy, while early school-leaving is heavily penalized. In Germany, both these normalised-attributes are much higher than under the data-driven model. Finally, although being rather small in 2004, the spread between the countries’ long-term unemployment normalised-rates becomes much more evident. Given the reduced relative weight given to longevity in the expert-based model, and the higher one given to long-term unemployment, the German index is (1) higher than Italy, while it was lower in the previous specification; and (2) increasing at a higher rate.

We can further illustrate similar rank-reversals by focusing on the results for the year 2012. The Italian region Trentino-Alto Adige, for instance, scores slightly better than Belgium’s Flanders under the data-driven model. Although both regions admittedly exhibit virtuous performance in each of the four raw variables, the Italian region is relatively better off in longevity-at-birth (83.6 vs 81.4 years) while Flanders has an edge in unemployment (1.5 vs 1.6), education (8.7 vs 15.8) and poverty (9.8 vs 12.5). Under the data-min–max, Trentino’s losses in three over four dimensions are more than compensated by the gain in life-expectancy, so that the two territories end up having very similar ranking and levels (83.8 for Trentino, 82.4 for Flanders, in the data-driven min–max). Conversely, the survey-based normalisation implies larger weights to the three dimensions where Trentino trails Flanders, thus enhancing the score and the ranking of the Flemish region, while depressing Trentino’s ones (88 for Flanders, 73 for Trentino).

Another example comes from the comparison between Spanish País Vasco and Danish Syddanmark. In the data-driven min–max model they rank 15th (76.9) and 22nd (73.7), respectively. Under the survey-based framework the Danish region climbs to 10th place (77.6), while País Vasco drops to 23rd (69.4). Again, the reason for this shift relies on the heterogeneity in the two regions’ dashboards, enhanced by the new expert-based weights. País Vasco performs very well in life-expectancy (83.1, well in school dropouts (11.5 %, close to the 9 % of Syddanmark), yet it loses ground in long-term unemployment (6.4 %, while Syddanmark’s rate is just 2.4 %).

7 Conclusion

The unavoidable subjectivity of composite measures of Well-being are cause of controversies in this field of economic analysis. In this paper, we argued that the lack of transparency on methodological choices can turn out to be more troublesome than subjectivity per se: specifically, we focused on the choices of the normalisation function. In the context of building a synthetic Index of social inclusion for 63 European regions between 2004 and 2012, we showed the consequences of adopting different normalisation methods while keeping constant the (linear) aggregation model, with equal weights allocated to the normalised dimensions.

To the extent to which rescaling is a requirement for composite measures, the actual aggregation involves the transformed variables, rather than the observed performances. There is an unavoidable and intrinsic difference between the interpretation of original and normalised performances, and yet social researchers are ultimately interested in the contribution of the original variables to the aggregate index. The rescaled unit of measurement (e.g., between zero and one-hundred) can be interpreted as a sort of degree of fulfilment of some criterion. Whether this criterion should be purely statistical (e.g., being far or close to the observed minimum or maximum observed achievements), or whether it should encompass some informed value judgements related to the topic at hand (as in the expert elicitation or in the adoption of policy benchmarks), relies on the researcher’s choice.

In the former case, the agnostic choice of “letting the data talk”, standard properties as strong non-satiation and continuity of the normalisation function are guaranteed, yet dimensions’ trade-offs are hard to interpret in economic terms or from a social desirability perspective. Therefore, the resulting aggregate measure would be characterized as a tool for mainly “positive” analysis.

In the latter case, conversely, the normalisation function may become weakly monotonic, when the elicited constraints are binding for some observed variable. Moreover, the elicitation method suffers from the arbitrariness embedded in any survey exercise (selection of the experts, biases in the framing of questions). Finally, dimensions’ trade-offs reflect the preferences of an actual group of experts, thus are independent from the data-selection, and allow to characterize the final measure with a “normative” connotation.

The main result of our analysis is that neither method is neutral. Indeed, in our case study, data-driven normalisation softens the aftermaths of the recent economic crisis as well as the differences between territories, since it puts a consistent weight on the longevity variable which follows dynamics that are only partially related to socio-economic contingencies. Conversely, the survey-driven normalisation emphasizes the worse performance in long-term unemployment and early-school leaving of Italy and Spain, thus producing a bi-modal picture of Inclusion in Europe, with a cluster of region scoring very high and another scoring very low. As a result, numerous rank-reversals occur between regions when switching the normalisation methods.

Conceptually, both normalisation strategies are acceptable, and it is hard to label one as “preferred” to the other. Although the picture obtained with the use of the expert-based approach seems to better reflect the most recent economic trends, this might change if another expert-panel is selected. In other words, the opinions of experts are not guaranteed to lead to more suitable benchmarks compared to the ones revealed from the data. Indeed, what differentiates the two strategies is the nature of the economic justification lying behind the resulting benchmarks and trade-offs: social welfare preferences, in the case of the expert-elicitation, as opposed to frequency-based parameters, in the case of the data-driven models.

As mentioned in the introduction, building a synthetic index of social inclusion requires that the concept’s indeterminate multiplicity be made determinate through a specification of its contents, and of their relationship. In this paper we do not offer a real solution, as far as the normalisation process is concerned. Rather, we raise the awareness that, when a “real solution” is presented, it may not be the unique one, and its premises may hide peculiar trade-offs which should be made transparent to the reader. E.g., in our social inclusion study, a policy maker should be aware of the weights and trade-offs lying behind the data-driven results, and should be presented with the alternative picture coming from the expert-based strategy, in order to be able to draw more informed and efficient conclusions on the topic at hand.