##### Design Process

“Composite [indices] involve a long sequence of steps that need to be followed meticulously” (Greco et al., 2018). The GGPM team applied a stepwise approach to enhance the transparency, replicability, and credibility of the Green Growth Index (Figure 11). This approach conforms to “good practices” in developing composite indices (Nardo, Saisana, Saltelli, & Tarantola, 2005; OECD & JRC, 2008). After concept building (chapter 4), the second step was an empirical application to systematically address methodological issues, such as data selection and statistical tests as well as normalization, weights, and aggregation of indicators. This chapter explains details of these methods. The third step, which aimed to check the robustness of the Green Growth Index, measured the explanatory power of the indicators and dimension subindices as well as the sensitivity and uncertainty levels of the index. The fourth step, which focused on the presentation of the indicators, dimension subindices, and the Green Growth Index, required attention to enhance the comprehensibility and policy relevance of the results. This step considered not only the illustration of the results in maps, diagrams, and tables but also their assessments using benchmarks and ranks. This report, however, presents only selected results because most of the analyses will be discussed in GGGI’s forthcoming Global Green Transformation Report (see chapter 9.1). It is a flagship report that will serve as a core part of GGGI’s initiative to promote the model of green growth and showcase successful country experiences and approaches, supplemented by data, analysis, and stakeholder engagement.

**5.1 Indicator Selection**

The conceptual framework should provide guidance on the choice of indicators (chapter 4.2), but the metrics or data to be selected to measure these indicators can be subjective, particularly when the “desired data” are not available (OECD & JRC, 2008). The selection criteria should thus be consistent with the objectives and purpose of developing the index. Because the Green Growth Index aims to measure green growth performance across countries and regions this year and the succeeding ones, GGPM used the following criteria in selecting indicators:

- Relevance of the indicator to the green growth dimensions based on conceptual and empirical evidence;
- Coverage of more than 140 countries, which include a large number of GGGI member and partner countries;
- Availability of time series data to allow updates of the index on a regular interval; and
- Accessibility of the data to allow replication of methods and check the credibility of their sources to enhance data acceptability.

Literature review was conducted to provide evidence on the relevance of the indicators to the green growth dimensions and pillars (chapter 4.2; Acosta, 2019). Some of the indicators are, however, “proxy variables” because the desired indicators are either not available or there was a dearth of relevant data (see discussion in chapter 7.1 on indicators and proxy variables). Although the GGPM team aimed to have a wide data coverage in terms of the number of countries and years, some of the more relevant indicators did not meet these criteria. For example, there was data for less than 100 countries on one indicator for green economic opportunities, which is the share of patent publications in environmental technology to total patents, and two indicators for social inclusion, namely the share of youth (aged 15-24 years) not in education, employment, or training as well as the proportion of urban population living in slums (Figure 12). No alternative proxy variables are currently available for these indicators. Indicators for social inclusion, however, are expected to improve in the coming years because they are SDG indicators. Also, there was data for only one year for two indicators for efficient and sustainable resource use, specifically water use efficiency and average soil organic carbon content; for two indicators for natural capital protection, specifically the municipal solid waste (MSW) generation per capita and soil biodiversity, or the potential level of diversity living in soils; and for one indicator for social inclusion, specifically the proportion of population above statutory pensionable age receiving a pension. Most of these indicators are proxy variables and expected to be replaced by more desired data in the next few years. For example, FAO is currently finalizing its database for soil nutrients, which would be an alternative data source for soil organic content and soil biodiversity. Further improvements are also expected in data for water use efficiency and statutory pensions because they are SDG indicators.

Data for all indicators included in the Green Growth Index are publicly available online. The data were mainly collected from international organizations; this offers importante advantages for measuring performance across countries. For example, collecting data from national agencies for more than 100 countries will be cumbersome, whereas data from international organizations are collected from national agencies and have undergone consistency checks. The United Nations coordinates statistical activities “to guarantee integrated systems of collection, processing and dissemination of data” (Eurostat, n.d.). Nonetheless, during the regional consultation workshops, some regional experts expressed concerns over using data from international organizations (Acosta et al., 2019). To address these concerns, GGGI will encourage regional experts to undertake additional consistency check of the data once the data used in the development of the Green Growth Index become available online. Moreover, GGGI will help to communicate any concerns on the correctness and validity of the data to the international organizations that are responsible for producing and publishing the data.

**5.2 Data scaling **

“To have an objective comparison across small and large countries, scaling of variables by an appropriate size measure, e.g., population, income, trade volume, and populated land area, etc. is required” (OECD and JRC, 2008b: p.23). More than 70 percent of the 36 indicators are scaled data. They mainly use denominator data on gross domestic product (GDP) or gross national income (GNI), such as for primary energy supply, domestic material consumption, and adjusted net savings; area, such as for water use efficiency, PM2.5 air pollution, soil organic carbon content, organic agriculture, key biodiversity areas, terrestrial and marine biodiversity, and forest area; available resources or size of sector, such as for freshwater withdrawal, environmental export, green employment, and environment technology patent; and population, such as for material footprint, DALY rate as affected by unsafe water, municipal solid waste, GHG emissions, access to safe water and sanitation, access to electricity and clean fuels, and mobile and fixed broadband.

Three composite indices, which by default are scaled, were used as indicators, including the Red List Index, inequality in income based on the Atkinson Index, and the Healthcare Access and Quality Index. The Red List Index measures the variation in total extinction across species groups. The income inequality measure developed by Atkinson is based on the proportion of the total income that a given society would have to forego to allow equal income shares among the population (Afonso, LaFleur, & Alarcón, 2015). The Healthcare Access and Quality Index is based on the study of the Global Burden of Diseases (GBD), which used 32 causes from which death should not occur in the presence of effective care (Fullman et al., 2018). It is not uncommon to use indices in developing a composite index. Indices are particularly useful when one indicator is not sufficient to measure different issues that equally need attention or when one indicator only partially captures the problema or its solutions. Acosta (2019) provides detailed descriptions of the indicators to enhance comprehensibility of these índices.

**5.3 Data imputation**

A direct and most common approach to address missing data is to simply exclude or omit them (Gelman & Hill, 2007; He, 2010; Kang, 2013). The Green Growth Index partly adopts this approach. This is applied to indicators with time series data, where indicators are excluded when they have missing data for two consecutive years prior to the baseline year, which refers to the year that was used in computing the index. Examples of sustainability indices that do not apply data imputation include the Environmental Vulnerability Index of the South Pacific Applied Geoscience Commission, the UNEP Green Economy Progress Index, and ADB’s Inclusive Green Growth Index. Kang (2013) emphasized the problems with missing data, including reduction in statistical power, bias in estimation of parameters, reduced representativeness of the samples, and increased complexity of analysis. While these are very relevant for complex modelling analysis, using simple and transparent aggregation methods to generate the Green Growth Index can reduce these problems (Chapter 5.8). Moreover, He (2010) explained that when data are missing completely at random (MCAR), analysis with missing data is unbiased. In most cases, there are no clear basis on whether data are missing at random, which is a prerequisite in most imputation methods (Nardo et al., 2005). Gelman & Hill (2007) also pointed out that excluding indicators with missing data will reduce the number of samples in the analysis.

Imputation methods, such as mean imputation, linear interpolation, regression analyses, maximum likelihood, multiple imputation, are widely used to fill in missing data (Horton & Kleinman, 2007; OECD & JRC, 2008; Kang, 2013; Wicklin, 2017). Examples of sustainability indices that apply data imputation include the Global Green Economy Index of DC, which uses the mean of the five closest countries; the African Green Growth Index of AfDB, which uses the mean of normalized indicators; the Ecological Footprint of the Global Footprint Network, which uses inter- or extrapolation; the Environmental Performance index of the Yale University and Columbia University, which imputes the closest data points and uses extrapolation; the Sustainable Society Index of the Sustainable Society Foundation, which uses expert judgment; and the Happy Planet Index of the New Economics Foundation, which imputes data from the closest years. He (2010) categorized the methods of mean imputation and of treating missing data as a separate category as ad hoc because imputation is based on implausible assumptions, noting that “these methods impute the missing data only once and then proceed to the completed data analysis” (He, 2010: p.3). Single imputation methods are known to underestimate variance and standard errors because they assume to know the unobserved value with certainty (He, 2010; OECD & JRC, 2008). As far as the computation of composite indices is concerned, there are serious statistical problems associated with these imputation methods, which can affect the reliability of the analysis. For example, mean imputed data will not only reduce the variance but also change the correlation between the indicators (Wicklin, 2017). Both are problematic because a good variance is important to capture differences in scores across countries and, as discussed in Chapter 5.5, correlation is importante to identify redundant indicators. In short, there are trade-offs when using data imputation, and decisions often depend on subjective judgement. The motivations for using, and not using, imputation methods should thus be justified because “[n]o imputation model is free of assumptions” (OECD & JRC, 2008:p.25). In order to minimize the statistical implications of various imputation methods, the GGPM team adopted the simplest approach of the Happy Planet Index, which imputed data only from the closest years; for instance, missing data for 2017 was imputed by data from 2016. In very few cases, the mean of the closest years was used when there was a lack of time series data to observe the trend, and only two data points were available.

Table 1 provides information on data availability for the indicators and which indicators that were subjected to imputation. Out of the 36 indicators, 12 required imputations. However, four out of 10 indicators only needed imputation for one country. The indicators with the largest number of countries subjected to imputation include GJ1 Share of green employment in total manufacturing employment (GT1) and share of youth (aged 15-24 years) not in education, employment or training (SE3). Data for GJ1 were estimated by the United Nations Industrial Development Organization (UNIDO) based on the methods developed by Moll de Alba & Todorov (2018, 2019 in press). SE3 is an SDG indicator. Data for both indicators are expected to improve in the next years.

**5.4 Distribution and outliers**

An outlier is an observed value that has an “abnormal distance”, whether extremely large or small value, from other values of a dataset (NIST-SEMATECH, 2013). Outliers can “distort mean, standard deviation and the covariance structure of the indicator” and alter correlation between indicators (Mishra, 2008). They also affect the normalized values of the indicators and thus need to be identified and accounted for (Nardo et al., 2005; OECD & JRC, 2008). Boxplots of the indicators were computed to show the distribution of numerical data and identify extreme values or outliers in the indicators. Figure 13 illustrates the boxplot for the ratio of the total primary energy supply to GDP, showing the presence of extreme outliers. It also shows the interpretation of the boxplots of the indicators.

Table 2 summarizes the information from the boxplots, which were used to identify the outliers and the indicators that needed capping, where:

“𝐼𝑄𝑅 = 75𝑡ℎ 𝑝𝑒𝑟𝑐𝑒𝑛𝑡𝑖𝑙𝑒 – 25𝑡ℎ 𝑝𝑒𝑟𝑐𝑒𝑛𝑡𝑖𝑙𝑒

𝐿𝑜𝑤𝑒𝑟 𝑓𝑒𝑛𝑐𝑒 = 25𝑡ℎ 𝑝𝑒𝑟𝑐𝑒𝑛𝑡𝑖𝑙𝑒 – μ×𝐼𝑄𝑅

𝑈𝑝𝑝𝑒𝑟 𝑓𝑒𝑛𝑐𝑒 = 75𝑡ℎ 𝑝𝑒𝑟𝑐𝑒𝑛𝑡𝑖𝑙𝑒 + μ×𝐼𝑄𝑅

With μ = 3.0 the multiplier.

Although 2.2 is the recommended multiplier (Hoaglin & Iglewicz, 1987; Iglewicz & Banerjee, 2001), the GGPM team used a relatively higher multiplier to avoid generating too many extreme outliers and capping the data of many countries. Moreover, 3.0 is mostly applied in many standard statistical software to compute for extreme outliers. In some cases, the normalization approach that was used to compute the Green Growth Index allowed capping of the outliers through benchmarking. As explained in detail in Chapter 5.6.2, this will depend on the relationship of the indicator to green growth, whether negative or positive, and value of the indicators relative to the sustainability targets, whether above or below. When extreme outliers cannot be capped through benchmarking, they were capped prior to normalization. This is the case for the following indicators. Table 2 presents the number of capped values.

- EE1: Ratio of total primary energy supply to GDP (MJ per $2011 PPP GDP)
- EW2: Share of freshwater withdrawal to available freshwater resources (percentage)
- ME2: Total material footprint (MF) per capita (MF tons per capita)
- EQ1: PM2.5 air pollution, mean annual population weighted exposure (Micrograms per m
^{3}) - EQ2: DALY rate as affected by unsafe water sources (DALY lost per 100,000 persons)
- EQ3: Municipal solid waste (MSW) generation per capita (Tons per year per capita)
- GE1: Ratio of CO
_{2}emissions to population, excluding AFOLU (Metric tons per capita) - GE2: Ratio of non-CO
_{2}emissions to population, excluding AFOLU (Ton per capita) - GE3: Ratio of non-CO
_{2}emissions in agriculture to population (Gigagrams per 1,000 persons) - GV1: Adjusted net savings minus natural resources and pollution damages (Percent of GNI)
- SE2: Ratio of urban-rural access to basic services, such as water, sanitation, and electricity (Percent)

Capping outliers implies replacing extreme values with other values that more or less correspond to the structure of the rest of the dataset or the normal distribution. For the Green Growth Index, the GGPM team used the values of the lower and upper fences depending on whether the extreme outliers are beyond lower or upper fences as shown in Appendix 2. Except for the adjusted net savings minus natural resources and pollution damages (GV1), all other indicators with extreme outliers took the upper fence as their capped values.

**5.5 Correlation of indicators**

Bivariate correlation was used to analyze the strength of the association between the indicators in each dimension. Pearson correlation was the appropriate technique to use for the Green Growth Index because its indicators are continuous, and only a few of them have extreme outliers (chapter 5.4). Chok’s (2008) study reveals that the correlation coefficient generated from this technique could improve statistical power even for distributions with moderate skewness. Its coefficient can take values from -1 to +1, where -1 shows perfectly linear but with negative relationship, +1 shows perfectly linear and with positive relationship, and 0 shows no linear relationship between the indicators (Bolboaca & Jäntschi, 2006). In the case of the Green Growth Index, the absolute values of the coefficients are more important than their signs. The aim of the correlation analysis is twofold: the first is to identify redundant indicators with very strong correlation, inducing double counting on the weights or the coefficient values; and the second is to verify whether indicators have acceptable levels of association in their respective dimensions or the p-value.

There are no clear rules on how to rate the values of the coefficients. According to Schober, Boer, & Schwarte (2018), many studies agree that “a coefficient of less than 0.1 indicates a negligible and more than 0.9 a very strong relationship, values in between are disputable” (Schober, Boer, & Schwarte, 2018: p.1765). In order to validate our indicators, it is necessary to have an acceptable significant correlation between 0.1 and 0.9. However, some experts consider these values very low and very high, respectively. The GGPM team thus interpreted the coefficient values according to a different range: 1 to 0.9 as very high; 0.89 to 0.7 and 0.1 to 0.29 as acceptable; 0.69 to 0.3 as ideal; and less than 0.1 as very low. The significance level of the correlation coefficient is represented by the p-value. When the p-value is below 0.01, then confidence in the correlation is 99 percent, or a 1 percent level of significance. When the p-value is between 0.01 and 0.05, then confidence is 95 percent, or a 5 percent level of significance, and when it is between 0.05 and 0.10, then confidence is 90 percent, or a 10 percent level of significance. Here the GGPM team investigated the absolute values of the correlation coefficients, only considering those with levels of significance that are equal or greater than 10 percent. Table 3 summarizes the results of the correlation analysis for each dimension, presenting those coefficients whose levels of significance are 10 percent or higher. Appendix 3 presents detailed results of the correlation analysis.

The correlation coefficients with significance levels of 10 percent or higher fall in the interval between 0.9 and 0.1 for all dimensions (Table 3), which means that no indicator has a very high level of correlation with another indicator. Many coefficients fall at an ideal level, between 0.3 and 0.7. However, a larger number of the coefficients are at na acceptable low level, between 0.1 and 0.3, particularly for indicators for green economic opportunities and efficient and sustainable resource use. About 10 percent of the correlation coefficients for social inclusion indicators are between 0.7 and 0.9, which is at an acceptable high level. The results of the correlation analysis reveal that there are no redundant indicators in our dataset, although many indicators have low, yet acceptable, levels of correlation. The only indicator with no statistically significant correlation with other indicators is the share of patent grants in environmental technology to total patent grants (GN1), one of the four indicators under the green economic opportunities dimension. This can be attributed to the small number of data points for this indicator, having the lowest number even after imputation (Table 1). Overall, the correlation analysis confirms the validity and soundness of the model.

It is worth mentioning, however, that the indicators in the final framework are a result of an iterative process of statistical validation of the indicators. Other indicators were also considered in the framework but excluded and replaced with other indicators due to a very high correlation. These indicators include lower secondary completion rate, total (percentage of relevant age group); mean years of schooling (number of years); studentteacher ratio, primary school; gender inequality index; poverty headcount ratio at $1.90 per day; universal health coverage (UHC) service coverage index; wage and salaried workers, total (percentage of total employment); and share of GHG emissions and removals to population for AFOLU (Gigagrams per 1,000 persons).

**5.6 Normalization of indicators**

- It is simple and the most widely used method, which will allow replication of the Green Growth Index by governments at the national and subnational levels.
- It can integrate upper and lower bounds in the method, which will reduce the problems of extreme values and partially correct for outliers.
- It allows application of targets in the method, which will represent benchmarking of sustainability targets.

## 5.6.1 Rescaling (min-max)

Generally, the method rescales a given indicator xi into different intervals with an identical range between 0 and 1 based on a minimum (X*) and a maximum (X*

_{min}*) (*

_{max}*Equation 1*). Equation 1 Many sustainability, environmental, and governance indices are using the rescaling method to normalize indicators. They include the Human Development Index of the United Nations Development Programme (UNDP), the Inclusive Green Growth Index of ADB, the Sustainable Society Index of the Sustainable Society Foundation (SSF), the Worldwide Governance Index of the World Bank (WB), the E-Government Development Index of the UN Public Administration Network, and the Democracy Index of the Economist Intelligence Unit (EIU). The range of the indices, however, is often not [0,1] because the rescaling method offers the advantage of setting boundaries (Talukder et al., 2017).

## 5.6.2 Benchmarking (lower/upper bounds)

*Equation 2*presents a more general mathematical function of the rescaling method in Equation 1 to include information on lower bound a and upper bound b. The values of these boundaries are assigned arbitrarily and often depend on the objectives of the index. For example, ADB’s Inclusive Green Growth Index has a range of 1 to 6 with the objective of aligning the scores with those of the World Bank’s Worldwide Governance Index (Jha et al., 2018). The Green Growth Index used the range [1,100]. The lower bound of 1 is used instead of 0 because during the regional workshops (Chapter 3), some experts suggested avoiding using 0 in the index because it provides a negative notion and discourages performance improvement. Although the rescaling method generates unitless numbers with the objective of facilitating comparison across not only indicators but also years and countries, scores of zero could be misinterpreted to mean the lack of capacity to perform in a given indicator on green growth. The upper bound of 100 is used to imply achievement of the sustainability target for a given indicator (Chapter 5.6.3). Equation 2 By integrating the targets into the rescaling method, the distance to sustainability targets can be directly measured from the scores of the indicators, or benchmarking (chapter 5.8). This approach is also referred to as the benchmarking normalization function, which “depends on indicator values each being mapped to some value based on a qualitative valuation of their level of sustainability” (Pollesch & Dale, 2016: p.198). OECD’s Measuring Distance to the SDG Targets (OECD, 2019b, 2019a) and SDSN’s SDG Index (Lafortune, Fuller, Moreno, Schmidt-traub, & Kroll, 2018; Sachs et al., 2019) applied this approach to measure country performance relative to the SDG targets. Pollesch & Dale (2016) compared how this approach was used in various studies to assess sustainability (e.g. Krajnc and Glavic, 2005; Castoldi and Bechini, 2010; Hayashi et al., 2014; Maxim, 2014; Pinar et al., 2014). In these studies, the boundaries were referred to as sustainability “thresholds,” which were defined as either internal or external. Internal thresholds can refer to values that are specific to the system and the environmental or socio‑economic sensitivities of the system being studied (Pollesch & Dale, 2016). The study of Pinar, et al. (2014) provided an example for using external thresholds, which were derived from outside sources, such as literature and international legislations. The GGPM team used both internal and external thresholds, which, in the context of green growth, refer to the sustainability targets. In line with the study Pinar et al. (2014), the external thresholds in the Green Growth Index are targets derived from literature. Specifically, these are targets that are explicitly agreed for the SDGs; implicit SDG targets based on the interpretations of OECD (2017b, 2019b, 2019a) and/or SDSN (Sachs, Schmidt-Traub, Kroll, Lafortune, & Fuller, 2018; Sachs et al., 2019); or targets identified by experts for other international agreements, such as the air quality guidelines (WHO, 2005), Aichi targets (Leadley et al., 2014), and material resources (Bringezu, 2015). Meanwhile, the internal thresholds are targets derived from the mean values of the top county performers for specific indicators (Chapter 5.6.3). The methods for integrating the boundaries or thresholds in the normalization function varied among different studies, mostly depending on the characteristics of the indicators used to measure these boundaries. In the case of the Green Growth Index, five different cases were identified for computing the upper bound b and integrating in the rescaling normalization method. Each case is elaborated below. Case 1 was applied indicators with a positive relationship to green growth and maximum values (X

_{max}) that were less than the sustainability target (X

^{t}). In this case, the upper bound

*was based on the ratio between the difference of the maximum from the minimum value and the difference of the sustainability target from the minimum value (*

**b***Equation 3*). The reference point for both the maximum value and the sustainability target should be the minimum value of the indicator, which, in many cases, was not equivalent to zero. Case 1 assumed that none of the countries has reached the sustainability target of 100. Case 1 assumptions Equation 3 Case 2 was applied to indicators with a negative relationship to green growth and minimum values (X

_{min}) that were greater than the sustainability target (X

^{t}). Since the indicators have a negative relationship to green growth, the normalization function in Equation 4 was inverted. In this case, upper bound

*was based on the ratio between the difference of the minimum from the maximum value and the difference of the sustainability target from the maximum value. The reference point for both the maximum value and the sustainability target should be the maximum value of the indicator. Similar to Case 1, Case 2 assumed that none of the countries has reached the sustainability target of 100. Case 2 assumptions Equation 4 Case 3 was applied to indicators with a positive relationship to green growth and some maximum values (X*

**b**_{max}) that were greater than or equal to the sustainability target (X

^{t}). The rescaling normalization function was modified, using the sustainability target as reference rather than the maximum value. For countries with values (

*x*

_{i}) that were greater than the sustainability target, their values for the indicator were modified by taking the value of the sustainability target. This assumed that they already met the target. This rescaling normalization method hence allowed the capping of any extreme values or outliers using the target value. Since upper bound

*was based on the ratio between the difference of the maximum, which was capped using sustainability target, from the minimum value and the difference of sustainability target from the minimum value (*

**b***Equation 5*), b = 100. Case 3 assumed that some countries have reached the sustainability target of 100. Case 3 assumptions Equation 5 Case 4 was applied to indicators with a negative relationship to green growth and some minimum values (X

_{min}) that were less than or equal to the sustainability target (X

^{t}). Because the indicators have a negative relationship to green growth, the normalization function in Equation 6 was inverted. Moreover, the function was modified, using the sustainability target as reference rather than the minimum value. For countries with values (

*x*

_{i}) that were less than the sustainability target, their values for the indicator were modified by taking the value of the sustainability target. Similar to Case 3, the countries were assumed to have already met the target and any extreme values or outliers were capped using the target value. Since upper bound

**was based on the ratio between the difference of minimum, which was capped using sustainability target, from the maximum value, and the difference of the sustainability target from the maximum value (Equation 6), b = 100. Case 4 assumed that some countries have reached the sustainability target of 100. Case 4 assumptions Equation 6 Case 5 is a special case where there are both lower and upper bounds, which correspond to two sustainability targets: one at the minimum level and the other at the maximum level. This case was only applied to the share of freshwater withdrawal to total available freshwater, which has values lower than the minimum sustainability target and higher than the maximum sustainability target. For countries that met these conditions, their values for the indicator were modified by taking the values of the sustainability targets. Any extreme values or outliers were capped using these target values. Since upper bound**

*b**was based on the ratio of the same values, b = 100. This indicator has a negative relationship to green growth, so the normalization function in Equation 7 was inverted. Case 5 assumed that some countries have reached the sustainability target of 100. Case 5 assumptions Equation 7*

**b**## 5.6.3 Sustainability targets

Figure 14 and Table 4 present the characteristics of the sustainability targets that were used to compute upper bound*in Chapter 5.6.2. Case 3 applied for more than half of the targets, and Case 4 applied to about a quarter of them. The former indicates that the indicators have a positive relationship to green growth and maximum values that were greater than the targets, while the latter suggests that indicators have a negative relationship to green growth and minimum values that were less than the targets. The number of indicators with a positive relationship to green growth is slightly higher than those with a negative relationship. The targets were grouped into three types: SDG targets; other targets, whose sources are not from the SDG indicators; and the mean of the top five performers. Where targets are not available from the SDG indicators and other reliable literature, they were computed based the average values of the top five performing countries (bottom 5 performing countries for negative relationship to green growth). This approach was adopted from SDSN’s Sustainable Development Report, which presents the SDG Index and Dashboards (Lafortune et al., 2018; Sachs et al., 2018, 2019). The targets in the Green Growth Index were aligned as much as possible with the SDG targets. Reference were thus made to those studies that identified targets for the SDGs, mainly OECD (2019a, 2019b) and SDSN (Sachs et al., 2018, 2019). For the SDG targets, the reference year was 2030, except for the share of marine biodiversity, which is 2020. Many countries have already achieved the 2030 targets for the SDG indicators (Table 4).*

**b**To sum up, the criteria for selecting the sustainability targets are based on the following:

- For SDG indicators, SDG targets, both explicit and implicit, which were suggested in the OECD and SDSN reports were used. If the interpretation of implicit targets is different, the SDSN values, which are applied on a global context, were adopted.
- For non-SDG indicators, targets suggested in scientific literature and reports from international organizations were used.
- For SDG indicators not included in the OECD and SDSN reports, the mean of the top five performers was used.
- For non-SDG indicators with no available information from the literature and reports, the mean of the top five performers was used.

**5.7 Weights of indicators and dimensions**

Weights determine the relative importance of the indicators to each other. It entails the use of expert or subjective judgement that can become complicated in case of a multidimensional concept (OECD & JRC, 2008; Michaela Saisana & Saltelli, 2011). Gan et al. (2017) broadly categorized methods for weighting indicators into three: statistic-based weighting, public/expert opinion-based weighting, and equal weighting.

Statistic-based weighting uses quantitative methods to identify explicit weights, such as the principal component analysis, the data envelopment analysis, and the conjoint analysis (Nardo et al., 2005; OECD & JRC, 2008; Greco et al., 2018). The principal component analysis (PCA) is widely used to transform data into fewer dimensions and provides summaries of characteristics of high‑dimensional data (Lever, Krzywinski, & Altman, 2017; Lever et al. 2017), but it can also be used to generate weights for the indicators based on the factor loadings (Chao & Wu, 2017; Hong‑jun & Jin‑feng, 2013). The GGPM team used PCA to compute the weights for the indicators (Appendix 4). The PCA weights, however, were not used in computing the Green Growth Index for two reasons: first, properties of the data influence the weights, which are expected to change when a new dataset with different structures are added to the composite index (Chapter 7.1); second, according to OECD & JRC (2008), this weight construction method is not valid and can be misleading for policy-guiding indicators. The weights from the PCA were used for the robustness check (see chapter 5.10).

The analytic hierarchy process (AHP) and the budget allocation process are examples of public or expert opinion-based weighting (Hudrliková, 2013). AHP is a participatory and multicriteria decision‑making approach that informs about the relative importance of indicators based on their pairwise comparisons (Dedeke, 2013; Pakkar, 2014). In AHP, the subjective judgment of the experts influences the weights. To facilitate the participation of the experts in identifying weights for the indicators, a survey questionnaire on AHP was developed for the Green Growth Index and distributed during the regional consultation workshops. The results of AHP revealed that there is a large divergence in consensus not only across regions but also across dimensions of green growth (Appendix 4). For this reason, it makes it difficult to use the AHP results to assign weights to the indicators. A higher level of consensus would be needed to identify the appropriate weights for the indicators.

The GGPM team used equal weighting for the Green Growth Index. Equal weighting is the most commonly used method in composite indices (Gan et al., 2017; Greco et al., 2018). Equal weights, which are often based on normative assumptions or based on understanding of the underlying concepts, are applied in composite indices, such as the Human Development Index, the Ecological Footprint, the Genuine Saving Index, the Environmental Vulnerability Index, the Sustainable Society Index, and the Corruption Perception Index. By not using weights from either AHP or PCA, the GGPM team assumed implicitly that the indicators have equal weights. Explicitly, however, the indicators do not have equal weights because the dimensions have a different number of indicators. This is clearly revealed by the PCA results in Figure A4.1 (see Appendix 4), where more weights are estimated for dimensions with the least number of indicators.

**5.8 Aggregation of indicators and dimensions**

Aggregation reduces dimensionality and provides a single holistic value (Pollesch & Dale, 2016) to measure performance. The two most common and simple methods include linear aggregation using arithmetic mean and geometric aggregation using geometric mean (Santeramo, 2016), with the former being more widely applied than the latter (Greco et al., 2018). For example, the Environmental Vulnerability Index and the Corruption Perception Index use linear aggregation, while the Human Development Index and the Sustainable Society Index use the geometric aggregation. The choice of aggregation methods should consider the properties of data, level of compensability, and implications on policy (Table 5). Both methods were used at the different levels of aggregation of the Green Growth Index (Figure 15).

At level 1, the indicators were linearly aggregated into indicator categories using the arithmetic mean. An important consideration here is the compensability of the individual indicators in each indicator category. This allows countries with poor performance in one indicator, for instance, due to lack of resources, to be compensated by another indicator in the same indicator category. In most cases, the level of correlation between indicators in the same category is not negligible (Chapter 5.5), which can be assumed that they have some degree of substitution. Moreover, at level 1 of aggregation, a rule on missing value for a category with more than four indicators was applied: Countries with more than 25 percent of missing values were dropped. This method was adopted from Jha et al. (2018) in developing ADB’s Inclusive Green Growth Index, which allowed indicators with missing values to be “substituted” by other indicators. This rule was not applied for the indicators in resource efficiency and green economic opportunities, which have less than three indicators in each category.

At level 2, geometric aggregation was applied to the indicator categories to allow only partial compensability between indicators in each dimension. Similar to level 1, the 25 percent rule on missing values was applied to dimensions with more than four indicator categories, such as in the case of resource efficiency and green economic opportunities. This rule was not applied for the indicator categories under natural capital protection and social inclusion, which have only three categories each.

At level 3, geometric aggregation was applied to the dimensions, and the 25 percent rule on missing values was not applied. At this level of aggregation, no dimension was allowed to easily substitute for the other dimensions to improve the Green Growth Index. Thus, as the level of aggregation increases, the level of substitutability decreases.

During the third phase of consultations, the expert reviewers were asked as to whether or not they agree on the aggregation methods used at different levels. This was important because when measuring performance relative to the SDGs, the choice of not only the indicators but also the methods influence countries’ ranks (Miola & Schiltz, 2019). More than half of them agreed on the methods used to aggregate the Green Growth Index (Figure 16). However, the level of agreement slightly declined for the third level of aggregation. More than a quarter of the expert reviewers could not provide an answer to the question. The number of those who did not agree was small compared to those who agreed and who were not knowledgeable of the methods.

**5.9 Ranks and benchmarks**

Ranks and benchmarks are useful methods to measure green growth performance. During the regional consultation workshops, which constitute the second phase of consultations, the experts’ opinions on how to rank the countries and which targets to use to benchmark the indicators were collected (Chapter 3). These topics need careful attention because they can influence the acceptability of any composite indices by policymakers, the public, and other stakeholders. Recognizing the continuous debates on the utility and credibility of composite indices, Saisana & Saltelli (2011: p.268) emphasize that indices “should never be seen as a goal, per se, regardless of their quality, [but] … as a starting point for initiating discussion and attracting public interest and concern”.

“Rankings can be powerful tools of both branding and influence” (The Economist, 2014), but they also create controversies (Michaela Saisana & Saltelli, 2011; Chowdhury & Sundaram, 2016; Seth & McGillivray, 2018). Many popular indices, such as the Human Development Index, Environmental Performance Index, Corruption Perception Index, and Doing Business, use ranks to compare performance across countries. While the experts agreed on the usefulness of ranks to measure performance, they suggested avoiding the use of global ranks. They preferred using ranks only for groups of countries, by region or level of development, for instance, through which performance is more or less comparable.

“[A] set of indicators may have effect only when seen through a relevant benchmarking system that will give meaning to the produced measurements” (Benetatos, 2008: p.3). The methods and parameters require careful consideration when making decisions on benchmarking. The benchmarking method in the Green Growth Index was integrated in the normalization of indicators (Chapter 5.6). Benchmarking normalization is commonly applied in global sustainability indices, for instance, those developed by UNDP and OECD. The benchmarking parameters, specifically sustainability targets (chapter 5.6.3), were based on SDG targets as well as targets defined by other international organizations. Many experts suggested using SDGs and other internationally agreed targets, which the countries have to fulfil and achieve based on their international commitments, to benchmark the Index.

**5.10 Robustness check**

## 5.10.1 Sensitivity analysis

The sensitivity analysis evaluates the impact on the Index of variations on the input. Two sources of uncertainty on the input exist: indicators and sustainability targets. The GGPM team manually and individually modifies the values of these inputs in a specific range and evaluates the impact on the Index. The six models used for the sensitivity analysis are described in Table 6.All in all, the sensitivity analysis based on the six models passed the robustness tests with good results (Flores et al., 2019). In this report, results of Monte Carlo analysis for Models A1.1 and A1.2 are presented. The input values were selected randomly within the specified range of change in indicator values (-100 and +100, at 20 percent interval). The values for the indicators were changed simultaneously at each iteration so that the interaction effects between the indicators were taken into consideration. This analysis enabled identification on how the scores and ranks of countries have changed within the specified range. The iteration was carried out over 1,000 times and a sequence of scores for the Green Growth Index was generated. Figure 17 summarizes the results of the sensitivity analysis after randomizing the input values within ±20 percent. The results show that input variations caused only minimal changes on the scores of the Green Growth Index. Moreover, many countries maintained their rankings. There is an average change of 3.7 units in the index values among all 115 countries which resulted to 90 percent of the countries having a change in rank fewer than 8 places. The countries show an average change in ranks of 3.5 and with the top 30 countries shifting only by 2.4 places. In addition, results of the sensitivity analysis show that changes in the input values to a certain amount have minimal impacts on countries with higher ranks than those with lower ranks.

## 5.10.2 Uncertainty analysis

The uncertainty analysis evaluates the impact of changing the assumptions made and methods used to build the model of the Green Growth Index. There are four assumptions selected: aggregation, normalization, outliers and weights. These were easily measurable and had a high potential impact on the results and rankings of the index. The eight models for uncertainty analysis are described in Table 7.Overall, the uncertainty analysis reveals that the impacts of changing model assumptions are acceptable and the model for the Green Growth Index is robust (Flores et al., 2019). To evaluate the overall impact of simultaneously applying the eight uncertainty models above, the GGPM team also applied a Monte Carlo analysis. Like in the sensitivity analysis, the aim was to analyze the changes in countries’ scores and ranks for the Green Growth Index relative to the baseline model. Here, the assumptions were also randomized 1,000 times, building new scores and ranks for the Green Growth Index for each country each time. Figure 18 summarizes the results of the Monte Carlo analysis which reveal that the uncertainty is overall quite low and rankings are significantly maintained. About 48 percent of the countries show confidence intervals of three places or less, while 87 percent of the countries have a change in ranking of less than 10 places. On average, the countries show a change in ranks of 4.7, which is acceptable when ranking 115 countries. Results of the Monte Carlo analysis also show that changes in the assumptions on aggregation, normalization, outliers, and weights have lesser impacts on countries with higher ranks than those with lower ranks (Figure 18). This can be attributed to the larger divergence in the scores across indicators and indicator categories and dimensions in low ranking countries.