Policy evaluation and eﬃciency: a systematic literature review

This paper provides a systematic literature review of studies investigating the eﬀect of an intervention on the eﬃciency of a decision-making unit, when eﬃciency is computed using nonparametric frontier approaches. This paper oﬀers a guide for future research by identifying patterns in (1) the ﬁelds of application, (2) applied eﬃciency models, and (3) analysis of eﬃciency determinants. Our ﬁndings indicate that, despite the prominent role of frontier techniques in the analysis of public sector performances and the importance of the eﬀectiveness and the policy perspective, these two approaches have long been kept separate. Nevertheless, the combination of eﬃciency and eﬀectiveness is fundamental to evaluate public interventions and to detect ineﬃciencies at the policy level, especially in key sectors such as education, health, and environment.


Introduction
In the evaluation of complex (public) activities (e.g., schools, hospitals, airports, water utilities, etc.) efficiency and effectiveness are two complementary aspects. On the one hand, efficiency analysis evaluates the performance of complex decision-making units (DMUs), accounting for the ability of transforming input into output (Farrell, 1957). On the other hand, effectiveness studies measure the performance of DMUs with respect to given goals (Golany, 1988;Cherchye et al., 2019). When the objectives are specified by a policy, effectiveness studies assess the impact of the intervention on a number of variables of interest, revealing the ability of the policy of influencing them (for a review of classical policy evaluation techniques, see Abadie and Cattaneo, 2018). Using the words of Peter Drucker, effectiveness analysis indicates whether we are doing the right thing, while efficiency reveals whether we are doing it right (Drucker, 1977;Asmild et al., 2007;Førsund, 2017). Therefore, the combination of these two perspectives is fundamental to evaluate public interventions and to detect inefficiency at the policy level (Kornbluth, 1991).
Using a systematic literature review, this paper investigates the role of frontier evaluation techniques in policy evaluation studies. The objective is to collect information on how the literature has studied the impact of external interventions on the performance of DMUs, when the performance is evaluated in terms of distance to an ideal best performance frontier. Historically, frontier evaluation approaches offered a mathematical formulation to the concept of technical efficiency (Førsund et al., 1980). As defined by Koopmans (1951) and Farrell (1957), a combination of input and output is efficient if it is not possible to increase the level of any output without increasing also the level of at least one input, or to decrease the level of any input without decreasing the current level of at least one output. In operational terms, two main approaches for frontier estimation can be distinguished, parametric and nonparametric approaches, according to the criterion used to specify the (functional) form. Despite both approaches are characterized by a number of models, the most representative models are the "stochastic frontier model" (SFA; Aigner and Chu, 1968;Aigner et al., 1977) and the "data envelopment analysis" (DEA; Charnes et al., 1978;Banker et al., 1984) model, respectively, for the parametric and nonparametric frameworks. However, the boundary between these strands is becoming blurred, as an increasing number of the deterministic models include stochastic components and vice versa (Daraio and Simar, 2007).
This paper focuses on the nonparametric efficiency literature, which includes DEA (Deprins et al., 1984), free disposal hull (FDH) (Tulkens, 2006), network (Färe, 1991;Färe et al., 2007), Malmquist (Färe et al., 1994), slack-based (Tone, 2001), bootstrap (Simar & Wilson, 1998), orderm (Cazals et al., 2002), order-alpha (Aragon et al., 2005), and conditional (Daraio and Simar, 2005) models. The continuous development of alternative techniques demonstrates the flexibility and the success of these approaches (for a review, see Seiford, 1996;Gattoufi et al., 2004;Cook and Seiford, 2009;Emrouznejad and Yang, 2018). In particular, the nonparametric frontier techniques are popular in public sector evaluations as they allow to measure the performance of complex DMUs accounting for the multiple dimensions involved, they rely only on a small set of assumptions and do not require price information (De Witte et al., 2020). Besides, the endogenous weighting of variables ensures that every unit is evaluated in the best possible light ensuring fairness and objectivity (Cherchye et al., 2007b).
Policy evaluation studies aim to determine the impact of an exogenous intervention on some outcome of interest (Heckman and Vytlacil, 2007). Traditionally, this has been done using classical econometric and statistical techniques, such as regression analyses (Draper and Smith, 1998). However, since the work of Rubin (1974), one of the main challenges in the econometric literature has been overcoming correlational evidence and assessing causal relationships. In the absence of an experimental setting, this often requires the construction of sophisticated identification strategies. For example, matching techniques are based on the idea of comparing treated and untreated units with similar covariates (Rosenbaum and Rubin, 1985;Rosenbaum, 1989;Abadie and Imbens, 2011), instrumental variables approaches solve the problem of missing or unknown control variables (Wright, 1928;Sargan, 1958;Angrist et al., 1996), difference-in-differences estimations use data with a time dimension to control for unobserved variables with a fixed trend (Donald and Lang, 2007;Lechner, 2011), or regression discontinuity designs exploit the setting where the treatment is determined according to a forcing variable and rely on the fact that locally the assignment is as good as random (Lee and Lemieux, 2010); more recently, machine learning approaches have been developed to improve predictions and detect heterogeneous effects (Athey and Imbens, © 2021 The Authors.

International Transactions in Operational Research published by John Wiley & Sons Ltd on behalf of International Federation of Operational Research Societies
A. Mergoni and K. De Witte / Intl. Trans. in Op. Res. 0 (2021) 1-23 3 2019). By identifying causal linkages, policy evaluation studies appraise the effectiveness of policy interventions, therefore, they play a fundamental role in the promotion of evidence-based policy making.
This paper provides the first systematic literature review of studies investigating the effect of an intervention (or policy) on the efficiency (or performance) of a DMU, computed via a nonparametric frontier approach. Despite the prominent role of frontier techniques in the analysis of public sector performances and the importance of the effectiveness and the policy perspective, these two approaches have long been kept separate (Førsund, 2017). With this review we aim to promote the use of nonparametric frontier techniques for policy evaluation, by cataloguing how this has been done in previous works.
The remainder of the paper continues as follows. First, we specify the criteria for the systematic review of the literature. Second, we classify the papers according their empirical application. Third, we discuss the methodologies implemented in previous works for the estimation of the efficiency scores. Next, we analyze the methodologies implemented to perform the policy evaluation and to investigate the determinants of the efficiency scores. Doing so, specific attention is given to the issues of endogeneity and causality. Finally, we provide a discussion and we indicate future lines of research.

Criteria of paper selection
This systematic review focuses on the role of nonparametric frontier analysis in policy evaluation. To implement the review, we rely on three main academic libraries: EBSCOhost, Scopus, and Web of Science (WOS). EBSCOhost is a multidisciplinary platform that includes a wide set of databases, among which is the Education Resource Information Center, sponsored by the Institute of Education Sciences of the U.S. Department of Education. Scopus is the largest abstract and citation database of peer-reviewed literature and it covers the fields of science, mathematics, engineering, technology, health and medicine, social sciences, and arts and humanities. WOS is the world's leading academic citation indexing database and search service and covers the science, social science, arts, humanities disciplines. The use of these three sources provides the widest coverage, delivering a comprehensive and interdisciplinary scientific output.
To include all relevant papers, we have implemented the search using the research string: "data envelopment analysis" AND "policy evaluation," using the "all fields" filter. We have chosen "data envelopment analysis" as our keyword for two main reasons: first, DEA is the first and most common nonparametric frontier model; second, it is a baseline for subsequent approaches. For these reasons the papers relying on FDH, Malmquist, network models, or on other nonparametric frontier approaches always refer also to DEA. To guarantee relevance, we restricted our search only to the papers published in English scholarly peer-reviewed journals. As time span we consider papers published between 1957 and January 2021. The former year provides a natural starting point for our research, as it is the year of publication of the seminal paper by Farrell (1957). This search returned in 205 documents from Scopus, 78 from EBSCOhost and 14 from WOS. Further, we performed a full-text analysis to exclude papers that did not use any DEA or DEA-related model or did not implement any policy evaluation. Finally, we added 17 papers retrieved via a snowball method. The snowball method is a technique to include relevant documents using key documents 4 A. Mergoni and K. De Witte / Intl. Trans. in Op. Res. 0 (2021)  as a starting point (Wohlin, 2014). In the remainder of the systematic literature review, we rely on this set of 81 papers.

Application-wise focus of the literature
To set the scene, we first graphically represent when the 81 papers were published. Figure 1 shows that after the early 2000s, there has been a strong increase in the number of published papers. The growth emerges since the seminal work by Simar and Wilson (1998) and Cazals et al. (2002), as these papers provide tools to obtain bias-corrected scores and to account for the influence of environmental variables on the efficiency scores.
The efficiency literature has a long tradition of analyzing policy changes in various subject fields. Categorizing the 81 detected papers in Section 2, we distinguish 10 homogeneous clusters of applications. We classified the papers in the application fields based on the policy under evaluation and on the DMUs considered. In Table 1, we relate the application fields to the Journal of Economic Literature classification system. 1 Among the 10 distinguished application fields, 3 main sectors can be identified: public sector policies, environment-related policies , and private sector policies. From Table 1, we observe that public sector policies are the dominant application field, with 52 papers. This main field involves six categories. Education is the absolute protagonist (with 22 papers), followed by local government (13 papers), health (10 papers), labor (4 papers), transportation (3 papers), and economic growth (1 paper). The second most represented field is composed of 22 papers investigating the role of environment-related policies. In particular, the field is composed of nine papers on environmental policy, seven papers on agriculture, and six papers on energy. Finally, 11 papers are clustered in the category private sector interventions to foster competitiveness and innovation. Note:. # is the symbol of cardinality, that is, the number of elements in a given set.
In what follows, we discuss for each application category the main research questions, the specific policies evaluated, and the level of intervention. Besides, we highlight the patterns in the relevant characteristics and the time trends.

Local government
A second subfield of public sector applications focuses on local government policy changes. Although there is a wider literature on local government efficiency (Narbón-Perpiñá and De Witte, 2018a, 2018b; Daraio et al., 2020), the systematic literature review revealed 13 papers that combine nonparametric efficiency analysis and local government policy evaluation.
The policy investigated by these studies are wide and heterogeneous, from the role of direct democracy (Asatryan and De Witte, 2015) to the effect of public investments for innovation competitiveness and development (Zabala-Iturriagagoitia et al., 2007;Shi and Yang, 2008;Vieira et al., 2018;Wang et al., 2019), from policies centered more on sustainability (Natesan and Marathe, 2017;Ma et al., 2020) to welfare and social assistance policies (López et al., 2008;Halkos and Tzeremes, 2011b;Broersma et al., 2013). From the perspective of the level of analysis, a consistent portion of these studies involves the evaluation of regional policies (Shi and Yang, 2008;Tzeremes, 2010b, 2011b;7 Vieira et al., 2018;Natesan and Marathe, 2017;Titl and De Witte, 2021), with a special attention given to the study case of Spain (Zabala-Iturriagagoitia et al., 2007;López et al., 2008), while the majority of the other studies focuses on the municipal and provincial levels (Broersma et al., 2013;Asatryan and De Witte, 2015;Cordero et al., 2017;Wang et al., 2019;Ma et al., 2020).

Health
The third most researched field within public sector policy evaluations are the health applications, with 10 papers. Despite the key position played by the health sector and the importance of efficiency measurement in the health care provision, application of frontier techniques to hospitals, nursing homes, and, more generally, to health management organizations are relatively recent (Worthington, 2004) and the use of efficiency techniques for the evaluation of public health policies still constitute an unexplored potential (Lobo et al., 2014). Except for the pioneer study by McCallion et al. (2000), investigating the role of the size on the efficiency of Northern Irish hospitals, literature on health policy and efficiency dates back to 2008 with the study of Zhang et al. (2008) and Puenpatom and Rosenman (2008). These studies investigate, respectively, the effect of Medicare prospective paying system on the efficiency in skilled nursing homes (Zhang et al., 2008) and the extension of health insurance programs on the efficiency of provincial public hospitals in Thailand (Puenpatom and Rosenman, 2008) or in India (Seth and Patel, 2014). In line with the latter research, Sahin et al. (2011) studied a health transformation program based on the introduction of a general health insurance in Turkish hospitals. Other studies compare different health systems (Halkos and Tzeremes, 2011a), funding systems (Ferreira et al., 2020), evaluate the introduction of information technology on the performance of hospitals (Hitt and Tambe, 2016), compare different national policies (Maragos et al., 2020), or responses to HIV (Sevigny et al., 2020). Despite the heterogeneity in the health policy considered, the large majority of studies share their unit of analysis, that is, the hospital level (McCallion et al., 2000;Puenpatom and Rosenman, 2008;Halkos and Tzeremes, 2011a;Sahin et al., 2011;Hitt and Tambe, 2016).

Other topics
Three additional topics can be classified: papers within the fields of labor, public transport, and growth.
The literature on labor involves four studies, two on the effect of some policies on labor outcomes, such as employment (Feroz et al., 2001;Althin and Behrenz, 2004) and two that focus on the impact of labor policies on productivity (Sofi and Sharma, 2015;Cordero and Tzeremes, 2018).
Three additional studies investigate the relation between efficiency and (public) transport interventions. Specifically, López et al. (2009) and Mallikarjun et al. (2014) investigate the impact of public subsidies on transport infrastructures while Park and Kim (2021) study airports efficiency.

.1. Environment
The systematic literature review distinguished nine papers evaluating the impact of environmental policies in a nonparametric frontier context. The first article dates back to Halkos and Tzeremes (2010a) who investigate the efficiency of environmental policies on biodiversity, probably inspired by the work of Nielsen et al. (2007). The late development of this literature can be attributed to the difficulty in choosing the appropriate dimensions for the evaluation of the environmental performances.
The fact that the concept of environmental performance involves different dimension results in a number of multidisciplinary studies that relate environmental issues with agricultural efficiency (Whittaker et al., 2017), marine economy (Ding et al., 2020), sustainable development (Czyźewski et al., 2020;Ma et al., 2020), or energy efficiency (Pan et al., 2019;Zhang et al., 2020). In particular, the bond between the energy and the environmental studies perspective is confirmed by the fact that all the energy papers in this review deal with environmental thematic (for more details, see Subsection 3.2.3).
Regarding the level of investigation, first studies focused on the construction of indexes to evaluate environmental performance at country level (Halkos and Tzeremes, 2010b;Rogge, 2012), but this approach was quickly abandoned in favor of studies on a lower level (as counties and provinces). This change of direction has probably been driven by the growing data availability, which is in turn driven by the rising attention for environmental concerns. With this respect, it is interesting to note that most studies were geographically located in Europe (Pan et al., 2019;Czyźewski et al., 2020) or China (Ke et al., 2015;Ding et al., 2020;Ma et al., 2020;Zhang et al., 2020).

Agriculture
As the second main category in the context of environment-related policies, we distinguished policies for the primary production sectors. In the review, we observe seven papers that use nonparametric efficiency models to evaluate agricultural policies. This literature is characterized by an analysis at the farm level and by the attention to environmental issues. In particular, the use of production efficiency to evaluate (agricultural) policy goes back to Whittaker (1994) who suggested to use DEA for (agricultural) policy analysis. This recommendation was reinforced by Färe and Whittaker (1995) who developed an intermediate input model to the evaluation of dairy farm and called for the use of production efficiency tools for the evaluation of agricultural policy. More recently, Whittaker et al. (2003Whittaker et al. ( , 2017 investigate the viability of a tax reduction policy to decrease the use of agricultural fertilizer of U.S. farms, Murillo et al. (2007) analyze the impact on efficiency and environmental adaption of the subsidy system of the Common Agricultural Reform on 1992 for animal-oriented farms in Spain, Priscilla and Chauhan (2019) focus on the role of cooperation on dairy farms in Manipur, and Kaliba et al. (2021) assess the advantage of using climate-adaptive sorghum for sorghum producers in Tanzania.

Energy
The rise of environmental concerns is also reflected in the energy efficiency literature. In particular, the six papers we observe in the systematic literature review analyzed the effect on energy efficiency of policies involving the use of renewable energy sources (Halkos et al., 2015), incentives for sustainability and emission reductions (Lin and Zhu, 2019;Baláź et al., 2020;Zhang et al., 2020), and energy conservation (Xin-gang et al., 2020). The geographical distribution of these studies suggests that particularly Europe (Morfeldt and Silveira, 2014;Halkos et al., 2015;Baláź et al., 2020) and China (Lin and Zhu, 2019;Xin-gang et al., 2020;Zhang et al., 2020) are the regions with most attention to environmental efficiency.

Competitiveness and innovation
With 11 relevant publications, the competitiveness and innovation applications constitute the third main field of application in absolute terms. As a number of local government studies also focused on the evaluation of public investments for innovation and competitiveness, we need to clarify the difference between these two categories. Studies that analyze the impact of policies for innovation at municipal or regional level have been categorized as local government studies, while the ones considering the impact of similar policy at firm level have been set in the category competitiveness and innovation. The only exception is the paper by Halkos and Tzeremes (2013), which investigates the impact of different national cultures on innovation in Europe.
Among the policy to promote competitiveness and innovation of firms, we observe a heterogeneous set of interventions. From science and technology policies, as in Du et al. (2014), to policy manufacturing policies to innovate the logistic industries, as in He et al. (2020), while other studies focus on the effect of policies for competitiveness (Halkos and Tzeremes, 2007;Wang et al., 2019). In particular, it is interesting to distinguish R&D projects as they constitute the biggest slice (Shi and Yang, 2008;Hsu and Hsueh, 2009;Jiménez-Sáez et al., 2011.

Method-wise focus of the literature
The idea of technical efficiency of a unit as the ratio of its actual production over the potential production, for a given set of inputs, dates back to Farrell (1957). Starting from this seminal paper, parametric and nonparametric frontiers models developed the idea of measuring inefficiency in terms of the relative ability of transforming inputs into valuable outputs with respect to the best performing peers.
Regarding the nonparametric frontier models, the reference point for the DEA literature is the Charnes, Cooper and Rhodes (CCR) model proposed by Charnes et al. (1978), which assumes an underlying convex production function characterized by constant return to scale and is, in its original formulation, output-oriented. 2 In the literature on efficiency for policy evaluation the Banker, Charnes and Cooper (CCR) model encountered great success, in particular in the field of education and firms' productivity. However, only few studies rely entirely on the CCR model (for education see Soares de Mello et al., 2006;Coupet and Barnum, 2010;Montoneri et al., 2012, and for firms' productivity see Kao and Hwang, 2010;Castillo and Salem, 2013). The majority of papers uses the CCR model in combination to the BCC model, proposed by Banker et al. (1984) to relax the constant return to scale assumption, and to investigate the possible presence of scale economies (Halkos and Tzeremes, 2013;Du et al., 2014;Schubert and Yang, 2016;Natesan and Marathe, 2017;Baláź et al., 2020).
Even more flexible is the FDH model proposed by Deprins et al. (1984), which abandons the convexity assumption to introduce a step-like frontier. With respect to DEA, FDH models are particularly appealing under uncertainty as they rely solely on the assumption that production possibilities satisfy free disposability (Cherchye et al., 2000;Tulkens, 2006). However, for what concerns the practical implementation, nonconvex models are still not as popular as their convex counterparts and they remain potentially underutilized in efficiency estimation (Cook and Seiford, 2009). This is reflected also in the works on efficiency for policy evaluation. Among the papers included in this review only five implemented an FDH approach (Cherchye et al., 2007a;Haelermans and De Witte, 2012;Asatryan and De Witte, 2015;Cordero et al., 2017;D'Inverno et al., 2021). A possible explanation can be traced back to the fact that nonconvex models are computationally more demanding, as they require to solve binary mixed integer programming problems instead of the classical linear problems, and the solution to these problems is less likely to be covered in the commonly available software packages (Kerstens and Van de Woestyne, 2018).
As for the orientation, both input-oriented and output-oriented models have been implemented in the literature, adapting to the characteristics of the empirical applications in question. Interestingly, we note that in educational applications output-oriented models are more common, as they highlight the perspective of educational outcome production (in line with the suggestion of Worthington, 2001), while in health application input-oriented models are implemented to give more emphasis on the ability of spending less resources. In the other application fields, it is not possible to identify similar patterns.
The literature is characterized by a continuous number of innovations and the classical DEA approaches have been used as a starting point for successive models. In the context of policy evaluation particular emphasis is given to the Malmquist index, which allows to measure efficiency changes over time (or across programs) and to the Network DEA model, which allow detection of the inefficiency of internal processes employed by the units. The Malmquist index had particular success in the context of educational policies as in Sahin et al. (2011), Zhang et al. (2011), Sav (2012, Schubert andYang (2016), andD'Inverno et al. (2021), while the network model was exploited in the context of firm's performance (Kao and Hwang, 2010;He et al., 2020).
Since the seminal work of Simar and Wilson (1998), bootstrap and bias-corrected techniques have become common across different sectors. These techniques have been used to analyze the sensitivity of efficiency scores relative to sample variations of the estimated frontier and aim to mitigate them. Therefore, they are particularly useful in the presence of noisy input or output measures, which are common in the field of environment (Halkos and Tzeremes, 2010a;Czyźewski et al., 2020), innovation (Castillo and Salem, 2013;Halkos and Tzeremes, 2013), and education (Tochkov et al., 2012;Papadimitriou and Johnes, 2019). A further innovation was provided by Cazals et al. (2002) and Daraio and Simar (2007) with the introduction of robust and conditional models as they allow to investigate the effect of possible environmental variables on the efficiency scores, without assuming the separability condition. Despite attractive features of the conditional efficiency approach (see De Witte and Kortelainen, 2013), they have been used merely in education (Cherchye et al., 2007a;Haelermans and De Witte, 2012;De Witte et al., 2013b;D'Inverno et al., 2021), local government (Asatryan and De Witte, 2015;Cordero et al., 2017;Titl and De Witte, 2021), and innovation (Halkos and Tzeremes, 2013) applications. The absence of a wider diffusion can be partly attributed to the fact that conditional models rely on a sophisticated theoretical apparatus and partly to the fact that at the moment no statistical software provide built-in procedure to implement conditional analysis.
Finally, for the purpose of validation and to give robustness to the results, seven studies compare different DEA models in the fields of education (Glass et al., 2006;Waldo, 2007), health (Halkos and Tzeremes, 2011a;Seth and Patel, 2014), environment (Halkos and Tzeremes, 2010a), local government (Halkos and Tzeremes, 2010b), and innovation (Halkos and Tzeremes, 2013) and three implement both DEA and non-DEA approaches in the fields of local government (Zabala-Iturriagagoitia et al., 2007;Wang et al., 2019) and education (Tochkov et al., 2012). The combination of several techniques is especially common when the application fields are heterogeneous and interdisciplinary, as these characteristics hamper the establishment of wide accepted models.

Determinant-wise focus of the literature
To investigate the effect of the policy on efficiency, it is necessary to detect the determinants of the efficiency scores. Traditionally, this has been done using two-step procedures (starting from the pioneering paper by Byrnes et al. (1988)). In the first step the efficiency scores are computed, while in the second step the scores are used to make inference and investigate their determinants. Simar and Wilson (2007) pointed out that two-step procedures are biased when the separability condition is not fulfilled, that is, when the background variables influence the input or output mix. Conditional efficiency models (initially developed by Cazals et al., 2002) allow to overcome this criticism. In particular, Daraio and Simar (2007), Wilson (2011), andKortelainen (2013) suggest to nonparametrically regress the ratio of unconditional over conditional scores on the set variables that affect, but do not directly enter as inputs or outputs, the production process. Table 2 overviews the papers implementing this conditional approach.
Next, we consider the literature from the perspective of the model implemented in the second stage (Table 3). Interestingly, it is not possible to observe patterns between the suggested models and the application fields. What emerges, instead, is that three macrocategories can be singled out: the first and largest category is constituted by regression-based analysis; the second involves studies that aim to overcome correlational evidence to assess causal relations; the final category encompasses studies that do not implement special inference techniques, relying on simple descriptive statistics analysis.
Regression analyses constitute the main instrument of investigation (see Coupet and Barnum, 2010;Sofi and Sharma, 2015;Schubert and Yang, 2016;Vieira et al., 2018;Ding et al., 2020). However, standard regression analysis does not account for the fact that the efficiency score is bounded between 0 and 1, and therefore inference might be biased (for a discussion, see Hoff, 2007). To address this issue, semiparametric and nonparametric regression models have been implemented (see Murillo et al., 2007;Halkos and Tzeremes, 2011b). Besides, although heavily criticized by Simar and Wilson (2007), particular attention has been given to truncated models (censored Tobit regression  (2010) Note:. # is the symbol of cardinality, that is, the number of elements in a given set.
in Althin and Behrenz, 2004;Waldo, 2007;Hsu and Hsueh, 2009;Mallikarjun et al., 2014, andtruncated regression in Puenpatom andRosenman, 2008;Tochkov et al., 2012;Czyźewski et al., 2020). In the context of panel data, instead, panel models are implemented (see Zhang et al., 2011;Papadimitriou and Johnes, 2019). A main drawback of the observed regression studies is the lack of a proper identification strategy, which impedes that the evidences supplied overcome correlation. Two main obstacles impede this overcoming. First, the interventions analyzed are not randomly assigned to the units, therefore, the treated units and nontreated units may differ in observed and unobserved characteristics. Second, the intrinsic multidimensionality involved in the computation of the efficiency scores makes it difficult to distinguish between the causal effect of the policy on the single inputs and outputs and the true causal effect of the policy on the efficiency. To overcome these issues, few attempts have been made in the literature by implementing classical econometric techniques for causal inference.  (2000), Glass et al. (2006), Murillo et al. (2007), Censored models Althin and Behrenz (2004), Waldo (2007), Zhang et al. (2008), Puenpatom and Rosenman (2008), Hsu and Hsueh (2009) Note:. # is the symbol of cardinality, that is, the number of elements in a given set. RDD, regression discontinuity design.
In particular, we observe that propensity score matching (Castillo and Salem, 2013;Papadimitriou and Johnes, 2019) and difference-in-differences (Hitt and Tambe, 2016;Pan et al., 2019;Baláź et al., 2020;He et al., 2020) are the more common techniques, and sometimes they are used as complementary tools (Lin and Zhu, 2019;Ma et al., 2020). Recent studies by D' Inverno et al. (2021) (and Mergoni et al. 2020) translated the idea of the identification strategy of the regression discontinuity design in the realm of frontier estimation. The value added of these papers is that they do not assume the separability condition and account for the endogeneity directly in the computation of the efficiency scores. Finally, descriptive studies involve correlation analysis (Murillo et al., 2007;Montoneri et al., 2012;Tochkov et al., 2012) and the comparison of the distribution of the efficiency scores of particular subpopulations. In this context, the classical t-test and the ANOVA test are commonly used (Kantabutra and Tang, 2010;Zhang et al., 2011;Sav, 2012). Besides, the Mann-Whitney test and the Wilcoxon test are used to assess return to scale characterization (Puenpatom and Rosenman, 2008;Halkos and Tzeremes, 2009, 2010a, 2011aZhang et al., 2011), while the Jacknife and the Kruskal-Wallis tests are used to detect differences in the distribution of the efficiency scores (Waldo, 2007;Puenpatom and Rosenman, 2008).  (2012), Broersma et al. (2013), Sofi and Sharma (2015), Asatryan and De Witte (2015), Schubert and Yang (2016) Note:. # is the symbol of cardinality, that is, the number of elements in a given set.

Endogeneity
This subsection focuses on the topic of endogeneity, as it is crucial both in the policy evaluation and in the efficiency context to obtain unbiased results. On the one hand, endogeneity is defined by econometricians as the correlation between the error term and at least one of the regressors. Three main sources of endogeneity are reverse causality, omitted variable bias (also in the form of unobserved heterogeneity and selection bias), and measurement errors. On the other hand, in the context of frontier estimation, Orme and Smith (1996) defined endogeneity as the correlation between the inefficiency and the level of input and shown by that it may cause a bias in the efficiency scores' estimation. This kind of endogeneity is also known as the input demand endogeneity or Olley and Pakes endogeneity, from their seminal paper (Olley and Pakes, 1992). Despite their importance, both types of endogeneity are still overlooked in the efficiency literature (Cordero et al., 2015).
To detect the effect of a policy on the efficiency, endogeneity issues play a central role, as it is likely that a nonrandom intervention affects both the input level and the efficiency of the units under analysis and, therefore, induces a correlation between these variables. Despite this potential threat, endogeneity concerns are discussed only to a limited extent in the papers collected. As reported in Table 4, the systematic literature review resulted in 16 papers that mentioned endogeneity, and only in nine papers that discuss it and indicate its causes. These numbers are surprisingly low considering that none of the investigated interventions in our review considers random assignments. The empirical fields where endogeneity is more discussed are education (Waldo, 2007;Schubert and Yang, 2016;Santín & Sicilia, 2018;Papadimitriou and Johnes, 2019;D'Inverno et al., 2021), local government and environment (Halkos and Tzeremes, 2011b;Broersma et al., 2013;Asatryan and De Witte, 2015;Cordero et al., 2017;Ma et al., 2020), and environment (Pan et al., 2019;Czyźewski et al., 2020;Ma et al., 2020;Zhang et al., 2020). Interestingly, endogeneity is regarded, in any field of applications, only from a policy evaluation perspective, that is, it is discussed in terms of correlation between a regressor and the error term. Only Sicilia (2018) andD'Inverno et al. (2021) are an exception to this tendency, as they account also for the Olley and Pakes type of endogeneity, that is, they account for the endogenous variables that enter the construction of the efficiency score.

Discussion and future research
This systematic literature review analyzed the papers investigating the effect of policy interventions on performance, when performance is evaluated using nonparametric frontier models. Policy evaluation and nonparametric frontier approaches are widely used in the literature, respectively, to evaluate the effectiveness of (public) interventions and to evaluate the performance of (public) activities. With this review, we shed light on the studies that join these two perspectives with the purpose of determining good practices for future research.
Using three main search engines, Scopus, EBSCHOhost, and WOS, we were able to select 81 papers for our review. We categorized the fields of empirical application (Section 3). Moreover, we classified papers along the applied methodologies used to obtain the efficiency scores (Section 4) and to investigate their determinants (Section 5).
From this review, three main conclusions can be drawn. First, the literature analyzed is relatively recent. Despite nonparametric efficiency models dated back to the seminal paper by Charnes et al. (1978) and despite the first attempt to compare the efficiency scores of units operating under different programs by Charnes et al. (1981), the oldest paper of this review is Whittaker (1994), investigating the relation between farm size and government program benefits.
Second, the main areas of application involve mainly sectors of public interest such as education (22 papers), local government (13 papers), health (10 papers), and environment (9 papers). Together, these fields account for more than half of the papers included in the review. In turn, this suggests that the necessity for joining the efficiency and the effectiveness perspective is particularly relevant the context of public policy.
Third, it appears that policy evaluation in the efficiency framework is a growing but still largely unexplored field. On the one hand, the analyzed policies are extremely heterogeneous, also in the context of the same application fields. Therefore, up to now it is impossible to implement metaanalyses to derive some pooled conclusions. On the other hand, causal results are scarce and not enough attention has been paid to the endogeneity issue. It emerges that the correlation between the policy intervention and the input level has not been sufficiently accounted for, yielding to possibly biased estimations. Besides, the lack of appropriate identification strategies prevent causal conclusions.
These conclusions are probably the reason that, despite the countless number of research articles testifying the great advantages in using DEA and similar nonparametric frontier methods to evaluate private and public sectors, the acceptance of DEA is still limited both in general interest journals in economics and in formal documents of government agencies or private companies. This pattern is especially marked in the United States, while in Europe, countries such as Belgium, Italy, Portugal, and Spain are starting to use DEA in a systematic way, following the pressure of an active group of academicians. The root of this underimplementation can be identified in three main interconnected issues. First, nonparametric frontier estimation techniques are not generally taught at graduate level, therefore its use and comprehension remain possible only for a niche of researchers. Second, the fact that the production process is considered as a black box is a double-edged sword. On the one hand, it allows the DEA models to be flexible and reliable, as they depend only on a small set of assumptions. On the other hand, it prevents a deep understanding of the transformation mechanisms and prevents an immediate interpretation of the results. Third, as it also emerged from this literature review, endogeneity issues are typically neglected in efficiency works. This prevents the use DEA for the detection of causal results, therefore hampering a widespread acceptance of its use for policy evaluation.
As future lines of research, we indicate two main challenges. First, more methodological effort is needed to interpret the findings in a causal way. The comprehension of mechanisms to detect causal effects on efficiency is still a largely unexplored field. Despite the rapid evolution of policy evaluation (Abadie and Cattaneo, 2018) and machine learning (Athey and Imbens, 2019) techniques for causal inference, it is still not clear how to adapt them in the world of frontier estimations, in particular in the context of quasi-experiments.
Second, researchers should adapt the field of investigation to the current global challenges. In particular, three main directions of expansions can be indicated: sustainability, health, and education. Despite the growing attention of the DEA literature for environment-and sustainabilityrelated topics (see the special issue by Chen et al., 2017) and the fact that environmental efficiency, ecoefficiency, emissions, pollution, sustainable development, environmental protection are among the main fields of current DEA studies (see the review of Emrouznejad and Yang, 2018, Section 5), researchers devoted only limited efforts in the investigation of their determinants and in the analysis of the role that environmental policy can play. To contribute to the development of adequate policies, the literature should make an effort to fill this gap. Besides, the perspective of sustainability should be used as a compass whenever possible. In particular, a priority should be given to investigate the efficiency of policies for the promotion of renewable energy and for the reduction of intensive farming, as these are the two key points in the climate change challenge. Another current global challenge concerns the health sectors. The covid-19 pandemic sheds light, on the one hand, on the fact that the actual health systems could not be able to cope with the needs of an aging and growing population, on the other hand, on the fact that an efficient health system is not only more sustainable economically but more effective in guaranteeing a high-quality service. Moreover, as the covid-19 pandemic deeply affected the education sector, it provides a unique opportunity to investigate the effects of distance teaching on the efficiency of the education production function.