(C) PLOS One

(C) PLOS One
This story was originally published by PLOS One and is unaltered.
. . . . . . . . . .

Scope 3 emissions: Data quality and machine learning prediction accuracy [1]

['Quyen Nguyen', 'Climate', 'Energy Finance Group', 'Cefgroup', 'School Of Surveying', 'University Of Otago', 'Dunedin', 'Otago', 'New Zealand', 'Ivan Diaz-Rainey']

Date: 2023-11

Corporate carbon footprints, a popular proxy for firms’ climate transition risks, measure the level of greenhouse gas [GHG] emissions associated with a firm’s business activities or products. Corporate carbon footprints provide an indication of how much anthropogenic carbon a company contributes to atmospheric GHGs and to global warming [1, 2]. Carbon footprints are preferred by academics and industry practitioners over other climate transition risk rating metrics, and their clear advantage is that they can be converted to dollar losses (using the effective carbon price) or hidden costs (using the future costs of carbon across different transition scenarios) [3]. Carbon footprints help facilitate the implementation of divestment strategies or low-carbon indices (e.g., S&P Carbon Efficient Indices, MSCI Low Carbon Index) by establishing a link between climate- and financial- risk. Although corporate carbon footprints are a popular metrics for assessing climate transition risks, carbon emissions data has numerous problems, including limited, inconsistent and inaccurate reporting [4, 5].

The GHG Protocol (WRI and WBCSD, 2020) divides carbon emissions into three categories: Scope 1—direct emissions from sources and assets controlled by the firm, Scope 2—indirect emissions from purchased electricity, and Scope 3—indirect emissions from a firms’ value chain. Traditionally, the protocol requires all firms to report Scope 1 and Scope 2 emissions, whereas firms have the discretion on whether and which categories they choose to report for Scope 3 emissions. Recently, there have been signals that a mandatory disclosure of Scope 3 emissions may be required in some jurisdictions. For instance, a draft rule by the U.S. Securities and Exchange Commission in March 2022 proposes that firms need to disclose emissions generated by their suppliers or partners if they are material or they are included in any of their emissions targets [6]. Therefore, the importance of accurately quantifying Scope 3 emissions is critical. Even so, systematically accounting for all emissions along the entire value chain (sometimes up to tens of thousands of firms) to the same level of accuracy is broadly acknowledged to be extremely challenging [7].

Scope 3 has many merits—it covers all of the indirect emissions spanning a firm’s full value chain, from acquiring and pre-processing raw materials (upstream) to distributing, storing, using, and disposing of the end products sold to customers (downstream). It captures a significant proportion of many firms’ total carbon footprints, especially for many firms operating in the energy sector [8, 9]. Further, Scope 3 represents the most significant emissions reduction opportunities going forward, and a full assessment of Scope 3 emissions is critical for understanding the end-to-end impacts of carbon taxes and climate policies on individual firms [10]. However, the analyses of firm-level emissions by external stakeholders are usually limited to Scope 1 and Scope 2 emissions [11–13]. This is due to the following three issues associated with Scope 3:

Despite recent signals, there are no binding rules on Scope 3 emissions disclosure. Further, related sustainability reporting standards/frameworks such as Global Reporting Initiative, Sustainability Accounting Standards Board, and International Integrated Reporting either remain silent on Scope 3 emissions reporting or fail to provide detailed recommendations on how Scope 3 should be properly disclosed [ 14 ]. As measurement and disclosure of Scope 3 are inconsistent and unsystematic, the quality and accuracy of firms’ voluntary disclosures remain unclear. Further, given the complexity in calculating Scope 3 emissions and extensive data collection efforts needed (in particular granular activity-level data from supply chains which may be business-sensitive), it is not surprising that the reporting of Scope 3 emissions is generally sparse [ 15 ].

Firms are not required to disclose the full composition of Scope 3 emissions across the fifteen distinctive Scope 3 categories (See Section 2 for more details.), as reporting is on a ‘comply-or-explain’ basis [15, p. 10]. Thus, using the aggregated Scope 3 emissions data from an incomplete composition can be misleading. As firms may choose to report only areas that they are performing well in, or that are easier to measure whilst intentionally ignoring other areas. For example, two firms that have similar value chain emissions and firm characteristics may choose to report different categories (e.g., one may report material Purchased Goods and Services emissions while another may choose to report immaterial Business Travel). It does not make sense to aggregate emissions data with many missing values, when firms have the discretion to choose which categories they would like to report and the boundaries they would like to report within. Rather than comparing apples with oranges, one should either look at firms’ Scope 3 data at the category level, or replace missing values (i.e., unreported Scope 3 categories) with estimated values before performing any cross-sectional comparisons.

[3] Measurement divergence/ reporting inconsistency

Firms may set different operational boundaries on the same Scope 3 emissions category, report different values across different communication channels (i.e., annual filings, sustainability reports, or through third-party initiatives such as the Carbon Disclosure Project [CDP]), and/or occasionally update (re-state) their reported emissions for past years in later years [14, 16]. The aforementioned issues, make it hard for third-party data providers, such as Bloomberg and Refinitiv, to build consensus and provide consistent Scope 3 measures. To be more specific, third-party data providers may collect Scope 3 emissions data from different sources, update restated values at different time frames, and/or make adjustments to the reported values using different proprietary models. Further, differences in scenarios (e.g., from methodological choices in allocation methods, product use assumptions, end-of-life assumptions) and estimation models make Scope 3 data unreliable and difficult to compare among different data providers [17, 18]. Researcher and industry practitioners (e.g., asset managers and institutional investors) should be aware of the measurement divergence among third-party data providers when performing analysis/forming investment portfolios using Scope 3 data.

In the face of the issues mentioned above for disclosing firms, there is an additional need to develop estimation models that employs externally available predictors to cover non-disclosing firms in a broader investment universe. This need arises as traditional approaches to model Scope 3 emissions either require very granular activity-level data that are rarely accessible to third-party stakeholders (i.e., the bottom-up life-cycle assessments [LCA]) or employ industry-based metrics to allocate national emissions that fails to account for heterogeneity among firms (i.e., the top-down environmental input-output models [EIO]) [17]. Although models using simple extrapolation techniques [19, 20], multi-variable regression models [21] or out-of-the-box machine learning techniques [5, 22, 23] are readily available for estimating Scope 1 and Scope 2 emissions, little has been done on Scope 3.

A few attempts are discernible from emissions data providers using a variety of modelling approaches. Some providers employ a bottom-up LCA model and use parameters such as firm activities and emissions factors (Carbon4Finance) [24]. Others employ EIO models using top-down metrics at the industry level (such as Trucost to estimate most of its upstream emissions) (Except for “Transport and distribution” where they collect self-reported data) [25], or combine both models in their workflow (such as ISS with EIO models for upstream and LCA models for downstream emissions) (ISS Methodology, Factset). More recently, CDP employs regression models using metrics at the firm level [26], and Bloomberg employs machine-learning techniques on a subset of oil & gas firms (https://www.bloomberg.com/professional/blog/bloombergs-greenhouse-gas-emissions-estimates-model-a-summary-of-challenges-and-modeling-solutions/. At the time of writing this paper, we have not had access to their modelled Scope 3.). Unfortunately, many organisations provide limited information on their Scope 3 estimation methods. Furthermore, the prediction performance of these models is often vague. At best, data providers including Bloomberg and ISS disclose a model confidence ranking associated with their estimates, however, the absolute magnitude of their prediction errors is rarely disclosed. The quality and integrity of the estimated Scope 3 datasets are unknown, as evidenced by Busch et al. [16], who discovered that the correlation between estimates of the aggregated Scope 3 values from ISS and Trucost is surprisingly low (16%). The estimation of Scope 3 emissions is important since it helps fill in the gaps (i.e., unreported Scope 3 categories) that in turn are used for a variety of financial functions including portfolio construction [15, 24]. However, it is problematic that third-party provider estimates do not disclose limitations such as inherent prediction errors and data uncertainties.

From the preceding discussion, it is evident that investors’ sophistication on climate risk is increasing and as part of this, they require high-quality and comprehensive Scope 3 data. Accordingly, we investigate Scope 3 emissions data divergence (across different providers), composition (which Scope 3 categories are reported) and whether machine-learning models can be used to predict Scope 3 emissions for non-reporting firms. These three issues are inherently interlinked if investors want to understand the quality of reported and predicted Scope 3 data. More specifically, using data retrieved from Bloomberg, Refinitiv, and ISS, we examined the following research questions: (i) What is the quality of Scope 3 emissions data in terms of measurement divergence between data vendors?, (ii) What is the quality of Scope 3 emissions data in terms of the composition of emissions categories reported by firms?; and (iii) What is the prediction accuracy of Scope 3 emissions estimates using machine learning models for non-disclosing firms? To answer the first question, we looked at the Scope 3 emissions datasets from Bloomberg/Refinitiv/ISS, and the divergence among these data providers through a three-way reconciliation of aggregated Scope 3 emissions values. To answer the second question, we analysed the composition of Scope 3 emissions from Bloomberg as this is the only dataset (out of the three in our study) that has detailed breakdown by categories (Notice that both ISS and Eikon just report the aggregated Scope 3 emissions data. CDP also has a breakdown of Scope 3 emissions by categories, and it is the source data that is fed into ISS, Eikon and Bloomberg. It would be interesting to perform a comparison of Scope 3 emissions from third-party data providers and the source data such as CDP and company reports.). For each Scope 3 emissions category, we measured its relevance based on its intensities in relation to the aggregated Scope 3 values, then we explored its completeness based on the proportion of firms that choose to disclose this category. (By doing this, the relevance of a category is defined purely by its relative size. CDP has a similar approach to determine the ‘relevance’ of Scope 3 emissions. https://cdn.cdp.net/cdp-production/cms/guidance_docs/pdfs/000/003/504/original/CDP-technical-note-scope-3-relevance-by-sector.pdf) To address the third question, we evaluated whether the Scope 3 emissions values could be estimated using top-down business and financial data and whether prediction accuracy could be improved to an acceptable degree using out-of-the-box machine learning techniques. We continued to use the Bloomberg dataset for this part of analysis as we aim to predict both aggregated Scope 3 emissions and its component categories.

Our study makes several contributions to the existing literature and collectively these contributions make this paper the most comprehensive analysis to date of interlinked Scope 3 data challenges. First, we extend the study by Busch et al. [16] that analyses the divergence in third-party carbon emissions datasets for Scope 1, Scope 2 and Scope 3 between 2005 and 2016. Focusing solely on Scope 3 emissions data from 2013 to 2019, we go beyond correlation analysis to quantify the degree of divergence among raw emissions data and to understand the implication of this divergence on emissions rankings (i.e., where firms stand compared to the universe of firms covered in Scope 3 emissions dataset in terms of carbon emissions). Second, we analyse over time and across sectors the completeness of reporting of Scope 3 emissions categories using Bloomberg data between 2010 and 2019. This is an extension of parts of the Klaaßen and Stoll [14] analysis, who examined the impact of incomplete composition/category exclusion for 56 technology firms in 2019. Finally, our paper applies machine-learning algorithms to predict Scope 3 emissions in a similar manner to Serafeim and Velez Caicedo [27]. Due to the difference in the predictor set (excluding Scope 1 and 2 and market capitalization) as compared to Serafeim and Velez Caicedo [27], our model is applicable to a wide universe of public or private firms regardless of their emissions disclosure status. (On the other hand, Serafeim and Velez Caicedo [27] limit the scope of their analysis to publicly available firms and those that disclose Scope 1 and Scope 2 emissions, by using market capitalization, Scope 1 and Scope 2 emissions. While including Scope 1 and Scope 2 emissions may provide a better representation of Scope 3 patterns, the use of market capitalization as an additional size proxy for Scope 3 emissions lacks a clear baseline. Market capitalization fluctuates on a daily basis and is heavily dependent on investors’ supply and demand, making the direct link to a firm’s operational activities unclear. It could be argued that other size proxies, such as revenue and total assets, better reflect the impact of operational scales on Scope 3 emissions.) In addition, we include energy consumption in the predictor set, since it has been shown to improve Scope 1 and 2 predictor accuracy [5]. More critically, in contrast to the results of Serafeim and Velez Caicedo [27], we conclude that there are large absolute prediction errors even when machine learning techniques are used to produce Scope 3 emissions estimates (discussed further below). This suggests that similar absolute prediction errors may be inherent in third-party estimation models especially when they use externally available data to model emissions for thousands of firms. This finding is relevant in the current emissions data landscape, where data providers methodologies and the errors associated with their proprietary modelling remain opaque.

Our main results are summarized as follows. First, we find that there is considerable divergence in the aggregated Scope 3 emissions values among third-party data providers. When the data provider adjusts reported emissions values with its proprietary models (in this case, ISS), none of its data points are identical to Bloomberg or Refinitiv Eikon (within 1% error), and the correlation values of this dataset with the two other datasets are low (respectively 55% & 56%.). However, when data providers use purely reported emissions values without any adjustments (in this case, Bloomberg and Refinitiv Eikon), they still have a surprisingly low proportion of identical data points (only 68%) despite high correlation values (95%). Divergence between reported datasets (Bloomberg and Eikon) is generally of smaller magnitude and has no systematic biases (the trimmed mean absolute percentage error is 4% and the trimmed mean percentage error is <0.01%). Divergence between ISS and Bloomberg (or Refinitiv Eikon) has substantial magnitude and exhibits a systematic upward bias (the trimmed mean absolute percentage error is 111%, and the trimmed mean percentage error is -20%, indicating that emissions values from ISS are systematically higher than that of Bloomberg). (The mean values have been trimmed to 5%-95% due to several outliers in percentage values. See Section 4.1.) This divergence will lead to substantially different low-carbon portfolio constituents if fund managers employ the ISS dataset to rank high/low emitters and adjust their weights accordingly, but the portfolios constructed from Refinitiv Eikon and Bloomberg data should yield quite similar results. Overall, these divergences make it difficult for investors to understand their portfolios’ real exposure to climate risks.

Second, we find that firms normally disclose an incomplete composition of Scope 3 emissions (on average, they only disclose 3.75 out of 15 categories in 2010–2019), but they are reporting more categories over time (from 1.7 categories in 2010 to 4.7 categories in 2019). The most relevant Scope 3 emissions categories differ both between and within industries. Business Travel has been reported by most firms (up to 84% in our sample) despite accounting for less than 1% of the total Scope 3 emissions. Other, more material Scope 3 categories, such as Use of Sold Products (making up to 66% of the total Scope 3 emissions) and Processing of Sold Products (making up to 8% of the total Scope 3 emissions), have been largely ignored (disclosed up to 18% and 6%, respectively). A simple fill-in-the-gap analysis inspired by Klaaßen & Stoll (Klaaßen & Stoll, 2021) using the median carbon intensities from the industry peer group to proxy for unreported categories at firm-level suggests that if firms were to report the full composition (all 15 categories) of Scope 3, their total Scope 3 emissions figure could be 44% higher than currently reported.

Third, Scope 3 prediction accuracy is low, even with a range of machine learning algorithms and an extensive set of business and financial predictors. In general, it is easier to predict upstream emissions than downstream emissions. Critically, estimating total Scope 3 emissions from the category level instead of aggregated level (as in the work by Nguyen et al. [5]) improves prediction accuracy (i.e., mean absolute error [MAE] of log-transformed emissions was reduced by 25% in Linear Forest). This is most probably because the aggregated Scope 3 emissions are distorted by non-reported categories, suggesting that the modelling of Scope 3 emissions should be conducted at the category level. Further, predictor importance varies by category materially.

However, there are limited improvements in prediction performance from ‘out-of-the-box’ machine learning models (i.e., Linear Forest) relative to baseline models (i.e., Industry Fill or Ordinal Least Square). More precisely, Linear Forest is slightly better at predicting total Scope 3 emissions at the category level and aggregated level than baseline models (MAE is reduced by 2% to 6%) and yields more or less equivalent prediction accuracy to a Stepwise regression model across most individual categories. In addition, the percentage errors between estimated values and actual values on the original scales (CO 2 -tonne) are large, as indicated by the large median absolute percentage errors for the aggregated Scope 3 (~72%) and individual categories (59%-187%). Large estimated errors like this may lead to inefficiencies in constructing low-carbon portfolios as documented by Kalesnik, and Zink [28] for Scope 1 and 2. This finding contrasts with Serafeim and Velez Caicedo [27] who report seemingly low percentage errors for several Scope 3 categories (as their percentage error metrics are based on logged-transformed emissions). (See Section 5.3 for detailed discussions.) Overall, our findings imply that researchers and investors should be wary of the potential prediction errors when using Scope 3 emissions obtained from third parties. The findings also call for more transparent disclosure from third-party data providers in terms of estimation methodologies and prediction performance.

The rest of the paper proceeds as follows: Section 2 provides context on the Scope 3 emissions problem. Section 3 outlines the data used and Section 4 presents the methodology implemented for the analysis. Section 5 reports the results and Section 6 concludes.

[END]
---
[1] Url: https://journals.plos.org/climate/article?id=10.1371/journal.pclm.0000208

Published and (C) by PLOS One
Content appears here under this condition or license: Creative Commons - Attribution BY 4.0.

via Magical.Fish Gopher News Feeds:
gopher://magical.fish/1/feeds/news/plosone/