(C) PLOS One
This story was originally published by PLOS One and is unaltered.
. . . . . . . . . .
Pangeo-Enabled ESM Pattern Scaling (PEEPS): A customizable dataset of emulated Earth System Model output [1]
['Ben Kravitz', 'Department Of Earth', 'Atmospheric Sciences', 'Indiana University', 'Bloomington', 'In', 'United States Of America', 'Global Change Division', 'Pacific Northwest National Laboratory', 'Richland']
Date: 2024-02
Annual mean pattern scaling
As a metric of validation, Figs 2–4 show a measure of how large the annual mean residual is compared to the underlying data. At each grid point, for each model and scenario, the average residual over the last 20 years of simulation (1995–2014 for historical; 2081–2100 for the others) is computed. This value is then compared to the standard deviation of the model output over that 20-year period for each model and scenario. We chose a 20-year period as broadly representative of a period that would average out most modes of internal variability; further work could test different periods of averaging, particularly for monthly means which would require shorter periods of averaging. Values in Figs 2–4 indicate the percent of models for each scenario for which the residual is within one standard deviation of the model output. The standard deviation is computed assuming each year within that 20 year period is independent, which is an erroneous assumption that results in a more conservative estimate (smaller standard deviation).
S1–S3 Figs show versions of Figs 2–4 where, instead of using the mean residual (i.e., bias), the comparison is to the root-mean-square error (RMSE) of the residual. We primarily focus on the mean residual in this study, as that is a commonly used metric and a good determinant of how well pattern scaling performs on climatological scales [18, 31]. A drawback of using the mean residual is that it can involve compensating errors, although large errors in bias also lead to large errors in RMSE [32]. Conversely, RMSE can describe how the errors in pattern scaling compare to natural variability, with the drawback that the RMSE results tend to be biased toward short-lived deviations (i.e., penalizes outliers). This is indeed borne out in the results, as comparisons of Figs 2–4 and S1–S3 Figs largely differ over areas where there is more variability (e.g., the midlatitudes for temperature and the tropics for precipitation). Nevertheless, outside of SSP4–3.4 (for which there are few models), at least half of the models do well in all scenarios, especially over land.
As has been found in other studies performing the exact same methodology (e.g., [7]), for the most part, pattern scaling appears to be an effective means of replicating the local trend of annual mean climate model output for a wide variety of models and scenarios, with many regions showing that for 90+% of the models the residuals are within one standard deviation of the baseline for each model. Fig 5 further illustrates this, showing the number of scenarios for which the panels displayed in Figs 2–4 are at least 90%. For surface air temperature, 85.6% of all area (96.3% of land area) has all six scenarios that meet this criterion. For precipitation the analogous figures are 98.5% of all area (98.9% of land area), and for relative humidity 78.4% of all area (78.5% of land area). Relative humidity shows worse performance than the other two models, as well as large inter-scenario differences. (S4 Fig shows a version of Fig 5 for but computed using the RMSE metric. As is expected, performance is substantially worse, as RMSE on the residuals tends to amplify deviations on sub-decadal timescales. Nevertheless, based on the results in S1–S3 Figs, choosing a cutoff value of lower than 90% would undoubtedly improve the appearance of the results, again particularly over land.) S5–S8 Figs are replicates of Figs 2–5, respectively, but calculated using a pooled variance across all models instead of the variances of the individual models; as might be expected, fewer values fall outside of the one standard deviation range in the pooled variance computations.
There are regions where, for some scenarios and some models, pattern scaling introduces error. This could be due to a few reasons:
Pattern scaling doesn’t work for that region. This could be due to some response in the climate system that results in a nonlinear relationship between global mean temperature and the local response (for example, feedbacks leading to Arctic Amplification); nonmonotonicity in global mean temperature (for example, ssp126 has an overshoot, which could affect regressions); or a low trend in global mean temperature (again, possible under ssp126) resulting in a poor linear fit. There could also be slowly responding elements of the climate system that result in different transient states or a different steady state (e.g., land-ocean contrast evolution). The baseline may have low variability. This would result in a greater probability of exceeding one standard deviation. There is a low number of models (as is the case for ssp434), so having a high residual for even one model can result in a large change in Figs 2–4.
Figs 6 and 7 provide further insight into potential sources of error in pattern scaling for the three variables considered here. For all variables in all scenarios, the inter-quartile range never exceeds 0.5 standard deviations, indicating that any errors tend to be due to a smaller percentage of models rather than general features of pattern scaling. Among those models that exceed the inter-quartile range, relative humidity tends to have greater error than the other two variables, and high latitudes tend to have more error than other regions. Historical and ssp245 tend to have the least error, ssp585 has the most error, and ssp434 has too few models to ascertain a robust comparison with other scenarios. The greatest error tends to vary across variables and scenarios, that is, there is no group of models that performs poorly in all cases. If a model/scenario/variable combination has error in one spatial region, it tends to have high error in the other regions; based on the results in Figs 2–5, this result is likely dominated by local features with high residuals. Nevertheless, there are few values in Fig 7 that exceed one standard deviation, and they are almost entirely found in ssp126, ssp370, and ssp585. (S9 and S10 Figs replicate Figs 6 and 7, respectively, but computed using the pooled variance instead of the individual model variances.) Further spatial analysis (not pictured) indicates that indeed the mean residuals on a gridpoint basis are quite small, further reinforcing that on average pattern scaling does well except for a few outliers. A notable exception includes tropical precipitation, which is a known difficulty for pattern scaling due to nonlinear behavior of intense precipitation [4].
PPT PowerPoint slide
PNG larger image
TIFF original image Download: Fig 6. Box plot of root mean square error (RMSE) of pattern scaling, calculated as the number of standard deviations the generated output is from the actual model output (calculated over the last 20 years of simulation), for each scenario (panels) and for temperature (tas), precipitation (pr), and relative humidity (hurs) in a variety of regions. Red lines indicate the median model, blue boxes indicate the inter-quartile range, and whiskers indicate the full model range. Because so few models participated in ssp434, we show the RMSE values for each model.
https://doi.org/10.1371/journal.pclm.0000159.g006
PPT PowerPoint slide
PNG larger image
TIFF original image Download: Fig 7. Heatmap of the number of standard deviations (colors) the generated output is from the actual model output for each model in each scenario (panels; calculated from the last 20 years of each simulation) for temperature (tas), precipitation (pr), and relative humidity (hurs) in a variety of regions. White squares (marked by NaN) indicate that there is no model output available for that model/variable combination on Pangeo.
https://doi.org/10.1371/journal.pclm.0000159.g007
Figs 2–5 have some areas where pattern scaling performance is consistently worse than others. In addition to the tropics, these areas include the North Atlantic, the high latitudes (predominantly the Arctic), the Southern Ocean, and oceanic areas associated with eastern boundary currents. These are all areas associated with feedback-dominated behavior where pattern scaling might not be expected to perform well: the “warming hole” in the North Atlantic associated with the Atlantic Meridional Overturning Circulation and cloud feedbacks [33]; Arctic amplification associated with strong feedbacks like the ice albedo feedback, lapse rate feedback, and changes in atmospheric and oceanic heat transport [34, 35]; cloud feedbacks in the Southern Ocean [36]; and persistent marine stratocumulus decks off the western coasts of continents [37]. Regarding the Atlantic Meridional Overturning Circulation, the Southern Ocean, and marine stratocumulus decks, these areas are not over land so are not directly relevant for many impacts models, for example agriculture or hydrological models. While these regions are important in general, one would not presume that pattern scaling is an effective tool for studying these sorts of complex processes and feedbacks, so it could be argued that pattern scaling performance in these regions is less important. The high latitudes are important for many impacts studies, notably sea level rise; due to substantial uncertainties in feedback strength at the high latitudes resulting in large model spread [38], we urge caution in using this data set (or pattern scaling in general) to evaluate impacts of high latitude change. Figs 2–5 do indicate, however, that even at these latitudes, there are many ESMs amenable to pattern scaling in many scenarios for many variables.
The regression approach undertaken here is not well suited to capturing interannual variability (e.g., the El Niño Southern Oscillation or the North Atlantic Oscillation). Regions strongly affected by interannual variability are unlikely to show major sources of error if the oscillation period is substantially smaller than 20 years (the averaging period of our results). If the oscillation changes such that one mode becomes more dominant than the other, the regression should be able to capture those changes, similarly resulting in low error. A potential caveat is if the oscillation has a longer period than can be captured in the 20-year average (such as the Pacific Decadal Oscillation) [39].
For temperature, pattern scaling on ssp126 is worse-performing than the other scenarios (Figs 6 and 7), and for precipitation and relative humidity, pattern scaling on the high forcing scenarios (ssp370 and ssp585) is worse than for the others. ssp126 has little global mean temperature change, and what change it does have is nonmonotonic [6], so it is difficult to obtain a confident regression slope. Under ssp370 and ssp585, there is greater excitation of temperature-related feedbacks, which is more likely to result in behavior that cannot be captured by linear regression.
[END]
---
[1] Url:
https://journals.plos.org/climate/article?id=10.1371/journal.pclm.0000159
Published and (C) by PLOS One
Content appears here under this condition or license: Creative Commons - Attribution BY 4.0.
via Magical.Fish Gopher News Feeds:
gopher://magical.fish/1/feeds/news/plosone/