(C) PLOS One

(C) PLOS One
This story was originally published by PLOS One and is unaltered.
. . . . . . . . . .

Deep reinforcement learning for irrigation scheduling using high-dimensional sensor feedback [1]

['Yuji Saikai', 'School Of Mathematics', 'Statistics', 'The University Of Melbourne', 'Melbourne', 'Vic', 'Allan Peake', 'Csiro Agriculture', 'Toowoomba', 'Qld']

Date: 2023-10

Deep reinforcement learning has considerable potential to improve irrigation scheduling in many cropping systems by applying adaptive amounts of water based on various measurements over time. The goal is to discover an intelligent decision rule that processes information available to growers and prescribes sensible irrigation amounts for the time steps considered. Due to the technical novelty, however, the research on the technique remains sparse and impractical. To accelerate the progress, the paper proposes a principled framework and actionable procedure that allow researchers to formulate their own optimisation problems and implement solution algorithms based on deep reinforcement learning. The effectiveness of the framework was demonstrated using a case study of irrigated wheat grown in a productive region of Australia where profits were maximised. Specifically, the decision rule takes nine state variable inputs: crop phenological stage, leaf area index, extractable soil water for each of the five top layers, cumulative rainfall and cumulative irrigation. It returns a probabilistic prescription over five candidate irrigation amounts (0, 10, 20, 30 and 40 mm) every day. The production system was simulated at Goondiwindi using the APSIM-Wheat crop model. After training in the learning environment using 1981–2010 weather data, the learned decision rule was tested individually for each year of 2011–2020. The results were compared against the benchmark profits obtained by a conventional rule common in the region. The discovered decision rule prescribed daily irrigation amounts that uniformly improved on the conventional rule for all the testing years, and the largest improvement reached 17% in 2018. The framework is general and applicable to a wide range of cropping systems with realistic optimisation problems.

1 Introduction

Fresh water is becoming a scarce resource in many parts of the world, and its use in agriculture increasingly needs to be optimised. While there are a number of approaches to irrigation optimisation, irrigation scheduling using advanced sensor technologies has considerable potential to apply the right amount of water at the right time based on monitored plant, soil, and atmospheric conditions [1]. In operationalising precision irrigation, a significant challenge is to devise an intelligent decision rule that prescribes a sensible irrigation amount at the time of each decision making based on inputs from a variety of crop and environmental measurements [2].

While precision irrigation, as a form of precision agriculture, holds the promise to increase resource use efficiency by exploiting advanced technologies, it is also faced with the challenge—the technologies are too complicated to fully exploit in practice [3, 4]. For example, drip irrigation is a prototypical practice of precision irrigation, enabling precise control of irrigation rate and timing. For determining rates and timings, Abioye et al. [1] listed 18 basic parameters that can be readily monitored using available sensor devices: 10 crop parameters including leaf area index and sap flow, 4 soil parameters including soil moisture and salinity, and 4 weather parameters including temperature and rainfall. The question is, given the data stream generated by a variety of sensors, how to sensibly determine when and how much to activate the drip irrigation system. In other words, the task is to devise an intelligent decision rule that sequentially prescribes irrigation amounts based on high-dimensional sensor feedback in order that the irrigated amounts are collectively sensible to achieve overall production goals such as profit maximisation.

As if reflecting the difficulty of the task, most studies have been addressing irrigation optimisation problems using only low-dimensional sensor feedback and often focus on only irrigation timings without thorough consideration of irrigation amounts. For example, the vast majority of studies considers a single source of feedback at a time: soil moisture or soil water deficit [5]. In addition, irrigation rules often handle only timings and assume the irrigation amounts needed to replenish the soil to field capacity [5]. If soil moisture is the only feedback information and production goals are yield maximisation, then it is intuitive and probably reasonable to assume that replenishment of soil water is likely to reach the aim. However, the assumption here ignores the availability of other sources of feedback information listed above (e.g., LAI, salinity, and temperature). Moreover, real-world optimisation problems are rarely as simple as unconstrained yield maximisation. For instance, in regions with water restrictions, applying deficit irrigation to large cropping areas is often more profitable for the farm than focusing on fully irrigated crops for only small areas [6, 7].

A standard method to find optimal decision rules in real-world systems is model predictive control (MPC), which has also been adopted in agricultural decision problems [8] including irrigation scheduling [9]. Crucially, MPC requires mathematical models of how systems evolve over time. While, in many physical systems, models of dynamics can be derived based on Newtonian first principles, there are in general no such first principles for complex agricultural systems [10] due to the significant non-linearity in responses and states that characterise these systems [11, 12]. Therefore, to apply MPC, dynamics needs to be estimated using data (i.e., system identification), which is feasible only in low-dimensional cases. Indeed, “most of the existing works on system identification are based on the soil moisture equation without capturing the changing dynamics of soil, plant, and weather” [9, p.2]. This again ignores the availability of high-dimensional sensor feedback.

Reinforcement learning (RL) is a subfield of machine learning, which intends to discover intelligent decision rules without prior knowledge of systems dynamics [13]. As a form of machine learning, RL relies on data that encodes key information of decision-making experience in an environment of interest over time. Relevant data include both feedback from the decision environment (e.g., monitored crop, soil, and weather) and actions taken by following decision rules (e.g., irrigation amounts). In cases of high-dimensional feedback, intelligent decision rules are most likely complex functions that can capture intricate relations between feedback and appropriate actions. A standard approach to learning such complex functions from data is to take advantage of the representational power of deep learning [14]. Hence, deep RL has emerged as a promising approach to discovering intelligent decision rules using high-dimensional feedback [15].

Despite the tremendous potential for irrigation scheduling, the research on RL applications remains relatively sparse and mostly impractical. Until recently, there existed only a handful of RL studies on irrigation scheduling in the literature [16, 17]. While, over the past few years, some researchers started to apply deep RL to specific irrigation problems, their implementations involve several impractical components, making it difficult for other researchers to apply the methods to different problems. In RL, decision rules are discovered through trials and errors in learning environments. Thus, when environments are created by simulations, discovered rules are useful for real-world deployment only to the extent that simulated environments capture key dynamics of the real-world systems.

For this reason, some studies [18, 19] have little practical relevance because they use learning environments whose dynamics are empirical models directly estimated using historical observations. Importantly, to “discover” good decision rules, RL agents need to stumble across unseen states presented by environmental extrapolation, which is empirical models are known to be poor at. In contrast, Chen et al. [20] constructs a problem-specific mechanistic model of systems dynamics by combining several component processes. Although preferable for extrapolation and possibly adequate for simple environments, in general, it is costly in time and other resources to manually construct high-fidelity models of complex agricultural systems, which may require thousands of component processes. Since distinct farms/crops necessitate construction of distinct environments, use of crop simulators is more scalable and practical as it allows researchers to create pertinent environments characterised by crop types, soil, weather, and management practices for their unique problems.

While Yang et al. [21] use AquaCrop simulator [22], the constructed environment model has unrealistic features that undermine the practical relevance of the study. For example, they use a fixed weather pattern in every simulation, despite the fact that weather is the most important random aspect in crop production. Moreover, they use a yield estimate supplied by the simulator at every time step throughout the season as a performance signal to facilitate the learning. In reality, such estimates are unreliable especially at early stages and could severely mislead the learning. Practical RL methods avoid relying on unrealistic extra information and try to overcome the common challenge of sparse performance signals [23]. Kelly, Foster, and Schultz [24] also use AquaCrop and simulate a number of state variables that can be observed in practice. However, their learning procedure consists of ad hoc steps and configurations (e.g., optimising the decision interval to strictly one of 1, 3, 5 and 7 days, which may be too rigid to be practical). Consequently, the generality of its findings is questionable.

To accelerate the research on deep RL for irrigation scheduling, this paper provides a principled framework and actionable procedure that facilitate individual applications of deep RL using high-dimensional sensor feedback. The framework consists of a formal mathematical formulation of an optimisation problem (Section 2.1), a solution algorithm (Section 2.2), and a procedure for constructing learning environments and implementing the algorithm (Section 2.3). In describing the procedure, key aspects of specifying both learning environments and learning algorithms are emphasised. To demonstrate effectiveness of the framework, a simulation study was conducted with the APSIM-Wheat crop model for irrigated wheat at Goondiwindi, Australia. A profit-maximising decision rule was learned in the simulated environment using 1981–2010 weather data, and it was tested using 2011–2020 weather data. The resulting profit for each of the testing years was compared against the benchmark profit obtained using an irrigation schedule optimised specifically for that particular year (Section 2.4 and 3). Finally, the discussion includes analysis of the case study, key assumptions, limitations, and future directions of the framework (Section 4).

[END]
---
[1] Url: https://journals.plos.org/water/article?id=10.1371/journal.pwat.0000169

Published and (C) by PLOS One
Content appears here under this condition or license: Creative Commons - Attribution BY 4.0.

via Magical.Fish Gopher News Feeds:
gopher://magical.fish/1/feeds/news/plosone/