https://scholar.google.com/citations?hl=en&user=7QwnQC0AAAAJ&view_op=list_works&authuser=4&gmla=AH70aAXSgsGfbihg4XfTuewCeQeYGy1HTwvT72Ir9iHrnZEDh1XFE7EzcqgkFv5kr1vS-lIMrz6MeOglUi59DhKE

Document Type : Original Research Paper

Authors

Department of Geospatial Information Systems, Faculty of Geodesy and Geomatics Engineering, K. N. Toosi University of Technology, Tehran, Iran

Abstract

Background and Objectives: Precise agricultural yield prediction is among the most important tools for managing agricultural resources, improving food security, and enhancing the productivity of international trade in agricultural products. The satellite remote sensing images has become widely adopted because traditional methods cannot provide the needed accurate and timely predictions, and it covers large areas while providing accurate data. The advances in machine learning and ensemble learning have identified the complex interaction of environmental variables with crop yield. In modern times, ensemble learning models have achieved much higher prediction accuracy and provided useful insights to farmers and policy makers.
This study aims to develop an innovative model that combines the XGBoost algorithm with the Pelican Optimization Algorithm (POA) to predict corn yields more accurately in the U.S. Midwest. The approach will provide an opportunity for the pre-harvest yield prediction by considering the plant phenological stages and optimal time range from July to August. The model will help the decision-makers to take effective measures on resource management to overcome the climate fluctuations and develop better agricultural policies.
Methods and Materials: This research focuses on predicting corn yields in five key corn-producing states in the U.S. Midwest (Illinois, Iowa, Minnesota, North Dakota, and South Dakota). This paper will utilize remote sensing information, including NDVI (Normalized Difference Vegetation Index), EVI (Enhanced Vegetation Index), LAI (Leaf Area Index), FPAR (Fraction of Photosynthetically Active Radiation), GPP (Gross Primary Production), and ET (Evapotranspiration); meteorological data, including temperature and precipitation; cropland data; and yield statistics during the growing season over the period 2011–2020 (May to September). XGBoost ensemble learning was used, whose hyperparameters were optimized with the Pelican Optimization Algorithm (POA) to enhance its accuracy. Filtration was performed on data using the VFI index. Nine years were used as training data, while one year was used as a test. For evaluating the performance, MAPE, MBE, MAE, RMSE, and the correlation coefficient have been used.
Findings: The evaluation results of the POA-XGBoost model demonstrated its outstanding performance in predicting corn yields. During the 2011–2020 timeframe, validation trends highlighted variations in prediction accuracy and bias. In the first period, which includes 2011–2014, the errors went down and the prediction accuracy improved: MAPE reached 6.26%, while in 2014 the correlation coefficient increased to 0.9372. During the middle period of 2015–2018, the errors and positive bias showed an upward trend, especially during 2018, where MBE rose to 0.8039 and the correlation coefficient fell to 0.8083. However, the last two years (2019–2020) revealed much improved results: MAPE comprises 6.57%, while the correlation coefficient is as high as 0.9237 in 2020.
Conclusion: The optimized POA-XGBoost model demonstrated high capability in predicting corn yields under diverse climatic conditions and can be extended to forecast other crops in the future. Advanced ensemble learning techniques combined with diverse data sources, such as satellite imagery and meteorological data, provide effective solutions for improving crop yield predictions. The study calls for the development of new hybrid models that will enable farmers and managers to better manage resources, increase productivity, and minimize risks.

Keywords

Main Subjects