Estimates of PM2.5 Concentration Based on Aerosol Optical Thickness Data Using Ensemble Learning with Support Vector Machine and Decision Tree


  • Satith Sangpradid Department of Geoinformatics, Faculty of Informatics, Mahasarakham University, Thailand
  • Theeraya Uttha Department of Geoinformatics, Faculty of Informatics, Mahasarakham University, Thailand
  • Ilada Aroonsri Department of Business Digital, Faculty of Management Sciences, Valaya Alongkorn Rajabhat University Under the Royal Patronage, Thailand



Aerosol optical thickness, Decision tree, Machine learning, PM2.5, Support vector machine (SVM)


Air pollution, particularly fine particulate matter with a diameter of 2.5 micrometers or less (PM2.5), is a significant public health concern in many regions worldwide, including the northeastern region of Thailand. This study investigates the correlation between PM2.5 concentrations and meteorological spatial datasets such as surface relative humidity (SRH), surface wind speed (SPD), visibility (Vis), surface temperature (ST), and aerosol optical thickness (AOT) in the region. GIS techniques and the inverse distance weighting technique were used to create spatial maps of the meteorological datasets and ground station PM2.5 measurements. Pearson correlation analysis was performed to examine the relationship between PM2.5 and the meteorological datasets. Decision tree and support vector machine (SVM) algorithms were employed to estimate PM2.5 concentrations based on the spatial datasets. The results showed that Vis and ST have a moderate positive linear relationship with PM2.5, while AOT has a moderate negative linear relationship. SRH and SPD have weak relationships with PM2.5. The decision tree and SVM algorithms demonstrated a strong positive correlation between estimated and measured PM2.5 concentrations. The study shows that machine learning algorithms can be effective tools for estimating PM2.5 concentration based on AOT data, and feature selection can improve model performance. Ensemble learning could be employed to further improve model performance, particularly in regions with high spatial variability. Overall, the study provides a promising approach for estimating PM2.5 concentration using machine learning algorithms and AOT data.