Sampling error profile analysis (SEPA) for model optimization and model evaluation in multivariate calibration

Authors

  • Wanchao Chen,

    1. Shanghai Key Laboratory of Functional Materials Chemistry, School of Chemistry and Molecular Engineering, East China University of Science and Technology, Shanghai, China
    Search for more papers by this author
  • Yiping Du,

    Corresponding author
    1. Shanghai Key Laboratory of Functional Materials Chemistry, School of Chemistry and Molecular Engineering, East China University of Science and Technology, Shanghai, China
    • Correspondence

      Yiping Du, Shanghai Key Laboratory of Functional Materials Chemistry, School of Chemistry and Molecular Engineering, East China University of Science and Technology, Shanghai 200237, China.

      Email: yipingdu@ecust.edu.cn

    Search for more papers by this author
  • Feiyu Zhang,

    1. Shanghai Key Laboratory of Functional Materials Chemistry, School of Chemistry and Molecular Engineering, East China University of Science and Technology, Shanghai, China
    Search for more papers by this author
  • Ruoqiu Zhang,

    1. Shanghai Key Laboratory of Functional Materials Chemistry, School of Chemistry and Molecular Engineering, East China University of Science and Technology, Shanghai, China
    Search for more papers by this author
  • Boyang Ding,

    1. Shanghai Key Laboratory of Functional Materials Chemistry, School of Chemistry and Molecular Engineering, East China University of Science and Technology, Shanghai, China
    Search for more papers by this author
  • Zengkai Chen,

    1. Shanghai Key Laboratory of Functional Materials Chemistry, School of Chemistry and Molecular Engineering, East China University of Science and Technology, Shanghai, China
    Search for more papers by this author
  • Qin Xiong

    1. Shanghai Key Laboratory of Functional Materials Chemistry, School of Chemistry and Molecular Engineering, East China University of Science and Technology, Shanghai, China
    Search for more papers by this author

Abstract

A novel method called sampling error profile analysis (SEPA) based on Monte Carlo sampling and error profile analysis is proposed for outlier detection, cross validation, pretreatment method and wavelength selection, and model evaluation in multivariate calibration. With the Monte Carlo sampling in SEPA, a number of submodels are prepared and the subsequent error profile analysis yields a median and a standard deviation of the root-mean-square error (RMSE) for the submodels. The median coupled with the standard deviation is an estimation of the RMSE that is more predictive and robust because it uses representative submodels produced by Monte Carlo sampling, unlike the normal method, which uses only 1 model. The error profile analysis also calculates skewness and kurtosis for an auxiliary judgment of the estimated RMSE, which is useful for model optimization and model evaluation. The proposed method is evaluated with 3 near-infrared datasets for wheat, corn, and tobacco. The results show that SEPA can diagnose outliers with more parameters, select more reasonable pretreatment method and wavelength points, and evaluate the model more accurately and precisely. Compared with the results reported in published papers, a better model could be obtained with SEPA concerning RMSECV, RMSEC, and RMSEP estimated with an independent prediction set.

Ancillary