IntroductionPolymers are highly important organic materials in industry.Due to the excellent properties of polymers, they areapplied in many products, e.g., construction and automotivematerials1,2. In3, it is reported that the consumption ofpolymers around the world was as high as 20.78 million tonsin 2005, and this number increased to 26.73 million tons in2015.The excellent properties of polymers, specifically polyisoprene (a polymer of isoprene, (C5H8) ), which is oneof the most important primary chemical constituents ofnatural rubber, include resilience, elasticity, abrasion resistance, efficient heat dispersion, and impact resistance4,5.These properties cannot be easily obtained from syntheticpolymers. Moreover, the properties of rubber can be furtherenhanced with fillers, where carbon black is one of themost popular fillers due to its ability to enhance certainproperties, especially mechanical properties such as elasticity and volume6–8. Such carbon black reinforcement ofrubber has been extensively studied as follows. Carbon blackphysically adsorbs rubber molecules to its surface9,10 oroccludes them in internal voids11 which results in partialimmobilization of the rubber and an apparent increase inthe filler volume. Carbon black also forms an agglomeratedinter-particular structure9–12 , which may be associated withspecific elastic properties and continuous breakup and rearrangement, finally leading to a strong nonlinear viscoelasticbehavior10–12 .One of the important issues in polymer production isquality measuring and monitoring. Such processes usuallyrequire a lot of resources both in terms of time and coste.g., chemical agent, labor, and sample cost. There are manyattempts to address such issues in past decades13,14. Nearinfrared (NIR) spectroscopy is a non-destructive techniquethat can provide detailed analysis in terms of the quantityand the quality of agricultural products15. Typically, NIRreflectance information in spectra from an agriculturalproduct sample is used to predict the chemical compositionof such sample by extracting the relevant information frommany overlapping peaks. Then, the predicted chemicalcomposition can be interpreted as the quality. Beforethe quality measurement by NIR can be applied, themeasurement system has to be calibrated for accurate results.In general, the calibration can be difficult and this is causedby the complex nature of the NIR spectra, where each ofthe interesting spectra are almost completely overlapped bythe others. The calibrated models require routine checking toimprove the accuracy and reduce the estimation error16 In this paper, we propose extensive experiment resultsto show the performance of the prediction models builtfrom NIR spectroscopy for the mechanical properties ofvulcanized rubbers. Our main contribution is a guideline forcreating prediction models of such material. The guidelineis tailor made according to the prediction techniques to beapplied, and the data pretreatment methods, which highlyaffects the quality measuring, in a particular scenario. NIR spectroscopy is a non-destructive technique whichprovides detailed analysis in terms of the quantity andthe quality of agricultural products. Specifically, NIR lightcovers the region from 4,000 to 12,500 cm-1 (700-2500nm). The C-H, O-H, C-O, and N-H bands in the subjectscan be observed due to stimulations of such vibrations inthis spectral range15. The NIR spectroscopy technique hasseveral attractive features including a short analytical time,ease of operation, and having a diffused reflectance mode.Generally, multivariate calibration analysis, such as a partialleast squares (PLS) regression model, is built to extractinformation from NIR spectra17. Specifically, the models aredeveloped from the relationship between the spectral dataand the interested constituents.Kwolek et al.18 were one of the first groups whoevaluated the properties of the resin and the rubberconcentration in guayule by NIR spectroscopy. In addition,NIR spectroscopy has been used to study the compositionof synthetic polymers and rubbers19–21. Takeno et al.22proposed a Fourier transform NIR (FT-NIR) spectroscopytechnique coupled with a PLS regression model to quantifynatural polyisoprene in Eucommia ulmoides leaves. It wasreported that the optimal models were obtained with secondderivative NIR spectra in the region between 400-6000cm-1 (R2, 0.95). Marinho et al.21 studied the applicationof NIR spectroscopy to analyze natural trans- and cispolyisoprenes from Ficus elastica (cis-1,4-polyisoprene),gutta-percha (trans-1,4-polyisoprene), and mixtures of thesepolymers. Sirisomboon et al.23 used FT-NIR spectroscopyin the wavelength of 1100-2500 nm to evaluate the dryrubber content of rubber latex. Sirisomboon et al.24 also usedshort-wave NIR spectroscopy in the wavelength of 700-950nm to evaluate the dry rubber content and the total solidscontent. Their work can be applied in concentrated latexfactory settings.In NIR analytical processes, the observed spectraare usually pretreated as the first process, in whichthis process is one of the most important steps forsuccessful analysis25. The data pretreatment processesusually refer to transformation of the NIR spectra with thegoal of reducing large baseline variations, dimensionality,collinearity, and/or the noise level of observed spectra. Forremoval of undesirable variations in the data, two typesof pretreatment are commonly applied in the analyticalchemistry literature, i.e. differentiation and signal correction.Generally, analysts combine more than a single pretreatmenttechnique in order to create precise spectroscopy models.The common approaches include SavitzkyGolay smoothing(SG)26, multiplicative scatter correction (MSC)27, signalcorrection28, and variable selection29,30. Such approacheswill be elaborated in the following.One of the most important basic data pretreatments isdata smoothing and differential filters proposed in26,31. SG,a well-known data smoothing and differential pretreatment,aims to optimally fit a set of data points to a polynomial inthe least-squares of signal-to-noise26. Such technique firstdetermines the rate of change of absorbance with respectto wavelength, or the slope of the curve at that point. Forthe derivation, a basic method is finite differences: the firstorder derivative is estimated as the difference between twosubsequent spectral measurement points; the second-orderderivative is then estimated by calculating the differencebetween two successive points of the first-order derivativespectra.Multiplicative scatter correction (MSC)27,32 is anotherwidely applied pre-processing technique for NIR spectroscopy. Applying MSC can reduce optical interferencefrom the equipment, i.e. spectral noise and background noisein NIR data. The MSC main processes are composed ofestimation of the correction coefficients and correcting therecorded spectrum. In the work by Martens et al.32, MSCwas applied to overcome optical interference. The observedreflectance was corrected by the coefficients based on different linearization before the prediction models are built.Orthogonal signal correction (OSC) is another approachto reduce the variation in observed spectra correlated to thereference. The approach determines highly related spectradata and subsequently removes the non-related data. Suchparts of data can be distinguished by orthogonal dataconsidering the reference in high-dimensional space. Woldet al.33 proposed to apply OSC for the NIR spectroscopypre-treatment process. Sjoblom et al. further applied thetechnique to reduce variation when the calibration modelsare transferred in real-life28. Marklund et al.34 applied OSCto improve the correlations and predictive quality of the PLSmodels. They obtained high-quality correlations betweenthe NIR spectral properties of pulp and strength propertiesof paper derived from the pulps. By applying OSC, thecorrelations from the spectroscopy results can be tracedback to the quality of the wood from which the pulps wereproduced from.Last, the wavelength selection from the whole spectrais an important issue, since it can complicate the NIRspectroscopy and reduce the prediction capability of themodel. Uninformative variable elimination (UVE)35 is oneof the most important methods for the selection. It is basedon a PLS regression coefficient which has been widelyapplied36–38. The method first determines the root meansquare error of the prediction, then the PLS models for anindividual spectrum are built into a matrix. Subsequently,spectra which do not improve the prediction ability will beeliminated from the matrix. Last, the new prediction modelcan be built based on the remaining spectrum. The processwill be repeated until no improvement can be achieved.Therefore, the UVE does not present any configurationproblems from the variable selection issue.