The quantitative detection of botanical trashes contained in seed cotton with near infrared spectroscopy method

This study is performed to investigate the potential of near infrared (NIR) spectroscopy for the detection of botanical trashes content of seed cotton harvested by cotton-picker (SCHCP). Large quantity of trashes become comingled with cotton fiber in the harvesting process, especially when the cotton is harvested with cotton-picker. In China, trashes content of seed cotton (SC) has to be detected when farmers sell the SC to ginneries because trashes reduce the prices of SC and it should be deducted from the whole weight. The conventional instrumental method used to detect the trashes content of SC, ginning and trashes analysis, is complex and time consuming. In this study, 353 SC samples were collected from three ginneries, the NIR spectra bands from 12,000 to 4000 cm−1 were collected with the FT-NIR spectrometer Nexus. Models between NIR spectra and the trashes contents of these SC samples have been developed with the method of partial least square regression (PLSR), bands of 12,000–4000 cm−1, multiplicative signal correction (MSC) was used to eliminate the negative effects caused by sample shapes, second derivative spectra were used to eliminate the translation and the rotation in the spectral baseline. And the parameters of optimized model: R2 is up to 0.985 (calibration set) and 0.973 (prediction set), RMSEC is as low as 0.072 g and RMSEP is 0.158 g. Results of ANOVA also certified the trashes contents calculated with the models are consistent with the actual trashes contents.


Introduction
Cotton is an essential natural fiber accounting for appro ximately 27% of all fibers. China is one of the world's largest cotton producing countries. 1 In recent years, machinedpicked cotton has been rapidly promoted, about 80% of cotton is harvested mechanically up to now. The harvested seed cotton (SC) will be ginned to separate lint from cotton seeds and various types of impurities, such as leaf, bark, stem, seed coat, paper, and plastic bag. The ginning procedure contains a series of sequential steps including several SC cleaning processes, ginning, several lint cotton (LC) cleaning processes, and packag ing of LC into bales. If the impurity cleaning efficiency is too low, the high impurity content of LC will adversely affect spinning. According to a survey done by the International Textile Manufacturers Federation (ITMF), 26% of cotton processed by the spinning mills were found to be moderately or severely contaminated impuri ties. However, the excessive cleaning will significantly degrade the fiber quality, 2,3 for example, the length, the length uniformity, the strength, etc. Which not only decreases the monetary value of cotton fiber, but also reduces the overall quality of yarns and cotton textiles. Therefore, the detection of the impurity content is very important in cotton industry.
The commonly used instrumental detection methods for cotton impurity include the gravimetric methods, the geometric methods and the spectroscopy methods. The gravimetric methods are mainly used to separated impuri ties and fiber. For example, the saw impurity analyzer which is the standard method to measure the impurity con tent of LC in China separates the botanical impurities and cotton fiber using the differences in densities and volumes of them.
The geometric methods are mainly used to detect the impurity content. For example, Lieberman et al. [4][5][6][7] used neural networks and learn vector quantization methods to detect impurities in LC, the impurities in large cotton lump has also been detected; Li et al. 8 proposed a method which extracted the feature vectors with Gabor Operator and the white foreign fibers were separated in the binary image composited using the feature vectors. Zhang et al. 9 pro posed a method for the identification of impurities in SCHCP with machine vision using the support vector machine classifier optimized with genetic algorithm.
The spectroscopy methods are also mainly used to detect the impurity content. For example, Fortier et al. 10,11 acquired the FTNIR spectral characteristics of hull, leaf, seed, stem of cotton, and identified the cotton impurity components with a NIR spectral database, the identifica tion accuracy was as high as 98% when the spectrum of impurity of new sample was compared with the reference spectral library. Rodgers et al. 12 monitored the micronaire of cotton fiber with a portable NIR instrument. Liu et al. classified the cotton samples into different level using the models between cotton samples and their spectra (220-2200 nm), the results indicated that using the model in the bands of 1105-1700 nm could reach an acceptable separa tion; Liu et al. [13][14][15] found there were large differences between the spectra (1200-900 cm −1 ) of the mature and immature cotton fiber. Allen et al. 16 used a FTIR spectral database to classify cotton samples into different impurity levels. Gamble and Foulk 17 built the partial least squares (PLS) models of six botanical trash types using fluores cence spectroscopy and the models of leaves and hull were capable of predicting individual trash component with a high degree of confidence. GaitánJurado et al. 18 deter mined the moisture content and impurity level of SC using NIR spectra. For moisture, the best model was obtained using PLS regression method, the first derivate, drying method, standard normal variate (SNV), and detrending as the pretreatment method.
The geometric methods and the spectroscopy methods are often combined to detect cotton impurities and other features of cotton fiber. For example, Zhang et al. 19 inspected foreign matter the surface of LC using the method of liquid crystal tunable filter hyperspectral image with spectral ranging from 900 to 1700 nm. Mustafic et al. 20,21 found fluorescent imaging apparatus with blue and UV light excitation sources could be a promising method for cotton foreign matter detection. Jia and Ding 22 trans formed the discriminations of the absorption characteristic of cotton fibers and foreign fibers at the NIR band to image features, then an image segmentation algorithm was selected for extracting foreign fiber objects from cotton background. The high volume instrument (HVI) incorpo rates both spectroscopy and imaging, can measure multi indexes of LC, such as the color grade, the length, the strength, and the fineness of cotton fiber, etc. HVI can also measure the impurity content of LC in the way of counting the number of impurities and the percentage of total sur face area of impurities. 23 In summary, it can be found that there are many researches to detect the impurity of LC, while there is almost no research on the detection of the impurities of SC. The impurity content of SC is much larger than the impurity content of LC, especially large quantity of stems, hulls, and leaves of cotton plant become comingled with cotton fiber in SC harvested with cottonpicker (SCHCP); The other reason is SC contains cotton seeds while LC do not. In this research, NIR spectroscopy is investigated to detect the impurity content (mainly refers to stems, hulls, and leaves of cotton plant) of SCHCP. The partial least square (PLS) models between the diffuse reflection NIR spectra and the impurity content of SCHCP.

Materials and methods
The whole experiment mainly contains four steps: sam pling and sample preparation, NIR spectra acquisition, separate impurity from cotton, mathematical modeling, as shown in Figure 1.

Sampling and preparation
As we known, different geographical locations have dif ferent climatic conditions, so the cotton maturity period will also be different, which further leads to different impurity rate of cotton. In order to ensure the represent ativeness, three ginneries were selected from the three most representative producing areas in Xinjiang, Chinese main cotton production area. In each ginnery, two cotton modules with the volume of 10 × 2 × 2.5 m 3 and the weight of 10 tons were randomly selected. In each module, 60 samples were collected on the mod ule's both longest sides. On each side, 30 samples were collected along roughly equidistant three lines. Along each line, 10 samples were collected roughly equidis tant under the surface of the module not less than 10 cm. The weight of each sample ranged from 100 to 150 g. In conclusion, a total of 360 samples were collected from three ginneries, with 120 samples per ginnery. However, there are seven sample bags were broken during the transport. Therefore, 353 samples were actually used for modeling and analysis.
Before acquiring NIR spectral data, 20 g SC was sepa rated from each sample and kept in the laboratory under the condition at a constant temperature of (20 ± 1)°C and relative humidity of (65 ± 2)%RH for more than 24 h.
Then, all of the samples were sealed with PE self sealing bags as shown in Figure 2.

NIR spectral acquisition
The NIR spectra were acquired with an FTNIR spectrom eter Nexus (Thermo Electron Corp., Madison, Wisc., USA) with a smart diffuse reflectance accessory, an InGaAs detector over a range of 12,000-4000 cm −1 and the light source was a builtin 50 W quartz halogen lamp. The back ground was the NIR spectra of a Teflon plate. Before spec tral acquisition, all samples were kept at the environment of  20 ± 1°C and 65 ± 2%RH last for more than 24 h. The spectra were acquired at a resolution of 8 cm −1 and 32 scans over the range of 12,000-4000 cm −1 .
The sample pool is a cylinder (Figure 3(a)) made of special metal material with the internal diameter of 10.16 cm and height of 6.35 cm. One end of the cylinder is sealed with low OH quartz glass which hardly absorb near infrared light and the other end is open for loading sam ples. In this research, the average spectrum of five spectra which were gained from five different positions (1-5 marked in Figure 3(a)) was used to represent the according sample. Because it is difficult to characterize one sample with the spectrum gained from just one position for the spot of the spectrometer is too small while the surface of  the sample is quite large. A gold coated compression tool was used to press the samples to a certain height and den sity (Figure 3(b)). In this way, it can ensure the consistency of the density and the height of different samples, and the stray light can also be prevented.

Trash detection
In this study, the trashes mainly refer to stems, hulls and leaves of cotton and grass ( Figure 4) because they account most of the botanical trashes in SCHCP. As shown in Figure 1 (P3), the trashes in SCHCP are mainly separated in the following three steps: (1) In the process of P3.1, most of the large trashes, mainly refers to cotton stems and hulls, were picked out manually. (2) In the process of P3.2, the SCHCP was ginned with a small roller ginning machine SY20 (roller size 120 mm × 205 mm, rotational speed of 88 rpm, produced by River machinery plant, Xinxiang, Henan, China), which was specially designed to gin small quantity of SC. In this process, the remaining cotton hulls and a part of cotton plant leaves were separated from cotton fiber. (3) In the process of P3.3, the remaining leaves were separated from the cotton fiber with cotton trash analyzer YG041 according to the test method for percentage of trash content in raw cotton (GB/T 6499) (roller size 57.15 mm × 490 mm, rotational speed of the roller 0.9 rpm, produced by Changzhou No.1 Textile Equipment Co., Ltd., Changzhou, Jiangsu, China), which is also specially designed for small quantity of LC.
In fact, although most of trashes could be separated from cotton fiber in the above processes, the LC still con tains a small amount of trashes which is difficult to sepa rate. In general, the trashes content of the cotton samples after trashes analysis is deemed as 0, so the content of this part of trashes is ignored in this research.

Tools and data analysis
Origin 2017 (OriginLab Corporation, Northampton, MA 01060 USA), Omnic 8.0 and TQ Analyst 8.0 (Thermo Electron Corp., Madison, Wisc., USA) were used for spectral pretreatment and modeling. Origin 2017 and Omnic 8.0 provides commonly used spectral preprocess ing and spectral feature extraction methods, such as cal culate the average spectra, the first and second derivative spectra, spectral smoothing, peak fitting, and so on. TQ Analyst 8.0 is a specialized software for infrared spectral modeling, which provides commonly used spectral preprocessing (multiplicative signal correction, baseline correction, spectral smoothing, data format transform) and modeling methods (e.g. distance match, discriminant analysis, simple beer's law, principal component regres sion, PLS, etc.).
Models were formulated which related the FTNIR spectra and trashes content of each sample. The prediction ability of the model is given as root mean squares error of calibration (RMSEC) and root mean squares error of pre diction (RMSEP). The mathematics statistic method Analysis of variance (ANOVA) was used to determine whether there were sig nificant differences between the prediction trash contents and the tested trash contents.

RMSEC
In this research, number of data sets k = 2, N the number of samples in each group, x i avgi the mean in group, x i avgt the mean of all (in this research, x i avgt is the mean of the prediction trashes content and the tested trashes content).

NIR spectra
In section 2.2, the methods of selecting multiple acquisi tion points for each sample and acquiring multiple spectra (32 scans) for each acquisition point were used to improve the representativeness and the signaltonoise ratio (SNR). In addition, several other pretreatment methods have been used to improve the quality of the spectra before modeling. The baseline was corrected with the least squares baseline correction algorithm, then the spectra were smoothed with SavitzkyGolay method (the size of the window was 5 points and the polynomial order was 2), at last the spectra were normalized with the method of dividing by the maxi mum ordinate value ( Figure 5).
The average spectra of the samples from different gin neries over the entire spectral range (12,000-4000 cm −1 ) are compared in Figure 6. There are obvious differences in the raw spectra ( Figure 6(a)); But the differences are becoming smaller after correcting baseline and smoothing ( Figure 6(b) and (c)); At last, the differences almost disap pear after normalizing.
It can also be observed that there are two bands repre senting cotton botanical trashes are observed at about 2050-2100 nm (bands of 4878-4762 cm 1 ) with an OH bend and CO stretch combination, and bands at about 2200-2270 nm (bands of 4545-4405 cm 1 ) with OH and CO stretch combination or CH stretch and CH 2 deforma tion. Despite this, it was found that there were considera ble overlaps among the FTNIR spectra of cotton fiber and varieties of botanical trashes (stems, leaves, hulls, seed coat, seed meat, etc.). Thus the components of cotton fiber and the trashes could not be uniquely identified with the original FTNIR spectra.

Trash content
The main statistical characteristics related to the weight of trashes and the content of trashes are shown in Figure 7.
The weight of the trashes separated in P3.3 (Figure 7(a)) is a little larger than the weight of the trashes separated in P2.1 and P3.2 (Figure 7(c)). Moreover, the trashes sepa rated in P3.3 distributed more uniformly than the trashes separated in P3.1 and P3.2, because the trashes separated in P3.3 are mainly cotton leaves, while the trashes sepa rated in P3.1 and P3.2 contain not only leaves, but also cotton plant stems and hulls whose sizes are larger than leaves and whose amounts are far less than the amounts of leaves. And the uneven distribution of trashes might lead to poor models.
The detailed statistical parameters of Figure 7 is shown in Table 1. The contents of trashes (shown in Figure 7

Models between FT-NIR spectra and trashes content
The quantitative prediction models were established using the software TQ Analyst 8.0. Multiplicative signal correc tion (MSC) was used to eliminate the negative effects caused by different sample shapes and granularity of fiber and trashes; The PLS algorithm was used to build the rela tionship between the NIR spectra and the trashes content because it was known that PLS usually was better than other methods in cotton analysis for the errors caused by the tightness and thickness could be greatly reduced in the transformation of the data matrix composed both depend ent variables and independent variables; To avoid over fit ting, each group of SCHCP samples was divided into five internal cross validation sub groups. In this way, one sub group was used as the prediction set and the calibration model was built on the remain four sub groups. This pro cedure was repeated until each sub group was used as pre diction set; The models between the three different spectral types (spectrum, first derivate and second derivate) rang ing from 12,000 to 4000 cm −1 were established and com pared ( Table 2). It is obvious that the second derivate is most suitable for establishing between NIR spectra of SCHCP samples and their trashes contents (before optimizing). The most impor tant reasons are the nonuniform translation in the baseline caused by different sample tightness and the significant nonuniform rotation caused by different sample heights. Although all of the samples were pressed with the same compression tool, the variations in trashes contents and the inconsistent traits of cotton seeds, etc. result in differences in sample tightness; And then the variations in sample tightness resulted in different heights which not only caused the nonuniform translation in the baseline, but also caused nonuniform rotation. It is known that the first derivative and the second derivative can reduce the effect of spectral translation and rotation to a certain extent. 24 Therefore, the models between NIR spectra of SCHCP and their trashes contents should be established using the sec ond derivate. Figure 8 shows the relationship between the actual trashes contents and the calculated trashes contents of the SCHCP samples.
Although the results of calibration set are quite good, the results of the prediction set are not as good as the cali bration set. It is known that abnormal samples (including physical or chemical properties abnormal, spectral abnor mal) might cause significant decrease in model quality. In section 3.2, the samples which are abnormal in trashes content have been distinguished. And here, the methods of mahalanobis distance, chauvenet test, and the leverage were used to distinguish the samples which were abnormal  in FTNIR spectra. In addition, the samples in the predic tion set would be deemed as abnormal samples if the trashes contents exceed the trashes content coverage of the calibration set. In total, seven abnormal samples in Aler group, seven abnormal samples in Kuitun group, and four abnormal samples in Shihezi group were excluded from the models. The parameters of optimized models are shown in Table 3.
The relationship between the calculated trashes con tents and the actual trashes contents after excluding the abnormal samples is shown in Figure 9. It is obvious that the models have been significantly improved (The average RMSEC reduces to 0.072 g and the average RMSEP reduces to 0.158 g).
The ANOVA method is used to test whether there are significant differences between the actual trashes contents  Table 4 from which, it is obvious that the trashes con tents calculated with the models are consistent with the actual trashes contents.

Conclusions
This study is performed to investigate the potential of FTNIR spectroscopy for the detection of botanical trashes contents of SCHCP samples. PLS models are built with the original FTNIR spectra, the first derivate and the second derivate in bands of 12,000-4000 cm −1 , the results indicates that the second derivate is most suit able for establishing the NIR models to predict the botanical trashes contents of SCHCP samples, the cor relation coefficient of optimized model is as high as 0.985 (calibration set), 0.973 (prediction set) and the RMSEC is 0.072 g and RMSEP is 0.158 g. The ANOVA analysis results also certify the trash contents calculated with the models are consistent with the actual trashes contents. Future studies will include more calibration samples and a NIR knowledgebased expert system (NIRKBES) for the detection of trashes content of SC will be developed.

Declaration of conflicting interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding
The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: We are grateful for the financial support provided by the National Natural Science Foundation of China (No. 31601224), the Natural Science Foundation of Anhui Provincial Department of Education No. KJ2019A0650 and KJ2020ZD004, the key research and development plan of Anhui province No. 202104a06020014.