Application of mid-infrared spectroscopy for the quantitative and qualitative analysis of organic matter in Holocene sediment records

The organic matter composition of lake sediments influences important in-lake biogeochemical processes and stores information on environmental changes. Extracting this information is notoriously difficult because of the complexity of the organic matter matrix, which routinely imposes trade-offs between high temporal and analytical detail in the selection of methods of analysis. Here, we demonstrate the potential of diffuse reflectance Fourier transform infrared spectroscopy (DRIFTS) for achieving both of these objectives using untreated bulk samples from two Holocene lake-sediment cores from central Sweden. We develop quantitative models for sediment total organic carbon (TOC) with the same predictive abilities as models based on samples diluted with KBr and qualitatively characterize the organic matter using a spectra processing-pipeline combined with principal component analysis. In the qualitative analysis we identified four organic matter sub-fractions and the interpretation of these is supported and further advanced with molecular data from pyrolysis-gas chromatography/mass spectrometry (Py-GC/MS). Within these organic fractions, compound groups such as aromatics, lignin, aliphatics, proteins and polysaccharides were identified by means of DRIFTS and the analyses and processes outlined here enables rapid and detailed quantitative and qualitative analysis of sediment organic matter. The DRIFTS approach can be used as stand-alone method for OM characterization with high temporal resolution in Holocene sediment records. It may also function as a screening process for more specific analyses of sample subsets, such as when coupled with pyrolysis-GC/MS to further tease apart the OM composition, identify sources and determine degradation status.


Introduction
In the analysis of lake sediment as well as other environmental archives spanning hundreds to thousands of years or longer, we are often confronted with the fundamental challenge of balancing the practical costs of high sampling (i.e.temporal) resolution against the costs of high analytical detail.The first -high sampling resolution -is essential to determine the timing and rates of sedimentary changes, the second -high analytical detail -is essential for developing specific insights into compositional changes and biogeochemical processes, and the balance between the two is dictated by the study questions.
Few analytical techniques are able to provide both high sampling resolution and analytical detail.For the geochemical composition of the sediment, core scanning may offer such solution by providing both a very high resolution (0.2 mm) and semi-quantitative data (e.g.Kylander et al., 2011).For the organic fraction we still largely rely on analyses of discrete samples to obtain information on organic matter (OM) composition, using common methods ranging from inexpensive but less-detailed methods such as % loss on ignition, total organic carbon (TOC) and nitrogen (TON) analysis (Hu et al., 2001), and Rock-Eval pyrolysis (Wohlfarth et al., 2004) to time-consuming and costly, but more detailed lipid biomarker analysis (Holtvoeth et al., 2010), solid-state 13 C nuclear magnetic resonance spectroscopy (Orem et al., 1991) and mass spectrometry-based methods (Ninnes et al., 2017).
One analytical method that holds potential for both high sampling resolution and analytical detail of the OM fraction is midinfrared spectroscopy (MIR).Sample preparation for MIR is rapid; for attenuated total reflectance (ATR) spectroscopy sample preparation only involves drying and grinding samples before analysis and for diffuse reflectance infrared Fourier transform spectroscopy (DRIFTS) typically also dilution with a powdered alkali halide transparent in the infrared region, such as potassium bromide (KBr).The purpose of diluting samples with KBr is to reduce specular reflectance distortion, which otherwise may result in reststrahlen bands with regions of strong absorption and reflection, such as the band reversal common in untreated, Si-Orich samples <1200 cm −1 (Nguyen et al., 1991).DRIFTS can, however, just like ATR spectroscopy, also be performed on untreated samples (Alaoui et al., 2011), which reduces the preparation time of DRIFTS further and does not consume sample material.With an analysis time of only a few minutes per sample, approximately 100-150 samples can be processed daily with ATR, and approximately 100 samples daily with DRIFTS if accounting for packing and emptying of sample cups and the analysis, and up to 300 samples daily if accounting only for the analysis.With minimal maintenance and consumable costs (e.g.liquid N coolant for DRFTS), high sampling resolution is certainly achievable.
High analytical detail can be achieved due to the highly resolved character of the spectral information (Kubo and Kadla, 2005).MIR collects information on the frequencies of molecular vibrations in samples over the spectral range ~4000 to 400 cm −1 (2.5-25 µm).Frequencies of these vibrations are linked to specific molecular structures and can therefore be assigned to absorption bands of molecules from different functional groups or compound classes.Published literature on such spectra-structure correlations for band interpretation is widely available, spanning from individual compounds to various complex matrices, which allows for comprehensive information extraction.The amount of extractable detail can, however, be affected by the sample preparation approach.For organic samples, where specular reflectance is expected to be low or absent, the use of untreated samples may be advantageous because the higher sample concentration enhances relatively weak absorption bands (Nguyen et al., 1991).Additionally, MIR spectra are highly reproducible over time on the same instrument (Aryee et al., 2009;Hofko et al., 2017), and between similar instruments assuming appropriate sample and data handling (Hofko et al., 2018;Mirwald et al., 2022).
The main challenge with the MIR technique is linked to data interpretation, more specifically to discriminating between molecules with functional groups that absorb IR radiation at similar frequencies, so-called band overlap (Tinti et al., 2015).Band overlap occurs for all complex matrices and may prevent identification of individual compounds or even groups of compounds.A considerable portion of the MIR spectra complexity can be handled with multivariate chemometric analysis, such as partial least squares regression (PLSR) and principal component analysis (PCA) (e.g.Alaoui et al., 2011;Martinez Cortizas et al., 2021b;Pérez-Rodríguez et al., 2016;Rosén et al., 2010).These methods can resolve many issues linked to band overlap through identification of dependence, or patterns, among multiple variables.For example, Martinez Cortizas et al. (2021b) provide a detailed compositional analysis of a peat sequence using a semiautomated peak selection processing pipeline (Álvarez Fernández and Martínez Cortizas, 2020) coupled with PCA, and verify their analysis against independent molecular OM data obtained with pyrolysis-gas chromatography/mass spectrometry (Py-GC/ MS).For lake sediment, the most common application so far is multivariate prediction models for quantification of major and minor sediment components based on DRIFTS, including total organic and inorganic C, biogenic silica (bSi), charcoal and common minerals (Brisset et al., 2013;Cadd et al., 2020;Hahn et al., 2013Hahn et al., , 2018;;Meyer-Jacob et al., 2014;Rosén et al., 2010Rosén et al., , 2011;;Vogel et al., 2008).Only more recently has multivariate chemometric analysis been used for extraction of qualitative information on OM composition (Maxson et al., 2021;Pérez-Rodríguez et al., 2016).
Our objective here is to evaluate DRIFTS on untreated samples as a time-and cost-efficient method to facilitate high-resolution analyses of changes in OM composition in Holocene lake-sediment cores.To do so we address the following aims: (1) quantify the major organic sediment fraction, TOC.For this purpose we use surface sediment samples from a large number of lakes in northern Sweden to develop a general northern Swedish calibration model, subsequently used to predict TOC content in two Holocene sediment cores.One of these records is then used to develop a local calibration model and the performance of the two models compared and (2) determine the qualitative OM composition using wavenumber assignment based on known information about MIR spectra.To support our evaluation of the compositional information indicated by DRIFTS, we analyze the same two Holocene lake-sediment cores that we have previously studied using Py-GC/MS (Ninnes et al., 2017).

Samples
The training set used to develop the general calibration model constitutes 93 lake surface sediment samples from a northern Swedish training set (Bigler and Hall, 2002;Rosén et al., 2011).These lakes are found at 169-1183 m a.s.l., with catchment vegetation ranging from boreal forest to alpine/tundra vegetation and even barren ground.Surface areas are all <20 ha and maximum depths range from 1.5 to 16 m (Bigler and Hall, 2002).
The two Holocene sediment cores come from lakes Dragsjön and Lång-Älgsjön; both small headwater lakes located in the southern boreal zone of central Sweden (60°23′60″N, 15°7′23″E; Figure S1).These neighboring lakes are situated ~430 m a.s.l. and both have surface areas of 5 ha.Dragsjön has a maximum depth of 6 m and a catchment to lake ratio of 22, while Lång-Älgsjön has a maximum depth of 12 m and catchment to lake ratio of 13.The surrounding vegetation is dominated by Pinus sylvestris and Picea abies forest on shallow, nutrient-poor glacial till.Both lakes are fringed by floating Sphagnum mats and in direct connection with extensive blanket peatlands.Dragsjön was sampled in 2014 and Lång-Älgsjön in 1998.Dating and initial age-depth modeling were done by Ninnes et al. (2017) and Meyer-Jacob et al. (2015) for Dragsjön and Lång-Älgsjön, respectively, and both age-depth models updated by Myrstener et al. (2021).
Additionally, five reference samples were analyzed for comparison with published spectra in the literature to verify that spectral features necessary for accurate compound assignment can be identified in the DRIFT spectra of untreated samples.These samples include three different types of organic materials (wood, peat, and green algae), bSi isolated from marine sediment and a Na-rich montmorillonite clay (Table S1).

DRIFT spectroscopy
Surface, core and reference samples were all scanned untreated (no KBr added) using a Bruker Vertex 70 equipped with a N-cooled Mercury-Cadmium-Telluride detector, a KBr beam splitter and a HTS-XT accessory unit.Each sample was scanned 128 times at a resolution of 4 cm −1 for wavenumbers 4000-400 cm −1 .The analysis was performed in a temperature-controlled laboratory (25.0 ± 0.5°C), in which the samples were allowed to equilibrate in a desiccator for at least 12 h prior to the measurement.A gold plate was used as background and a total of 97 and 91 samples were analyzed for Dragsjön and Lång-Älgsjön sediment cores, respectively.Additionally, four of the reference samples were scanned both untreated and diluted with KBr (5.5 mg sample to 250 mg spectroscopic-grade KBr), and 61 samples from Dragsjön scanned diluted with the purpose of quantifying sediment bSi content using the model by Meyer-Jacob et al. (2014).For all diluted samples KBr was used as background.

TOC content
For the surface samples from the Northern Swedish training set, TOC content was analyzed by Rosén et al. (2011), using a Perkin Elmer 2400 series II CHNS/O elemental analyzer operated in CHN mode.TOC for these samples ranges from 0.7% to 36.0%(dry mass), with a mean of 19 ± 8% (Table S2).The TOC contents for Dragsjön (n = 58, TOC 2.7-33.6%)and Lång-Älgsjön (n = 29, TOC 0.9-43.1%)were analyzed by Ninnes et al. (2017) and Meyer-Jacob et al. (2015), respectively, on a Flash 2000 Organic Elemental Analyzer (Thermo Fisher Scientific) at the Swedish University of Agricultural Sciences (SLU) in Umeå.Carbonates were not present in either data sets based on inspection of DRIFT spectra and samples therefore not pre-treated with acid before elemental analysis.

Py-GC/MS
The Py-GC/MS analyses of the two lake-sediment cores were performed by Meyer-Jacob et al. (2015) and Ninnes et al. (2017).Briefly, 200 ± 20 μg of sediment was pyrolyzed in a Frontier-Labs PY-2020iD oven (450°C) connected to an Agilent 7890A-5975C GC/MS system.Peak integration and identification were performed using a data processing pipeline under the "R" computational environment (Tolu et al., 2015).Compounds were identified using the software "NIST MS Search version 2.0" containing the library "NIST/EPA/NIH 2011" and additional spectra from published studies (Tolu et al., 2015).Because the records were analyzed during different sessions, data are reported as relative abundances, that is, for each sample, the peak area of each identified compound was normalized against the summed peak area for of all identified compounds.Data was handled in this way because Py-GC/MS can be associated with strong instrument sensitivity drift with time, meaning that peak area of detected compounds for samples analyzed during different sessions cannot be directly compared.Further details about the method can be found in Tolu et al. (2015) and Ninnes et al. (2017).Data used in this study include 145 compounds, here grouped into 28 compound groups (Table 1), which have been identified in both lake-sediment cores by Ninnes et al. (2017), but here with the additional inclusion of the compound group fatty acids (Table S3).

Statistical analyzes
A first step in our analyses was to confirm that undiluted DRIFT spectroscopy can be used to estimate the TOC content in lake sediments.For this initial proof-of-concept we first developed a general calibration model using DRIFT spectra of untreated samples and the corresponding TOC concentrations in the 93-lake northern Swedish training set (Bigler and Hall, 2002;Rosén et al., 2010).By building a model on samples from a large number of lakes it is possible to compare the performance of our model to similar, existing models but which are based on spectra from samples diluted with KBr.For external validation, we applied the general northern Swedish model to the Holocene sediment cores from Dragsjön and Lång-Älgsjön.To demonstrate that our approach also performs well for site-specific conditions the Dragsjön core was then used to develop a local calibration model that was externally validated using the Lång-Älgsjön core.
(Continued) centered and standardized to unit variance (z-scored).The internal predictive performance of the models, that is, how well the DRIFT spectra predicts TOC content for the training sets, was evaluated with the coefficient of determination from seven-fold cross-validation (R 2 CV ) and with the root mean squared error of cross-validation (RMSECV, in % TOC).The external predictive ability of the models, that is, how well the models predict TOC content when applied to samples not used in the model, but with known TOC contents, was evaluated with the coefficient of determination (R 2 ) and the root mean squared error of prediction (RMSEP, in % TOC).
For analysis of the qualitative OM sediment composition in the cores from Dragsjön and Lång-Älgsjön the main spectral peaks within the range 3750-400 cm −1 were identified with the andurinha R application (Álvarez Fernández and Martínez Cortizas, 2020).This application provides the average, standard deviation and second derivative spectra, with the latter used for extraction of the main spectral peaks from baseline corrected spectra (3800-3750 and 2200-2210 cm −1 ) for each sediment record.The main DRIFT spectral signals within the reduced spectra were extracted with PCA on standardized absorbance data in the R package "Psych" with a varimax rotation applied to maximize the variable loadings on the components.Principal component analysis was also applied to the 28 pyrolytic compound groups to enable comparison between DRIFTS and pyrolytic OM composition.The comparison was done using bivariate correlation coefficients between respective component scores and passively plotting DRIFT spectra components in the pyrolytic PCA loading plots.All principal components retained for the spectral and pyrolytic PCAs have eigenvalues >1.

Results and discussion
Reference DRIFT spectra of untreated samples and samples diluted with KBr DRIFT absorbance spectra for the untreated reference samples show good coherence with published spectra for similar types of samples but which have been diluted with KBr.Untreated spectra for decomposed Sphagnum peat (Figures 1a and S2) show characteristic bands for O-H stretching of phenolic and aliphatic groups (3600-3200 cm −1 ); C-H stretching of aliphatics (2925-2850 cm −1 , Niemeyer et al., 1992); C=O stretching of esters, aldehydes, ketones and carboxylic acids (1750-1670 cm −1 , Marchessault, 1962), and C-O and O-H stretching of aliphatic alcohols and C-H deformation of aromatics (1155-1000 cm −1 , Orem et al., 1996;Pandey, 1999;Zaccheo et al., 2002).The only difference in the DRIFT spectra between the untreated and the diluted samples is the slightly lower intensity of the broad band at 1120-1000 cm −1 , but specular distortion effects are effectively absent.For bSi, the DRIFT spectra of untreated and diluted samples are near identical above 1500 cm −1 and below this region characteristic bands linked to Si-OH stretching (947 cm −1 ) and Si-O-Si stretching (~800 cm −1 , Farmer, 1974) are easily identified (Figures 1b and  S5).The DRIFT spectra of the untreated sample deviate as expected in the 1400-1000 and 700-400 cm −1 regions.Here, specular distortion in the form of band reversal (1270-1000 cm −1 and 500-400 cm −1 ) and Fresnel peaks (1350 and 530 cm −1 ) mask the broad bands linked to Si-O stretching (1100 cm −1 ) and Si-O bending (470 cm −1 ), which are present in the DRIFT spectra of the sample diluted with KBr.The same distortion effects are present in the DRIFT spectra of untreated samples of the Na-rich montmorillonite clay, with band reversals at 1070, 530, and 470 cm −1 and Fresnel peaks at 1270 and 650 cm −1 (Figures 1c and S6).However, bands necessary for identification of silicate mineral phases remain intact, including bands linked to Si-O stretching (802/786 cm −1 doublet) and bending (695 cm −1 ) of quartz and feldspars (Farmer, 1974), and the strong band between 3700-3600 cm -1 characteristic of structural O-H vibrations in clays (Madejova and Komadel, 2001).Absorbance and second derivative spectra, peak positions, peak assignments and references for peak assignments for all five reference samples are presented in figures S2-6.

Sediment TOC quantification with DRIFT spectra from untreated samples
Our models for TOC quantification in lake sediments use the common DRIFTS technique coupled with PLSR, but contrast with previous work on lake sediments by using untreated samples for analysis.The general calibration for sediment TOC based on DRIFT spectra and TOC content for the northern Swedish training set (n = 93) resulted in a 5-component model with an R 2 CV of 0.96 between the quantitative TOC measurements obtained with conventional elemental analysis and the PLSR-predicted TOC content, and a prediction error (RMSECV) of 1.7% (Table 2 and Figure 3).The first three latent components capture 89% of the variation in the spectral signal (DRIFT spectra, x-variables) and 94% in the TOC (y-variable).Loadings on components 1-3 expressed as weight vectors (w*c) show that wavenumbers 3600-2250, 1780-930, and 600-400 cm −1 are strongly positively correlated to TOC.
The local calibration model based on DRIFT spectra and TOC content for Dragsjön (n = 58) resulted in a 3-component model with an R 2 CV of 0.96 between the quantitative TOC measurements obtained with conventional elemental analysis and the PLSRpredicted TOC content, and a prediction error (RMSECV) of 2.6% (Table 2, Figure 3).The first two latent components capture 78% of the variation in the DRIFT spectra and 92% of the TOC.Loadings on components 1 and 2 show that wavenumbers 3500-3250, 3000-2235, 2214-2200, 1800-1640, and 1150-1026 cm −1 are strongly positively correlated to TOC.Thus, for both models C-H stretching and overtones in aliphatics at 3000-2650 cm −1 ; C=O stretching in carbonyl groups (1800-1640 cm −1 ), and C-O and O-H stretching and deformation in polysaccharides, cellulose and lignin (1180-995 cm −1 ) are important regions.The same three regions underpin calibration models for TOC and OM in soils and lake sediments based on samples diluted with KBr (Janik and Skjemstad, 1995;Rosén et al., 2010Rosén et al., , 2011;;Viscarra Rossel et al., 2006;Vogel et al., 2008), with the 3000-2800 cm −1 region particularly important for the quantification of OM, TOC and humic substances in estuarine sediments (Alaoui et al., 2011;Tremblay and Gagné, 2002).
When the general northern Swedish model is applied to the sediment sequences from Dragsjön (R 2 = 0.96, RMSEP = 6.4% of gradient) and Lång-Älgsjön (R 2 = 0.97, RMSEP = 9.7% of gradient) the downcore trends are captured well, but concentrations are slightly underestimated for most samples (Figure 3).These underestimations likely occur because the samples in the northern Swedish training set to some extent are qualitatively different from the sediments in Dragsjön and Lång-Älgsjön, despite the comparable quantitative TOC ranges.These differences are captured by the contribution of each wavenumber to the PLSR models.While the same general regions are important for both models, there are notable differences particularly within the 3500-3000, 2215-2070, 1600-1200 cm −1 regions (Figure 3).Accordingly, the local model based on samples from Dragsjön more accurately predicts sediment TOC concentrations in Lång-Älgsjön (R 2 = 0.98, RMSEP = 6.2), particularly for samples that fall within the range of concentrations that are well-represented in the calibration model (~23-34% TOC).
For both of our models, validation samples with measured TOC concentrations >34-36% are underestimated due to spectral saturation of these samples, which contributes to the higher RMSEP for Lång-Älgsjön than Dragsjön with the general northern Swedish model.The same issue was reported by Rosén et al. (2011) for samples diluted with KBr when TOC exceeded ~35%.When saturation occurs there is a deviation from the expected linear relationship between concentration and absorbance at the high end of concentrations, such that when the linear calibration model is applied to these samples, predicted TOC is underestimated.Here, dilution with alkali halides can be used to reduce band intensities and create models optimized for higher TOC ranges (Rosén et al., 2011).However, specular saturation will only be a problem for TOC analysis on untreated samples for lakes with highly organic sediment, such as those found in the upper part of the Lång-Älgsjön record, whereas most lake sediments will have TOC concentrations below 35%.For example, in the 93-lake northern Swedish training set only one lake exceeds 35% TOC, and in the 3164-lake training set spanning from polar desert to boreal forest environments used by Rosén et al. (2011), only ~1% of the lakes exceeded 35% TOC.
The performances of both models are nonetheless comparable to other calibration models that use DRIFT spectra, whether untreated estuarine sediment samples (Alaoui et al., 2011) or lake-sediment samples diluted with KBr (Vogel et al., 2008).The most comparable TOC ranges are those by Rosén et al. (2010Rosén et al. ( , 2011)), in which concentrations in lake sediment range from 0 to ~30-41% TOC, for a similar or higher number of samples (n = 94-3164, R 2 CV = 0.83-0.98,RMESCV gradient = 6.1-7.7%).Our results verify that lake-sediment TOC concentrations can be estimated using DRIFT spectra of untreated samples but that the accuracy of estimated values may differ between general and more lake-specific models.We expect that a more robust calibration model would result from a calibration data set that covers the entire qualitative sample range of the predictive data set, alternatively selecting a subset of samples for TOC analysis to develop a lake-specific calibration.Dragsjön and Lång-Älgsjön.Preliminary PCAs on the absorbance spectra for these peaks enabled identification of wavenumbers associated with each of the three main lake sediment fractions (OM, mineral matter and bSi).Groups of wavenumbers identified as capturing mineral matter (i.e.all wavenumbers with negative or positive loadings on a particular component) were subsequently removed in a couple of successive PCAs such that the final PCAs included only organic and bSi components.Minerogenic wavenumbers were removed to make the comparison with the organic pyrolytic data more intuitive and because approaches for qualitative and quantitative analysis of mineral matter with MIR has been done elsewhere (e.g.Hahn et al., 2018;Kanbar et al., 2021;Martínez Cortizas et al., 2021a).Their removal does not affect the order of the organic components nor which components the wavenumbers load on, with the exception of one wavenumber for Dragsjön and four for Lång-Älgsjön.Wavenumbers associated with the bSi fraction were kept in the final PCAs because these bSi components appeared to also carry partial polysaccharide signals.Removing them does not alter the PCA results for Dragsjön, suggesting only a small polysaccharide signal, but weakens the polysaccharide signal for Lång-Älgsjön, suggesting a larger partial signal.For reference purposes the PCAs performed on the initial 72 and 74 for peaks are shown in Figure S8.The final PCAs on wavenumbers associated with OM and bSi include 43 spectral peaks for Dragsjön and 55 peaks for Lång-Älgsjön (Figure 4).

Comparison of spectral and pyrolytic OM composition
Dragsjön.Passive plotting of the four spectral components (Cp1-4 IR ) in the pyrolytic components 1 and 2 (Cp1-2 PY ) space supports the interpretation of Cp3 IR (1126-1049 cm −1 ) as capturing primarily polysaccharides.This is because Cp3 IR plots with the indicators of intact OM in the bottom left quadrant (Figure 5a).These include anhydrosugars, which primarily are indicators of intact polysaccharides from plants (Pouwels et al., 1987), but also found in some algae (Piloni et al., 2021); 4-isopropenylphenol, a pyrolysis product of Sphagnum acid and sensitive to aerobic degradation (Schellekens et al., 2015); and the other polysaccharides from mixed origins (i.e.microbes, plants, algae; Nguyen et al., 2003;Pouwels et al., 1987).
The second spectral component, Cp2 IR (2378, 1768-1452 cm −1 ), plots in the upper left quadrant, half-way between the fatty acids on the negative side of Cp1 PY , here dominated by the relatively labile, short chain fatty acids (C14-16:0, Table 2) that are typical of aquatic OM (Cranwell et al., 1987;Han et al., 2022;Jaffé et al., 2001), and a group of pyrolysis products of algal and bacterial origins in the top right quadrant on the positive side of Cp2 PY .The latter includes hopanoids, which derive from bacterial membrane (Ourisson et al., 1987); the chlorophyll-derived phytadienes; the protein and amino acid-derived diketopiperazines (Fabbri et al., 2012); the mixed origin steroids (Volkman, 1986) and the alkylamides, which are dehydration products of algal macromolecules (Derenne et al., 1993) and possibly also secondary pyrolysis products of amino acids and proteins (Nierop and van Bergen, 2002).This combination of compound groups has previously been interpreted as algal-derived OM in a boreal lake sediment sequence from south-western Sweden (Tolu et al., 2017) and the location of Cp2 IR in the Cp1-2 PY space agrees with the partial protein and aliphatic spectral signal captured by Cp2 IR .
The fourth spectral component, Cp4 IR , interpreted as mainly a bSi signal with a minor polysaccharide signal (1205-1161 and 492 cm −1 ), also plots in the upper left quadrant, partially capturing an algal and bacterial signal and partially a labile OM signal.This supports the bSi (i.e.algal) interpretation of Cp4 IR , but also the partial polysaccharide signal.The duality is also evident in down-core trends of Cp4 IR component scores and DRIFTSbased, PLSR-modeled bSi content (Figure 6a).The trends are coherent until ~1000 CE, but after this point, bSi content deviates from the increasing trend of Cp4 IR , probably due to the increase in polysaccharides content.Finally, Cp1 IR , capturing O-H vibrations of polysaccharides and phenols, C-H stretching of aliphatics, and N-H stretching in proteins (3552-2850 cm −1 ), plots in the upper center of the plot, close to the pyrolytic compound group phenols.
The spectral component Cp2 IR (1768-1452 cm −1 ) plots half way between the wetland indicators n-alkan-2-ones C21-25 and C27-33, and phenols on the positive side of Cp4 PY and the compound groups other polysaccharides and acetamidosugars on the negative side of Cp3 PY .The aliphatic n-alkan-2-ones C21-25 and C27-33 are ubiquitous in lake and wetland environments as components of various aquatic and peat-forming plants (Nichols and Huang, 2007;Ortiz et al., 2011;Qu et al., 1999) and the phenols derive from lignin (Faix et al., 1991;Saiz-Jimenez and De Leeuw, 1986), cellulose (Pouwels et al., 1989), non-lignin plant phenols (Schellekens et al., 2009) and proteins (van Heemst et al., 1999).This location of Cp2 IR strongly supports the mixed aliphatic, protein, lignin and aromatic signal captured by Cp2 IR .Plotting of Cp1-4 IR in the pyrolytic components 4 and 5 (Cp4-5 PY ) space shows that Cp1 IR plots in the upper center of the plot, the positive side of Cp5 PY.This component is driven by the chitinderived acetamidosugars (Stankiewicz et al., 1996) and the protein and amino acid-derived diketopiperazines.In the plot space, however, Cp1 IR is located closer to the phenols and pristene, which further supports that Cp1 IR captures phenolic compounds, in addition to vascular plant-associated aliphatic and chlorophyll derivatives.
Lång-Älgsjön.Passive plotting of the four spectral components (Cp1-4 IR ) in the pyrolytic components 1 and 2 (Cp1-2 PY ) space shows that Cp1 IR , capturing the relative proportions of phenols, polysaccharides and proteins at 3606-2960 cm −1 versus lignin and aliphatics at 1780 and 1514-1381 cm −1 plots on the positive side of Cp1 PY together with a range of polysaccharide, phenol and protein-derived compounds groups (Figure 5b).Cp3 IR , capturing mainly a bSi signal (1234-1126 and 494-447 cm −1 ), also plots with the pyrolytic compound groups to the right on Cp1 PY .The opposite side of Cp1 PY captures the fatty acids, (poly)aromatics and many of the aliphatic groups, which have in common very high relative abundances in the two silt and clay-rich basal samples.They likely represent refractory, or potentially shielded (Han et al., 2022), OM from the time of lake formation.The second spectral component, Cp2 IR , capturing aliphatics and proteins (2920-2852 and 1768-1533 cm −1 ), plots with phytadiene, steroids, alkylamides and diketopiperazines, indicative of algal OM, on the positive side of Cp2 PY .Anhydrosugars also plot near these compound groups, as an indicator of intact OM, though a minor algal signal cannot be excluded.Cp2 IR plots closer to the algal OM compound groups for Lång-Älgsjön than for Dragsjön, probably because this spectral component includes predominantly vibrations of aliphatics and proteins, as opposed to also capturing vibrations of lignin and aromatics for Dragsjön.
In the pyrolytic components 3 and 4 (Cp3-4 PY ) space, only Cp1 IR (3606-2960 cm −1 vs 1780, 1514-1381 cm −1 ) plots on the positive side of Cp3 PY , while the remaining three components plot on the negative side.The pyrolytic compound groups located on the positive side of Cp3 PY likely represent degraded algal and bacterial OM.These groups include the n-alkanes C17-19, representing direct inputs from aquatic plants and bacteria (López-Días et al., 2013;Qu et al., 1999), but also microbial oxidation of n-alkanes and decarboxylation of fatty acids (Ambles et al., 1993;Püttmann and Bracke, 1995); pristene, which are pyrolytic or degradation products of primarily tocopherol; alkanenitrile, derived from pyrolysis of algal macromolecules; the small alkylfurans, which may be derived from plant polysaccharides (Marbot, 1997;Pouwels et al., 1987) and bacterial sugars (Buurman et al., 2005); and the protein-derived alkylpyrroles (Nguyen et al., 2003).Compounds found on the negative side of Cp3 PY are instead the less-degraded compounds such as fatty acids, anhydrosugars, steroids and 4-isopropenylphenol.Along this gradient, Cp1 IR is located near the hopanoids and phenols, suggesting it carries a partial bacterial, more so than algal, signal, while the polysaccharide component, Cp4 IR (1086-1038 cm −1 ), is located furthest left, near a cluster of compounds encompassing both intact polysaccharides (anhydrosugars) and algal OM indicators (alkylamides, diketopiperazines and phytadiene).Unlike for Dragsjön, none of the spectral components plot near the plant OM indicators, including n-alkan-2-ones C21-25 and C27-33, tocopherol, lignin and 4-isopropenylphenol, on Cp4 PY .In the pyrolytic components 4 and 5 (Cp4-5 PY ) space, the bacterial OM indicator hopanoids is located toward the top of the plot on the positive side of Cp5 PY , and pyrolysis products of intact polysaccharides are located toward the bottom, on the negative side.The spectral components are again found along a gradient, with Cp1 IR located the positive side of Cp5 PY , half-way between the hopanoids and the plant OM on Cp4 PY , and Cp4 IR down the bottom close to the other polysaccharides and the anhydrosugars.These patterns further support that Cp1 IR (3606-2960 cm −1 ) represents a degraded OM fraction with a strong phenolic and proteinaceous signal, and the polysaccharides on Cp4 IR (1086-1038 cm −1 ) the most-intact OM fraction in Lång-Älgsjön.
Temporal trends.The OM composition in these two lakesediment cores was previously divided into three major phases based on the pyrolytic organic compounds (Ninnes et al., 2017).To further validate our DRIFTS approach as a stand-alone method to characterize OM composition a brief comparison of temporal changes may be highly informative.The earliest of the three phases identified by Ninnes et al. (2017) is the post-glacial phase (>7900 BCE), which represents the time of lake formation.Mineral matter initially dominates the sediment but as the catchments become vegetated and soils stabilize (Myrstener et al., 2021), the relative abundances of most pyrolytic organic compound groups increase rapidly (Figure 6).This phase is equally apparent in the spectral data, with initially low scores for all organic components, corroborating the relatively low, but increasing OM content.The second phase spans most of the Holocene, until ~300 CE, and is characterized by stable or slowly changing relative abundances of pyrolytic organic compounds.This phase represents a climatically warm, but linearly cooling, phase of the Holocene (Seppä et al., 2009), where most trends for both pyrolytic organic compound groups and spectral components either show unchanging (Lång-Älgsjön Cp1 IR , and Cp3 IR ), or slowly increasing (Cp2 IR both lakes) or decreasing trends (Lång-Älgsjön Cp4 IR ).Slow, millennial-scale fluctuations are seen in seen in Cp1 IR , Cp3 IR and Cp4 IR in Dragsjön.Minor fluctuations in the pyrolytic data during this phase are masked due to the lower sampling resolution in these sections of the cores, but the higher resolution spectral data corroborate that no rapid changes in OM composition occurred, with the exception for Cp3 IR (polysaccharides) in Dragsjön.
The third phase, spanning ~300 CE to present, is the most dynamic, hence also divided into three sub-phases.The first two are characterized by large increases in labile pyrolytic compound groups such as proteins and amino acids, anhydrosugars and 4-isopropenylphenol, but also in the aliphatic n-alkan-2-ones.These changes are probably linked to altered catchment inputs due to expansion of adjacent peatlands and/or the development of the floating, shoreline peat mats (Myrstener et al., 2021;Ninnes et al., 2017).Increased supply of OM from peat or Sphagnum, in combination with more rapid burial, would increase the proportion of polysaccharides in the sediment and also promote preservation of labile compounds (Ninnes et al., 2017).Spectral components that carry protein and polysaccharide signals similarly indicate large changes in OM composition during the first two sub-phases in both lakes.
During the third sub-phase, from 1970 CE to present, diagenetic processes are an important feature in the pyrolytic OM composition and possible to identify due to the high compositional detail provided by this technique, which allows for tracking of individual compounds (Ninnes et al., 2017).Diagenetic signals cannot be identified in the spectral components, nor in the spectra of individual wavenumbers of labile compounds or in ratios between labile and resistant compounds.This is possibly because the main wavenumbers for labile compounds, that is, polysaccharides, are not "clean," but also carry other types of information, mainly on Si-O vibrations.

Summary -Application of undiluted MIR spectroscopy for analysis of lake-sediment OM composition
Our spectral analyses of two lake-sediment sequences highlight the advantages and potential of MIR, whether DRIFTS as used in here or ATR, using untreated samples to extract quantitative and qualitative data on lake-sediment OM composition.Here we outline a strategy for applying DRIFTS to analyze lake-sediment cores simply using dried and ground samples, allowing for high sampling resolution and much greater detailing of OM composition than other rapid and low-cost methods (e.g.C and N elemental analyzes along with their isotopes).
First, as expected, we show that sediment TOC concentrations can be modeled either with a general calibration model using a wide variety of sediment samples or with a local model developed from a subset of samples from the sediment record(s) of interest.These TOC models based on the DRIFT spectra of untreated samples have the same predictive abilities as models based on samples diluted with KBr and can be applied as a first step to provide estimates and to identify subsets of samples to be analyzed for C content in order to develop a more precise local model.Furthermore, our DRIFT spectra and measured C concentrations could potentially be used as a foundation for developing calibration models by anyone using similar DRIFTS instrumentation.Previous work on DRIFT spectra from untreated soil, estuarine sediment and water samples show good results not only for C, but also for quantification of total N and other major components, including humic substances and specific compounds groups such as amino acids and amino sugars (e.g.Alaoui et al., 2011;Matamala et al., 2019;Reeves et al., 2001;Tremblay et al., 2011).Hence, there is great potential for development of quantitative calibrations based DRIFT spectra of untreated samples for broad scale characterization of lake sediments.
Second, and more importantly for our aim, using the opensource processing pipeline with peak selection and reduction based on second derivative spectra developed by Álvarez Fernández and Martínez Cortizas (2020) in combination with PCA, the DRIFT spectra can be used to establish qualitative compositional changes in OM, as well as the minerogenic and to a large extent biogenic sediment fractions.Within the organic fraction, compound groups such as aromatics, lignin, aliphatics, proteins and polysaccharides could readily be identified in our two Holocene sediment records.As such, MIR can be used as a rapid standalone method for OM characterization or function as a screening process for more-specific, time-intensive analyses, such as Py-GC/MS or GC/MS and LC/MS methods requiring sample extraction, but which can further tease apart the OM composition, and identify sources and degradation status.A next step can be to develop predictive models based on MIR and pyrolytic data for modeling of greater sample compound diversity to reveal more about temporal OM changes.

Figure 3 .
Figure 3.The general TOC calibration model (left) with measured sediment TOC concentrations versus DRIFTS predicted TOC concentrations for (a) the PLSR calibration model based on samples from the northern Swedish calibration training set (n = 93); (b) the external model validation using samples from Dragsjön (n = 58) and Lång-Älgsjön (n = 29); (c) loadings on the first three components of the calibration model, expressed as weight vectors (w*c), where positive weight vectors indicate wavenumbers positively correlated to the TOC concentration and negative weight vectors wavenumbers negatively correlated to TOC concentration, down-core trends for measured and predicted TOC concentrations for the external validation for (d) Dragsjön, and (e) Lång-Älgsjön.The local TOC calibration model (right) with measured sediment TOC concentrations versus DRIFTS predicted TOC concentrations for (a) the PLSR calibration model based samples from Dragsjön (n = 58); and (b) the external model validation using samples from Lång-Älgsjön (n = 29); c) loadings on the first three components of the calibration model, expressed as weight vectors (w*c), where positive weight vectors indicate wavenumbers positively correlated to the TOC concentration and negative weight vectors wavenumbers negatively correlated to TOC concentration, and (d) down-core trends for measured and predicted TOC concentrations for Lång-Älgsjön, where orange and dark gray circles represent the validation samples (n = 29) and light gray circles samples with model-predicted TOC only (n = 62).

Figure 4 .
Figure 4. Principal component analyses of DRIFT spectra from untreated samples for (a) the 43 peaks capturing OM and bSi in Dragsjön, and (b) the absorbance spectra for the 55 peaks capturing OM and bSi in Lång-Älgsjön.Wavenumbers are ordered by component based on their strongest positive or negative loading, and colored by sediment fraction; organics (green), bSi (orange) and other variance allocation (light gray).

Table 1 .
List of the 28 pyrolytic organic compounds groups and the individual organic compounds within each group (n = 144).

Table 2 .
Statistical performance of the general Northern Swedish and the local PLSR calibration models for sediment total organic carbon (TOC) resulting from internal (calibration set samples) and external (validation set samples) validation.