Oil chemometrics and geochemical correlation in the Weixinan Sag, Beibuwan Basin, South China Sea

The oil–oil and oil–source rock correlations, also termed as geochemical correlations, play an essential role in the construction of petroleum systems, guidance of petroleum exploration, and definition of reservoir compartments. In this study, the problems arising from oil–oil and oil–source rock correlations were investigated using chemometric methods on oil and source rock samples from the WZ12 oil field in the Weixinan sag in the Beibuwan Basin. Crude oil from the WZ12 oil field can be classified into two genetic families: group A and B, using multidimensional scaling and principal component analysis. Similarly, source rocks of the Liushagang Formation, including its first, second, and third members, can be classified into group I and II, corresponding to group B and A crude oils, respectively. The principle geochemical parameters in the geochemical correlation for the characterisation and classification of crude oils and source rocks were 4MSI, C27Dia/C27S, and C24 Tet/C26 TT. This study provides insights into the selection of appropriate geochemical parameters for oil–oil and oil–source rock correlations, which can also be applied to other sedimentary basins.


Introduction
Statistical and mathematical methods have been essential tools in analytical chemistry for analysing data and recognising patterns (Ferreira et al., 2018;Frank et al., 1981;Kumar et al., 2014). With the development of microcomputers since the 1970s, several complex mathematical algorithms have been developed and widely used by analytical chemists. As a result, a new interdisciplinary field has emerged: chemometrics. The chemometric method can simultaneously process the measured parameters or variables based on multivariate statistical treatments. Therefore, this method has been widely used in several disciplines after its development in only a few decades (Bevilacqua et al., 2017;Chabukdhara and Nema, 2012;Madsen et al., 2010;Wang et al., 2020); however, its application in the field of petroleum geochemistry began in the 1980s (Kvalheim et al., 1985;Øygard et al., 1984;Peters et al., 1986;Telnaes and Dahl, 1986;Zumberge, 1987).
The Weixinan sag, which is the main oil-rich sag in the Beibuwan Basin, is rich in oil and gas resources, with a volume of 11.63 Â 10 8 t (Xu et al., 2012;Yan et al., 2019). Numerous studies on the accumulation mechanism of oil and gas in this area include the evaluation of source and reservoir rocks, oil-source rock correlations, oil and gas migration, and the timing and modelling of hydrocarbon migration and accumulation Hu, 2000;Huang et al., 2017;Liu et al., 2008;Xie et al., 2014). The discovered oils in the Weixinan sag are a result of multi-stage hydrocarbon generation from source rocks in the Liushagang Formation. Knowledge and studies on oil sources are fundamental in understanding the oil accumulation mechanisms. Traditionally, the Liushagang Formation in the Weixinan sag is divided into three members based on the sequence stratigraphy profiles. The second member of the Liushagang Formation or even the entire Liushagang Formation is considered the main source rock (Fang et al., 2013;Liu et al., 2013;Xu et al., 2011;Yang et al., 2009). However, Fu (2018 and Fu and Liu (2018) suggest that the oil shale at the bottom of the second member of the Liushagang Formation is the dominant source rock in the Weixinan sag. In a similar investigation of source rocks, a few other researchers reported that the lower part of the first member and the upper part of the third member of the Liushagang Formation are also good source rocks in the study area (Jin, 2020;Xu et al., 2012). The relationship between the discovered crude oils and source rocks in various intervals of the Liushagang Formation is unclear, although the discovered crude oils have connections with the potential source rocks in each interval of the Liushagang Formation. Therefore, it is necessary to classify the types of both crude oils and source rocks in the Weixinan sag. In addition, the selection of geochemical parameters for oil-oil and oil-source rock correlations in the Weixinan sag is confusing, and the classification of oils and source rocks based on different geochemical proxies in previous studies is also inconsistent (Fan et al., 2014;Huang et al., 2011;Xu et al., 2012;Yang et al., 2019;Zhou, 1993). Therefore, it is necessary to construct a series of characteristic geochemical parameters with broad significance as a reference to classify crude oils and source rocks in the study area.
To address the uncertain correlations between oil and source rocks in the Weixinan sag in the Beibuwan Basin in South China Sea, chemometric methods were applied to calculate the oil-oil and oil-source rock correlations using published data of molecular biomarker compounds in source rocks and crude oils from the WZ12 oil field in the Weixinan sag. Several characteristic geochemical parameters were selected using chemometric methods for the classification of the source rocks and crude oils to achieve more comprehensive and precise oil-source rock correlations. This study provides insights into the importance of potential source rocks in the Liushagang Formation and a better understanding of oil-source rock correlations in the Weixinan sag. The method used in this study can also be used in similar studies in other basins.

Geological background
The Beibuwan Basin is one of the four oil basins in the Northern South China Sea continental shelf area (Figure 1(a)), with an area of $40,000 km 2 (Huang et al., 2011). The major sags in the Beibuwan Basin include the Weixinan, Wushi, Maichen, Haizhong, Fushan, Leidong, Haitoubei, Changhua, and Lemin sags. Wushi and Weixin sags are hydrocarbon-rich as confirmed by drilling. They are the main sags for oil-gas exploration in the Beibuwan Basin, with promising exploration potential. Over the past 30 years, more than ten industrial oilfields and a small number of oil-bearing structures have been discovered in the basin, with oil resources of more than 3.0 Â 10 8 tonnes (Huang et al., 2013).
The Weixinan sag is a secondary tectonic unit in the northern sag of the Beibuwan Basin, with the Wanshan uplift in the west and the Qixi uplift in the south, under the control of three large fault zones (Figure 1(b) and (c)). The Weixinan sag can be divided into three subsags A, B, and C, based on the burial depth of the sag base and thickness of sedimentary cap rocks (Guo et al., 2009;Zhou et al., 2019). The WZ12 oil field is close to subsag B. Sedimentary strata in the Weixinan sag are mainly dominated by Cenozoic deposits, containing Paleogenic Changliu, Liushagang, and Weizhou formations, Neogenic Xiayang, Jiaowei, Dengloujiao, and Wanglougang formations, and Quaternary strata in ascending order, with a maximum depositional thickness up to 7000 m (Jin, 2020). The Liushagang Formation is dominated by lacustrine and deltaic deposits and can be divided into three members (first, second, and third) from top to bottom (Fu et al., 2017). The main source rock in the Liushagang Formation is semi-deep lacustrine mudstone, shale, and oil shale. For more details on the geological background and the petroleum system of the study area, please refer to (Huang et al., 2017;Zhou et al., 2019).

Methodology
This study used the published data from the WZ12 oil field in the Weixinan sag for 18 source rocks in the first, second, and third members of the Liushagang Formation and 18 crude oil samples (Zhou et al., 2019). We selected a newly introduced chemometric method-multidimensional scaling (Wang et al., 2016(Wang et al., , 2018a and principal component analysis (PCA)-to solve the problems of the oil-oil and oil-source rock correlations in the study area. Among these methods, MDS method is a user-written calculation program (for more details, please refer to Wang et al. (2016)). The PCA was performed using Past 3X software. Fourteen ratios of biomarker parameters of the source rocks and crude oils (Pr/Ph, C 27 %, C 28 %, C 29 %, 4MSI, Ol/H, C 23 /H, Ga/H, C 27 Dia/C 27 S, ETR, C 19 /C 23 TT, C 24 Tet/C 26 TT, C 24 /C 23 TT, and C 22 /C 21 TT) were used in the chemometric analysis. Appendix 1 presents the details on the proxies and relevant data. These selected parameters are less affected by secondary factors, such as biodegradation, thermal maturity, and migration (Wang et al., 2018b), similar to the previous studies (Peters et al., 2013;Wang et al., 2016). Therefore, highly mature or heavily biodegraded oil samples should be excluded from the dataset (Peters et al., 2016).

Classification of source rocks using chemometrics
Source rocks in the Liushagang Formation can be divided into two groups: group I and II, based on the geochemical characteristics and the results of MDS. The results of the classification of the first, second, and third members of the Liushagang Formation in the Weixinan sag are displayed in Figure 2. In Figure 2(a) and (b), different variables were employed for MDS to evaluate the influence of the number of parameters; 9 or 14 parameters for MDS did not significantly affect the classification results.
Group I includes the source rocks in the first member and part of the third member, and group II contains the source rocks in the second member and part of the third member of the Liushagang Formation. The source rocks in the third member of the Liushagang Formation are derived from wells U, R, V, and B1. Two source rock samples in wells U and V are classified as group I, whereas three source rocks in wells R and B1 are classified as group II (Appendix 1). According to the single well profiles of B1, P, and R wells and the depth of the samples, three source rock samples in group II are predominantly located in the upper portion of the third member of the Liushagang Formation, whereas the two source rock samples in group I are mainly in the lower portion of the third member (Zhou et al., 2019). Therefore, source rocks in group I are the combination of the first member and the lower third member, whereas those in group II are the combination of the second member and the upper third member.
Group I shows a relatively higher average Pr/Ph ratio than group II ($1.98 and 1.50, respectively), probably indicating that the source rock was deposited under more oxic Figure 2. Scores plot for source rocks in the WZ12 oil field of the Weixinan sag using the multidimensional scaling method to classify the groups and evaluate the influence of the number of parameters. The score for the sum of the first two principal components is 96.6%, representing most of the information in the original data set. Biomarkers used in (a) include Pr/Ph, 4MSI, Ol/H, C 23 /H, Ga/H, C 27 Dia/C 27 S, ETR, C 19 /C 23 TT, C 24 Tet/C 26 TT, similar to the oil-source rock correlation, whereas those in (b) include Pr/Ph, C 27 %, C 28 %, C 29 %, 4MSI, Ol/ H, C 23 /H, Ga/H, C 27 Dia/C 27 S, ETR, C 19 /C 23 TT, C 24 Tet/C 26 TT, C 24 /C 23 TT, and C 22 /C 21 TT.
In freshwater lake sediments, 4-methyl C 30 sterane is mainly derived from Dinoflagellates, and the content of algal fossils in sediments is positively correlated with its abundance (Brassell et al., 1986;Goodwin et al., 1988;Ji et al., 2011;Peters et al., 2005). Group II contains relatively higher average 4-methyl sterane/C 29 aaa sterane (4MSI) than group I ($2.40 and 1.43, respectively, Appendix 1), indicating a bloom of lacustrine dinoflagellates during deposition. Meanwhile, source rock samples in group II have a relatively low concentration of oleanane, with an Ol/H ratio ranging from 0.04 to 0.28 (Appendix 1). This observation suggests that organic matter of the source rocks in group II originated from few angiosperms because oleanane is derived from angiosperms (Rullk€ otter et al., 1994). In addition, most source rock samples in group II are characterised by relatively lower C 19/ C 23 TT and C 24 Tet/C 26 TT ratios than group I (0.12-0.70 and 2.13-3.90, respectively, Appendix 1), suggesting little terrestrial higher plant input to organic matter because C 19 tricyclic terpene and C 24 tetracyclic terpene are related to terrestrial higher plants (Noble et al., 1985;Philp and Gilbert, 1986). Interpretation of the origins of organic matter from diverse biomarker proxies suggests that organic matter of source rock samples in group II is mainly from algae and to a lesser degree from terrigenous organic matter. In contrast, group I has relatively low 4MSI, C 19 /C 23 TT, and C 24 Tet/C 26 TT ratios, but some samples have a high Ol/H ratio. Consequently, source rocks in group I have higher input of terrestrial organic matter than source rocks in group II.
The results of the classification of source rocks are different from those of Zhou et al. (2019). Zhou et al. (2019) suggest that source rocks in the second and third members of the Liushagang Formation have similar organofacies and are different from those of the first member of the Liushagang Formation. However, the third member of the Liushagang Formation can be divided into two parts (upper and lower submembers) in accordance with the seismic information and sedimentary evolution . Thus, the upper and lower third members of the Liushagang Formation should be discussed separately. This is also supported by hydrocarbon generation potential of the Liushagang Formation ( Figure 3). Source rocks in the second and the upper third members of the Liushagang Formation have high hydrocarbon generation potential, whereas those in the first and lower third members have relatively low hydrocarbon generation potential (Huang et al., 2017;Xie et al., 2014). These variable source rock characteristics are possible related to the lacustrine depositional environment because the Paleogene lacustrine mudstones are dominant source rocks of the Beibuwan and Pearl Mouth basins in the South China Sea (Huang et al., 2003;Robison et al., 1998). Moreover, the members with a strong influence of deep lakes or semi-deep lakes are distributed in the first, second, and upper third members of the Liushagang Formation . Organic matter input in two groups of source rocks also verifies the MDS results. Source rock in group I has more characteristics of terrestrial organic input than that of source rock in group II in terms of biogenic sources.
Oil-oil and oil-source rock correlations based on chemometrics Zhou et al. (2019) observed only one group of crude oil in the WZ12 oil field based on the biomarker ratios such as ETR, C 24 Tet/C 26 TT, and C 23 /H. This oil group is mainly related to the source rocks of the same organofacies in the second and third members of the Liushagang Formation (Zhou et al. (2019). However, the results of the oil-oil and oilsource correlations based on the chemometric methods in this study classify two groups of crude oils, group A and group B, in the WZ12 oil field. Group A crude oil is characterised  (Fu and Liu, 2018;Xie et al., 2014). TOC represents total organic carbon; S 1 and S 2 represent Rock-Eval pyrolysis data. by a higher 4MSI, ranging from 1.22 to 3.42, but lower C 19 /C 23 TT and C 24 Tet/C 26 TT, ranging from 0.24 to 0.36 and from 0.91 to 2.36, whereas group B crude oil has higher C 19 / C 23 TT and C 24 Tet/C 26 TT, ranging from 0.50 to 0.66 and from 3.05 to 3.48, but lower 4MSI, ranging from 0.80 to 1.37 (Appendix 1). These data indicate that group A has more terrigenous organic matter input than group B. Group A has a strong correlation with the source rocks in group II, whereas group B is mainly derived from the source rocks in group I (Figures 4 and 5). In contrast to the results obtained by Zhou et al. (2019), nine sourcerelated biomarker proxies were comprehensively analysed using the chemometric method in this study, including Pr/Ph, 4MSI, Ol/H, C 23 /H, Ga/H, C 27 Dia/C 27 S, ETR, C 19 /C 23 TT, and C 24 Tet/C 26 TT. More than three variables can be synthesised simultaneously in the chemometric method using the numerical matrix calculation, which highly improves the analysis results and interpretation of geochemical data (Peters et al., 2005;Wang et al., 2018b). As a result, the geochemical correlation results of crude oils and source rocks in the WZ12 oil field based on nine biomarker parameters become more accurate. In addition, the biomarker parameters of crude oil samples indicate that group A and group B crude oils correspond to group II and group I source rocks, respectively, in terms of biological sources of organic matter. The former one is derived mainly from algae, whereas the latter one is from terrestrial higher plants. These characteristics are consistent with the features of the two types of source rocks, which substantiate the results of geochemical correlation based on chemometrics.
Oil-oil and oil-source rock correlations based on characteristic geochemical parameters Table 1 shows the relative contributions of the nine selected biomarker ratios to the first three principal components, which are also referred to as loadings in PCA. Generally, the parameter with a large absolute value of the loading represents key information on each principal component (PC) (Wang et al., 2018a(Wang et al., , 2019. The loading on PC1 is positively linked to C 24 Tet/C 26 TT. The loading on PC2 is negatively related to C 27 Dia/C 27 S, but positively linked to C 24 Tet/C 26 TT. PC2 is mainly controlled by C 27 Dia/C 27 S because the loading values of C 24 Tet/C 26 TT on PC1 are significantly higher than those of PC2. Similarly, PC3 is positively correlated with 4MSI. The C 27 Dia/C 27 S ratio is likely an indicator of carbonate-derived or argillaceous rock-derived oil, but it is also affected by maturity and biodegradation (Mello et al., 1988;Moldowan, 1978, 1979). As shown in Figure 6, the C 27 Dia/C 27 S ratio displays no obvious correlation with the C 29 bb/(aa þ bb)  sterane ratio, indicating that this proxy is less influenced by thermal maturity. In addition, there is no evidence of significant biodegradation among the studied samples (Zhou et al., 2019). Thus, C 27 Dia/C 27 S is largely controlled by the source rock lithology in the study area. C 24 Tet/C 26 TT is generally used to reflect the higher plant organic matter input (Philp and  Gilbert, 1986), whereas 4MSI is commonly associated with the dinoflagellates input (Peters et al., 2005). Therefore, PC1 represents higher plant input, PC2 represents carbonatederived source rocks, and PC3 represents the dinoflagellates input. Crude oil and source rocks from the WZ12 oil field are classified into two groups (Figure 7). Group A crude oils are originated from group II source rocks, whereas group B crude oils are primarily derived from group I source rocks. This classification result is consistent with the chemometric results. Thus, we propose that 4MSI, C 27 Dia/C 27 S, and C 24 Tet/C 26 TT ratios are the characteristic geochemical parameters for the classification of crude oils and source rocks from the WZ12 oil field in the Weixinan sag.

Conclusions
In this study, two types of source rocks (group I and group II) and crude oils (group A and group B) were classified in the WZ12 oil field using multidimensional scaling and principal component analysis. Group I is the combination of the first member and lower third member of the Liushagang Formation, whereas group II is the combination of the second member and upper third member of the Liushagang Formation. Group A crude oil is mainly derived from group II source rocks, whereas group B crude oil is originated from group I source rocks. The 4MSI, C 27 Dia/C 27 S, and C 24 Tet/C 26 TT ratios are the effective geochemical parameters for the classification of crude oils and source rocks in the WZ12 oil field of the Weixinan sag. This study provides a practical application of the chemometric method for the selection of parameters for geochemical correlations.
Appendix 1. Molecular parameters for source rock extracts and crude oils of the WZ12 oilfield in the Weixinan Sag.