Digitalization Improves Enterprise Performance: New Evidence by Text Analysis

The reform of digitalization has deeply influenced the development mode and production methods of traditional production in China. However, the impact of digitalization on the management performance of Chinese listed companies are not well investigated and there is a gap in the establishment of digitalization effective measurement variable. In order to fill this gap, this article constructs a variable to measure the application degree of “digitalization” at the company level. By analyzing the text of the annual reports of Chinese listed companies from 2009 to 2020, the effective variables to measure “digitalization” are determined. Based on this, the development status and determinants of enterprise digital applications and the impact of digitalization on enterprise performance are discussed. Thus, by analyzing the panel data of these companies, the relationship among the age of the enterprise, the knowledge level of the board of directors, the proportion of fixed assets, the company’s assets, and the degree of enterprise digitalization are evaluated. At last, according to the evaluated results, the relationship between the degree of digitalization and the performance of enterprises is analyzed. In this article, our analysis helps to solve the “IT production paradox” problem, and also to provides data and theoretical support for digitalization to improve corporate performance to a certain extent. Furthermore, the digitalization measurement variable laid out in this article provides a foundation for future research on the relationship between digitalization and corporate performance.


Introduction
In the last few decades, the Chinese manufacturing industry has achieved significant development, as well as the proportion of the service industry (Li, 2018).However, there is still a particular gap in the upgrading level of the industrial structure, which restricts the Chinese economy transforming from quantitative growth to quality growth.In this article, the development of the digital economy is consistent with the era background that China is insignificant changing of development model, economic structure, and growing power, which provides conditions for the optimization of industrial structure and digital upgrading (Manyika & Chui, 2011).This also encourages Chinese enterprises to pay more and more attention to digitalization, as shown in Figure 1.
Therefore, to explore whether digitalization has a positive impact on performance, researchers often associated digitalization with intangible asset expenditure.Among various intangible assets, technology, and digitalization play a crucial role (Bertani et al., 2021).Empirical research shows that R&D expenditure and ICT have a positive impact on corporate performance (Belvedere et al., 2013).Digitalization also seems to have a positive impact on enterprise performance, as pointed out by Martı´n-Pen˜a et al. (2019).Professionals are also paying more attention to digitalization and its positive influence on firm performance.For example, PwC (2016) has outlined that digitalization may be used as a managerial tool to facilitate the development of an organization by optimizing the business model and reducing risk levels.Their research further measures the extent to which the absence of digitalization can lead to the loss of competitive advantage and market share.In particular, some researchers adopt sentiment analysis innovatively in their research (Saura et al., 2022).With the same thinking, (Salvi et al., 2021) analyzed the impact of the information about digitalization provided directly or indirectly by companies through their website on firm value.However, in previous studies, there are some gaps that the digitalization measurement variable is too subjective and the effectiveness is not guaranteed.To deal with this problem, a new idea for the research of digitalization is proposed.
On the other hand, the implementation of digitalization may require significant investment, which will not directly lead to enterprise performance improvement (Yunis et al., 2018).For example, existing large enterprises, such as large banks with significant assets, may benefit from large-scale economy but require more investment to implement digitalization than small peers.In addition, they may be more complex and challenging to convert, resulting in higher conversion costs (Gresov et al., 1993).In addition, cognitive awareness may prevent incumbents from fully participating in digitalization reform.The existing organizational structures, routines and procedures have been fine-tuned to adapt to previous generation technologies, resulting in a lack of understanding of emerging technologies (Hanelt et al., 2021).In addition, regulators have paid more and more attention to regulators.For example, the government promotes digital supervision of the financial services industry to ensure financial security.Employers may hesitate to digitalization reform because they are worried about ''legitimacy'' (Zimmerman & Zeitz, 2002).
Therefore, the novelty of the present study is shown as follows.Based on nearly two million items of financial expenditure details, the digital hardware expenditures, software expenditures in intangible assets, system platform, and other software expenditures are extracted.By combining these information, an effective digitalization measurement variable is built and its effectiveness is also verified by text-analysis technologies.The research questions addressed in the present study are as follows: RQ1: How to build a more accurate digitalization measurement variable?RQ2: What is the impact of digitalization on overall industry performance?
To answer these two questions, in the present study, we aimed to achieve the following objectives: To create a more effective variable to measure digitalization, supported by data.To verify the effectiveness of digitalization variable, according to the statistics of keywords in the annual report.To explore the relationships between the digitalization measurement variable and corporate performance.
Methodologically, based on the current research situation, this article defines ''digitization'' as the massive, high-speed, and diverse data assets that enterprises collect, process, and utilize (Bukht & Heeks, 2017).The creative research on the digital economy is conducted through the following aspects.Firstly, nearly40,000 pieces of data on digital investment (software, computers hardware, platform systems, etc.) in the financial expenditure details of companies from 2009 to 2020 are extracted.And these data are used to measure the digitalization degree of companies.The keywords related to ''digitalization'' in the annual reports of Chinese listed companies are captured by using the method of text analysis to verify the effectiveness of digital asset variables.Compared to previous studies, the measurement of digital application in this article is more effective, supported by data.Therefore, it can intuitively describe the dynamic change process and influencing factors of the digital development level of Chinese listed companies and lay a solid data foundation for follow-up research.Secondly, it empirically tests the positive significance of digitalization on the profitability of companies in the overall Chinese industry.
Furthermore, the impact mechanism is also explored, which constructs a theoretical basis and provides an empirical basis for improving corporate competitiveness.More importantly, In this article, the analysis is based on the data of Chinese overall listed enterprises.Moreover, Chinese digital infrastructure construction, talent supply conditions, and other macro elements are also considered.Therefore, the analyzing results could provide practical suggestions and help Chinese high-quality economic development.

Digitalization Measurement
The effects of digitalization have been studied recently.In line with the objectives of this study, we should focus on the measurement of digitalization.Traditionally, the level of digitization is usually measured based on primary and secondary indicators, but this method is usually subjective.Martı´n-Pen˜a et al. (2019) used digital technology combination to measure digitalization and the digitalization level is measured as the combination of different projects.These projects cover the use of ICT (information and communications technology) and advanced manufacturing technology.However, there are many digital projects in the manufacturing industry, and this method is not reliable for the final digitalization measurement.da Costa et al. (2022) adopted the form of the questionnaire.Although the data collected by the questionnaire is convenient for data processing and analysis, the survey results are broad but not deep.It is too subjective to measure digitalization, and the measurement results have also deviated from reality.In Forcadell et al. (2020), the measurement of digitalization is based on the theory of digitalization on the premise that innovation in the financial industry leads to the improvement of productivity.This value will bring unpredictable one sidedness.Therefore, a comprehensive and systematic of digitalization measurement variable is necessary for our research.
To provide a better understanding of how to analyze digitalization in previous research.Table 1 summarizes previous studies on digitalization.All studies reviewed in Table 1 justified that there is no unified way to measure digitalization.A comprehensive and systematic measurement of digitalization measurement variable is necessary for our research.
As for how to effectively measure the level of digitization, we need to follow the definition of digitalization closely.Digitalization needs to be closely combined with the basic activities of enterprises such as production, operation, management, and sales.Therefore, in addition to digitalization itself, enterprises also need investment in other aspects, including the reconstruction of production and operation activities, the establishment of new business models, the adjustment of organizational structure, the upgrading of management experience, staff training of relevant technologies customized development of specific software, etc (Kohtama¨ki et al., 2019).Based on this, digital applications are more related to enterprises' intangible assets and long-term value.Digitalization is more realized through intangible assets, such as ''software'' and ''management system.''The increasing digital investment will increase the corresponding intangible assets, providing some possibilities for measuring digitalization.

Digitalization and Enterprise Performance
With the deepening of information technology, the relationship between the application of information technology and enterprise performance has attracted more and more attention in the academic community.The most representative is the ''IT productivity paradox'' (Carlaw & Oxley, 2008), and academia has formed two opposing views.
One believes that the ''IT productivity paradox'' objectively exists.The use of IT technology will have a weak impact on the performance of enterprises, or even have no positive role at all, but will increase the burden of enterprises.Through a survey of 30 enterprises, (Pan et al., 2020) found that the application of IT technology can only improve the performance of a few enterprises.Furthermore, this performance improvement in science and technology enterprises is more prominent, while in other types of enterprises the improvement is not significant.Even more than half of these enterprises experience negative profits due to investment in information technology.Moreover, the research on the world's top 500 This paper aimed to evaluate the digital maturity of MSEs, using the Brazilian case as a study model with a sample of more than 340 companies.Forcadell et al. (2020) This study aimed to fill this gap by analyzing the impact of the information about digitalization provided directly or indirectly by companies through their website on firm value.Ricci et al. (2020) It used the accounting data from the balance sheet and income statement.Martı ´n-Pen ˜a et al. (2019) Their contribution to firm performance was analyzed.hypothesis testing is conducted using data on 828 Spanish industrial firms.Foroudi et al. (2017) The data was gathered through 21 in-depth interviews with managers from different multinational organizations and 6 focus groups with employees.Chen et al. (2016) In this study, data were collected using field interviews and survey from senior executives of small and medium-sized enterprises (SMEs) in the Taiwanese textile industry.
enterprises from 2010 to 2014, Bahar and Foda (2019) found that these enterprises have always been in a leading position in IT technology, but their total assets and sales profit margin are not increased significantly within 5 years.On the contrary, their operation cost is increased a lot.Another view is that the ''IT productivity paradox'' does not exist.With the continuous deepening of information technology, the bigdata-industry is gradually formed, and the application of bigdata technology can improve the performance of enterprises.Li-ying (2015) showed that the investment in bigdata applications can improve the economic growth rate, labor productivity, and enterprise profit margin from the national, industrial, and enterprise levels.Rehman et al. (2020) analyzed the correlation between the application of enterprise human resources bigdata and overall performance, and pointed out that the enterprise human resources department can improve the enterprise efficiency and performance by managing personnel through bigdata technology.Awan et al. (2021) pointed out that digital decision-making can bring significant changes in transforming and supporting a circular economy.Rehman et al. (2020) pointed out that through the analysis of many enterprise samples, digital-driven decision-making is indeed related to higher productivity, and market value.There is some evidence that digital-driven decision-making is related to some profitability indicators (ROE, asset utilization).
To provide a better understanding of digitalization research.Table 2 summarizes previous studies on digitalization.All studies shown in Table 2 justified that the view that digitalization can improve enterprise performance is inconsistent.
In addition, for emerging markets and developed countries, the enterprise performance brought by digitalization is also different, and even the impact of digitalization on each region is also different.Tambe and Hitt (2012) showed that the IT return rate of medium-sized enterprises is much lower than that of Fortune Global 500 enterprises, and it is realized more slowly in large companies.In medium-sized companies, the short-term contribution of IT to output is similar to that of longterm output.Moreover, from 2000 to 2006, the marginal output of IT expenditure is higher than ever before.
The literature review shows a great deal of attention to the financial benefits associated with digitalization and the mechanisms underlying this relationship.In summary, digitalization may bring significant advantages, but it may also have potential negative effects.Therefore, the macro research on the impact of digitalization on enterprises is very important.

Data Sources
This article takes all Chinese listed companies from 2009 to 2020 as the initial research object and successively excludes the following samples: (1) ST and *ST companies; (2) Companies with main missing variables; (3) Software service listed companies.The software industry belongs to the information technology industry, and it directly relates to digitalization, so it may be affected differently from other industries.(4) There is no software impairment in intangible assets, so the details with negative intangible asset investment in the statistical data are removed.The article provided solid evidence that over the last two decades an increase of ICT investment by 10% translated into higher output growth of 0.5% to 0.6%.
Positive Koc and Bozdag (2009) This article showed that local area network, computer-aided design, and computeraided manufacturing technologies are the most commonly used and automated storage, robotics, and wide area network technologies are the least commonly used AMTs in SMEs.
Positive Boadi et al. (2022) The study revealed that there is a significant and negative relationship between IT investments and cost to income ratio used as a surrogate for banks' efficiency.
Negative Chen et al. (2016) This study examined the moderating effects of professional training on the relation between information technology (IT) investments and financial performance of audit firms in Taiwan.Empirical results indicate that professional training positively associates with productivity significantly but insignificantly with profitability.
Negative Stratopoulos and Dehning (2000) By comparing successful users of IT and less successful users of IT, it showed that successful users of IT have superior financial performance relative to less successful users of IT.However, any financial performance advantage is short-lived, possibly due to the ability of competitors to copy IT projects.

Neutral
Among them, (1) the basic information and annual report documents of listed companies come from Tushare and www.cninfo.com.(2) The annual report data is crawled by Python.Other data are from the CSMAR database.(3) In order to avoid the interference of extreme values, winsor smoothing of 1% and 99% quantiles is performed on all data.The final data results can be seen in the documents provided by Supporting material.

Measurement
Explanation Variables.A core variable of this article is the application degree of digitalization in the company.The previous literature used to measure the application degree of digitalization in enterprises is often limited to some application form or a particular industry.In order to accurately describe the application degree and widely cover the overall level of Chinese listed enterprises, the logarithm of the sum of the company's intangible assets and some fixed assets investment is taken as the index to measure the digital investment.The digital investment details include ''software,'' ''network,'' ''client,'' ''management system,'' ''intelligent platform,'' etc.The fixed assets investment includes ''computer server.''Moreover, in order to verify this variable (lnDigitalInvest) can accurately measure the company's application of bigdata, this article proposes a verification method based on (Andreou et al., 2020): grab the keywords related to ''data'' application in the annual report in batches through Python program based on the text information of the annual report disclosed by listed companies, Construct the variable (lnBigdata) according to the total number of occurrences of all keywords in the annual report, and use Stata to analyze the correlation, to prove the effectiveness of lnDigitalInvest in measuring digital investment.The specific steps are as follows: Firstly, the digital-related keywords in the annual reports of all Chinese listed companies from 2009 to 2020 are extracted and analyzed, as shown in Table 3.After adding one to the total number of occurrences of all keywords in the annual report, take the logarithm to get lnBigdata.Secondly, use Stata to analyze the correlation by control a series of variables (control variables used by the next OLS model).The results show a strong positive correlation between lnDigitalInvest with lnBigdata (p \ .01).Therefore, lnDigitalInvest can better reflect the company's actual digital investment.
Control Variables.The control variables selected in this article include: (1) The total assets of assets are measured by natural logarithm.(2) The fixed assets ratio (PPE_TA) is measured by dividing fixed assets by total assets.(3) The company age (lnAge) is the year of the current year minus the year of listing plus 1, and then take the natural logarithm.(4) The average education level of the board members in the company (lnEdu) in the current year is taken as a logarithm, 0 for junior college and below, and one for each higher education level of junior college and above.
Variable Descriptive Statistics.Table 4 reports the descriptive statistical results of the main variables.The screened data, it covers most of the listed companies.

Model Construction
The basic OLS measurement model is constructed as follows: Through the network ''cloud,'' the huge data calculation and processing program is decomposed into countless small programs and then processed and analyzed through the system composed of multiple servers.These small programs get the results and return them to users digitalization The uniform and continuous digital bits are structured and granulated to form standardized, open, nonlinear, and general data objects and the application of big data is realized based on different forms and categories of data objects.information assets It refers to the enterprise's information resources that can bring future economic benefits to the enterprise.
According to Gartner's report, big data is essentially an information asset.data warehouse Facilities for placing computer systems and related components for transmitting, accelerating, displaying, calculating, and storing data and information on the network infrastructure.Big data needs a safe, reliable, and efficient data center for storage, calculation, and exchange in the information age.
This variable represents the adaptation of the model in different enterprises, regions, and industries.In the model, Control ijp , i represent different enterprises, j represents different industries, and p represents different provinces.In the model, m i control the fixed characteristics of the company that do not change with individuals; d p is the province fixed effect, and g j is the fixed effect of the industry to control the influence of factors such as the regional economic environment and industry development of the potential company; h ijp represents the random error term, and the standard error is clustered at the enterprise level.

Baseline Regression Results
Firstly, by analyzing the panel data, we find the company size (lnAssets), company age (lnAge), average educational level of the company's board of directors (lnEdu), the proportion of fixed assets (PPE_TA) have a correlation with lnDigitalInvest, as shown in Table 5.The analysis process can be seen in Supporting material.
Based the results of Table 5, we found that at the level of 1%, listed companies with larger scale, lower proportion of tangible assets, higher educational level of the board of directors, and older are more likely to increase digital investment.
Secondly, using the model, lnAssets, lnAge, lnEdu, and PPE_TA are taken as the control variables, lnDigitalInvest as the explanatory variable, and roa as the explained variable.We focus on the coefficient b 1 and its significance.Its economic meaning is the impact of digitalization on the company's profitability.As shown in Table 6.
Since the estimation results of regression coefficient and standard error may be greatly affected by fixed effects, columns 1 to 4 investigate the influence of different fixed effects on the research conclusions to ensure the robustness of the results.It can be seen that all of the   coefficients are at the level of 1%, and the conclusion that the degree of digitalization improves the enterprise performance is very stable.

Robust Test
This article mainly studies the impact of digitalization on corporate profitability, but there may be endogenous problems in using the above OLS model to identify causality.In terms of the research problems of this article, endogeneity mainly comes from the following sources: The first 1 is reverse causality that companies with good performance and high valuation have sufficient cash flow, low external financing costs, And they are more able to pay a higher cost to invest in digitalization and build data platform, which leads to the overestimation of regression estimation coefficient.The second is the missing variables.There may be factors that are difficult to observe and related to the company's digital application and profitability.For example, the strength of national policies supporting the digital economy is different in some periods, resulting in the error of the regression estimation coefficient.If the company is faced with fewer growth opportunities or an expected negative impact on performance, it will also carry out digital transformation due to lower opportunity cost, resulting in false regression estimation coefficient.Generally, the direction of bias in OLS regression coefficient estimation is not clear in theory.This article alleviates the possible endogenous problems by designing tool variables.Science and engineering majors cultivate mathematical ability, programing ability, and engineering design thinking closely related to big data application.Therefore, science and engineering professionals play a vital role in big data applications.Previous industry experience and academic research have confirmed that the lack of technical talents in science and engineering is the bottleneck for the company to adopt big data or artificial intelligence technology (Pan et al., 2020).And the tool variables are designed based on this phenomenon.The structure of the instrumental variable (IV ) is as follows.According to the design of the instrumental variable, the closer the distance between listed companies and colleges (i.e., the smaller the distance) could make the company recruit high-ability science and engineering students easier.The more listed companies in the city where the office is located (i.e., the greater N ), the more likely the above strength will be weakened.Time is the virtual variable of time, and Num is the number of new science and engineering graduates in each place.Therefore, the instrumental variable theoretically meets the correlation requirements of instrumental variables.In addition, it is difficult for the instrumental variable to have an impact on the company's profitability through other channels.In theory, it meets the exclusive requirements.The calculation formula of IV is: Using Python program, the offspring of IV variables are evaluated and plugged into the multi-dimensional fixed-effect model.Then, the results in Table 7 are obtained through analyzing the two-stage least squares estimation.
Table 5 reports the results of instrumental variable regression and related tests.Since the change of instrument variables mainly depends on the company's location, columns 1 to 2 do not control the fixed effect of ''provinces.''Columns 3 to 4 control the overall fixed effect.These two results are consistent.Taking columns 3 to 4 as an example, column 3 reports the regression results of the first stage, that is, the core variable (lnDigitalInvest) is regressed to the instrumental variable, and it is found that the coefficient of the instrumental variable is significantly positive.This shows that after the policy, the listed companies closer to the pilot universities have a higher degree of digital application, which is consistent with the expectation.In the first stage, the Cragg-

Discussion
The research of this article is mainly to propose a measurement index which is closer to the actual degree of digitalization and to study the impact of digitalization on enterprise performance.In the previous studies, most of them are limited to one industry or one region.And correspondingly, the selection of digitalization variables has a small amount of and is somewhat subjective.Through this large amount of data analysis, we find that in China's overall industry, digitalization investment can improve enterprise performance.Although the methods used are different, our results are congruent with (Song et al., 2022) argument about digitalization can improve enterprise performance.Because while improving productivity, the performance of enterprises will also improve.
Not only in China but also in other countries, digitalization has a positive impact on enterprise performance.(Truant et al., 2021) highlight the still embryonic adoption of digital tools to support daily company operations.However, the impacts of digitalization on company performance are noticeable.
Furthermore, we found that the degree of digitalization of enterprises is positively related to digital awareness level of managers.The results are consistent with (Ribeiro-Navarrete et al., 2021) which indicated that updating social networks, using social networks for corporate purposes, having a high level of training in digital tools, and having older managers can enhance company performance.Of note, the longer the company is established, the larger the asset of listed companies, and the higher the degree of digitalization, which is inconsistent with ''IT productivity paradox'' (Carlaw & Oxley, 2008).The reason may be that China's existing industrial environment is different, which is a key issue to be studied later.
Methodologically, in recent studies, text analysis is usually used to try to verify opinions (Saura et al., 2022).However, based on the argument of this paper, extract keywords by text analysis and use it to measure digital standards is subjective, so it can only be used to verify the use of variables.How to use text analysis effectively will be the focus of future research.

Conclusions
In this article, we put data analysis in the most important position.With regard to RQ1(How to build a more accurate digitalization measurement variable?),we constructs digitalization measurement variable by analyzing relevant digitalization expenditures.Furthermore, applying textual analysis to verify this variable by combining with the annual report disclosed by China's listed companies, the keywords related to ''digitalization'' are grabbed, And the effectiveness of the measurement variable of the degree of digital application at the company level is verified.
In relation to RQ2(What is the impact of digitalization on overall industry performance?), we find that digital application improves the company's performance.Moreover, the phenomenon that different companies have different probabilities of applying data in production and operation is also revealed.Companies with large scale, low proportion of tangible assets, large years of establishment and high education of board members are more likely to increase digital investment.And this digitalization can significantly improve the profitability of the company.The instrumental variables are also constructed by using the radiation degree of the flow of highly educated talents in science and engineering in colleges and universities to listed companies every year.By these variables, the endogenous problems are alleviated, and a obtains a consistent conclusion can be obtained.

Theoretical Implications
The results of the present study have two-fold theoretical implications.First of all, in terms of methodology, considering the incompleteness of measuring digitalization variable in previous studies, this paper analyzes the details of financial revenue and expenditure in recent 10 years, and constructs a more reasonable digitalization variable based on the huge data analysis, which is more practical.In order to verify the standardization of this variable, we use the method of text analysis to extract digitalization related keywords, and use the frequency to measure the effectiveness of the variable.This measurement method can be used in subsequent studies.
Secondly, in theory, empirical research can help to analyze the digitization of emerging markets.So far, few empirical studies have investigated the impact of digitalization on the performance of Chinese enterprises as a whole.We use objective indicators to avoid bias.Our analysis helps solve the ''IT production paradox'' problem and provides data and theoretical support for digitalization to improve corporate performance to a certain extent.Furthermore, the digitalization variable laid out in our study provides a foundation for future research on the relationship between digitalization and corporate performance.

Practical Implications
This study has great practical significance for managers of emerging market listed companies looking to formulate effective management strategies.Because of the rapid development of digital technology, managers should accept the challenge and realize the digitalization of enterprises as soon as possible.First of all, since digitalization can quickly analyze market trends and lead managers to make correct decisions, we strongly recommend that managers in traditional industries take advantage of the benefits of digitalization.In addition, agile thinking plays a crucial role in digitalization.Therefore, it is necessary to recruit individuals with a ''compatible'' mentality on a large scale (Tronvoll et al., 2020).Furthermore, we recommend that managers equip their employees with digital skills by applying cutting-edge technologies and extending incumbents beyond their comfort zones.It is possible to generate a wealth of innovative ideas and stimulate their creativity (Eller et al., 2020).

Limitations and Future Research Avenues
The data of this study is relatively complete, but the data of some listed companies are seriously missing, which makes the results of the analysis of the sub-industry too unreasonable, so this study can only do an overall analysis.If there is a complete data source in the future, a detailed analysis of the sub-industry should be considered.
This study inevitably has some limitations, which provide opportunities for future research.All of our hypotheses are in China.China is an emerging economy, which is different from developed economies in many ways.Although these findings strongly support our hypothesis, comparative research in emerging economies and developed countries to improve universality may lead to future research.In addition, since 2009, some data used in the study have been unavailable and missing, which may lead to the deviation of research results in some industries.

Declaration of Conflicting Interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Figure 1 .
Figure 1.Digitalization trends from 2009 to 2020 in Chinese listed company.

Table 1 .
Relevant Previous Studies on Digitalization Measurement.

Table 2 .
Relevant Previous Studies on Digitalization.