Reconstructing a network of stress-response regulators via dynamic system modeling of gene regulation.

Unicellular organisms such as yeasts have evolved mechanisms to respond to environmental stresses by rapidly reorganizing the gene expression program. Although many stress-response genes in yeast have been discovered by DNA microarrays, the stress-response transcription factors (TFs) that regulate these stress-response genes remain to be investigated. In this study, we use a dynamic system model of gene regulation to describe the mechanism of how TFs may control a gene’s expression. Then, based on the dynamic system model, we develop the Stress Regulator Identification Algorithm (SRIA) to identify stress-response TFs for six kinds of stresses. We identified some general stress-response TFs that respond to various stresses and some specific stress-response TFs that respond to one specific stress. The biological significance of our findings is validated by the literature. We found that a small number of TFs is probably sufficient to control a wide variety of expression patterns in yeast under different stresses. Two implications can be inferred from this observation. First, the response mechanisms to different stresses may have a bow-tie structure. Second, there may be regulatory cross-talks among different stress responses. In conclusion, this study proposes a network of stress-response regulators and the details of their actions.


Introduction
Single-celled organisms such as yeasts constantly face variable or even harsh external environments that threaten their survival or, at least, prevent them from performing optimally. Environmental changes can be of a physical or chemical nature such as temperature, oxidation, osmolarity, acidity and nutrient availability (Hohmann and Mager, 2003). The response mechanisms to stress are highly complex. One aspect of this cellular adaptation is the reorganization of gene expression. The gene expression program required for maintenance of the optimal cell physiology in one environment may be far from optimal in another environment. Thus, when the environmental condition changes abruptly, the cell must rapidly adjust its gene expression program to adapt to the new conditions (Gasch et al. 2000).
Large-scale reprogramming of gene expression can be revealed from genome-wide DNA microarrays, which measure the relative transcription levels of every gene in the yeast genome at any given moment, providing a snapshot of the genomic expression program (Gasch and Werner-Washburne, 2002). Exploring the dynamic nature of the yeast genome through time-course experiments can illuminate yeast stress responses. For example, Gasch et al. (2000) and Causton et al. (2001) used genome-wide expression analysis to explore how gene expression in yeast is remodeled over time as cells respond to heat shock, oxidative shock, osmotic shock, acidic stress, nitrogen depletion, amino acid starvation as well as other environmental stresses. They discovered that more than half of the genome is involved in responding to at least one of the investigated environmental changes. A set of genes (~10% of yeast genes), termed as the environmental stress response (ESR) genes or common environmental response (CER) genes, showed a similar drastic response to almost all of these environmental changes. Other gene expression responses appeared to be specifi c to particular environmental conditions. However, the regulators of these stressresponse genes are not revealed from their studies. The complete network of stress-response regulators and the details of their actions remain to be investigated (Gasch et al. 2000).
Computational and statistical methods have been developed to identify plausible regulators of many cellular processes in yeast (Pilpel et al. 2001;Bar-Joseph et al. 2003;Segal et al. 2003;Middendorf et al. 2004;Yu and Li, 2005;Kim et al. 2006;Wu et al. 2006;Lin et al. 2007;Rokhlenko et al. 2007;Wu et al. 2007a;Wu et al. 2007b). When the timecourse data of a system are available, using dynamic system modeling is more appropriate than using statistical approach because it can model the dynamic behavior of the system (Tegner et al. 2003;Chen et al. 2004;Chen et al. 2005;Chang et al. 2005;Chang et al. 2006). Therefore, in this paper we use a dynamic system model of gene regulation to describe the mechanism of how TFs may control a gene's expression. Then, based on the dynamic system model, we develop the Stress Regulator Identifi cation Algorithm (SRIA) to identify stressresponse TFs for six different kinds of stresses. Our goal is to reconstruct a network of stress-response regulators and study the details of their actions.

Data sets and data preprocessing
Two kinds of data sets are used. First, for each gene in the yeast genome, the TFs that may regulate its expression are retrieved from the YEASTRACT database (Teixeira et al. 2006b). The regulatory associations between the target gene and the TFs are included if they are supported by published data showing at least one of the following experimental evidences: i) Change in the expression of the target gene due to a deletion (or mutation) in the gene encoding transcription factor; these evidences may come from detailed gene by gene analysis or genome-wide expression analysis; ii) Binding of the transcription factor to the promoter region of the target gene, as supported by band-shift, foot-printing or Chromatine ImmunoPrecipitation (ChIP) assays. However, the TF-gene regulatory association data are noisy. The genes whose expressions are affected by the mutation of a TF may not be the direct targets of that TF. Moreover, the genes that are bound by a TF identifi ed by ChIP-chip experiments may not be regulated by that TF since TF binding does not necessarily mean regulation. Therefore, other independent data source such as gene expression or proteomic data should be used to fi lter out the noise inherent in the TF-gene regulatory association data.
In this study, we incorporate the gene expression data into our analysis. The genome-wide gene expression time profiles under various stress conditions such as heat shock, oxidative shock, osmotic shock, acidic stress, nitrogen depletion, and amino acid starvation are from Gasch et al. (2000) and Causton et al. (2001). Under each stress condition, samples for all genes in the yeast genome are collected at multiple time points. In order to reconstruct the missing values and avoid overfi tting, the cubic spline method (Faires and Burden, 1998) is applied to interpolate some extra data points. (The missing values of the microarray data are those data points whose values are questionable due to the experimental defects. Therefore, these data points are not reported in the microarray data and regarded as the missing values.) In order to fi t the dynamic system model in the linear scale, the microarray data are transformed from the log 2 scale to the linear scale.

Dynamic system model of gene regulation
We consider the transcriptional regulatory mechanism of a target gene as a system with the regulatory profi les of several TFs as the inputs and the gene expression profi le of the target gene as the output. Owing to random noise and fl uctuations at the molecular level, the transcriptional regulation of a target gene is described by the following stochastic dynamic equation It has been shown that TF binding usually affects gene expression in a nonlinear fashion: below some level it has no effect, while above a certain level the effect may become saturated. This type of binding behavior can be modeled using a sigmoid function (Chen et al. 2004;Chen et al. 2005;Chang et al. 2005;Chang et al. 2006;Wu et al. 2006;Wu et al. 2007a). Therefore, we defi ne x i [t] (the regulatory profi le of TFi at time point t) as a sigmoid function of z i [t] (the gene expression profi le of TFi at time point t): where r denotes the transition rate of the sigmoid function and A i denotes the mean of the gene expression profi le of TFi.
Estimating the parameters of the dynamic system model Using the TF-gene regulatory association data from the YEASTRACT database and gene expression data from Gasch et al. (2000) and Caustion et al. (2001), we can estimate the parameters of the dynamic system model in Equation (1). We rewrite Equation (1) to the following regression form: For simplicity, we can further define the notations Y, Φ and e to represent Equation (4) as follows The parameter vector θ can be estimated by the maximum likelihood (ML) method as follows (Johansson, 1993)

Stress Regulator Identifi cation Algorithm (SRIA)
After constructing the dynamic system model of gene regulation as in Equation (1) and estimate the parameter vector as in Equation (6), we are now ready to identify the stress-response TFs for six kinds of stresses. We develop Stress Regulator Identifi cation Algorithm (SRIA) to do this task. SRIA can be divided into the following three steps: Step 1: Identifi cation of stress-response genes The fi rst step is to fi nd out all the genes in the yeast genome that respond to a specifi c stress (e.g. heat shock, oxidative shock, osmotic shock, acidic stress, nitrogen depletion and amino acid starvation). A gene is said to respond to a specifi c stress if more than two time points of its gene expression profi le measured under that specifi c stress are induced or repressed by more than three folds compared to that of the unstressed condition (see Supplementary Table 1 for details).
Step 2: Identifi cation of stress-response TFs For each stress-response gene identifi ed in Step 1, we retrieve all TFs that may regulate its expression from the TF-gene regulatory association data. Knowing this information enables us to construct the dynamic system model of the transcriptional regulation of the stress-response gene as in Equation (1). Then we apply the ML method to estimate the parameter vector θ = [b 1 ... b N k λ] T of the model and get the ML estimate as in Equation (6). Since b i stands for the regulatory ability of TFi, a small absolute value of b i means that TFi only has a negligible effect on the stress-response gene's expression. Therefore, this TF-gene regulatory association may be a false positive and should be eliminated from our analysis. On the other hand, a large absolute value of b i means that TFi has a large regulatory effect on the stress-response gene's expression. We regard TFi to be a stressresponse TF if its regulatory ability b i is statistically signifi cantly different from zero (i.e. b i 0 ). The test statistic ˆ/ ( ) is an unbiased estimator of σ (the standard deviation of the stochastic noise ε[t]) (Mendenhall and Sincich, 1995). The p-value computed by the t-distribution is then adjusted by Bonferroni correction to represent the true α level in the multiple hypothesis testing (Mendenhall and Sincich, 1995). Then, TFi is said to be involved in response to a specifi c stress if the adjusted p-value p adjusted Յ 0.01 (see Supplementary  Table 1 for details).
Step 3: The stress-response TFs identifi ed in Step 2 are ranked according to the number of times that they are identifi ed under different stress-response genes. The fi rst TF in the list is the one that is identifi ed with the largest number of times. The TFs that are at the top 5% of the ranked list are classifi ed as the highconfi dence stress-response TFs. Finally, we output a ranked list of the high-confi dence stress-response TFs for each of the six different kinds of stresses. The fl owchart of SRIA could be seen in Figure 1.

Stress-response TFs
We identifi ed the TFs that respond to heat shock, oxidative shock, osmotic shock, acidic stress, nitrogen depletion, and amino acid starvation, respectively. Table 1 shows the high-confi dence stress-response TFs for each of the above six stresses. The identifi ed stress-response TFs can be divided into two categories. The fi rst category is the well-known stress-response TFs with solid literature evidence that directly indicates involvement of these TFs in response to that specifi c stress. The second category is the novel stress-response TFs that have only partial or no literature support. (The partial literature support means that these TFs are predicted by pervious studies as plausible stress-response TFs but still need further verifi cation.) We found that 38% (13/34 counting multiplicity) of the identifi ed stress-response TFs belongs to the fi rst category, indicating the effectiveness of SRIA. In addition, 52% (11/21 counting multiplicity) of the second category has partial literature support, revealing the predictive power of SRIA. Therefore, the novel stress-response TFs (Arr1, Ifh1, Rpn4 and Sok2) that have no literature evidence yet are worthy of experimental verifi cation.

Biological validation of predictions
Now we discuss in detail our predictions that are supported by experimental evidence in the literature. The high-confi dence TFs in response to heat shock, oxidative shock, osmotic shock, acidic stress, nitrogen depletion and amino acid starvation are shown. The TFs are in red (blue) if there exist solid (partial) experimental evidence showing that they are involved in the same stress as we predicted.

Heat
Oxidative Osmotic  Acidic  Nitrogen  Amino acid  shock  shock  shock  stress  depletion  starvation  TFs  TFs  TFs  TFs  TFs  TFs  Arr1 Sfp1 Heat shock The predicted heat shock TFs Msn2, Msn4, Rpn4, Yap1 and Cad1 have solid or partial literature evidence. First, Msn2 and Msn4 bind DNA at stress response element (STRE) and activate many STRE-regulated genes in response to many stresses such as heat shock, oxidative shock and osmotic shock (Cherry et al. 1998). Second, it has been demonstrated that the heat shock TF Hsf1 co-ordinates a feed-forward gene regulatory circuit for RPN4 heat induction (Hahn et al. 2006). Third, Yap1 is known to induce the expression of GSH1 and GSH2 to synthesize glutathione in heat shock response (Suqiyama et al. 2000). Fourth, overexpression of Cad1 is known to increase thermo-tolerance of a cell under starvation conditions (Cherry et al. 1998).

Oxidative shock
The predicted oxidative shock TFs Sfp1, Yap1, Rpn4 and Rap1 have solid or partial literature evidence. First, Sfp1 is known to regulate ribosomal protein (RP) gene expression in response to various stresses such as oxidative shock and osmotic shock (Marion et al. 2004). Second, Yap1 is known to regulate genes that respond to oxidative shock. For example, Yap1 regulates TRX2, a cytoplasmic thioredoxin isoenzyme of the thioredoxin system which protects cells against oxidative stress (Güldener et al. 2005). Third, RPN4 promoter contains a Yap1 binding site (YRE) which is responsible for RPN4 induction in response to oxidative stress. (Hahn et al. 2006). Fourth, it is known that Rap1 signaling is required for suppression of Ras-generated reactive oxygen species and protection against oxidative stress (Remans et al. 2004).

Osmotic shock
The predicted osmotic shock TFs Msn2, Msn4, Sfp1 and Yap1 have solid or partial literature evidence. First, the msn2-msn4 double deletion mutants exhibit higher sensitivity to severe osmotic stress, indicating Msn2 and Msn4 are involved in response to osmotic stress (Cherry et al. 1998). Second, Sfp1 regulates RP gene expression in response to various stresses such as oxidative shock and osmotic shock (Marion et al. 2004). Third, the YAP4 gene, previously shown to play a role in response to hyperosmotic stress, is regulated by the transactivators Yap1 and Msn2 (Nevitt et al. 2004).

Acidic stress
The predicted acidic stress TFs Msn2, Msn4 and Rpn4 have solid or partial literature evidence. First, it is known that RGD1 is activated at low pH. The transcription level at low pH was demonstrated to depend on the STRE box located in the RGD1 promoter. The general stress-activated TFs Msn2 and Msn4 were shown to act mainly on the basal RGD1 transcriptional level in normal and stress conditions (Gatti et al. 2005). Second, under acute herbicide 2,4-dichlorophenoxyacetic acid (2,4-D) stress, 14% of the yeast transcripts suffered a greater than twofold change. Most of the up-regulated genes in response to 2,4-D are known targets of Msn2, Msn4 and Rpn4 (Teixeira et al. 2006a).

Nitrogen depletion
The predicted nitrogen depletion TFs Sfp1, Ifh1, Fhl1, Rpn4 and Rap1 have partial literature evidence. First, ribosomal protein (RP) genes in eukaryotes are coordinately regulated in response to growth stimuli and environmental stress, thereby permitting cells to adjust ribosome number and overall protein synthetic capacity to physiological conditions. Sfp1, Fhl1 and Ifh1 are known to regulate RP gene expression in response to nutrient depletion (Cherry et al. 1998;Marion et al. 2004;Güldener et al. 2005). Second, on solid growth media with limiting nitrogen source, diploid budding-yeast cells differentiate from the yeast form to a fi lamentous, adhesive and invasive form. Both low availability of nitrogen and a solid growth substrate are required to induce diploid fi lamentous-form growth. It is known that Rpn4 regulates fi lamentous growth, indicating that Rpn4 is involved in response to nitrogen depletion (Prinz et al. 2004). Third, Rap1 is known to be involved in the regulation of nitrogen, sulfur and selenium metabolism (Güldener et al. 2005).

Amino acid starvation
The predicted amino acid starvation TFs Gcn4, Rap1 and Sfp1 have solid or partial literature evidence. First, Gcn4 is a well-known transcriptional activator of amino acid biosynthetic genes in response to amino acid starvation (Cherry et al. 1998). Second, HIS4 encodes an enzyme that catalyzes the histidine biosynthesis and Rap1 is required for the rapid increase in the HIS4 mRNA level following amino acid starvation (Devlin et al. 1991). Third, ribosomal protein (RP) genes in eukaryotes are coordinately regulated in response to growth stimuli and environmental stress, thereby permitting cells to adjust ribosome number and overall protein synthetic capacity to physiological conditions. Sfp1 is known to regulate RP gene expression in response to nutrient depletion (Cherry et al. 1998;Marion et al. 2004).

Discussions and Conclusions
In this study, we use a dynamic system model of gene regulation to describe the mechanism of how the stress-response TFs may control a stressresponse gene's expression. Based on the dynamic system model, we develop the Stress Regulator Identifi cation Algorithm (SRIA) to identify the stress-response TFs for each of six kinds of stresses. Some general stress-response TFs that respond to various stresses and some specific stress-response TFs that respond to a specifi c stress are identifi ed. For example, we found the wellknown general stress-response TFs Msn2, Msn4, Yap1, Rpn4 and Sfp1, consistent with the results of Saccharomyces Genome Database (SGD) (Cherry et al. 1998). Besides, the well-known heat The cellular responses to heat shock, osmotic shock and acidic stress may have regulatory cross-talks. They all trigger TFs Arr1, Msn2, Msn4 and Rpn4. Moreover, the cellular responses to oxidative shock, nitrogen depletion and amino acid starvation may have regulatory cross-talks. They all trigger TFs Arr1, Rap1, Rpn4 and Sfp1. shock TF Cad1, nitrogen depletion TF Fhl1 and amino acid starvation TF Gcn4 are identifi ed. The ability to identify these well-known stress-response TFs validates the effectiveness of SRIA. SRIA identifi ed 12 distinct TFs (Arr1, Cad1, Fhl1, Gcn4, Ifh1, Msn2, Msn4, Rap1, Rpn4, Sfp1, Sok2 and Yap1) to be in response to at least one of the six stresses under study (see Figure 2). This indicates that a small number of TFs may be suffi cient to control a wide variety of expression patterns in yeast under different stresses. Two implications can be inferred from this observation. First, the response mechanisms to different stresses may have a bow-tie structure (Csete and Doyle, 2004). As shown in Figure 2, the core stress-response TFs make up the 'knots' of a bow tie, facilitating the fan in of a large variety of environmental stresses through signal transduction pathways and fan out of an even larger variety of stress-response proteins through activating stressresponse target genes. Actually, approximately two- The true positive and false negative rates of SRIA using different cutoff thresholds are shown. When the cutoff threshold equals 5%, SRIA has the best performance in terms of the tradeoff between maximizing the true positive rate and minimizing the false negative rate to fi nd out the known amino acid starvation TFs. (The true positives are those known amino acid starvation TFs that are found by SRIA and the false negatives are those known amino acid starvation TFs that are not found by SRIA). thirds of the yeast genome (about 3600 genes) is involved in responding to the changes in environment (Causton et al. 2001). Second, there may exist regulatory cross-talks among different stress responses (see Figure 3). We found that heat shock, osmotic shock and acidic stress all can trigger TFs Arr1, Msn2, Msn4 and Rpn4, indicating that these three stresses may share a similar stress response mechanism. In addition, we found that oxidative shock, nitrogen depletion and amino acid starvation all can trigger TFs Arr1, Rap1, Rpn4 and Sfp1, indicating crosstalks among the cellular responses to these three stresses. The fact that different stress response mechanisms share some, but not all, of their regulators suggests a higher level of modularity of the yeast stress response network (Segal et al. 2003). In Step 3 of SRIA, only those stress-response TFs that are at the top 5% of the ranked list are classifi ed as the high-confi dence stress-response TFs and reported as the fi nal result. The reason for choosing only the top 5% of the ranked list is that when the cutoff threshold equals 5%, SRIA has the best performance in terms of the tradeoff between maximizing the true positive rate and minimizing the false negative rate to fi nd out the known amino acid starvation TFs (see Figure 4 for details).
The response mechanisms to stress are highly complex. They require a complex network of sensing and signal transduction leading to the adaptation of cell growth and proliferation along with the adjustments of the gene expression program, metabolic activities and other features of the cell (Hohmann and Mager, 2003). This study focused on the regulation of gene expression and proposed a network of stress-response regulators and their details of actions. Thus, it provides a starting point for understanding the adaptation mechanisms that yeast uses to survive some of the environmental stress conditions, experienced in the wild. We believe that as more gene expression data are accumulated, in combination with data from other whole-organism approaches, novel computational algorithms such as SRIA have the potential to construct a dynamic picture of the integrated cellular response of yeast cells to environmental changes.