Fuzzy Multiple Regression Model for Estimating Software Development Time

As software becomes more complex and its scope dramatically increase, the importance of research on developing methods for estimating software development time has perpetually increased, so accurate estimation is the main goal of software managers for reducing risks of projects. The purpose of this article is to introduce a new Fuzzy Multiple Regression approach, which has the higher accurate than other methods for estimating. Furthermore, we compare Fuzzy Multiple Regression model with Fuzzy Logic model & Multiple Regression model based on their accuracy.


Introduction
Many study have already proposed models for size, effort, time and cost estimation.We just consider some of these studies: Regression analysis is a classical statistical technique for building estimation models.It is one of the most commonly used methods in econometric work.It is concerned with describing and evaluating the relationship between a dependent variable and one or more independent variables.The relationship is described as a model for estimating the dependent variable from independent variables.The model is built and evaluated through collecting sample data for these variables.This model was used first for estimating LOC of an information system (Kuan Tan et al. 2006).Boehm was the first researcher to look at software from an economic point of view.Putnam also developed model known as SLIM, but both of COCOMO and SLIM are based on linear regression techniques (Moataz et al. 2005).Algorithmic models such as COCOMO, have failed to present suitable solutions that take into consideration technological advancements, because they are often unable to capture the complex set of relationships (e.g. the effect of each variable in a model to the overall prediction made using the model), they are not flexible enough to adapt to a new environment, and they can't learn from their previous knowledge, also parametric models use a static predictive function for estimating (e.g.COCOMO use Effort = A• Size B for estimating Effort (Xia et al. 2005) (Moataz et al. 2005, MacDonell 2003].Orginally, estimation was performed using only human expertise, but more recently, attention has turned to a variety of combining methods.Here we apply fuzzy concepts to regression model and compare their results with each other.The primary motivation of fuzzy set theory is the desire to build a formal quantitative structure capable of capturing the imprecision of human knowledge, that is, the manner in which knowledge is expressed in natural language.This theory seeks to bridge the gap that separates traditional mathematical models needed for physical systems, and the mental representation, generally imprecise, of such systems (Lima et al. 1999).This paper is structured as follows.In section 2, multiple regression equation is considered and its result is shown in data subset, then in next section we apply fuzzy logic to the same data subset.In section 4 we introduce specific regression model with fuzzy concepts.In section 5, evaluation criteria are introduced for evaluating models, and in section 6, we apply these mentioned models to the same data subset for comparing them.Finally, conclusions are drawn in section 7.

Multiple Regression Equation
This model is the most common statistical technique for estimating.A linear equation with three independent variables (McCabe Complexity (MC), Dhama Coupling (DC), and physical Lines Of Code (LOC)) and a dependent one (Development Time (DT)) may be expressed as (Cuauhtemoc et al. 2005): Where b0, b1, b2 and b3 is obtained by solving follow equations: For simplify we used x1 as MC, x2 as DC, x3 as LOC and y as DT.By using data from The result of each module is organized in Table 4.

Estimating by Fuzzy Logic Rules
A fuzzy model like any other model provides mapping from input to output.For obtaining a fuzzy model first the verbal expert knowledge, based on the correlation(r) between pairs of their variables, is translated into if-then rules.Parameters of this structure, such as membership functions and weights of rules, can be tuned by using input and output data.Correlation is the degree that indicates two sets how much are related to each other, and is defined as follows (Cuauhtemoc et al. 2005): The membership functions corresponding to Table 2 are shown in Fig. 1(a), 1(b), 1(c), and 1(d).Consequently, by using fuzzy rules and their memberships, DT is depicted in Table 4.

Multiple Regression Model with Fuzzy Concepts
Fuzzy concepts help us to find the deviation of each data from fitness equation, so we define a normal distribution membership function as follow: Where μ is average of sample points and σ is square root of variance math.
If we add fuzzy domain to Regression method, the effect of discrete data points on the fitness result will be reduced and the effect of concentrated data points on the fitness result will be enhanced.
For each data in Table 3, we obtain the membership function that is shown in column 7. A group of equations to obtain the fuzzy parameters are given as (Gu et al. 2006): Here, Where y is each Development Time (DT) of mentioned projects, and here we have 41 projects for considering.Then b0 is calculated by: By solving these equations, final equation is expressed as: The result of this method is presented in Table 4.

Evaluating Techniques
A common criterion, which is calculated for each observation, is MRE and it is defined as follows: With aggregation of MRE on all data, MMRE (Mean Magnitude of Relative Error) is achieved as follows (Burgess & Lefley, 2001): A complementary criterion that is used here is Pred(20).

Experimental Results
Multiple Regression, fuzzy rules system and fuzzy multiple regression are applied to the same data subset.

Conclusions and Future Research
The goal of this paper is to investigate the models for estimating software project.These techniques have been compared in terms of accuracy.Research demonstrates that fuzzy multiple regression models are better than linear regression equations and fuzzy models.An ongoing research is related to apply neural network models using Bayesian Regularization training algorithm to data subset, because is more stable than fuzzy models that have membership functions whose derivatives have discontinuities at some points.
Fig. 1.Membership functions for input & output In general, Pred(l)=k/N where k is the number of observations where MRE is less than or equal to l (Cuauhtemoc et al. 2006), So Pred(20) gives the Table 3. Modules description and metrics, MC (McCabe Complexity), DC (Dhama Coupling), LOC (Lines of Code), DT (Development Time (minutes) ) percentage of projects which were predicated with a MRE less or equal than 0.20.In general, the accuracy of an estimation technique is proportional to the Pred(20) and inversely proportional to the MMRE (Xia et al. 2005).
Fig. 2. comparison between estimation models

Table 1 .
Correlation between variables Correlation between development time as dependent variable and McCabe complexity, Dhama Coupling and lines of code as independent variables are organized in Table 1 (Cuauhtemoc et al. 2005):

Table 2 .
Membership Function Parameters So fuzzy rules were formulated as bellow: 1.If Complexity is low and Size(LOC) is small then DT is low 2. If Complexity is average and Size(LOC) is medium then DT is average 3.If Complexity is high and Size(LOC) is big then DT is high 4.If Coupling is low then DT is low 5.If Coupling is average then DT is average 6.If Coupling is high then DT is high