When to Use Data from other projects for effort estimation

When to Use Data from Other Projects for Effort Estimation      引言相关知识实验方法实验结果讨论 2 引言引言 应用情况：当一个项目的内部数据不足以用来做工作量估算时，使用其他项目的引进数据。 数据来源：PROMISE repository [软件工程中的预测模型] 类似做法：软件缺陷检测 结果：表明假定在估算之前将相关过滤器应用到数据里，使用跨项目数据和使用项目内部的数据来做估算的估算精度差不多相关知识相关知识 cross project/data within project/data 类比估算(analogy-based estimation methods,ABE0)：通过把新项目和过去类似项目做比较来对新项目进行准确的工作量估算。相似度计算： n distance  ( x  y ) 2 i i i 1 相关知识 ABE0: 1. 利用过去的项目建立训练集 2. 训练集中包含自变量和因变量，其中自变量是值定义项目的特征，因变量是指工作量 3. 决定使用多少个相似项目（analogies）来进行一个新实例的估算，即k-values 4. 对于每个新的测试实例，从训练集中找出k个相似项目 ① 在选择相似项目的过程中，使用一种相似性度量 ② 在计算相似性之前，给每个自变量设定一个权重 5. 使用最相近的k个相似项目来估算工作量相关知识 Feature1 Feature2 …… FeatureN Effort value Project1 Project2 …… ProjectK independent variables(自变量): the features that define projects dependent variables(因变量) : the recorded effort value 相关知识 ABE0+Relevancy Filtering:  Step1 removes the training instances implicated in poor decisions;  Step 2 selects those instances nearest the test instance. 4/13/2015 相关知识 ABE0+Relevancy filtering: ABDEFGH AB A DE B D FGH E FG F H G 相关知识 ABE0+Relevancy filtering: The variance of the effort values in each sub-tree (the performance variance) is then recorded and normalized to a 0-1 interval.（将差异归一化） Step one prunes all sub-trees with a variance greater than 10% of the maximum variance seen in any tree.（剪枝） ABDEFGH AB A DE B D FGH E FG F H G 用方差作为决策准则（decision criterion） IF 当前树的差异>其子树的差异继续向下移动 ELSE 用当前树的实例作为相关实例，并用此子树做估算 FGH FG F 4/13/2015 H G 实验方法实验方法数据源： Nasa93、 Cocomo81、 Desharnais dataset（X） Nasa93 subset(Xi) instance Nasa93c1 Instance 1，……， instance k Nasa93c2 Nasa93c5 Coc81o Cocomo81 Coc81e Coc81s DesL1 Desharnais DesL2 DesL3 4/13/2015 实验方法判断条件1-Mann-Whitney test 判断条件2 win Significant different Lower MRE loss Significant different Higher MRE tie Not Significant different 曼-惠特尼U检验（Mann-Whitney test）：它假设两个样本分别来自除了总体均值以外完全相同的两个总体，目的是检验这两个总体的均值是否有显著的差别。实验方法 For within experiments Leave-one-out method：在Xi的n个实例中选一个实例作为测试集，其他n-1个实例作为训练集。相关过滤分别应用在X1, X2和X3里，将训练集里面的中值作为测试实例的的工作量估算值。 For the cross experiments 选择X1, X2和X3中的其中之一作为测试集（test set），剩下的两个作为训练集（ cross dataset）。将相关过滤（relevancy filtering）应用在训练集中，将对测试集的估算记录下来。 4/13/2015 实验方法 For within experiments： Xi ：测试集 For the cross experiments：：训练集 X3 X1 X2 ：测试集：训练集实验方法 •Without Relevancy Filtering 线性回归(linear regression)模型 -Within experiment -Cross experiment With Relevancy Filtering 重复预测20次 -Within experiment -Cross experiment 4/13/2015 实验方法实验结果实验结果 Without Relevancy Filtering In the absence of relevancy filtering , the within datasets yield significantly lower MRE values in majority of cases 实验结果 With Relevancy Filtering 由图可以看出,至少75%的实验两种方法(within和cross)表现相当实验结果 With Relevancy Filtering 由图可以看出,2/3的实验中两种方法(within和cross)表现相当；但是，用Coc81o做测试集时，within方法13次优于cross方法(原因不明) 实验结果 With Relevancy Filtering 由图可以看出,2/3的实验中两种方法(within和cross)表现相当；但是，用DesL3做测试集时，within方法16次优于cross方法(原因不明) 讨论讨论图中表示每次选择的相似项目（analogies）的数目很小：均值为3 讨论 It would also lead to (a)more accurate filtering techniques; (b)a better understanding of the structure of software projects including where to find data most relevant to some current project. 4/13/2015 THE END Thank You~ 4/13/2015

When to Use Data from other projects for effort estimation

Related documents

Products

Support

When to Use Data from other projects for effort estimation

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib