Data Mining for Smarter Project Management: a SEKE 2003 tutorial

Tutorial I, Monday, 1:30pm to 5pm, June 30, 2003

Tim Menzies

Software Engineering Research Chair

NASA Independent Software Verification and Validation

Facility and Department of Computer Science & EE

West Virginia University;


Gary D. Boetticher

Department of Software Engineering

University of Houston-Clear Lake

2700 Bay Area Boulevard

Houston, TX 77058,

The bad news is that despite decades of research and practical experience:

         most software is not a success,

         most software takes too long to build,

         most software goes over budget, and

         many software projects never even complete.

This isnít good enough and we must do better. This tutorial explains how to use data mining to assist in the construction of systems that support classification, prediction, diagnosis, planning, monitoring, requirements engineering, validation, and maintenance. An ensemble of case studies will encompass software fault estimation, software effort estimation, software risk reduction, and other aspects of software engineering.

Two important aspects of this tutorial will be live demos and use of freely available data mining tools. Rather than just tell the audience how data mining applies to software engineering, we prefer to show several live data mining demonstrations using industrial-based data to illustrate the construction of effort estimator, reuse assessment, and defect prediction. Futhermore, all demonstrations use simple, powerful, and publicly available data mining tools available on the Web.

Who should attend: This tutorial is industrial practitioner-oriented. The material is suitable for the software engineer-novice, graduate student, or the technical manager of software engineering projects. It is also appropriate for software engineering theoreticians interested in assessing data mining techniques within software engineering.

Exit skills: We expect participants to leave with the following skills and knowledge:

         assess their respective organizational for opportunities applying data mining tool to their organization.

         explain some of the algorithmic underpinnings of all data mining tools; know which tools are suitable to which types of software engineering problems.

         secure freely available data mining tools from the Web.

         realize how quickly an automated learner may be constructed by applying data mining tools.

         use one or more case studies as mileposts in applying data mining tools

Required Background: This tutorial assumes no background knowledge in data mining.

The presenters:

         Dr. Menzies has a long background in practical applications of artificial intelligence. He was the author of Australia's first ever exported expert system (1987). Since 1998, Dr. Menzies has been consulting with NASA on applying machine learning techniques to software engineering problems. His publication count includes 88 articles. In his current research Dr. Menzies uses machine learning to find the average case behavior of inferences within the space of uncertainties representing the space of options within procedural and declarative software. Dr. Menzies holds a Ph.D. in artificial intelligence (1995), masters of cognitive science (1988), and a computer science undergraduate degree, all from the University of New South Wales, Sydney, Australia.


         Professionally, Dr. Boetticher has 19 years consulting experience (The U.S. Olympic Committee, NASA, LDDS WorldCOM, Bailey Network Management, Mellon Mortgage) in building and applying machine learners to solve software engineering problems (reuse, effort estimation). In the late nineties Dr. Boetticher served on the executive committee for an IEEE Software Engineering Standards Committee (Reuse Interoperability Group) to establish industry reuse standards. His company co-hosted the inaugural meeting of the IEEE committee to define reuse process standards, reuse domain analysis standards, and to support SPICE stan≠dards in Europe. Academically, Dr Boetticher has 12 years academic experience. He is currently a faculty member at University of Houston-Clear Lake. Dr. Boetticher holds a Ph.D. in Computer Science (machine learning, software metrics, and software reuse) from WVU.

For more information on the authors see : and