<SDML>

Current Research Projects

Source Code Representations: srcML

XML is used to augment source code with syntactic information from the parse tree as a means to add explicit structure to program source code. This underlying representation, srcML (Source Code Markup Language), supports light-weight fact extraction, source code transformation, and source code difference analysis. The src2srcml translator works for C/C++/C# and Java. It is very robust and efficient, running faster than compilers on the same input. The translator is used by a large number of researchers and a commercial version is in use to support software porting. srcML forms the basis for an efficient syntactic differencing method. This is being used to analyze differences between software versions.

srcML

Software Traceability

A framework, Traceability+, is being developed to support a variety of services that support a diverse set of stakeholder needs. A model to support artifact to artifact traceability is used to support different types of links. Maintenance of existing links is supported as well as the recovering and automatic identification of links from artifacts is done. A traceability query language (TQL) is being developed that leverages XML technologies. TQL includes a number of primitive functions that support higher order queries that support different stakeholder needs.

Automatic Identification of Stereotypes

Method sterotypes are being automatically identified in C++ source code. srcML is used to perform pattern matching and light weight static analysis to classify methods with a particular stereotype. The stereotype classification was develop through empirical analysis of existing open source software.

Source Code Analyses for Evolving C++ Generic Libraries

C++0x concepts are a new language feature that extends C++ templates to better support for the generic programming paradigm. The evolution of C++ generic libraries to C++0x requires the analysis of existing source code and the application of concept constraints to template parameters. Reverse engineering and source code analyses are applied to class and function templates to automatically identify constraints on template parameters.

Reverse Engineering UML Class Models

Design-level UML class diagrams are reverse engineered for C++ and include associations, using, and generalization relationships. Source code analyses are applied to help infer information about class attributes from sets of accessor and mutator functions.

UML Class Diagram Layout Adjustment Techniques and their Empirical Assessment

The research investigate the effectiveness of UML class diagram layout techniques. Specifically, the architectural importance of classes is used to guide layout techniques. Architectural importance in a UML class diagram is defined by control, boundary, and entity class stereotypes. Stereotypes are used in class diagram layouts to aid in understanding the roles and responsibilities of classes, helping users augment their mental models of software, thereby supporting software evolution. The direction taken in this research is to first empirically assess different layout strategies via empirical studies done using conventional methods such as questionnaires as well as using eye trackers. These empirically validated layout strategies will then be built into a tool that can be used by software maintainers to automatically convert a class diagram into the new layout. Besides this, eye tracking data from our empirical studies will be used to provide objective metrics to measure the effectiveness of UML class diagram layouts.

Information Retrieval for Software Engineering

The research focuses on investigating the combined use of semantic and structural information of programs to support the comprehension tasks involved in the maintenance and reengineering of software systems. Here, semantic refers to the domain specific issues (both problem and development domains) of a software system. The other dimension, structural, refers to issues such as the actual syntactic structure of the program along with the control and data flow that it represents. An advanced information retrieval (IR) method, latent semantic indexing (LSI), is used to define a semantic similarity measure between software components.

semD Analysis

Mining Software Repositories

Data mining techniques are being utilized to analyze multiple versions of a software's history to uncover latent trends and evolutionary couplings. The techniques have been applied to the problems of defect identification, software localization, and traceability link recovery.

Software Visualization

A number of projects address the visualization of large scale software systems to support their understanding and analysis. For example, Source Viewer 3D (sv3D) is a software visualization framework that builds on the SeeSoft metaphor. It brings a number of enhancements and extensions over SeeSoft-type representations. In particular it creates 3D renderings of the raw data and various artifacts of the software system and their attributes can be mapped to the 3D metaphors at different abstraction levels. Other work uses focus+context metaphors (i.e., onion graphs) to view UML class diagrams. Another project is MosaiCode, which supports multiple coordinated views of large software systems. It leverages the software visualization metaphor SeeSoft to directly map from the visual metaphor back to the source code (or data). This leads to natural navigation of the representation. Color and pixel maps are used to represent physical software concepts such as lines of code, functions, files, and subsystems. The metaphor is 2D and is fairly scalable.

Previous Work

Topics investigated in the past but no longer of prime interest.

Data Cleansing