1 year fellowship for a doctoral researcher on program analysis

Wim Vanhoof wva at info.fundp.ac.be
Do Jul 31 19:39:50 CEST 2008


(apologies for multiple postings)

---------------------------------------------------------------------

The University of Namur, Belgium, is seeking a young researcher
(no PhD) for a 1 year research project entitled

    "Automatic program analysis for refactoring logic programs".

A short description follows. Interested candidates should send their CV 
before September 15, 2008 to Wim Vanhoof (wva at info.fundp.ac.be).


---------------------------------------------------------------------


Program refactoring [3] is the process of systematically changing the 
structure of a
program without changing its semantics. The goal of refactoring is to 
improve the design
of the code after it has been written, in order to facilitate 
maintenance (including further
development) of the software. Emerged from the OO and XP communities, 
program
refactoring has recently gained attention in the fields of functional 
[8] and logic
programming [11]. Within the software engineering community, the process of
refactoring is considered important and has been identified as central 
to software
development and maintenance [3].

At the basis of the refactoring process is a catalogue of available 
source-to-source
transformations - the so-called refactorings. For each refactoring, a 
set of conditions is
specified under which the transformation is correct in the sense that it 
preserves the
semantics of the program. The activity of refactoring consists then in 
repeatedly searching
through the source code, looking for a code fragment of which the design 
could be
improved by a particular refactoring from the catalogue. The refactoring 
is subsequently
applied, and the whole process is repeated. Although each transformation 
can have an
impact (positive or negative) on the performance of the program, the 
primary aim of each
transformation is to improve the readability and maintainability of the 
code.
Even if refactoring is basically a manual process that is performed by 
the programmer, the
need for automation is recognized due to the time-consuming and 
error-prone nature of
the refactoring activity [3, 8, 10]. Although tools do exist that aid 
the developer with
performing a particular refactoring on a selected fragment of source 
code - an example
being the Refactoring Browser that was developed for Smalltalk [10] - 
identifying where
to perform (a particular) refactoring in the code remains essentially a 
non-trivial, creative
and therefore manual process.

Although a substantial number of different refactorings has been 
identified in the
literature, the primary indication for where to perform refactoring 
seems to be the
presence of duplicated code [3]. The notion of duplicated code is not 
limited to literally
copied code, but refers to fragments of code that have the same 
input/output behaviour.
The goal of refactoring is then to transform the source code in such a 
way that the
duplication is removed. This generally requires to extract the 
duplicated input/output
behaviour into a new subroutine (be it a method, function or predicate), 
which may
require a generalisation of the concerned code fragments and a possible 
reorganisation of
the code as a whole [3, 11].

Code duplication can be caused by a number of reasons. First of all, 
unfamiliarity of the
developer with the existing code body may result in reimplementation of 
routines that
already exist in the application under development. Second, the “copy 
and paste”
technique is commonly used when the existing functionality has to be 
slightly adapted.
Even if in this case one usually does not end up literally duplicating 
input/output
behaviour, the changes introduced by adaptation are usually relatively 
minor and a
generalization of the original and the adapted fragments could be often 
proposed. Finally,
code duplication might result from a polyvariant program analysis.

The problem of deciding whether two code fragments implement the same 
functionality is
well-known to be undecidable. Nevertheless, automatic program analysis 
techniques have
been developed, notably in the context of imperative and object-oriented 
languages, that
are capable to detect such equivalence under particular circumstances 
and within a certain
error margin, e.g. [1, 5, 6, 7, 9]. Also related is the work on 
plagiarism detection for
programs written in such languages, e.g. [4, 14, 13].

In this project, we will investigate the automatic detection of 
duplicated code in
declarative programming languages, in particular logic programming 
languages such as
Prolog and Mercury. Declarative languages allow the developer to program 
at a much
higher level of abstraction than it is the case with most imperative 
languages. In a
declarative language, one describes properties of the desired solution 
rather than the
actual algorithm that should be used to find the solution.

We aim to develop an analysis that allows to find, without user 
intervention, duplicated
code fragments into a logic program and we will study if and how the 
results of this
analysis can be used to drive a number of refactorings to remove the 
unwanted
duplication from the program. These refactorings include predicate 
extraction (replacing
duplicated code fragments by a call to a newly defined predicate), the 
removal of
predicates implementing the same relation as another predicate and the 
generalization of
duplicated code fragments into calls to newly generated higher-order 
predicate [11, 12].
The development of such a duplicated code analysis for logic programs, 
and its
integration with refactoring, presents some interesting research 
opportunities. Firstly, the
declarative nature of logic programs makes it not straightforward to 
adapt the methods
developed in an imperative (or object-oriented) setting. Secondly, the 
fact that logic
programs have a small and formally well-defined semantics and use an 
explicit symbolic
data representation makes the use of advanced analyses possible. 
Therefore it might well
be possible to obtain more fine-grained results than is the case for 
imperative languages.
Finally, it might be worthwhile to investigate how the developed 
analysis could be used in
the context of plagiarism detection for logic programs.

References
----------
[1] B. S. Baker. On finding duplication and near-duplication in large 
software systems. In Proc. Second
IEEE Working Conference on Reverse Engineering, pages 86-95, July 1995.
[2] F. Degrave and W. Vanhoof. Towards a normal form for Mercury 
programs. In A. King, ed. LOPSTR
2007, volume 4915 of Lecture notes in computer science, pages 43-58. 
Springer-Verlag, 2007.
[3] M. Fowler, K. Beck, J. Brant, W. Opdyke, and D. Roberts. Refactoring 
: Improving the Design of
Existing Code. Object Technology Series. Addison-Wesley, 1999.
[4] S. Horwitz. Identifying the semantic and textual differences between 
two versions of a program. ACM
SIGPLAN Notices, 25(6) :234-245, 1990.
[5] T. Kamiya, S. Kusumoto, and K. Inoue. Ccfinder : A multilinguistic 
token-based code clone detection
system for large scale source code. IEEE Trans. Software Eng., 28(7): 
654-670, 2002.
[6] R. Komondoor and S. Horwitz. Using slicing to identify duplication 
in source code. In Static Analysis
Symposium, pages 40-56, 2001.
[7] K. Kontogiannis, R. de Mori, E. Merlo, M. Galler, and M. Bernstein. 
Pattern matching for clone and
concept detection. Autom. Softw. Eng., 3(1/2) :77-108, 1996.
[8] H. Li, C. Reinke, and S. Thompson. Tool support for refactoring 
functional programs. In J. Jeuring,
editor, ACM SIGPLAN 2003 Haskell Workshop. ACM 2003.
[9] J. Mayrand, C. Leblanc, and E. Merlo. Experiment on the automatic 
detection of function clones in a
software system using metrics. In Intl. Conf. on Software Maintenance, 
pages 244-253, 1996.
[10] D. Roberts, J. Brant, and R. E. Johnson. A refactoring tool for 
Smalltalk. Theory and Practice of Object
Systems (TAPOS), 3(4) :253-263, 1997.
[11] A. Serebrenik, T. Schrijvers and B. Demoen. Improving Prolog 
programs: Refactoring for Prolog.
Theory and practice of logic programming (Accepted) 2008.
[12] W. Vanhoof. Searching semantically equivalent code fragments in 
logic programs. In S. Etalle, editor,
LOPSTR 2004, volume 3573 of Lecture notes in computer science, pages 
1-18. Springer-Verlag, 2005.
[13] J. Winstead and D. Evans. Towards differential program analysis. In 
Proceedings of the 2003
Workshop on Dynamic Analysis, 2003.
[14] W. Yang. Identifying syntactic differences between two programs. 
Software Practice and Experience,
21(7): 739-755, 1991.





Mehr Informationen über die Mailingliste IFI-CI-Event