Chapter 1 - Introduction
1.5 Thesis Outline
Chapter 2 - Literature Survey on Traceability
2.1 Traceability Reference Models and Meta-Models
2.2 Traceability Approaches to Capture Trace Relations
2.2.1 Formal Approaches
2.2.2 Process Oriented Approaches
2.2.3 Information Retrieval Approaches
2.2.4 String Matching Approaches
2.2.5 Rule Based Approaches
2.2.6 Run-time approaches
2.2.7 Hypermedia and Information Integration approaches
2.3 Representation, Recording and Maintenance of Traceability Relations
2.4 Visualisation of Traceability Relations
2.5 Use of Traceability Relations
2.6 Traceability Approaches for Multi-Agent Systems
2.7 Performance Measures
2.8 Implication of tools that infer trace relations
Chapter 3 - Traceability Reference Model
3.1 Overview of the Reference Model
3.2 Multi-agent Oriented Artefacts
3.2.1 i* Framework
3.3 Traceability Relations
3.3.1 Traceability Relations between i* and Prometheus
3.2.2 Traceability Relations between Prometheus and JACK
Chapter 4 - Traceability Framework
4.1. Overview of the Framework
4.2 Traceability and Completeness Checking Rules
4.3 Extended Functions
4.3.1 Completeness checking functions
4.3.2 XQuery functions
4.4 Retratos Tool
Chapter 5 - Evaluation and Results
5.1 Criteria for Evaluation
5.2 Automatic Teller Machine
5.2.1 Overview of the Case Study
5.3 Air Traffic Control Environment
5.3.1 Overview of the Case Study
5.4 Electronic Bookstore
5.4.1 Overview of the Case Study
5.6 Threats of Validity
Chapter 6 - Conclusion and Future Works
6.1 Overall Conclusions
6.5 Future Work
6.5 Final Remarks
Appendix A - Extended Functions
A.1.1 Completeness checking functions
A.1.2 XQuery functions
A.1.4.1 ActorHasCapability function
A.1.4.2 FieldTokenizer function
A.1.4.3 GetAttributeValue function
A.1.4.4 GetIncludedFields function
A.1.4.5 GetInformationCarried function
Appendix B – Automated Teller Machine
B.2 Organizational Models
B.3 Prometheus Models
B.4 JACK Code
B.5 JACK Code in XML
Appendix C – Air Traffic Control Environment
C.2 Organizational Models
C.3 Prometheus Models
C.4 JACK Code
C.5 Code in XML
Appendix D – Electronic Bookstore Case Study
D.1 JACK Agent vs Prometheus Goal
D.2 JACK Agent vs Prometheus Role
D.3 JACK Agent vs Prometheus Agent
D.4 JACK Agent vs Prometheus Capability
D.5 JACK Agent vs Prometheus Plan
D.6 JACK Agent vs Prometheus Percept
D.7 JACK Agent vs Prometheus Action
D.8 JACK Agent vs Prometheus Message (sends)
D.9 JACK Agent vs Prometheus Message (receives)
D.10 JACK Agent vs Prometheus Data (uses)
D.11 JACK Agent vs Prometheus Data (creates)
D.12 JACK Plan vs Prometheus Goal
D.13 JACK Plan vs Prometheus Role
D.14 JACK Plan vs Prometheus Agent
D.15 JACK Plan vs Prometheus Capability
D.16 JACK Plan vs Prometheus Plan
D.17 JACK Plan vs Prometheus Percept
D.18 JACK Plan vs Prometheus Action (Sends)
D.19 JACK Plan vs Prometheus Message (Sends)
D.20 JACK Plan vs Prometheus Message (Receives)
D.21 JACK Plan vs Prometheus Data (Uses)
D.22 JACK Plan vs Prometheus Data (Creates)
D.23 JACK BeliefSet vs Prometheus Role (Creates)
D.24 JACK BeliefSet vs Prometheus Role (Uses)
D.25 JACK BeliefSet vs Prometheus Role (Creates)
D.26 JACK BeliefSet vs Prometheus Agent (Uses)
D.27 JACK BeliefSet vs Prometheus Capability (Creates)
D.28. JACK BeliefSet vs Prometeus Capabilitity (Uses)
D.29 JACK BeliefSet vs Prometheus Plan (Creates)
D.30 JACK BeliefSet vs Prometheus Plan (Uses)
D.31 JACK BeliefSet vs Prometheus Data
D.32 JACK Event vs Prometheus Agent (sends)
D.33 JACK Event vs Prometheus Agent (receives)
D.34 JACK Event vs Prometheus Capability (sends)
D.35. JACK Event vs Prometheus Capability (receives)
D.36 JACK Event vs Prometheus Plan (sends)
D.37 JACK Event vs Prometheus Plan (receives)
D.38 JACK Event vs Prometheus Message
Appendix E – Introduction to BDI architecture
E.1 Agent Architectures
E.2 BDI Architecture
Appendix F - Traceability Relations between i* and Prometheus
Appendix G - Traceability Relations between Prometheus and JACK
Figure 3.1 SD model
Figure 3.2 Strategic Dependency Diagram for the Electronic Bookstore
Figure 3.3 Strategic Rationale Diagram for the Electronic Bookstore actor
Figure 3.4 Prometheus methodology phases
Figure 3.5 Goal diagram for the Electronic Bookstore
Figure 3.6 Role Diagram for the Electronic Bookstore
Figure 3.7 Order Book Scenario
Figure 3.8 System Overview Diagram
Figure 3.9 Find BestSellers Capability
Figure 3.10 Security Manager Agent Overview Diagram
Figure 3.11 Airport Agent in JACK
Figure 3.12 BankAgent agent in JACK
Figure 3.13 ArrivalSequencing Capability
Figure 3.14 WithdrawRequest event
Figure 3.15 WithdrawCash plan
Figure 3.16 Accounts beliefSet
Figure 3.17 Prometheus Goal vs. SD Goal overlaps traceability relation
Figure 3.18 Prometheus Data vs. SR Resource overlaps traceability relation
Figure 3.19 Prometheus Data vs. SD Goal contributes traceability relation
Figure 3.20 Prometheus Data vs. SD Task contributes traceability relation
Figure 3.21 Prometheus Plan vs. SD Resource uses traceability relation
Figure 3.22 Prometheus Plan vs. SR Resource uses traceability relation
Figure 3.23 Prometheus Plan vs. SR Resource creates traceability relation
Figure 3.24 Prometheus Scenario vs. SR Resource creates traceability relation
Figure 3.25 Prometheus Agent vs. SD Goal achieves traceability relation
Figure 3.26 Prometheus Plan vs. SR Task achieves traceability relation
Figure 3.27 Prometheus Goal vs. Actor depends on traceability relation
Figure 3.28 Prometheus Scenario vs. SR Goal compose traceability relation
Figure 3.29 Prometheus Scenario vs. SR Task composed traceability relation
Figure 3.30 JACK BeliefSet vs. Prometheus Data overlaps traceability relation
Figure 3.31 JACK Agent vs. Prometheus Agent overlaps traceability relation
Figure 3.32 JACK Agent vs. Prometheus Plan uses traceability relation
Figure 3.33 JACK Plan vs. Prometheus Data uses traceability relation
Figure 3.34 JACK Plan vs. Prometheus Data creates traceability relation
Figure 3.35 JACK BeliefSet vs. Prometheus Plan creates traceability relation
Figure 3.36 JACK Agent vs. Prometheus Goal achieves traceability relation
Figure 3.37 JACK Plan vs. Prometheus Goal achieves traceability relation
Figure 3.38 JACK Agent vs. Prometheus Message sends traceability relation
Figure 3.39 JACK Plan vs. Prometheus Message sends traceability relation
Figure 3.40 JACK Plan vs. Prometheus Message receives traceability relation
Figure 3.41 JACK Plan vs. Prometheus Message sends traceability relation
Figure 4.1: Overview of traceability framework
Figure 4.2 Example of the use of rule in our approach
Figure 4.3 Rule Template
Figure 4.4 Rule4
Figure 4.5 Rule4 Header
Figure 4.6 Namespace declarations
Figure 4.7 Variable declarations
Figure 4.8 Condition part
Figure 4.9 Traceability Relation Creation
Figure 4.10 Traceability Relation between Arrange delivery and Organize delivery
Figure 4.11 Generation of Missing Element
Figure 4.12 Log Outgoing Delivery Missing Element
Figure 4.13 Rule49
Figure 4.14 Iteration part of the Rule15
Figure 4.15 Airport agent in Prometheus and Airport actor in i*
Figure 4.16 Traceability relation between Airport agent and Airport actor
Figure 4.17 Rule4cc
Figure 4.18 Iteration part of the Rule4cc
Figure 4.19 Airport SR model and ATCE Prometheus Goal
Figure 4.20 Request Runway goal missing in Prometheus
Figure 4.21 Calling getPDTFileName extended function in Java
Figure 4.22 List of strings
Figure 4.23 Arrival Sequencing Capability and ATL SD Resource
Figure 4.24 capabilityUsesSDResource function example
Figure 4.25 getBeliefSetFields function example
Figure 4.26 getIncludesFields function example
Figure 4.27 isSimilar function example
Figure 4.28 isSynonyms function example
Figure 4.29 contains function example
Figure 4.30 getSubGoalsAndTask function example
Figure 4.31 Retratos main menu
Figure 4.32 Creating a New Project
Figure 4.33 New Project window
Figure 4.34 Creating traceability relations and identifying missing elements
Figure 4.35 output.xml file
Figure 4.36 – HTML Generator sub-menu item
Figure 4.37 Simple HTML Report
Figure 4.38 HTML Template
Figure 4.39 – retratos.css file
Figure 4.40 HTML Report using HTML template and retratos.css file
Figure 4.41 HTMLGeneratorWith Types menu item
Figure 4.42 HTML Report with Types
Figure 4.43 HTML Report with types using HTML template and retratos.css file
Figure 4.44 IstarPrometheusRule menu item
Figure 4.45 IstarPrometheus rule editor
Figure 4.46 PrometheusJACKRule menu item
Figure 4.47 PrometheusJACK rule editor
Figure 4.48 Show Rules menu item
Figure 4.49 Rule Viewer
Figure 5.1 Fields of the Accounts beliefSetAccounts Descriptor
Figure 5.2 Accounts Descriptor
Figure 5.3 Balances beliefSet
Figure 5.4 Balances descriptor
Figure 5.5 ProcessWithdraw plan
Figure 5.6 Process Withdraw descriptor
Figure 5.7 WithdrawApproved plan
Figure 5.8 Withdraw Approved descriptor
Figure 5.9 WithdrawCash plan
Figure 5.10 Withdraw Cash descriptor
Figure 5.11 WithdrawRejected plan
Figure 5.12 Withdraw Rejected descriptor
Table 3.1 Relations between Prometheus and i* SD
Table 3.2 Relations between Prometheus and i*SR elements
Table 3.3 Traceability Relations Types between Prometheus and JACK Artefacts
Table 3.4 Traceability Relations Types between Prometheus and JACK Artefacts
Table 5.1 ATM elements in Prometheus
Table 5.2 ATM elements in JACK
Table 5.3 Results of experiments for the ATM case study
Table 5.4 Missing Information
Table 5.5 Results of the experiments for the new models of the ATM case study
Table 5.6 ATCE elements in Prometheus
Table 5.7 ATCE elements in i*
Table 5.8 ATCE elements in JACK
Table 5.9 Results of the experiments between Prometheus model and JACK code
Table 5.10 Results of the experiments between i* model and Prometheus model
Table 5.11 Missing relations between JACK code and Prometheus model
Table 5.12 Results of the experiments for the new models of the ATCE case study
Table 5.13 Missing relations between i* and Prometheus model
Table 5.14 Results of the experiments for the new models of the ATCE case study
Table 5.15 EB elements in i*
Table 5.16 EB elements in Prometheus
Table 5.17 EB elements in Prometheus
Table 5.18 Evaluation Results
Table 5.19 Evaluation Results
Table 6.1 – Results of the experiments
Table 6.2 Results of the experiments
Table 6.3 – Results of LEDA case study using threshold
Table 6.4 Results of the experiments
Table 6.5 Number of traceability relations identified for the ATM case study
Table 6.6 Number of traceability relations identified for ATCE case study
Table 6.7 Number of traceability relations identified for the ATCE case study
I would like to thank the examiners Peter Sawyer and Bill Karakostas for having so kindly accepted to take part of my viva voice examination and for the comments and suggestions. The quality of thesis would have suffered without their contribution.
Andre Zisman has been principal motivator actor of my work giving helpful feedback during the period of her supervision. It was also great value have written papers with her and with my co-supervisor George Spanoudakis.
Thank you to all colleagues from Department of Computing who I have had the good fortune to share a room with or work together with as visiting tutor. Especially, I would like to thank Michael Iossif, Mark Firman, Shant Narcessian, Waraporn Jirapathong, Khaled Mahub, Marcus Andrews, Olga Castilho, Thsiamo, Theoharris, Ricardo Contreras, and George Lekeas.
I would also like to thank the support and administrative team for all their help during all this period.
Thank you especially to my friends from London for the support and attention that made life easier and happier.
My greatest gratitude goes to my family that had suffered from my absence and for the support that they always gave in my life.
Finally, I would to thank you the “Lord” that without anything would not be possible.
I grant powers of discretion to the University Librarian to allow the thesis to be copied in whole or in part without further reference to the author. This permission covers only single copies made for study purposes, subject to normal conditions of acknowledgment.
The development of multi-agent software systems is considered a complex task due to (a) the large number and heterogeneity of documents generated during the development of these systems, (b) the lack of support for the whole development life-cycle by existing agent-oriented methodologies requiring the use of different methodologies, and (c) the possible incompleteness of the documents and models generated during the development of the systems.
In order to alleviate the above problems, in this thesis, a traceability framework is described to support the development of multi-agent systems. The framework supports automatic generation of traceability relations and identification of missing elements (i.e., completeness checking) in the models created during the development life-cycle of multi-agent systems using the Belief-Desire-Intention (BDI) architecture.
Traceability has been recognized as an important activity in the software development process. Traceability relations can guarantee and improve software quality and can help with several tasks such as the evolution of software systems, reuse of parts of the system, validation that a system meets its requirements, understanding of the rationale for certain design decisions, identification of common aspects of the system, and analysis of implications of changes in the system.
The traceability framework presented in this thesis concentrates on multi-agent software systems developed using i* framework, Prometheus methodology, and JACK language. Here, a traceability reference model is presented for software artefacts generated when using i* framework, Prometheus methodology, and JACK language. Different types of relations between the artefacts are identified. The framework is based on a rule-based approach to support automatic identification of traceability relations and missing elements between the generated artefacts. Software models represented in XML were used to support the heterogeneity of models and tools used during the software development life-cycle. In the framework, the rules are specified in an extension of XQuery to support (i) representation of the consequence part of the rules, i.e. the actions to be taken when the conditions are satisfied, and (ii) extra functions to cover some of the traceability relations being proposed and completeness checking of the models.
A prototype tool has been developed to illustrate and evaluate the work. The work has been evaluated in terms of recall and precision measurements in three different case studies. One small case study of an Automatic Teller Machine application, one medium case study of an Air Traffic Control Environment application, and one large case study of an Electronic Bookstore application.
Chapter 1 - Introduction
A multi-agent system consists of a system composed of several agents that are situated in an environment and that interact with each other and with their environment. Multi-agent systems have been proposed as a solution to implement complex systems that need to run in an environment that is open, distributed and highly interactive. An agent is defined by Wooldridge in (Wooldridge, et al., 1995), (Wooldridge, 2002) as a software component that is “situated in some environment and that is capable of autonomous action in this environment in order to meet its design objectives”. Several types of software components fulfil this definition varying from daemons process in UNIX (Frisch, 2002) to complex decision making systems that control unmanned autonomous vehicles (Agent Oriented Software Limited, 2010).
An intelligent agent is an autonomous software component that is categorised to be pro-active, reactive, and social (Wooldridge, 2002). Pro-activeness means that the agent takes initiative in order to achieve its goals. Reactivity means that the agent perceives its environment and responds to its stimulus according to its goals. Social ability means that the agent will be able to communicate with other agents and have abilities such as co-operation, co-ordination, and negotiation.
Several architectures have been proposed to build multi-agent systems such as Jadex (Pokahr, et al., 2005), Jason (Bordini, et al., 2005), and JACK (Busetta, et al., 1999), (Howden, et al., 2001). Agent architectures can be classified in three categories: deliberative architectures, reactive architectures, hybrid architectures.
Reactive architectures do not maintain a symbolic representation of the environment and actions are performed using rules. Agents are situated in the environment and perceive the environment. Depending on the event that occurs in the environment a rule is executed and actions are performed.
In the deliberative architecture, a symbolic representation of the environment is created and the agent performs actions to manipulate these symbols. The actions performed are based on logical reasoning using theorem provers (Genesereth, et al., 1987). The drawback of this architecture is that it is difficult to represent the real world using a symbolic representation. Moreover, the use of logic reasoning to determine what action to perform is a very resource and time consuming task. Several multi-agent systems use a deliberative architecture to support reasoning and some of them are based on the BDI (Belief Desire Intention) architecture (Bratman, et al., 1988). Hybrid architectures combine deliberative and reactive behaviour. Examples of hybrid architectures are: TouringMachines, and INTERRRAP (Luck, et al., 2004).
BDI architectures have been proposed to address the problem of resource boundedness. The BDI architecture (Rao, et al., 1992) is one of the most successful architectures. The BDI architecture is founded on the philosophy theory of Bratman (Bratman, 1999) to explain human rationale action and it has been formalised by logic theory called LORA (Wooldridge, 2000) and BDI logic (Rao, et al., 1998). The BDI architecture has been implemented several times. Examples of implementation are: IRMA (Bratman, et al., 1988), PRS (Ingrand, et al., 1992), Jadex (Pokahr, et al., 2005), Jason (Bordini, et al., 2005) and JACK (Howden, et al., 2001), (Agent Oriented Software Limited, 2010).
Bratman et al. describe the Intelligent Resource-Bounded Machine Architecture (IRMA) that is the first implementation of BDI architecture (Luck, et al., 2004). The IRMA architecture addresses the problem of how an agent can select the best set of actions to carry out in order to achieve a goal when limited by resources such as the amount of time to take the decision.
The Procedural Reasoning System (PRS) is one of the most successful implementation of BDI architecture. The PRS architecture was used to build several applications such as a prototype system to manage the air traffic control of Sydney airport (Ljungberg, et al., 1992), (Rao, et al., 1995). The PRS system has been re-implemented and extended several times. The most known implementations are dMARS (d'Inverno, et al., 2004), JAM (Huber, 1999), JACK (Howden, et al., 2001), (Agent Oriented Software Limited, 2010), and Jadex (Pokahr, et al., 2005).
To support the development of multi-agent systems various methodologies have been proposed such as Prometheus (Padgham, et al., 2004), Tropos (Castro, et al., 2002), MaSE (DeLoach, 2001), and Gaia (Wooldridge, et al., 2000). These methodologies can be classified based on their origins. For instance, Tropos is based on requirements oriented methodologies and has its origins on i* framework. Prometheus is based on object-oriented methodologies and its design phase is influenced by JACK. Luck et al. (Luck, et al., 2004) and Sudeikat et al. (Sudeikat, et al., 2004) classify agent oriented methodologies origins as object-oriented, knowledge engineering oriented, requirement engineering oriented, and of general category.
Despite advances in the area, the development of multi-agent systems is a complex task. As outlined in (Luck, et al., 2004), the difficulty to develop multi-agent systems are due to the (a) design of software systems that maintain a balance between proactive and reactive behaviour present in agents, (b) understanding of when agent approaches are appropriate, and (c) use of informal development techniques. In addition, (i) the large number and heterogeneity of documents generated during the development of multi-agent systems, (ii) the lack of support for the whole development life-cycle by existing agent-oriented methodologies requiring the use of different methodologies, and (iii) the possible incompleteness of the documents and models generated during the development of multi-agent systems contribute to the difficulties of developing such systems.
Moreover, the development of multi-agent systems produces a huge number of artefacts. Each artefact created can be related to several other artefacts. The relations between artefacts can be explicit or implicit. Explicit relations are concerned with the direct relation between two artefacts. For instance, artefact B depends on artefact A. Therefore, there is an explicit relation between the artefacts A and B. Implicit relations are concerned with indirect relations between two artefacts. For instance, artefact B depends on artefact A and artefact C depends on artefact B. Therefore there is an implicit relation between artefacts A and C.
Explicit relations are easier to maintain while implicit relations are difficult to maintain and to be found. Furthermore, multi-agents systems are normally developed by teams of analysts, developers, and programmers that are often distributed in different locations and use different tools, notations, and methodologies. The heterogeneity of people, tools, notations, and methodologies makes difficult to identify and understand the relations between the artefacts. In addition, it is not possible to guarantee completeness of the generated artefacts.
The need to understand the relations between the artefacts created during the development of software system is essential to several activities of software development such as impact analysis, software maintenance and evolution, component reuse, verification and validation. It is difficult or even impossible to indentify manually these relations in complex systems (e.g. multi-agent systems).
The difficult to indentify traceability relations in multi-agent systems are due to (a) the large number and heterogeneity of documents generated during the development of these systems, (b) the lack of support for the whole development life-cycle by existing agent-oriented methodologies requiring the use of different methodologies, and (c) the possible incompleteness of the documents and models generated during the development of the systems.
We recognize that the above problems can occur in other types of complex systems, but in this thesis we focus on multi-agent systems developed using BDI architecture. In particular, the main differences are the types of the elements and documents that are used when developing a multi-agent system. The development of multi-agent systems involves a new set of elements such as goals, percepts, beliefs, capabilities, agents, roles, actions, events, messages, and plans. To utilize and understand the traceability relations, it is necessary to define the semantics of the relations between these elements. To address this problem we define a traceability reference model to represent the semantic of traceability relations. The semantic of traceability relation gives the ability to carry out richer kind of analysis (e.g. impact analysis).
Another difference is that in some of methodologies such as Troops and Prometheus the definition of requirements is based on goal oriented techniques instead of textual descriptions that allow the development of multi-agent using a model driven development since the requirement definition phase.
Multi-agent systems are distributed and concurrent, and the agents that make up a multi-agent system are able to exhibit complex flexible behaviour in order to achieve its objectives in the face of a dynamic and uncertain environment. This flexible behaviour is key in making agent technology useful, but it makes it difficult to trace agent systems. Tracing is an essential part of the process of developing software, and important to support verification, validation and debugging.
In order to alleviate the above problems, in this thesis we propose the use of software traceability and identification of missing elements between artefacts produced during the whole life cycle of a multi-agent system.
Software traceability has been defined as “the ability to describe and follow the life of a requirement, in both a forward and backward direction (i.e. from its origins, through its development and specification, to its subsequent deployment and use, and through periods of ongoing refinement and iteration in any of these phases)” (Gotel, et al., 1994). Traceability relations can help to assist with several activities during the life cycle of software development such as impact analysis, verification and validation, reuse, and maintenance.
The identification of traceability relations manually is a labour intensive and an error prone task (Spanoudakis, et al., 2005). Several approaches have been proposed to recover traceability relations automatically. The approaches can be classified as (i) formal approaches (Pinheiro, et al., 1996), (ii) process oriented approaches (Castro-Herrera, et al., 2007), (Ravichandar, et al., 2007), (Pohl, 1996), (iii) information retrieval approaches (Zou, et al., 2007), (Poshyvanyk, et al., 2007), (Duan, et al., 2007), (Kritzinger, et al., 2008), (Antoniol, et al., 2002), (Marcus, et al., 2003), (Zou, et al., 2006), (De Lucia, et al., 2007), (De Lucia, et al., 2008), (Lormans, et al., 2006), (Hayes, et al., 2007), (iv) string matching approaches (Fiutem, et al., 1998), (Antoniol, et al., 2001), v) rule base approaches (Spanoudakis, et al., 2004), (Jirapanthong, et al., 2005), (Jirapanthong, et al., 2009), (Cysneiros, et al., 2003), (Cysneiros, et al., 2007a), (Cysneiros, et al., 2007b), (Cysneiros, et al., 2008) (Spanoudakis, et al., 2003), (Spanoudakis, et al., 2004), (Dagenais, et al., 2007), (Reiss, 2006), (Fletcher, et al., 2007), (Rilling, et al., 2007), (Kagdi, et al., 2007), (Alves-Foss, et al., 2002), (vi) run-time approaches (Liu, et al., 2007), (Egyed, 2003), (Egyed, et al., 2005), (Grechanik, et al., 2007), and (vii) hypermedia and information integration approaches (Sherba, et al., 2003), (Sherba, 2005).
The approaches above address different aspects of the traceability problem. For instance, i) formal approaches can be used when it is possible to define the software project using a formal language and then traceability relations are derived automatically using axioms; ii) process oriented approaches can be used when a unified software process development is used to develop software; iii) information retrieval techniques have been used successful to identify traceability relations between textual documentation of software artefacts; iv) string matching approaches can be used when naming of elements are used consistently to define elements of a software project; v) rule-based approaches can be used when it is easy to identify and define rules between relations of elements created during the development of a software; vi) run-time approaches can be used when code of the system is available.
In this thesis a rule-based framework is described to support automatic generation of traceability relations and identification of missing elements in artefacts created during the development of multi-agent systems. The identification of missing elements is called completeness checking in this thesis report. This work provides support for artefacts created during different phases of the software development life-cycle. More specifically, the approach supports artefacts created during early and late requirements elicitation, analysis and design, and implementation phases of the development of multi-agent systems.
The framework concentrates on early requirements represented using i * framework (Yu, 1995), late requirements, analysis and design specification created using the Prometheus methodology (Padgham, et al., 2004), and code implemented with JACK (Agent Oriented Software Limited, 2010).
Prometheus methodology was chosen because it has been largely used in academic and industrial settings; it covers the whole life cycle of development; and there are a large number of documentation, examples, and tools support available. Moreover, the detailed design phase of Prometheus covers the concepts necessary to model multi-agent systems implemented using the BDI architecture.
The reason for using i * framework to represent the early phase of the requirement specification is due to the fact that Prometheus methodology only supports a specification of goals of the system in terms of a hierarchical diagram. The i * framework provides a richer modelling technique to represent organizational process. The i * framework represents relations between actors that depend of each other to have its goals accomplished. The rationale behind the dependencies can also be represented in i*.
The adoption of JACK is due to its use in several academic and commercial applications and in diverse areas such as unmanned aerial vehicles, surveillance, air traffic management, real-time scheduling, and virtual actors (Agent Oriented Software Limited, 2010). Moreover, a large number of documentation is available and the detailed phase of Prometheus describes the elements of the JACK.
In order to support the heterogeneity of models and tools covered in this work, the models are represented in XML (XML, 2010). XML was chosen as the basis of our approach due to several reasons: (a) XML has become the de facto language to support data interchange among heterogeneous tools and applications, (b) the existence of large number of applications that use XML to represent information internally or as a standard export format, and (c) to allow the use of XQuery (XQuery, 2010) as a standard way of expressing traceability rules.
We propose to use an extended version of XQuery to represent the rules in our framework. XQuery is an XML-based query language that has been widely used for manipulating, retrieving, and interpreting information from XML documents. Apart from the embedded functions offered by XQuery, it is possible to add new functions. We have extended XQuery (a) to support representation of the consequence part of the rules, i.e. the actions to be taken when the conditions are satisfied, and (b) to support extra functions to cover (i) some of the traceability relations being proposed and (ii) completeness checking of the models.
A prototype tool has been implemented to demonstrate and evaluate the work. The evaluation of the work has been performed in three case studies, namely: (i) automatic teller machine, (ii) electronic bookstore, and (iii) air traffic control environment. The automatic teller machine is a small size application that allows a customer to withdraw cash and print statements. The air traffic environment is a medium size application that simulates the landing sequencing of an aircraft. The electronic bookstore application is a large size application that implements the main functionalities of an electronic bookstore such as browsing catalogue, search books by keyword and buy a book.
The remainder of this chapter describes the hypotheses, problem definition, objectives, contributions, and thesis outline.
The hypothesis of our framework consists on the identification of traceability relations between software artefacts created during the development of multi-agent systems using a model driven approach. This hypothesis is broken into the following:
- It is possible to use rules to identify traceability relations between software artefacts created during the development of multi-agent systems using a model driven approach;
- It is possible to use rules to identify missing elements between software artefacts created during the development of multi-agent systems using a model driven approach;
- It is possible to use the information about missing elements to fix discrepancies between names given to elements in the different documentation and to improve completeness between software artefacts created during the development of multi-agent systems using a model driven approach;
- It is possible to use the information about missing elements to improve the number of traceability relations identified by our framework.
To evaluate this hypothesis, a prototype tool was built and assessed in three case studies. In chapter 5 these experiments are described and the results of the evaluation presented.
The overall aim of this research is to develop an approach to support traceability between artefacts created during the entire life cycle of the development of a multi-agent system. In particular, the main interest was in supporting the identification of missing elements and automatic generation of traceability relations between software elements in i* organisational models (Yu, 1995), Prometheus models (Padgham, et al., 2004), and JACK code (Agent Oriented Software Limited, 2010).
The main aim was broken down into the following objectives:
- To define different types of traceability relations;
- To create a reference model that defines traceability relations between artefacts in i* and Prometheus and between artefacts in Prometheus and JACK code;
- To create a set of rules to identify missing elements and traceability relations between i* and Prometheus artefacts;
- To create a set of rules to identify missing elements and traceability relations between Prometheus and JACK code elements;
- To develop a prototype tool to identify missing elements and to automatically generate traceability relations between i* and Prometheus models and between Prometheus models and JACK code;
- To evaluate the work in several case studies.
This research contributes to the Agent Oriented Software Engineering area and addresses the problems discussed in Section 1.1. The main contributions can be summarised as:
- Automatically recovery of traceability relations - A rule-based approach was proposed to support automatic generation of traceability relations between heterogeneous software models created during the development of multi-agent systems. This alleviates the problems of creating traceability relations manually;
- Support for completeness checking - A rule-based approach was proposed to support the identification of missing elements in various software models. This facilitates fixing inconsistencies among the models;
- Traceability Reference Model - Nine types of traceability relations with different semantics and a traceability reference model between i * and Prometheus elements and between Prometheus and JACK elements were proposed;
- Rules to recover traceability relations and to identify missing elements - Several rules were created to identify missing elements and to recover traceability relations between i * and Prometheus and between Prometheus and JACK elements;
- Traceability prototype tool - A prototype tool has been developed in order to execute the rules and to create traceability relations and identify missing elements information;
- Development of three case studies – The work was evaluated in three case studies. A small size application of an Automatic Teller Machine where JACK code provided by AOS (Agent Oriented Software Limited, 2010) has been reversed engineering to create Prometheus model. This work shows that the prototype tool can identify automatically most of traceability relations correctly and we used the information about missing elements to fix the inconsistencies and to complete the models. A medium size application of an Air Traffic Control Environment where JACK code has been provided and it was used to create models in Prometheus and i *. A large size case study of an Electronic Bookstore. In this case, Prometheus models have been created based on available documentation (Padgham, et al., 2004) and on real applications such as Amazon.com (Amazon.com, 2010). JACK code was implemented based on the created Prometheus models.
1.5 Thesis Outline
The remaining of this thesis is structured as follows. In chapter 2 the literature about traceability is reviewed. We introduce what traceability is and the importance of traceability in the software development process. This chapter describes the main traceability reference models, approaches used to recover traceability relations, approaches to represent and maintain traceability relations, approaches to use and visualise traceability relations, and approaches that define traceability reference models. We also describe existing work on traceability and agent oriented systems.
A traceability reference model for software models created during the development of multi-agent systems using i * framework, Prometheus methodology and JACK language is presented in chapter 3. We describe the elements of i* framework, Prometheus methodology, and JACK language used in our framework, and nine types of traceability relations.
The framework is described in chapter 4. Initially, we give an overview of the approach and then we provide details of the architecture of the framework. We show how different types of rules can be created to be used by the framework and then give some examples of different type of rules. We describe different functions that we have developed to support the rules, to perform completeness checking, to verify if names of elements in the models are synonyms, to compare similarities between elements in the models, and to manipulate elements in PDT (PDT, 2010), TAOME (TAOME4E, 2008) and JACK models. Finally, we describe the prototype tool that we have developed to support our traceability framework.
The evaluation of the framework in three case studies and the results of this evaluation are presented in chapter 5.
In the chapter 6 the conclusions and future works are presented. The main contributions of the research and how the hypotheses have been achieved are described.
The thesis report is composed of several appendices. Appendix A contains the list of extra functions implemented in Java to extend XQuery. Appendix B describes the Automatic Teller Machine case study. Appendix C describes the Air Traffic Control Environment case study. Appendix D describes the Electronic Bookstore case study. Appendix E gives an introduction to the BDI architecture. We present different types of agent architecture used to build multi-agents systems and then we describe in detail the BDI architecture that was used by our research. Appendix F describes traceability relations between i* and Prometheus elements. Appendix G describes traceability relations between Prometheus and JACK elements.
Chapter 2 - Literature Survey on Traceability
Software traceability is the ability to relate artefacts created during the development life-cycle of a software system (Spanoudakis, et al., 2005). More specifically, from the point of view of requirements, traceability has been defined as the “ability to describe and follow the life of a requirement, in both a forward and backward direction (i.e. from its origins, through its development and specification, to its subsequent deployment and use, and through periods of ongoing refinement and iteration in any of these phases)” (Gotel, et al., 1994).
Software traceability is essential in the software development process and has been used to support several activities such as impact analysis, software maintenance and evolution, component reuse, verification, and validation. The importance of traceability in the software development process has been endorsed by several standards for quality management and process improvement such as ISO 9001:2000 (ISO, 2010) and CMMI (Carnegie Mellon, 2010).
Gotel discusses several challenges and problems (Gotel, 2008), (Gotel, 2009) that exist to support traceability practice. Examples of these problems and challenges are: i) traceability is seen as a repetitive and tedious task and the challenge is how to change this image (the yo-yo challenge – the boredom of a fixed routine); ii) responsibility for traceability is blurred and the challenge is to distribute the responsibility for traceability to all team members of a software project ; iii) artefacts in the software project are from different types and they are represented in a variety of medias, and at different levels of formality and granularity and the challenge is to identify what should be traced and how trace relations are to be established; iv) the credibility of traceability can be debatable and the challenge is how determine ways to communicate confidence level of trace relations; v) traceability relations tend to decay without dedicated ongoing maintenance and the challenge is to plan a traceability strategy well; vi) unrealistic expectations are placed on traceability automation, however techniques to recover traceability relations automatically still demand a high quality set of artefacts and manual filtering of results and the challenge is to figure out how to combine heterogeneous automated and humans approaches to support traceability; vii) traceability should be a by product, but it became an extra activity of software development and the challenge is to make traceability to be achieved as a by-product of other engineering activities.
In addition, several researchers and practitioners have participated in a series of two events: First Workshop on Grand Challenges for Traceability (GCW'06) and International Symposium on Grand Challenges in Traceability (GCT’07) with the goal of identifying challenges in the area of traceability that need to be addressed. The identified challenges in these workshops have led to the creation of the Grand Challenge document. Examples of these identified challenges are: Traceability Knowledge, Training and Certification, Supporting Evolution, Link Semantics, Scalability, Human Factors, Cost Benefit Analysis, Methods and Tools, Tracing across Organizational Boundaries, Process, Compliance, Measurements and Benchmarks, Technology Transfer (Cleland-Huang, et al., 2007).
Despite the importance of software traceability, current support for traceability is inadequate. Most of the commercial tools do not provide mechanisms to automatically generate and maintain traceability relations. Moreover, existing tools do not offer support for defining the various types of the traceability relations (i.e., the semantics of the relations). The lack of automation becomes a serious problem in the development of complex software systems where the numbers of artefacts are large and there is a need to establish traceability relations between those artefacts that are usually created by non-interoperable tools, and can evolve autonomously.
De Lucia et al. (De Lucia, et al., 2007) discuss the importance to show the effectiveness of traceability recovery approaches. In particular, they present a study that compares the effort to identify traceability relations using a traceability tool (i.e. ADAMS) with the effort to identify traceability relations manually. As it is expected, the study shows that the use of a tool helps to improve precision and it also reduces the time necessary to indentify traceability relations. Similarly, Grechanik et al. (Grechanik, et al., 2007) shows that less time was spent by software analysts when using their approach to automatically identify traceability relations in order to execute task of code evolution and code comprehension.
In (Asuncion, 2007), the authors declare that the effective practice of traceability aids in system comprehension, impact analysis, system debugging, and communication between the development team and stakeholders. Assunction et al. present the economic, technical and social benefits obtained using a traceability tool that supports the entire life cycle of the software development in an industrial case study. For instance, some benefits derived from the traceability practice are: i) raise of the visibility of actual software processes enabling users to compare actual practices to stated company procedures; ii) automation replace burdensome tasks associated with traceability, such as maintaining consistency between various artefact representations; iii) prove to the customers that the requirement has been tested; iv) project managers easily obtain an accurate status report of the project.
Traceability has been studied for many years and several approaches have been proposed to tackle its different aspects and issues. Pohl (Pohl, 1996) states that a traceability approach should provide answers to the following questions:
- What traceability information should be captured?
- How traceability information should be captured?
- How traceability information should be stored?
Sherba adds in (Sherba, 2005) that a traceability approach should also to answer the question below:
- How traceability relations are going to be viewed and queried?
Moreover, Gotel (Gotel, 2009) states that before trying to answer what, where, when, and how to trace, it is necessary to answer “why it is important to trace?”. Gotel highlights that it is important to know who are the stakeholders and what are their needs. Gotel says that traceability is a team effort and that other stakeholders need to understand why they should spend time to create or to maintain traceability relations to other stakeholders. Gotel et al. also discuss (Gotel, et al., 2007) that some lessons can be learned from other industries that use traceability. In particular, Gotel et al. compare traceability in software industry with traceability in the food industry. For instance, the responsibility for traceability in the food industry is shared by all people involved in a certain process, while the responsibility for traceability in the software industry is assigned to a few people in most of the cases.
In the next sections, we discuss how different approaches address the questions above and provide a literature survey of these existing approaches.
2.1 Traceability Reference Models and Meta-Models
Traceability reference models are used to define the types of traceability information that should be captured. Several classifications for different types of traceability relations and several traceability reference models and meta-models have been proposed in the literature (Davis, 1990), (Gotel, et al., 1994), (Lindvall, et al., 1996), (Dick, 2002), (Ramesh, et al., 2001), (Spanoudakis, et al., 2005), (Berenbach, 2007), (Almeida, et al., 2007), (Goknil, et al., 2008), (Toranzo, et al., 1999), (Toranzo, et al., 2002), (Han, 2001), (Pinto, et al., 2005).
In (Davis, 1990), traceability has been classified from the perspective of direction as forward and backward. Forward traceability is the ability to trace an artefact to its implementation, while backward traceability is the ability to trace an artefact to its origin. More specifically, Davis has identified four types of traceability relations, namely (a) forward from requirements, (b) backward to requirements, (c) forward to requirements, and (d) backward from requirements. Types (a) and (b) are also known as post-traceability and types (c) and (d) are known as pre-traceability (Gotel, et al., 1994). Traceability relations can also be categorised as horizontal (or inter-traceability) and vertical (or extra-traceability) (Lindvall, et al., 1996). Horizontal traceability relations refer to those relations within the same model, while vertical traceability relations refer to those relations that involve different models.
The need to capture the semantic of traceability relations has been point out as fundamental in order to make effective the use of traceability (Dick, 2002), (Ramesh, et al., 2001), (Spanoudakis, et al., 2005). Dick proposed in (Dick, 2002) an approach to represent “deeper kinds of traceability” relations in order to perform deeper types of analysis. He argues that the use of propositional logic to group relations together and textual information describing the rationale of the relations can be used to describe the traceability relations and to perform further analysis.
Ramesh et al. describe two traceability reference models in (Ramesh, et al., 2001). These traceability reference models have been derived from an empirical study of traceability practices in 26 major software development organizations. In this study, they identified two types of traceability users: “low-end” and “high-end” users. Low-end users have few years of experience with traceability and the use of traceability is compelled by project sponsors or for compliance with standards. Low-end users simply use traceability to relate various components of information without explicit identification of the semantic and rationale of such relations. The main application of traceability by low-end users is for requirements decomposition, requirements allocation, compliance verification, and change control. On the other hand, high-end users have several years of experience and use traceability to cover the full cycle of the software development, and to capture discussion issues, decisions, and rationale.
Low-end users use traceability to create relation of: i) dependencies between requirements (derive); ii) allocation of requirements to system components (allocated _ to); iii) satisfaction of requirements to system components (satisfy); iv) compliance verification procedure developed for requirements (developed _ for); v) dependencies between system components (depend _ on); vi) compliance verification performed by system components (performed _on); vii) between interfaces that system components has with external systems (interface _ with). High-end users use much richer traceability schemes than low-end users. Ramesh et al. divided traceability relations in four parts: Requirements Management, Design Allocation, ComplianceVerification, and Rationale Management.
In (Berenbach, 2007), Berenbach proposes a traceability meta-model and affirms that the implementation and tool support of traceability can help to enforce design and process rules. Berenbach says that a traceability meta-model can be used to create completeness verification checks, and that traceability information can be used to propagate name changes of related elements. The reference model proposed by Berenbach consists of traceability relations between elements of analysis and design models in UML. Examples of types of relations are: i) requirements derive from (Derive From) use cases; ii) use cases are associated with (associated with) use cases; iii) use cases are shown on (shown on) use case diagrams; iv) use case realizations implements (realize) use cases; vi) use cases are explained by (explained with) sequence diagrams; vii) business object relationships are shown on (Relationships shown on) class diagram; viii) boundary elements interact with business objects; ix) use case realization are contained in (contained in) design package; x) components realizes (realizes) business object; xi) component has (has) interface; xii) component are shown on (shown on) component diagram; xiii) components are composed of (composed of) classes; xiv) classes relations are shown (relationships shown on) on class diagrams; xv) test case tests (tests) components; xvi) test case verifies (verifies implementation) requirement; xvii) subsystem details are shown in (details shown in) sequence diagrams; xviii) subsystem behaviour are explained by (behaviour explained by) activity diagram, subsystem contains (contains) design package.
Almeida (Almeida, et al., 2007) proposes a requirement traceability meta-model to support Model Driven Engineering development. The meta-model is implemented using Ecore metamodel. The meta-model proposed by Almeida et al. consists of satisfaction traceability relations between requirements that are part of the requirement specification and artefacts of the software model.
Goknil et al. highlight the importance to define the semantics of traceability relations in order to execute change and impact analysis activities (Goknil, et al., 2008). Goknil presents a traceability meta-model composed of four different types of traceability relations between requirements, namely: requires, refines, conflicts, and contains. A requirement R1 requires a requirement R2, if R1 is fulfilled only when R2 is fulfilled. A requirement R1 refines a requirement R2, if R1 is derived from R2 by adding more details to it. A requirement R1 contains requirements R2 if R2 is part of the requirements R1. A requirement R1 conflicts with a requirement R2, if the fulfilment of R1 excludes the fulfilment of R2 and vice versa.
Another traceability reference model for Model Driven Engineering development has been proposed in (Vanhooff, et al., 2005). Vanhooff defines traceability information used by model transformation as transformation traceability. Vanhooff presents a traceability meta-model that consist of dependencies between source and target elements, dependencies between a mapping and the transformation unit that created it, and the marking of source element as deleted.
Toranzo et al. (Toranzo, et al., 1999), (Toranzo, et al., 2002) present a general purpose reference model. Examples of relations are: i) stakeholders are responsible for (responsibility) requirements, a program represents (represents) requirements, requirements are allocated to (allocated _ to) sub-systems, and tasks are satisfied by (satisfy) design elements. Pinto et al. (Pinto, et al., 2005) describe a process for guiding the use of a reference model to the development of multi-agent systems. In particular, Pinto et al. propose a series of guidelines to extend Tropos in order to support traceability.
Han (Han, 2001) describes a traceability model between requirements and architecture documents. Examples of types of relations are: i) components provides (provides) services; ii) components requires (requires) services; iii) components are part of (part_of) components; iv) components conforms to (conforms_to) interface; v) interface makes visible (makes_visible) assumptions; vi) assumptions are subject to (subject_to) risks; vii) authorities asserts (asserts) assumption; viii) services respect (respect) assumption; ix) stakeholders owns (owns) goals; goals refines (refines) goals; x) services are delivered with (derived_with) quality of services; xi) quality of services satisfies (satisfies) goals; xii) use cases uses (uses) services.
In (Pohl, 1996), the authors identify 18 types of traceability relations that were created based on a survey of the requirements engineering literature. The types of traceability relations were created to describe relations between a hypertext model that specify the vision and requirements of the system, a structured analysis (SA) model that consists of a data flow model and a data dictionary, an extended entity-relationship (ER) model for modelling the data view of the system and an OMT model for modelling the object-oriented view as well as the behaviour of the system.
The traceability relations are classified in five groups, namely: (a) condition links, (b) context link, (c) document links, (d) evolutionary links, and (e) abstraction links. Condition links are used to relate restrictions (precondition or constraints) to a particular object; context links are used to express relations of similarity, comparisons, contradictions, and conflicts between objects; document links are used to relate different kinds of documentation to a requirement such as examples, test cases, description of the purpose, background information and comments; evolutionary links are used to express when a requirement has been replaced, based on, formalized or elaborated by another requirement; abstraction links are used to represent abstractions between trace requirements such as generalizations or refinements.
Spanoudakis and Zisman (Spanoudakis, et al., 2005) propose a framework to organize the types of traceability relations identified in the literature. They had grouped the traceability relations into eight main groups: i) dependency; ii) generalisation/refinement; iii) evolution; iv) satisfaction; v) overlap; vi) conflicting, vii) rationalisation; viii)contribution. Dependency relations are used if an element relies on the existence of another element, generalization/dependencies are used to identify how complex elements of a system can be divided into components, or how an element can be specialised by other elements or how elements can be generalised by another element, evolution relations are used if an element is replaced by another element, satisfiability relations are used if an element meets the expectation, needs and desires of another element or if an element complies with a condition represented by another element, overlaps relations are used if two elements refer to common features of a system or its domain, conflict relations are used to represent conflicts and issues between two elements, rationalisation are used to represent and maintain the rationale behind the creation and evolution of elements, and decisions about the system at different levels of detail, contribution relation are used to represent associations between requirement artefacts and stakeholders that have contributed to the generation of the requirements.
Spanoudakis et al. describe in (Spanoudakis, et al., 2004), a rule based approach to identify traceability relations between requirements specifications using structured text and use case specifications using Cockburn template (Cockburn, 2000), and between requirements and class diagrams. The approach defines four different semantic types of traceability relations, namely: overlaps, requires_execution_of, requires_feature_in, and can_partially_realise. An overlaps relation is used if two elements refer to a common feature of the system or its domain; a requires execution of (requires_execution_of) is used when a sequence of terms appears in a pre-condition of an use case, post-condition of an use case, a requirement statement or an use case event requires the execution of an operation. A requires feature (requires_feature_in) relation is used between a part of an use case specification and a requirement statement (r2), or between a requirement statements r1 and another requeriment statement r2 when the use case or the requirement cannot be realised without the existence of the structural or functional feature in the requirement r2. A can partially realise (can_partially_realise) relation is used between a description, an event or a postcondition of a use case and the description of a requirement statement if the use case can realise part of the requirement statement.
An extension of the above work has been proposed by Jirapanthong et al. in (Jirapanthong, et al., 2005), (Jirapanthong, et al., 2009). The work in (Jirapanthong, et al., 2005), (Jirapanthong, et al., 2009) identifies traceability relations between documents created to develop software product families. The approach helps to identify common and variable aspects between different members of a product family. The approach identifies traceability relations between different types of documents generated when using an extension of the FORM methodology to develop product family systems. A traceability reference model has been created and nine types of traceability relations are proposed: satisfiability, dependency, overlaps, evolution, implements, refinement, containment, similar, and different. Satisfiability relations are used if an element meets the expectations and needs of another element. Dependency relations are used if the existence of an element relies on the existence of another element. Overlaps relations are used if two elements refer to common aspects of a system or its domain. Evolution relations are used if an element has been replaced by another element. Implement relations are used if an element executes or allows for the achievement of another element. Refinement relations are used to identify how complex elements can be decomposed in sub-elements. Containment relations are used when an element uses another element. Similar relations exist between elements that depend on the existence of a relation in common. Different relations are used to assist with the identification of variable aspects between various product members.
Although the reference models and types of relations presented in the literature provide a better understanding of the semantics of traceability relations, there is no consensus on the different types of traceability relations (Sherba, 2005). Moreover, the types of traceability relations are project specific (Pinheiro, et al., 1996), (Spanoudakis, et al., 2005) and can vary depending on the stakeholders, methodologies, domain, and tools involved in the system software development process. Therefore, it is important to create an approach that allows stakeholders to define the type of traceability relations that are important to them in a particular project.
Moreover, to the best of our knowledge, there are no traceability reference models for elements created during the development of multi-agent systems using i * framework, Prometheus methodology and JACK language. The granularity of traceability relations created in (Pinto, et al., 2005) is a general traceability reference model for elements created during the development of multi-agent systems and it has been created to enhance the Tropos methodology to support traceability. The granularity of the types of relations between code elements (i.e. “Program”) and design elements are high level and it does not take in consideration different types of code elements and design elements.
2.2 Traceability Approaches to Capture Trace Relations
There are several types of tools that provide support for capturing traceability in various activities of the software development life-cycle. Examples of these tools are requirement management tools, software change and configuration management tools, and project management tools. However, most of these tools require some intervention by the user in order to create traceability relations. Moreover, in these types of tools, the user has to select the source and target elements to be related. Some of these tools provide some mechanism to assist with the definition of traceability relations. For example, Rational DOORS (IBM Rational, 2010a) and CaliberRM (Borland, 2010) can import the requirements automatically from documents in Microsoft Word based on the heading styles of the text, while Rational RequisitePro (IBM Rational, 2010b) can import the requirements based on keywords. However, once the requirements have been imported, the relations have to be identified manually.
The evidence of the importance of tools to support traceability is the large number of commercial tools available in the market (Standish Group, 2003). Examples of requirement management tools are Rational DOORS, Rational RequisitePro, and CaliberRM. Most of commercial tools available to support traceability require the user define traceability relations manually or provide limited support to automatic creation of the traceability relations.
The task of creating traceability relations manually is costly, labour-intensive, and error-prone (Spanoudakis, et al., 2005), (De Lucia, et al., 2008), (Hayes, et al., 2004), (Lormans, et al., 2006). As a consequence, the cost to establish traceability relations can overcome its benefits. To address this problem several approaches have been proposed to support automatic creation of traceability relations. We classify these approaches in seven groups based on the techniques that they use to support generation of traceability relation, namely (i) formal approaches (Pinheiro, et al., 1996); ii) process oriented approaches (Castro-Herrera, et al., 2007), (Ravichandar, et al., 2007), (Pohl, 1996); iii) information retrieval approaches (Zou, et al., 2007), (Poshyvanyk, et al., 2007), (Duan, et al., 2007), (Kritzinger, et al., 2008), (Antoniol, et al., 2002), (Marcus, et al., 2003), (Zou, et al., 2006), (De Lucia, et al., 2007), (De Lucia, et al., 2008), (Lormans, et al., 2006), (Hayes, et al., 2007); iv) string matching approaches (Fiutem, et al., 1998), (Antoniol, et al., 2001); v) rule base approaches (Spanoudakis, et al., 2004), (Jirapanthong, et al., 2005), (Jirapanthong, et al., 2009), (Cysneiros, et al., 2003), (Cysneiros, et al., 2007a), (Cysneiros, et al., 2007b), (Cysneiros, et al., 2008) (Spanoudakis, et al., 2003), (Spanoudakis, et al., 2004), (Dagenais, et al., 2007), (Reiss, 2006), (Fletcher, et al., 2007), (Rilling, et al., 2007), (Kagdi, et al., 2007), (Alves-Foss, et al., 2002); vi) run-time approaches (Liu, et al., 2007), (Egyed, 2003), (Egyed, et al., 2005), (Grechanik, et al., 2007); vii) hypermedia and information integration approaches (Sherba, et al., 2003), (Sherba, 2005).
2.2.1 Formal Approaches
Formal approaches define software artefacts and their relations using a formal language, and by using axioms and regular expression to identify traceability relations between the artefacts (Pinheiro, et al., 1996). The main problem when using these formal approaches is the need to have training and knowledge of a specific formal language. To alleviate this problem TOOR approach (Pinheiro, et al., 1996) uses graphical interface where the specification of the project and the relation between the artefacts can be defined using a combination of graphical interface and forms.
TOOR (Pinheiro, et al., 1996) provides a semi-automatic approach to identify traceability relations and the process of capture traceability relations is divided in three different phases: definition, registration, and extraction. In the definition phase, the user defines the classes of objects and types of relations to a specific project using the FOOPS (Functional and Object-Oriented Programming Systems) formal language. In the registration phase, the objects are created by the selection of the appropriate class of the object from a graphical user interface and then a template form is filled. To create a traceability relation, the user has to select the type of traceability relation and fill a template form with the source and the target object from the relation. Alternatively, a graphical user interface can be used to select the source and target objects. Relations can also be created based on axioms defined in the definition phase. Finally, in the extraction phase, the axioms are computed and the traceability relations are displayed.
2.2.2 Process Oriented Approaches
Traceability is required by several standards for quality management and process improvements such as ISO 9001:2000 (ISO, 2010) and CMMI (Carnegie Mellon, 2010). Some approaches integrate traceability techniques with software process (Castro-Herrera, et al., 2007), (Ravichandar, et al., 2007), (Pohl, 1996). The main advantages of these approaches are that traceability relations are created as a product of the software development process and also enforce a practice in the software development process. The main disadvantages of these approaches are concerned with the difficulty to support tool integration and the lack of the definition of unified software development processes in practice.
Castro-Herrera et al. (Castro-Herrera, et al., 2007) propose to extend Basic Open Unified Process (OUP/Basic) to incorporating automated traceability adding new documents (i.e. work products) and tasks to the process. The authors highlight the importance and the need to integrate automated traceability techniques in the software development process to maximize the potential benefit of automated traceability. Castro-Herrera added three new work products: Requirements document, Trace Strategy and Granularity document, and Additional Traceable document and five new tasks: Create Trace Strategy, Create Additional Traceable documents, Set Up in Place Traceability, Run Automated Traceability Analysis, and Test and Verify Automated Traceability Results. Requirements documents describe the functional requirement using a textual description. Trace Strategy and Granularity document is used to describe different traces that the stakeholders wish to record. Additional Traceable document is a general template that can be instantiated to include other artefacts that are not defined in the process. Create trace strategy task defines the traceability strategy and granularity of the traceability relations. Create additional traceable documents task creates new artefacts that need to be traced based on some guidelines. Set up in place traceability task set up the infrastructure necessary for the use of traceability tools. Run automated traceability analysis task executes the automated traceability tool and provides feedback. Test and verify automated traceability results task analyses the effectiveness of automated traceability.
Pohl (Pohl, 1996) presents a process centred approach that automatically creates the traceability relations by recording the execution of actions during the software system development. The approach requires a method engineer to define the process and tools that are stored in a repository. Traceability relations are also identified automatically as part of the software process in (Ravichandar, et al., 2007). Ravichandar et al. uses Capability Engineering process to identify systems requirements from user needs. A graph (Function Decomposition) is created to link the decomposition between different levels of abstraction of user needs. Traceability relations can be inferred by the transformations from the user needs to the requirements represented in the Function Decomposition graph.
2.2.3 Information Retrieval Approaches
Several approaches use information retrieval techniques to identify traceability relations between software artefacts (Zou, et al., 2007), (Poshyvanyk, et al., 2007), (Duan, et al., 2007), (Kritzinger, et al., 2008), (Antoniol, et al., 2002), (Marcus, et al., 2003), (Zou, et al., 2006), (De Lucia, et al., 2007), (De Lucia, et al., 2008), (Lormans, et al., 2006), (Hayes, et al., 2007). Information retrieval techniques identify traceability relations based on the fact that artefacts with high textual similarities probably share concepts and, therefore, are likely candidates to have traceability relations. The main drawbacks of using information retrieval techniques to identify traceability relations is that standard information retrieval techniques do not take into consideration the structure of the artefacts. Moreover, in these approaches, a large percentage of candidate relations are identified (high recall), however the percentage of identified candidate relations that are correct is low (low precision) (Zou, et al., 2007). This increases the effort necessary to select from the set of candidate relations what relations are correct and what relations are invalid. Some approaches address this problem mainly by incorporating coupling techniques (Poshyvanyk, et al., 2007), clustering methods (Duan, et al., 2007), phrasing (Zou, et al., 2007), query term coverage (Zou, et al., 2007), relevance feedback, and attribute weighting (Kritzinger, et al., 2008) to the information retrieval technique.
Antonio et al. (Antoniol, et al., 2002) describe an approach to identify traceability relations between source code and natural language documentation based on information retrieval techniques using both a probabilistic method and vector space model. The approach uses comments and identifier names within the source code to find similarities in the documentation. The documents are ranked by relevance and based on these relevance the traceability relations are created.
Another approach named Latent Semantic Indexing (LSI) is described in (Marcus, et al., 2003). Marcus et al. argue that their approach achieves better results than the Antoniol’s approach. Their approach uses full parsing code and morphological analysis of the documentation Marcus et al. affirm that, in comparison with Antoniol’s approach (Marcus, et al., 2003), their approach requires less processing of the source code and documentation, and it is language, programming language, and paradigm independent.
Poshyvanyk et al. presents (Poshyvanyk, et al., 2007) an approach that combines LSI technique to recover traceability relations between software documentation (e.g. requirements and user manuals) and code with coupling measures techniques. The main goal of the approach is to address the common problem that the structure of the documentation (e.g. files, sections of documents, directories, etc.) does not reflect the structure of the source code.
The use of clustering to reduce effort required by the user to select candidate relations generated by information retrieval techniques is investigated by Duan et al. in (Duan, et al., 2007). Three algorithms have been implemented and tested using a web-based tool named Poirot. The tested algorithms are agglomerative hierarchical clustering, K-means, and bisecting divisive clustering and they were evaluated to capture traceability relations between requirements and other types of artefacts such as higher level business goal, design elements and code. Duan et al. affirm that the benefit of using the tool was that traceability relations are presented to software analysts as part of a meaningful group (cluster). This allows the analyst to take decision about accept or reject candidates relations based on similar artefacts. The tool also provides functionality that allows the user to accept or reject all candidate relations associated with a specific cluster.
Zou et al. declares in (Zou, et al., 2007) that most of the information retrieval techniques used to recover traceability relations are able to find a large percentage of correct relations (high recall), but in general produce a low level of precision. To address this problem Zou et al. propose the use of phrasing in (Zou, et al., 2006). Information retrieval techniques such as vector space model, probabilistic model, and latent semantic indexing build an index of terms used by the documents. Zou et al. approach uses phrases instead of single terms. They assume that artefacts that share common phrases are more inclined to be related than artefacts that only share common terms. The phrases are automatically generated by the tool making use of a part-of-speech tagger and searching the entire document. The approach is extended in (Zou, et al., 2007) to use query term coverage. Query term coverage takes into consideration the number of unique shared terms, while in standard information retrieval techniques the similarity is based on the total weight that is calculated based on the frequency in which the terms appear in a document.
Kritzinger et al. propose an approach (Kritzinger, et al., 2008) that uses latent semantic analysis to identify traceability relations between several software artefacts such as system requirements, use cases, collaboration and state diagrams, and source code in C#. The Kritzinger’s approach differs from others approaches that use latent semantic analysis mainly by incorporating user relevance feedback and attribute weighting to the technique. The user feedback helps to create cluster of documents that are relevant to previous queries. Attribute weighting takes into consideration the structure of a document (e.g. methods, fields and package declarations of a class) during term weighting phase to create a term-artefact matrix.
De Lucia et al. (De Lucia, et al., 2008) present ADAMS Re-Trace tool that also uses latent semantic indexing technique to identify traceability relations between artefacts of different types. ADAMS Re-Trace is integrated to Advanced Artefact Management System (ADAMS) that is a fine-grained artefact management system for Eclipse. Traceability relations between artefacts are created manually in ADAMS and used for impact analysis and change management tasks. ADAMS Re-Trace adds to ADAMS system the functionality to identify traceability relations semi-automatically.
Lormans and Deursen also apply Latent Semantic Indexing (LSI) technique in three different case studies to recover traceability relations between requirements and design documents and between requirements and test documents (Lormans, et al., 2006). The cases studies are different in size and scope varying from small to complex, and from academic to industrial. The authors highlight the importance of having a traceability model as part of the traceability approach and classify them in static and dynamic models. In a dynamic model, the types of relations can change according to specific project needs. A static approach is used to define traceability relations types between requirements and design and requirements and test documents. The approach uses a Text to Matrix Generator tool to pre-process documents (e.g. lexical analysis, stop word elimination, stemming, index-term selection and index construction) used as input by the Trace Reconstructor (TR) tool that generates a term-by-document matrix. TR tool selects candidates relations that are greater than a constant value and that have a similarity degree greater than a percentage value calculated based on the total of similarity measures.
Hayes et al. describes in (Hayes, et al., 2007) a front-end for RETRO (Requirements Tracing On target) tool that can be used with different information retrieval techniques. The authors argue that the user satisfaction with a traceability tool depends more on the functionalities provided by the front-end than the information retrieval method used. RETRO has been used in several projects by NASA Independent Verification and Validation (IV & V) Program.
2.2.4 String Matching Approaches
String matching approaches identify traceability relations based on the name of the elements and its properties (Fiutem, et al., 1998), (Antoniol, et al., 2001). Regular expressions, edit distance, and maximum match algorithms (Fiutem, et al., 1998), (Antoniol, et al., 1999) are used in the process of naming comparison. The main drawback of string matching approaches is that they rely on the assumption that artefacts are named consistently through all documentation of a system.
In (Antoniol, et al., 2001), a method to identify traceability relations between object-oriented design and code in C++ is proposed. The method first translates the C++ source code into an intermediate representation and then identifies similarities between the pair of elements from design and code. The similarity comparison is based on matching class names, attributes and their types, and method signatures.
Fiutem et al. (Fiutem, et al., 1998) presents an approach that identifies traceability relations between design elements in OMT and C++ source code elements. The main goal of the approach is to ensure consistency among software artefacts. The approach uses the Abstract Object Language (AOL) to represent design elements in OMT and C++ code. The approach finds traceability relations between elements based on the name of classes and its properties. The approach uses regular expressions and an edit distance algorithm to match the names. The approach provides a visualisation mechanism that show common parts and missing information. Antoniol et al. also use a similar approach in (Antoniol, et al., 1999) to establish traceability relations between different versions of a system implemented using C++ code. The approach provides a visualisation mechanism that compares the difference between two versions. It contains a release view and a class level view. The release view represents graphically the degree of similarity between two versions. The class level view is more detailed and show additions/deletions and modifications of attribute and a file summary.
Our work is similar to string matching approaches in that we also use name of entities and its properties to identify traceability relations. However, our work differs from the string matching approaches on that it uses other information such as traceability relations, to identify similarities between two elements. Moreover, our work uses synonyms to compare similarities between names and the completeness checking supported by our work helps to identify elements that have been named inconsistently (i.e. discrepancy of names between elements), as described in Chapter 4.
2.2.5 Rule Based Approaches
Rule based approaches create traceability relations between elements when a certain condition is satisfied. Examples of rule-based approaches to support generation of traceability relations are (Spanoudakis, et al., 2004), (Jirapanthong, et al., 2005), (Jirapanthong, et al., 2009), (Cysneiros, et al., 2003), (Cysneiros, et al., 2007a), (Cysneiros, et al., 2007b), (Cysneiros, et al., 2008) (Spanoudakis, et al., 2003), (Spanoudakis, et al., 2004), (Dagenais, et al., 2007), (Reiss, 2006), (Fletcher, et al., 2007), (Rilling, et al., 2007), (Kagdi, et al., 2007), (Alves-Foss, et al., 2002). The main challenge of rule based approaches is make people to understand that is necessary to spend some time a prior to know what need to be traceable and in what ways. Most of the time, it is difficult to identify what is need to be traceable and in what ways and then create rules to identify these relations. To alleviate this problem some traceability approaches create traceability models and pre-established rules to identify traceability relations (Spanoudakis, et al., 2004), (Jirapanthong, et al., 2005), (Jirapanthong, et al., 2009), (Cysneiros, et al., 2007).
Spanoudakis et al. describe in (Spanoudakis, et al., 2004), a rule based approach to identify traceability relations between requirements specifications using structured text and use case specifications using Cockburn template (Cockburn, 2000), and between requirements and class diagrams. A prototype tool was developed to generate automatically traceability relations. The tool receives as input requirement and use case specifications and a set of rules expressed in XML. The requirements and use cases specifications are pre-processed by CLAWS part-of-speech tagged tool in order to create a tagged representation of the documents indicating the grammatical role of each word in the text. The documents used by the approach are represented in XML or translated into XML. A machine learning algorithm has been presented in (Spanoudakis, et al., 2003) to improve recall of the approach. The algorithm creates new rules that generalize the syntactic patterns of the original rules based on examples of undetected traceability relations by the user. The traceability rules are defined using XQuery.