Automatic Classification of Non-Functional Requirements From App Store Reviews. Reviewing and Applying Approaches From Current Research


Bachelor Thesis, 2021

33 Pages, Grade: 1,3


Excerpt


Table of Contents

List of Tables

List of Abbreviations

1 Introduction
1.1 Objectives of the Thesis
1.2 Structure of the Thesis

2 Theoretical Background
2.1 Introduction to Requirements Engineering
2.2 Distinction Between Functional and Non-Functional Requirements

3 Literature Review
3.1 Methodology
3.2 Different Taxonomies for Non-Functional Requirements
3.3 App Store Reviews Containing Non-Functional Requirements
3.4 Different Machine Learning Algorithms to Classify Non-Functional Requirements

4 Applying the Support Vector Machine Algorithm to Classify an Existing Dataset

5 Results

6 Discussion
6.1 Contributions to Theory
6.2 Contributions to Practice
6.3 Limitations and Future Work

7 Conclusion

Reference List

Appendix

List of Tables

Table 1. Confusion Matrix

Table 2. Performance Measure Results

List of Abbreviations

AUR-BoW Augmented User Reviews

BNB Binarized Naïve Bayes

BoW Bag-of-Words

BR Binary Relevance

CHI² Chi-Squared

FR Functional Requirement

k -NN k -Nearest-Neighbor

ML Machine Learning

NB Naïve Bayes

NFR Non-Functional Requirement

NLTK Natural Language Toolkit

RE Requirements Engineering

SMO Sequential Minimal Optimization

SVC Support Vector Classifier

SVM Support Vector Machine

TF-IDF Term Frequency - Inverse Document Frequency

1 Introduction

Requirements Engineering (RE) is a crucial element of the software development process (Baker, Deng, Chakraborty, & Dehlinger, 2019). It contributes directly to the success or failure of a software project (Baker et al., 2019). Requirements are commonly divided into functional requirements (FRs) and non-functional requirements (NFRs) (Glinz, 2007). The functional aspects are often easier to formulate; however, the non-functional elements contribute to user satisfaction the most (Khatter & Kalia, 2013). This is especially true for mobile applications (Maalej, Nayebi, Johann, & Ruhe, 2015). Identifying NFRs for apps in an uncomplicated way would thus hugely benefit app developers in their work. Considering app stores have the functionality to collect user feedback, it is reasonable to use this data (Maalej et al., 2015). To this date, such a solution to automatically extract NFRs from app store reviews and classify them into different types is still missing in practice (Maalej et al., 2015).

1.1 Objectives of the Thesis

This thesis aims to explore the general nature of NFRs, their specific characteristics when appearing in app store reviews, and the current state of research in developing automated solutions to classify NFRs from app store reviews. In addition, this thesis introduces one possible machine learning (ML) algorithm that learns from a dataset of manually labeled reviews. It intends to assess its performance and, based on that, make recommendations for further research.

1.2 Structure of the Thesis

To reach these objectives, the thesis follows a specific structure. The first part is a literature review. This review, first, explores and discusses different taxonomies of NFRs. Second, it associates these NFRs with app store reviews and examines their characteristic nature. The third part of the literature review analyzes several past approaches to automatically classify NFRs from app store reviews using ML. The second big part of the thesis is the development of a specific algorithm and its use on a given dataset. Its performance is investigated in the results section, and general limitations of the thesis and an outlook to future research are given at the end.

2 Theoretical Background

This chapter lays the theoretical background for requirements engineering and precisely the distinction between functional and non-functional requirements as this differentiation is an important underlying concept of the thesis. To look at different classification schemes and types of non-functional requirements, one must understand what they are, why they are different from functional requirements, and why they are so crucial in RE.

2.1 Introduction to Requirements Engineering

RE is a part of software engineering (Zave, 1997). As Zave (1997, p. 1) described it, RE is "concerned with the real-world goals for functions of and constraints on software systems". RE itself is the process of collecting, understanding, and specifying these requirements, which are commonly formulated using natural language (Sommerville, 2016; Baker et al., 2019). To name it differently, RE "identifies, documents, negotiates, and manages the desired properties and constraints of software-intensive systems, the goals to accomplish in a software project, and the assumptions about the environment" (Davis, 2003, as cited in, Maalej et al., 2015, p.1). From these definitions, it already becomes clear that there is no fixed definition of RE in literature and that the scope of what is described as RE can also vary greatly. In the development of a software system, it is considered the first activity (Sommerville, 2016; Khatter & Kalia, 2013). However, as users' requirements change frequently, RE is a continuous process (Khatter & Kalia, 2013).

Maalej et al. (2015) projected several trends for the field of RE. According to them, the RE decision-making process is usually stakeholder-focused or based upon rationale schemes (Maalej et al., 2015). The authors predicted it to shift towards being more mass-driven and user-centered alongside the rise of collecting large amounts of user data, such as review and feedback data (Maalej et al., 2015). As this thesis deals with a dataset of app store reviews to classify requirements (i.e., non-functional ones), it is apparent that at least the latter is already taking place. Furthermore, Maalej et al. (2015) also stated that priorities, targets, and dependencies change rapidly, making real-time RE decision-making important. An important thing to note is that RE is such a crucial task because the success of software projects significantly depends on the precise formulation and implementation of requirements (Baker et al., 2019). Customer dissatisfaction can often be traced back to an issue from the RE process (Baker et al., 2019).

2.2 Distinction Between Functional and Non-Functional Requirements

Requirements are commonly categorized into functional requirements and non-functional requirements (Khatter & Kalia, 2013; Eckhardt, Vogelsang, & Fernández, 2016; Glinz, 2007). However, there are also other distinctions across literature. For example, Groen, Kopczyńska, Hauer, Krafft, and Doerr (2017) divided requirements into three classes: functional requirements, constraints, and non-functional requirements. Although the distinction between FRs and NFRs is widely accepted, there is a discussion in research as, whether NFRs are really non-functional or rather functional (Eckhardt et al., 2016; Bajpai & Gorthi, 2012). Eckhardt et al. (2016) explored this question and concluded that many so-called NFRs are essentially not non-functional. In this thesis, however, the common distinction between FRs and NFRs is adopted.

FRs define the system's functionality (Kurtanović & Maalej, 2017; Glinz, 2007), often described as what the system does or should do (Bajpai & Gorthi, 2012; Eckhardt et al., 2016; Lu & Liang, 2017). According to Khatter and Kalia (2013, p. 1), "a functional requirement allows the user or customer to perform a function of software or its components". Glinz (2007) inferred that standard definitions of FRs often follow two threads: the emphasis on functions and the emphasis on the behavior of a system.

A common way of describing NFRs is to consider them as constraints or restrictions and properties or characteristics of a software system (Kurtanović & Maalej, 2017; Khatter & Kalia, 2013; Glinz, 2007). They are often associated with the terms “qualities” or “quality requirements” (Groen et al., 2017; Khatter & Kalia, 2013; Glinz, 2007). However, as Glinz (2007) stated, all requirements, including the functional ones, can be seen as qualities. Hence, every definition of NFRs, which includes the term “quality”, restricts its meaning to a set of specific qualities other than functionality (Glinz, 2007). Jha and Mahmoud (2019, p. 1) described NFRs as "a set of high-level quality constraints that a software system should exhibit". Khatter and Kalia (2013, p. 1) defined an NFR as "a constraint or a restriction on the product that must be considered during the design of the solution". Overall, NFRs answer how the system does something (Eckhardt et al., 2016; Bajpai & Gorthi, 2012).

One reason why NFRs have become very prevalent in research in recent years is the fact that their satisfaction is essential for determining the success or failure of systems and achieving user satisfaction (Khatter & Kalia, 2013; Bajpai & Gorthi, 2012; Jha & Mahmoud, 2019; Glinz, 2007). Yet, they are mainly dealt with very late in the development process or neglected completely, which often leads to failure of the system or an increase in the costs of a system (Khatter & Kalia, 2013; Bajpai & Gorthi, 2012; Mairiza, Zowghi, & Nurmuliani, 2010; Younas, Jawawi, Ghani, & Shah, 2019). It is crucial to fully integrate NFRs into all phases of the software development process, including elicitation, representation, and integration (Khatter & Kalia, 2013). The described problem arises from several characteristics of NFRs. Their nature is complex, vague, subjective, and not uniform (Khatter & Kalia, 2013; Bajpai & Gorthi, 2012; Younas et al., 2019). Furthermore, there is a great diversity of NFRs and often no quantitative measures to examine or validate their satisfaction (Khatter & Kalia, 2013; Eckhardt et al., 2016; Bajpai & Gorthi, 2012). Additionally, they can only be satisfied partially by functional means, e.g., user-friendly GUI (Graphical User Interface) elements to satisfy usability (Jha & Mahmoud, 2019). Due to the lack of a formal definition or specification, developing classification schemes and modeling approaches has been the goal of many research projects (Khatter & Kalia, 2013). This thesis deals with a way to automatically classify a set of NFRs into a given scheme rather than developing one. Relevance of such efforts arises from the fact that due to the described vague nature of NFRs, the missing of such an automatization has often led to their negligence as manual classification is a very tedious and time-consuming process (Bajpai & Gorthi, 2012; Kobilica, Ayub, & Hassine, 2020).

3 Literature Review

This chapter conducts a review of existing academic literature to answer the theoretical research questions described in the introduction. The first section explains the methodology of how this review was undertaken. The second section reviews several approaches to define a taxonomy for NFRs. It aims at clarifying the vague term of NFRs and reasoning the taxonomy applied in this thesis. The third part looks at app store reviews specifically to understand their characteristics and the importance of using them in the RE process. The fourth part then examines several machine learning approaches to classify NFRs from app store reviews and other textual documents to lay the basis for applying such an ML algorithm in the fourth chapter.

3.1 Methodology

Several databases were browsed through, such as the ACM Digital Library or the IEEE Xplore Digital Library, to conduct the literature review. The initial starting point was the paper of Jha and Mahmoud (2019), "Mining non-functional requirements from App store reviews". More primary and secondary literature was obtained by searching for keywords in these databases and going through the reference lists of found papers. To keep a certain level of consistency within this thesis, the literature review primarily covers topics either discussed in or necessary for the application part. Specifically, these are different taxonomies of NFRs, app store reviews containing NFRs, and different ML algorithms to classify NFRs.

3.2 Different Taxonomies for Non-Functional Requirements

This section reviews the different classes and subclasses used to categorize NFRs that are prevalent in academic literature. Several research papers are being assessed, concluding with defining the taxonomy that is adopted in this thesis.

In their research review, Bajpai and Gorthi (2012) defined some of the most common categories of NFRs, which are performance, reliability, availability, compatibility, usability, maintainability, interoperability, recovery, robustness, and resilience. Their definitions of the categories are echoed here in short. According to Bajpai and Gorthi (2012), performance concerns the response time of the system as well as processing, query, and reporting time. Reliability deals with the average time between failures and the system's average time to recover (Bajpai & Gorthi, 2012). Availability refers to the operating time and when the system is available (Bajpai & Gorthi, 2012). Compatibility is the easiness of a system to work with shared applications (Bajpai & Gorthi, 2012). Usability describes the user-friendliness of a system commonly also known as “look and feel” (Bajpai & Gorthi, 2012). Maintainability refers to the easiness of modifying or changing a system, e.g., to add new functionalities or to fix bugs (Bajpai & Gorthi, 2012). Interoperability describes the capability of diverse systems to work together without restrictions (Bajpai & Gorthi, 2012). Recovery is the system's ability to recover and resume after damage (Bajpai & Gorthi, 2012). Robustness concerns whether a system can deal with errors during runtime or whether an algorithm can keep operating despite irregularities (Bajpai & Gorthi, 2012). Lastly, resilience is the ability to provide and sustain an adequate level of service when shortcomings occur (Bajpai & Gorthi, 2012).

Mairiza et al. (2010) did an extensive analysis of existing literature concerning the terminology of NFRs, different types, and essential NFRs in various systems and application domains. Their investigation of 182 sources of information led to the identification of 252 types of NFRs, where 114 relate to qualities or quality attributes. The top five most frequent NFRs they found are performance, reliability, usability, security, and maintainability. A critical aspect, according to them, is that it is often unclear whether something is considered a separate type of NFR or just an attribute of another one and that these distinctions vary considerably across literature. This fact also shows the general vagueness and lack of a standard definition of NFRs stated in chapter two. Mairiza et al. (2010) defined performance as a software product's ability to perform adequately relative to the resources required to perform fully. According to them, reliability is a software product's ability to function without failure and sustain a certain performance level. Usability defines end-user interactions, security shall avoid unauthorized access, and maintainability describes whether a software product can be modified (Mairiza et al., 2010).

As already mentioned, Eckhardt et al. (2016) showed with their analysis that most NFRs are not non-functional but rather represent behavioral aspects of a system. Their research was based on NFRs from industrial projects. The five most frequently used classes of NFRs they found are security, reliability, usability, efficiency, and portability. The authors did, however, not define them in their study.

Kurtanović and Maalej (2017) developed a supervised ML approach to identify various types of NFRs. Their dataset consisted of FRs and eleven classes of NFRs: availability, maintainability, performance, security, portability, usability, fault tolerance, operational, scalability, legal, and look & feel. The authors did not explain from where the taxonomy used in their dataset originated.

Abad, Karras, Ghazi, Glinz, Ruhe, and Schneider (2017) evaluated several ML approaches and their performance to classify NFRs. They used a fixed taxonomy, explicitly consisting of eleven subcategories. They defined ten categories for quality requirements and one constraint category. The former are availability, maintainability, look & feel, performance, operability, scalability, portability, usability, fault tolerance, and security, and the latter is legal & licensing (Abad et al., 2017).

In their study of automatically classifying app store reviews into NFRs, Lu and Liang (2017) used the four classes usability, reliability, portability, and performance. They described usability as the feasible extent of using a system efficiently and effectively to reach a specific goal in a specified setting. Reliability, according to them, is the extent to which a system executes particular functions for a distinct period and under fixed conditions. Portability describes whether a system can be moved from one hardware, software, or different environment to another and, if so, effectively and efficiently (Lu & Liang, 2017). Lastly, performance is the degree to which a function must be carried out under some specified conditions (Lu & Liang, 2017).

To develop their ML classification approach Jha and Mahmoud (2019) applied a taxonomy for NFRs introduced by Kurtanović and Maalej (2017) and consisting of the following categories: dependability, usability, performance, and supportability. Kurtanović and Maalej (2017), however, used the term reliability instead of dependability. Jha and Mahmoud (2019) related their definitions to the research topic of mobile applications. According to them, dependability concerns trustworthiness issues like the security, availability, or reliability of an app. Usability deals with the user interface, such as the layout, attractiveness, and ease of use, e.g., the understandability or accessibility (Jha & Mahmoud, 2019). Performance relates to issues such as resource consumption, scalability, or response time (Jha & Mahmoud, 2019). Supportability concerns the question of whether an app can operate across several devices and platforms and deals with issues of maintenance or updates like modifiability or installability (Jha & Mahmoud, 2019). They presented a table with many more attributes of the main four classes, which, again, are often separate categories in other taxonomies. For example, Jha and Mahmoud (2019) considered efficiency an attribute of performance which Eckhardt et al. (2016) recognized as a separate class. Jha and Mahmoud (2019) inferred that usability is the NFR most talked about in app reviews.

To conclude this section, the six most frequent categories of NFRs in the papers reviewed here are as follows: (1) usability, (2) performance, (3) reliability, (4) maintainability, (5) security, and (6) portability. These terms are more general and thus more frequently used than other smaller categories such as recovery or robustness. Some terms also refer to the same concepts. For example, usability is often described with the term “look & feel”, where usability is the more general term of the two. Fault tolerance could also be associated with reliability. In addition, the more repeatedly used classes of NFRs could also be an indicator of what users deem as the more important ones. Concepts such as scalability or compatibility would thus be of minor importance.

The dataset used in this thesis adopts the taxonomy of Jha and Mahmoud (2019) with the categories dependability, usability, performance, and supportability. As their paper concerns an ML approach to categorize NFRs from app store reviews, those four classes of NFRs are best fitting to apply them to such review texts. From the above-stated list of the six most frequent categories, maintainability, for example, is left out. According to Lu and Liang (2017), maintainability concerns the internal quality of a system, whereas user reviews deal with external quality.

3.3 App Store Reviews Containing Non-Functional Requirements

User feedback data has been getting more attention, especially with the rise of the availability of mobile devices like smartphones or tablets (Maalej et al., 2015). Alongside this rise, the need for software to support these systems has become apparent, i.e., mobile applications (Jha & Mahmoud, 2019; Kilani, Tailakh, & Hanani, 2019). Mobile apps, nowadays, play a massive role in people's lives as they provide services for various domains and social groups and aid them in plenty of their daily activities (Kilani et al., 2019; Jha & Mahmoud, 2019). App stores, such as the Apple App Store or Google Play, have emerged as marketplaces (Maalej et al., 2015). These stores provide the possibility to express one's opinion about an app after downloading and using it (Jha & Mahmoud, 2019). This can be in the form of text feedback or other meta-data such as star ratings (Jha & Mahmoud, 2019; Lu & Liang, 2017). This feedback data can then be used for marketing purposes like measuring customer satisfaction (Kilani et al., 2019). However, reviews contain even more valuable information when analyzed in detail (Lu & Liang, 2017). Software developers can collect requirements straight from the users (Kilani et al., 2019). This can help them improve their software to meet users' expectations and is also essential for retaining current users and attracting new ones (Lu & Liang, 2017). While conventional RE mainly involves users through workshops, interviews, or focus groups, using tremendous amounts of such feedback data is a shift towards a more mass-driven and especially user-centered RE (Maalej et al., 2015).

Content of app store reviews

According to Jha and Mahmoud (2019), app store reviews contain technical feedback such as bug reports, software maintenance requests, and functional feature requests, but also NFRs. Only little attention has been paid to these NFRs as of yet (Jha & Mahmoud, 2019). Their results also show that different types of NFRs are raised in different app categories (Jha & Mahmoud, 2019). Kilani et al. (2019) stated that app store reviews contain the users' daily experience with the software expressed in complaints and suggestions. This might include the report of bugs or security issues, performance feedback, or the demand for new features such as user interface improvements (Kilani et al., 2019). Security issues or performance feedback are typical NFRs as described before. Maalej et al. (2015) described similar content of user reviews. First, bug reports, which, according to them, are descriptions of issues of an app that need correction. These can be crashes, performance problems, or erroneous behavior (Maalej et al., 2015). Second, feature requests can be contained in user reviews (Maalej et al., 2015). Users ask for lacking functionality or content, which might be included in other apps (Maalej et al., 2015). Furthermore, users might also describe specific experiences, e.g., the app’s helpfulness in a given situation (Maalej et al., 2015). These experiences can also contain descriptions of unusual problems, which might inspire developers to design new features (Maalej et al., 2015). App store reviews might moreover contain users’ ideas on how to improve the app (Maalej et al., 2015). Finally, user reviews can involve numeric ratings, such as stars with a short comment (Maalej et al., 2015). However, these types of feedback are less informative as they often only contain approval or criticism (Maalej et al., 2015).

[...]

Excerpt out of 33 pages

Details

Title
Automatic Classification of Non-Functional Requirements From App Store Reviews. Reviewing and Applying Approaches From Current Research
College
University of Mannheim
Grade
1,3
Author
Year
2021
Pages
33
Catalog Number
V1130435
ISBN (eBook)
9783346489913
ISBN (Book)
9783346489920
Language
English
Keywords
Requirements Engineering, App Store Reviews, Non-Functional Requirements, Requirements, Software Development, Machine Learning, RE, NFRs, ML
Quote paper
Esther Krystek (Author), 2021, Automatic Classification of Non-Functional Requirements From App Store Reviews. Reviewing and Applying Approaches From Current Research, Munich, GRIN Verlag, https://www.grin.com/document/1130435

Comments

  • No comments yet.
Look inside the ebook
Title: Automatic Classification of Non-Functional Requirements From App Store Reviews. Reviewing and Applying Approaches From Current Research



Upload papers

Your term paper / thesis:

- Publication as eBook and book
- High royalties for the sales
- Completely free - with ISBN
- It only takes five minutes
- Every paper finds readers

Publish now - it's free