Implementing a type system for secure information-flow. Potential security risks


Master's Thesis, 2019

90 Pages, Grade: 77


Excerpt

Table of Contents

Table of Tables

Table of Figures

Table of Listings

1 Introduction and Objectives
1.1 Introduction
1.2 Background
1.3 Objectives
1.4 Beneficiaries and personal objectives
1.5 Work plan
1.6 Report structure

2 Context
2.1 Method of literature review
2.2 Compiler functionality
2.2.1 Overview and difference to interpreters
2.2.2 Lexical and syntactic analysis
2.2.3 Semantic analysis
2.3 Terminology of Type
2.3.1 Variables and types
2.3.2 Principal type and principal typing
2.4 Type systems
2.4.1 Important program properties related to type systems
2.4.2 Type rules
2.4.3 Dynamic checking
2.4.4 Subtyping
2.4.5 Flow-sensitive typing
2.4.6 Gradual typing
2.5 Information flow security problem
2.5.1 Security goals
2.5.2 Issues of standard solutions
2.5.3 Security classes
2.5.4 Dynamic and static enforcement mechanisms
2.5.5 Non-interference of information flows
2.5.6 Security type systems and their features
2.5.7 Covert channels
2.5.8 Alternatives to non-interference

3 Methods
3.1 Development methodology
3.2 Development tools
3.3 Software objects that were provided
3.4 Informal description of the While language
3.5 Verification and validation

4 Results
4.1 Selection of type system
4.2 Evolution 1: AST printer and harnessing the visitor pattern
4.3 Evolution 2: AST printer for procedures
4.4 Evolution 3: Dependency map implementation and unit testing
4.5 Evolution 4: Security type system implementation
4.5.1 Flow Principal
4.5.2 Termination Principal
4.6 Evolution 5: Procedure typings
4.6.1 Input parameter dependencies
4.6.2 Output parameter dependencies
4.6.3 Procedure body dependencies
4.6.4 Final typing
4.6.5 Recursive procedure calls
4.7 Evolution 6: Fixed type-checking algorithm
4.8 Evolution 7: Test programs and performance

5 Discussion
5.1 Verification, validation and complexity
5.2 Architectural limitations and alternative designs
5.3 Discussion with attention to real-life scenarios
5.4 Feasibility or impracticability of a security type system for SAP

6 Evaluation, Reflections, and Conclusions
6.1 Project Evaluation
6.2 Further work

References

Appendix A: Project proposal

Appendix B: Mind map for the literature review

Appendix C: UML class diagram

Appendix D: Sources of test programs

Appendix E: Selection of Java classes

Abstract

Standard security practices, such as access controls, insufficiently assure secure end-to-end behaviour of an application. Any program flaw, no matter how small or big, poses a potential security risk. Static information flow analysis checks a program for confidential information leaks into public data containers at compile-time. This design-and-build project's aim is the prototypical implementation of a security type system for a simple demonstrative language to prevent programs leaking confidential information effectively. The project is based on existing research concerning security type systems as a means of enforcing information flow policies in a program. The results are discussed not only in terms of validity but also considering the feasibility of a similar security type system for SAP enterprise resource management systems.

Keywords: information flow analysis, security type system, end-to-end security, confidentiality, non-interference

Table of Tables

Table 1: Provided software objects

Table 2: Categorised security type systems from the literature

Table 3: Interface description of the dependency map

Table 4: Overview of test programs and their results

Table of Figures

Figure 1: Two parse trees for an ambiguous program

Figure 2: Join-semilattice (Denning, 1976, p. 240, Figure 1)

Table of Listings

Listing 1: Unpredictable property (pseudocode)

Listing 2: Safe but rejected Java excerpt

Listing 3: Implicit information flow (pseudocode)

Listing 4: Program only secure in a flow-sensitive analysis

Listing 5: Test program for the abstract syntax tree

Listing 6: AST printer output

Listing 7: Procedure with overlapping variable names

Listing 8: AST for a procedure call

Listing 9: Java implementation of Composition and Closure

Listing 10: Implementation of the Assign type rule

Listing 11: Implementation of the Seq rule

Listing 12: Caching of procedure declarations

Listing 13: Calculation of input parameter dependencies

Listing 14: Calculation of output parameter dependencies

Listing 15: Final type composition of procedures

Listing 16: Example test program

Listing 17: Runtimes of the type-checker for the test program

Listing 18: Console output when launching Retest

Listing 19: ABAP program containing an implicit information flow

Introduction and Objectives

1.1 Introduction

Society relies heavily on software-intensive systems in all facets of life. Information is automatically processed in automobiles, phones, fridges, and countless web servers. Much of that information is personal data and can distinguish an individual's identity, such as their name, biometric records, or email addresses. Intellectual property and confidential information are furthermore handled in mission-critical military, governmental, medical and business applications. Therefore, building trust in the handling of data by systems is a crucial aspect of software architecture design.

No software system is entirely secure and invulnerable. Unintentional defects and accidental misuse commonly lead to failures and even catastrophic events. Moreover, malicious actors may get access to insufficiently protected information or may deliberately damage data integrity. There is a ubiquitous trade-off between added benefits brought by increased connectivity and a higher risk of harm. Considering the amount of valuable personal data circulating on social media platforms and customer relationship management systems, information flows unknown to users often for the sake of targeted advertising are a relevant problem.

This design-and-build project's aim is the prototypical implementation of a security type system for a simple While language to prevent programs leaking confidential information effectively. The project is based on existing research concerning security type systems as a means of enforcing information flow policies in a program. Standard security practices, such as access controls, insufficiently assure secure end-to-end behaviour of an application. Information flow analysis checks a program for leaks of confidential information into public data containers dynamically or statically. The type-system-based approach in this project focusses on static compile-time enforcement.

1.2 Background

When software systems become more complex, data protection may become too difficult to handle with standard methods. Useful mechanisms include firewalls, encryption, and anti-malware software. More importantly, access controls regulate which applications can read specific data. However, none of these ways provides end-to-end protection of confidential information, since they do not check the internal information flow for improper information leakages. Therefore, to design a system with end-to-end security with respect to a given confidentiality policy, language­based techniques checking information flows should be used in addition to standard security mechanisms. These techniques formalise and enforce the non-interference of private data with public data.

Different forms of information flow analysis exist that address different security goals. Taint analysis identifies untrustworthy sources, such as user input streams, that possibly contain malware or confidential data. If a tainted object from this source flows into a critical sink according to the security policy, e.g., a database, an illegal flow will be reported. Taint analysis enforces integrity by avoiding harm to valuable assets.

Integrity is one of the fundamental security properties a system can hold, in addition to availability, authentication, accountability, auditability, and confidentiality. This project addresses the latter, which is “the ability of a system to ensure that an asset is viewed only by authorised parties” (Pfleeger, Pfleeger and Margulies, 2015, p. 35). Ensuring this in a complex system is limited by implicit violations stemming from branch decisions or other programming language constructs. Moreover, depending on the access mode, information may be declassified by attackers by analysing the processor's power consumption, the execution time, the exhaustion of available resources, and the probability distribution of observable data. Nevertheless, an overly pessimistic view of a system's security properties is impractical as well: Taking the example of a login process, the system necessarily declassifies the confidential piece of information “username and password do (not) match.”

Security type systems are useful for preventing improper information flows with regards to confidentiality. One of the main advantages of the type system approach is the possibility of rigorously proving its correctness through formalisation. It has been shown by Denning (1976) that information flow security can be achieved by attaching security labels to values and using a label lattice to verify the non-interference of high and low-security levels.

Dynamic information flow analysis has several drawbacks: It increases execution overhead and, more importantly, does only take into account a single execution path. In contrast, type-checking as a compile-time activity statically looks at all executable paths to find security policy violations. The downside of static checks is that the security policy and label lattice must be known when compiling the program and cannot change at runtime. Yet, performing the information flow security checks at the semantic analysis stage of compilation allows for sound tracking of implicit flows prior to execution.

1.3 Objectives

The project has the following objectives:

1. To design, implement and systematically demonstrate a chosen type system considering reliability, performance, and scalability.

- Deemed to have been met if the implemented prototype passes verification and validation.

2. To determine the limitations of the implementation and alternative architectural designs.

- Deemed to have been met if the limitations are determined as part of the discussion and alternative architectural designs are provided.

3. To evaluate the extent to which the developed prototype scales up to real-life scenarios.

- Deemed to have been met if the project results are discussed with attention to real- life scenarios.

4. To investigate the feasibility of a similar security type system for SAP systems.

- Deemed to have been met if the feasibility or impracticability is justified.

SAP is a leading global manufacturer of enterprise software for managing business operations and customer relations. ERP (enterprise resource management systems play a central role in most companies worldwide and must protect confidential customer data across different processes and departments. No product for static information flow analysis in SAP ERP systems seems to exist, even though there are use cases. Only Grabowski, Hofmann and Li (2012) use a type-system-based approach to implement programming guidelines at SAP to prevent code injection vulnerabilities. The fourth objective addresses this apparent gap and is also based on personal interest in the area.

1.4 Beneficiaries and personal objectives

The primary beneficiary is the author who gains insight into information flow analysis, security aspects such as covert channels, language grammars and compilers which are otherwise not covered in the programme. A big part of the undertaking is undoubtedly the design and creation process, which aims at maximising the learning experience by focussing on the creation of artefacts (“learning via making”, Oates, 2006, p. 111). Also, the project contributes a systematically implemented type system and the lessons learned may inform other developers considering implementing a similar system.

1.5 Work plan

After the current state of knowledge has been outlined in the literature review, a security type system is selected from existing research according to a set of criteria. The objectives are met using an evolutionary prototyping approach where design, build, and test activities are arranged in phases, each resulting in an executable version of the product. Test programs, as well as unit tests, are used to verify the correctness of the type-checking algorithm. The results of the project are discussed subsequently for the research objectives.

1.6 Report structure

Chapter 2: Context sets the project in context with existing research and clarifies the terminology used in the rest of the report.

Chapter 3: Methods describes the methods which were necessary to achieve the objectives in detail and justifies their choice. In addition, the development tools and the While programming language grammar for the demonstrative test programs are presented.

Chapter 4: Results begins with a discussion of the chosen type system. Following that, the prototyping evolutions are discussed one by one, each with their challenges and limitations.

Chapter 5: Discussion evaluates the project outcomes, highlights the most interesting details, and assesses if the objectives have been met.

Chapter 6: Evaluation, Reflections, and Conclusions includes reflections about the project as a whole and which parts I would execute differently if I were to start again. The conclusions gained from this are examined to propose ideas for future work.

2 Context

2.1 Method of literature review

The principal objective of this literature review is to explore important developments in the research field of security type systems. As this project is based on existing theoretical achievements, it is important to relate it to the work of others in a selective and ordered way. Using language­based mechanisms to enforce information flow policies has a long history-program certification by Denning (1976) is one of the most important early works. Therefore, alternative approaches were developed that need to be critically synthesised. Only relevant articles have been selected by means of searching academic databases using keywords and cross-references.

Before exploring type systems, the term “type” is defined, since it is similarly ambiguous as the term “object” in programming languages. Moreover, the literature uses terms that appear very similar, e.g. type, typing, and others that are poorly defined, such as strong and weak typing-these terms are avoided altogether. The terminology in this report is mostly based on Cardelli (1996).

Security type systems with varying levels of formality, precision, and security goals have been developed. To be able to choose one, some aspects of type systems need to be understood and put in context with language theory first. Different ways to categorise security type systems are illustrated in order to provide sensible selection criteria for the next chapter. A mind map was drafted with the free software XMind 8 1 to organise the papers around information flow research, see Appendix B.

2.2 Compiler functionality

In order to facilitate understanding, it is necessary to make the terminology clear for the rest of the report. Programming languages provide an interface for human users to write sophisticated programs that can be translated into machine code by a compiler and executed by the computer's processing unit. Imperative languages like the one used in this project define sequences of commands, while declarative languages such as SQL omit the control flow specification. The description of a programming language can be separated into the intertwined domains syntax and semantics, which both contain many non-trivial challenges. Hundreds of programming languages exist that aim to solve different problems and whose designers thus made different design decisions. For instance, the C language has many unsafe features in exchange for maximum performance.

2.2.1 Overview and difference to interpreters

A compilation is a transformation of a source language to a target language. In the case of software, the source text is a high-level language like Java and the target low-level machine language. A compilation comprises of an analysis and synthesis task, also known as front-end and back-end. The analysis is concerned with the syntax and semantics of the source text. It scans and checks the textual input before generating an abstract, intermediate representation that is processed to determine the program's meaning. If errors occur, the compilation task halts and informative messages are provided to the user. The intermediate representation is taken up by the synthesis task to generate the target source text by selecting target low-level instructions that are equivalent to the concepts in the source (Waite and Goos, 1984).

An interpreter compiles and executes single instructions simultaneously without previously translating the entire source text to the target machine language. Some activities, such as semantic analysis, may need to be done every time a statement is executed because the analysis and synthesis tasks are performed dynamically. Execution time may thus be slower, because program behaviour is dominantly informed by the high-level source language instructions, and not by optimised machine code.

This project addresses the front-end of the compilation, as type systems are part of the semantic domain. In the following sections, the program analysis task is explored in more detail before focusing on type systems.

2.2.2 Lexical and syntactic analysis

The syntax of a language specifies the logical structure of a program source text. It comprises of a set of rules that define what constructs the language offers and how they are decomposed into other constructs. Two phases can be identified in the process of checking if a source text complies with the rules. A lexical analyser or lexer converts the source character stream into a sequence of meaningful tokens, e.g. keywords, literals, and comments. Lexers may also be used for formatting, i.e. pretty-printing, the source text.

While the lexer scans characters and matches them to the lexical grammar, the parser matches the produced token sequence to the context-free grammar, which describes the structure of the program at a higher level in terms of terminals and non-terminals.2 The syntax analysis checks if a construct is structured correctly by looking at all the tokens in its decomposition. Some tokens, termed symbols, have special meaning and are stored in a symbol table by the parser. For example, variable names and function names can be looked up in this table (Slonneger and Kurtz, 1995).

If the source text complies with the context-free grammar, the parser generates a structured representation of the program, commonly a parse tree (Krishnamurthi, 2017). Tree generation is not trivial, since the structures may be ambiguous. To illustrate, the small nonsensical program, where “;” is the sequencing operator, can be represented with both trees:

Abbildung in dieser Leseprobe nicht enthalten

Figure 1: Two parse trees for an ambiguous program

Since this tree retains all the information of the source text, it is fine-grained and contains many details that can be glossed over in the subsequent semantic analysis. By abstracting the redundant concrete elements away, a higher-level AST (abstract syntax tree) can be formed. The AST is, therefore, leaving out language elements that only the parser needs and focuses on the crucial structural aspects of the program. There is no general rule for the construction of such AST because it depends on the choice of tree nodes that are deemed necessary or not for semantic analysis. As a result, more than one correct AST can exist for a single program (Slonneger and Kurtz, 1995).

2.2.3 Semantic analysis

Let a program be well-formed if it adheres to all rules of the programming language. Apart from the syntactic rules described by the context-free grammar, certain invariants need to be checked that go beyond their scope. Every program has semantic properties, which can be divided into static and dynamic properties (Seidl, Wilhelm and Hack, 2013). The dynamic semantics of a language describes the meaning of a computation, for example, by relating the input to the output. The static semantics constraints the formation of programs to prevent execution errors. The checks may, for instance, make sure that function calls supply the correct number of actual parameters, or that the scopes of variables are well-formed.

An important field of semantics is the type system, which constrains the range of values a variable can assume, or in the case of security type systems, restrict the information flows between variables. Type systems are discussed subsequently. In this project, static analysis determines the semantic properties at compile-time.

2.3 Terminology of Type

2.3.1 Variables and types

Before going into the details of type systems, it is essential to clarify the terminology. A program variable is an identifier in a context. In contrast to mathematical variables, it references a changeable value stored at a memory address. The separation of symbol and value provides a higher abstraction and makes way for language features such as multiple references to one value. The identifier is used to access, read and change the value. Hence, a variable can assume different values throughout program execution; otherwise, it would be termed a constant.

The type of a variable is the upper bound of all possible values. For instance, a variable has type “number”, if it can assume both decimals and integers but not strings as values. Programming languages are typed when these restrictions are in place, and untyped when there are no types or, equivalently, a universal type containing all values. An example of an untyped language is Javascript.

Untyped languages are not to be confused with dynamically typed languages, that assign types to variables at runtime and can dynamically change the associated type. These languages are more flexible and increase productivity, but statically typed languages lead to fewer errors and less code ambiguity (Pearce and Noble, 2011). If information about type assignments is contained in the syntax rules, the language is explicitly typed. In implicitly typed languages, type information is omitted in the source text, and types are inferred automatically (Cardelli, 1996).

Having this in mind, not only variables are assigned types. Expressions, functions, parameters or any other programming construct can have a type, depending on the design of the language.

2.3.2 Principal type and principal typing

Since the two terms are similar and easy to confuse, they are contrasted shortly. A typing in a type system can be found in the final judgment of a derivation. For example, if a derivation ends with the judgment Γ EE:t, the typing for E is the pair Γ, t. By contrast, the type of E is t in this type system and the environment Γ (Jim, 1996). This definition can be extended with the notion of principality. On the one hand, a principal type for an expression E in type environment Γ represents all possible types for E in Γ. This property is useful for type inference because instead of dealing with incompatible types in a judgment and having to choose one, the type system can infer the principal type that encompasses all possible types.

On the other hand, given a single expression E in a type system, all possible typings for E can be derived from the principal typing. The principal typing property is valuable, as it allows for compositional type inference: The type system does not need to analyse a program in sequence, instead, it can process subprograms separately in any order and then combine the intermediate results to a complete program typing (Wells, 2002).

2.4 Type systems

A type system is an inherent part of a typed programming language. Although it is a complex research discipline with many facets, the underlying motivation is simple: Preventing execution errors by attaching types to specific constructs in a program, tracking the types in an environment and checking them against a set of rules. There are other goals besides eliminating errors, for example, collecting more information about the program’s semantic context in order to optimise machine code generation during compilation. Moreover, type systems have the power to change the semantics of a programming language by establishing invariants (Wells, 2002). The formal definition of a type system should be detailed enough to specify its implementation unambiguously (Krishnamurthi, 2017).

2.4.1 Important program properties related to type systems

Two distinct types of execution errors exist. Trapped errors lead to a crash immediately at runtime, while untrapped errors may go unnoticed until they develop into bugs or cause corrupt data. Based on this separation, some properties can be defined. A program is type-safe if all untrapped errors are prevented. In other words, only meaningful behaviour is allowed, and commands that were certainly not intended by the programmer are made impossible, e.g. referencing a wrong memory address or assigning incompatible types. Ensuring the safety property is the primary goal of the type system. However, some programming languages, for instance, C and C++, are purposely unsafe in order to attain maximum performance (Cardelli, 1996).

Beyond the basic property of type safety, a program is well-behaved, if some sensible trapped errors are prevented by the type system as well. An example of a trapped error that can easily be detected in a static analysis is the assignment. It, therefore, follows that a program can be type­safe and ill-behaved at the same time if it is checked for untrapped errors but still contains trapped errors designated as illegal. Conversely, all well-behaved programs are type-safe.

A type-checker is the static implementation of a formal type system and part of the compiler. The rules specified by the type system are ensured by it. If a program passes the type-checker, it is well- typed. Ill-typed programs are not automatically ill-behaved and vice-versa, because the completeness of the type rules limits this property. If it is formally proved that well-typed programs are well-behaved, the type soundness theorem holds, stating that all typed programs that passed the type-checker are well-behaved (Cardelli, 1996).

2.4.2 Type rules

The type rules of the system are formalised and are used to carry out type derivations. It is more straightforward to prove type soundness working with this formal proof system than expressing the rules in the type-checking algorithm. Also, the formal rules are easier to read thanks to their modular structure. Generally, type rules describe a condition where no error occurs. If no rule applies to a program fragment during the static program analysis, the fragment is ill-typed, and a typing error will be produced. The derivation of a type consists of a tree of judgments, where the root is the final judgment at the bottom. If the root can be derived successfully under the premises, the judgment is valid. If the tree has no premises, the judgment is always true.

Abbildung in dieser Leseprobe nicht enthalten

2.4.3 Dynamic checking

So far, static type systems have been looked at, which assign types to program constructs independent of execution order or input data. There are several difficulties with static analysis since many programming languages offer mechanisms that enable programmers more freedom when annotating types. This freedom results in reduced specificity and places the burden of deciding concrete types on the type system. However, no non-trivial semantic property of a program is statically decidable, as stated by Rice's Theorem (Rice, 1953). Here, a non-trivial property is one that is unpredictable. For example, considering the following conditional, the type of x cannot be deducted statically, only at runtime. A solution to limitations of decidability like this would be to infer the principal type, which is imprecise as only the higher-level interface of the principal type would be available in subsequent operations. Alternatively, additional syntactic annotations to clarify the semantics would need to be provided. In languages without these alleviations, some programs might not pass the type-checker, even though they are type-safe. In conclusion, there is a trade-off between correctness and performance without sacrificing syntactic simplicity (Krishnamurthi, 2017).

Abbildung in dieser Leseprobe nicht enthalten

Listing 1: Unpredictable property (pseudocode)

Therefore, in addition to statically typing, dynamic checking that tracks values precisely is usually necessary to eliminate ill behaviour, e.g. detecting arrays going out of bounds. That said, static analysis naturally provides crucial execution time saving compared to untyped languages, because fewer decisions need to be made at runtime. In order to improve space-efficiency, a relation between types called subtyping can be used, which is illustrated in the following.

2.4.4 Subtyping

The notion of subtyping is, informally stated, a relation between types, where one type is termed subtype and the other one supertype. A variable (or any other programming construct for that matter) that is of the subtype may also be regarded as a variable of the supertype. A clearer specification of what constitutes a subtyping relationship is given by the subsumption principle: “[If] σ is a subtype of τ, then a value of type σ may be provided whenever a value of type τ is required” (Harper, 2012, p. 251).

Subtyping is advantageous in many cases, first, to give the programmer more flexibility in combining program fragments of slightly different types by casting them, and second, to save space when tracking types dynamically. The latter can be exemplified by a function that accepts a variable of supertype a, that has two subtypes. When calling this function, the checker only needs one bit to record the actual type instead of treating it as a distinct type (Krishnamurthi, 2017). Other areas of application for subsuming types are, among others, reference subtyping, object subtyping and function subtyping (Harper, 2012).

2.4.5 Flow-sensitive typing

A flow-sensitive type system provides more flexibility as it allows variables to have different types at different points in the program. Flow-sensitivity empowers the type inference mechanism by considering the control flow during static analysis (Pearce and Noble, 2011). Java, for example, lacks this property; The following program fragment would be judged ill-typed, although it is clearly type-safe.

Abbildung in dieser Leseprobe nicht enthalten

Listing 2: Safe but rejected Java excerpt

A flow-sensitive type system can solve this decidability problem by inferring the only possible type from the surrounding control flow.

2.4.6 Gradual typing

To overcome the limitations of having only static or dynamic checking, gradual typing allows the mixing of both paradigms to any degree. Type annotations are optional in a gradually typed language, and the programmer can control when to add or omit them. If a program fragment is annotated, it is typed statically. Otherwise, it is checked dynamically. This flexibilty provides greater freedom for the implementor (Siek and Taha, 2007).

2.5 Information flow security problem

2.5.1 Security goals

In the previous chapter, it is shown how type systems are useful for preventing execution errors, most importantly by establishing the type safety property. The primary goal is to prevent untrapped execution errors, since these may go unnoticed for long periods before causing corruption, like overwriting datasets at a wrong memory address. Nevertheless, even though type-safe programs may be safe from such errors-or even from all execution errors-, they may still contain virtually undetectable semantic flaws, for example, a wrong conditional or wrong order of commands. Be that as it may, because preventing human error is not covered in this project, only formal ways to improve correctness are looked at. It is important to realise that not only the trustworthiness of a given software design needs to be verified, but it must be assessed whether the implementation follows the design specification correctly (Goguen and Meseguer, 1982).

Considering data security, any program flaw, no matter how small or big, poses a security risk. A software system can have several security properties: Availability, integrity, confidentiality, authentication, accountability and auditability. Two properties will be looked at in more detail. Integrity is difficult to define and has different meanings depending on the context, but may be pinned down as being internally consistent, performing its intended function correctly and being unmodified.

Confidentiality ensures that data cannot leak to unauthorised sinks. It involves policies that unambiguously specify a subject, a mode of access, an object, and if the combination of these three entities is authorised or not (Pfleeger, Pfleeger and Margulies, 2015). For example, the policy might state: “An external application (subject) learning the existence (mode of access) of a user in a database (subject) violates the confidentiality.”

Moving on with these definitions in mind, it can be said that ill-behaved programs cause integrity failing since the correctness of their results cannot be guaranteed. This type of risk does not necessarily stem from malicious intent but may lead to data corruption and other serious harm. Perhaps more importantly-at least in this project-, illegal program behaviour provides an opportunity for malicious actors to exploit and manipulate the system’s resources. For instance, if an attacker can feed input to a program written in an unsafe language causing it to read a particular memory fragment, confidential information could be exposed or changed at will. Thus, the consequence of insufficiently checking software system security is confidentiality failure, integrity failure or both (Pfleeger, Pfleeger and Margulies, 2015). Substantial feasible assurance of these two properties is a desired feature, though the focus of this project lies on confidentiality.

2.5.2 Issues of standard solutions

The standard techniques to ensure data protection from intentional and unintentional harm include firewalls, encryption, and anti-malware software. More importantly, access controls consisting of privileges given to users and applications regulate who can manipulate certain data. However, information systems become increasingly interconnected, and the flow of information opaquer. Encryption restricts who can read the exchanged message but fails to enforce a confidentiality policy beyond the point of decryption. Besides, encrypted data is more difficult to process.

Similarly, access controls provide a “one-end” solution by restricting access at one point and cannot typically enforce an end-to-end confidentiality policy between two or more applications. More specifically, access controls do not restrict the propagation of information. One technique to determine a program's end-to-end security automatically is to analyse the information flow within it and checking it against an information flow policy (Sabelfeld and Myers, 2003).

2.5.3 Security classes

Information is disseminated among objects in an information system, e.g. when reading from and writing to files, when calling web services, or even when assigning program variables. Processes, which are instances of programs, are responsible for regulating these information flows. In order to identify legal and illegal flows, an information flow policy specifies a set of security classes and a flow relation defining among which pairs of classes information is permitted to flow: If the value of a data object x is in any way dependent on the value of another object y, their security classes must either be the same or the relation between them explicitly or transitively permitted by the flow policy (Denning, 1976).

The finite partially ordered set of security classes forms a join-semilattice, which facilitates the general reasoning about flow policies. To illustrate, the ordering on the powerset of {x, y, z} results in the lattice displayed in Figure 2. It is universally bounded by a unique top element T and bottom element 1. The meet of two security classes does not need to exist because an information flow between two classes would always result in their join, i.e. the least upper bound. Another simple example is the binary lattice low E high, where flows from high (confidentiality) to low are not permitted.

2.5.4 Dynamic and static enforcement mechanisms

A program is accepted by an information flow control only if all explicit and implicit flows are following the rules prescribed by the information flow policy. The control enforcing the flow policy can be implemented as a runtime or compile-time mechanism. The former is preferable if the security classes or the security policy may change over the course of program execution, e.g. when file permissions are modified in a file system (Buiras, Vytiniotis and Russo, 2015).

Compile-time mechanisms have two main advantages: They do not impair execution speed since program certification happens before execution; what is more, dynamic information flow control only considers one execution path, whereas static checks can assess the confidentiality of all paths. In addition, compile-time checks can be anchored in the programming language semantics for more natural proof of correctness (Sabelfeld and Myers, 2003).

Explicit information flows arise from the direct passing of values between objects, e.g. when assigning variables or returning values from procedures. Implicit flows are harder to detect and deserve more attention. They are rooted in the control structure of the program, rather than in the passing of values. Conditionals, for instance, always lead to an implicit flow,3 as can be seen in Listing 3. Here, y is dependent on x. Dynamic flow controls, however, are not feasibly able to evaluate all execution paths and consequently unable to spot all implicit flows (Denning and Denning, 1977).

Abbildung in dieser Leseprobe nicht enthalten

Listing 3: Implicit information flow (pseudocode)

Lastly, another issue of dynamically tracking flows in processes lies in unnecessarily raising the attached security classes of variables monotonically, i.e. without ever lowering them. This effect, called label creep, leads to inaccurate results, since intermediate values are not considered, and some values might, in fact, not depend on each other (Rajani and Rafnsson, 2017). Static information flow controls often introduce a program counter label to avert label creep, which corresponds to the context of a command. The program counter allows the current context to be raised temporarily rather than permanently (Stefan et al., 2017).

Taking the above into account, language-based techniques are an attractive choice for static information flow analysis. Different forms of information flow analysis exist that address different security goals, e.g. taint analysis, Java's bytecode verifier, program slicing, data flow analysis, semantics-based security model (Leino and Joshi, 1998), and program dependence graph analysis (Hammer and Snelting, 2009). Taint analysis, for instance, identifies untrustworthy sources, such as user input streams, that possibly contain malware or confidential data. If a tainted object from this source flows into a critical sink according to the security policy, e.g. a database, an illegal flow will be reported. Taint analysis enforces integrity by avoiding harm to valuable assets.

2.5.5 Non-interference of information flows

The aim of language-based information flow control used in this project is to enforce a type of security property called non-interference, which separates private (confidential) variables from public variables and states that public output must not be affected by variations of private input. To put it differently, variables assigned security classes of a higher level do not interfere with the variables of lower security levels (Sabelfeld and Myers, 2003). Therefore, a malicious user of a program cannot infer confidential data by observing its public outputs. When the static analysis ensures non-interference, the program obeys confidentiality-not considering covert channels (Hammer and Snelting, 2009). In this light, a security policy is just a set of non-interference assertions (Goguen and Meseguer, 1982).

The problem of enforcing non-interference of information flows entails a trade-off between conflicting requirements: correctness, precision, scalability and practicability. Correctness terms the ability to find all illegal flows, while precision denotes that solely truly insecure programs get rejected by the information flow control. These two are usually impossible to combine because of undecidable control flows. Even if an analysis is correct and precise, scalability or practicability suffers from a complex algorithm or a heavily annotated syntax (Hammer and Snelting, 2009).

2.5.6 Security type systems and their features

To harness better precision and better scalability by reducing runtime overhead, the information flow analysis can be conducted statically before runtime. Using security type systems as a formal specification of what constitutes secure information flow in a programming language has several advantages. Type systems are formalised and thus separate from its concrete implementation, which enables the designer to freely choose an algorithm after having proved that the non­interference theorem holds (Volpano, Irvine and Smith, 1996). Also, this approach builds on traditional type system research and can likewise be realised as part of the compiler. Owing to the ability to analyse all possible execution paths, a security type system is able to ensure that a program obeys confidentiality before its execution (Sabelfeld and Myers, 2003).

Non-interference, respectively the security type systems that enforce it can exhibit certain features, some of which will be looked at here. The type system can be sensitive to the order of execution, which means that the result of an analysis of n commands may be different depending on their order. Conversely, flow-insensitive type systems associate a fixed security class to every variable and do not take into account the execution order. For instance, the program in Listing 4 is accepted only by a flow-sensitive analysis which updates the security class of temp.

Abbildung in dieser Leseprobe nicht enthalten

Listing 4: Program only secure in a flow-sensitive analysis

Flow-insensitive type systems are less precise than flow-sensitive ones since they invalidate programs where the security classes of variables vary over time. Even if there is no dependency between the initial values and the final state of the program, such a system may consider it insecure due to one subprogram containing an insecure flow (Hunt and Sands, 2011). It should be noted, however, that a program that can be typed in a flow-sensitive type system can be transformed into an equivalent program typeable in a simple flow-insensitive fixed-type system by introducing additional variables for each element of the security label lattice (Hunt and Sands, 2006). Flow­sensitivity is a deciding argument for static security analysis. While a flow-insensitive analysis enforcing non-interference can be implemented either as a static analysis or as a dynamic monitor with identical results, it is impossible for a dynamic monitor to enforce a flow-sensitive policy (Russo and Sabelfeld, 2010).

2.5.7 Covert channels

Attackers may exploit programs without observing the program in- and outputs by taking advantage of covert channels. These are physical side channels not intended for information transfer at all and are not captured by an information flow policy (Lampson, 1973). Examples include the program's effect on processor load, memory consumption, and the file system. In particular, the timing channel of a program can expose confident information. For example, the runtime of the program if high then Ciong leaks the value of high.

A part of this problem can be alleviated by a termination-sensitive security type system, which does not ignore the termination behaviour of a program, i.e. non-termination and crashes (Kashyap, Wiedermann and Hardekopf, 2011). The termination channel is an important covert channel since smaller secrets with a few dozen bits can be leaked quickly, even though it remains infeasible to find out sufficiently large secrets such as encryption keys with this method (Askarov et al., 2008).

2.5.8 Alternatives to non-interference

Non-interference is often not a practical constraint for expressing security requirements of real systems, because it is either too restrictive or not restrictive enough. Some deviations from non­interference are overviewed briefly in the following. Security type systems can enforce declassification and erasure, just like non-interference.

Declassification

Programs are sometimes required to release some amount of sensitive information: Taking the example of a login process, the system necessarily declassifies the confidential piece of information “username and password do (not) match” (Clark, Hunt and Malacaria, 2007). The problem of specifying security policies which allow deliberate declassification can be divided into the dimensions:

- What information can be released?
- Where in the system can information be released?
- Who can release information?
- When can information be released?

The first offers control over what information is released by extending the security type system with “escape hatch” expressions that allow the programmer to downgrade information explicitly. It is guaranteed that no other information is declassified. However, the location of declassification plays a vital role as well. For instance, the program l := h; skip; l := declassify(h) exposes the value of h before its intended release; A security policy dealing only with the what dimension would accept this. Thus, a combination of what and where release policies is necessary for better accuracy, which Askarov and Sabelfeld (2007) termed localised delimited release. Closely related to this is the approach of determining the level of information interference by quantifying the released information using information-theoretic rationale (Clark, Hunt and Malacaria, 2007).

A flow control focussing on the who dimension enforces confidentiality by specifying ownership of data. For instance, the Java extension Jif augments the programming language with syntax allowing the programmer to declare who can read or modify certain variables, thus capturing the level of sensitivity directly in the source text (Pullicino, 2014). Policies constraining the when dimension typically focus on releasing information relative to events. For example, the framework by Chong and Myers (2004) uses temporal conditions that need to hold at the time a declassification happens. Later, Asian Askarov and Sabelfeld (2007) introduced gradual release, a policy where the observer gains knowledge about variables gradually. Yet in all models, the absence of declassification expressions in the source implicates non-interference.

Erasure

While non-interference guarantees sensitive information not to leak into public sinks, it is sometimes required to erase information. These problems need to be reasoned about in combination because the information flow in a system has a bearing on the erasure conditions. For instance, mobile devices need to erase secret keys after using them; If no application has the necessary security class to use them, they can be dispensed with immediately. In a generic approach, Hunt and Sands (2008) realised erasure of variables as a reclassification of security classes. In essence, a piece of information is assigned a lifetime terminated by a condition, after which it is “erased” to a higher security level and consequently unreachable for lower observers. The Jif programming language features an extension that uses a related framework to successfully enforce erasure and declassification (Chong and Myers, 2008).

3 Methods

This chapter discusses the approach taken to achieve the objectives outlined above. It begins with an overview of the project's initial ideas and how these developed over time. After illustrating the methodology, the employed tools are discussed, and the While language introduced. Before moving to the next chapter, it will be outlined how the implementation was verified and validated.

3.1 Development methodology

This project includes the implementation of a security type system and its evaluation using different test scenarios. The development process entailed a thorough investigation of the research subject and making several decisions which impacted the scope. The initial idea described in the proposal was to conduct the project in two phases: First, selecting a type system according to a set of criteria and determining the (non-)functional requirements. Second, carrying out development following the conventional classical waterfall model, i.e. laying out the development process unidirectionally (Dawson, 2009). The waterfall model was justified with the short time given; The problem was unlikely to evolve unexpectantly over three months. Another reason was the absence of a user group, who may be demanding a proof of concept or regular demonstrations to make sure the implementation matches their needs.

However, the sequential approach turned out to be inappropriate early on owing to several reasons. The work involved developing at the semantic analysis stage of a compiler, which was unknown terrain for me, having a background in high-level software development. Moreover, one of the objectives is to figure out scalable and performant ways to implement the typing rules, or more concretely, which data structures are most effective for storing necessary information about program variables.

Thus, the evolutionary prototyping approach was a more favourable choice of methodology. It can be illustrated as a series of phases, each having a distinct design, build and test activities and each resulting in an executable version of the product. The advantage of this approach is having a good idea of the final version. Source code can be expanded with every increment, rather than throwing it away and starting anew. This also allowed for routine reviews of the requirements and overall project progress, which was consistent with the coarse-grained phases of the proposed work plan but differed at the fine-grained level. That is, the design, build, test and review tasks were organised in iterations instead of being single chunks of work.

Selecting a suitable security type system from the literature according to a set of selection criteria was prerequisite for pursuing the objectives. The project scope is especially informed by the choice of type system since these can become very complex. A preliminary qualitative analysis compared nine publications of security type systems based on five criteria. The selected type system and its features are justified and investigated in the next chapter.

Since development followed the evolutionary prototyping approach, the provided While language grammar, interpreter and parser were extended in iterations, each resulting in an executable compiler front-end. The gained insights of every iteration, associated design decisions and significant trade-offs are discussed in detail in corresponding sections in the next chapter. The development took place bottom-up (Dawson, 2009), meaning that possible data structures were examined first, followed by the dependency map implementation, and only the last step was to develop the security type system functionality.

3.2 Development tools

The platform used to manage the different evolutions was GitHub.com, a popular provider of the open-source version-control system Git. Source code was kept synchronised with the cloud-based GitHub project,4 and every commit was documented. A branch in Git corresponds with one evolution, and different commits of a branch represent milestones, e.g. “Finished writing unit tests”, “Changed a well-formedness condition”. As a result, the development’s timeline can be precisely retraced. As this is a design and creation research project, the ability to backtrack is valuable for identifying the leaps made from existing knowledge to interesting insights.

The compiler front-end was developed in the Java™ programming language release 11 (JDK™ 11) in the open-source integrated development environment Eclipse. The decision to use this language was made because I am more familiar with it compared to C++ or C#, although these and others would have been an equally good fit. C++, in particular, might be an good choice for a future project, to see if there are considerable performance gains compared to Java. Noteworthy Eclipse plugins that proved to be useful were

- UMLet, to create UML diagrams from Java files,
- Eclipse EGit, to synchronise with the GitHub repository, and
- JavaCC for Eclipse, to enable formatting for JavaCC grammar files.

3.3 Software objects that were provided

There were software objects provided by the supervisor that allowed me to concentrate on the main problem of the project, the security type system. These are stated in Table 1 to distinguish project contributions from given objects accurately.

Table 1: Provided software objects

Abbildung in dieser Leseprobe nicht enthalten

3.4 Informal description of the While language

This section describes the very simple While language used in the project. It essentially only allows the programming constructs assignments, conditionals and while loops. In addition, procedures with zero to n input and output parameters can be defined and called. Parameters are always called and returned by value, which reduces the semantics of parameter passing to an assignment of actual to formal parameters and vice versa when entering and leaving a procedure. The While language has no variable declarations, no scoping and no types let alone a type system. Syntactically, variables can only be assigned integers and the keywords true and false. Code blocks can be achieved with curly braces, although there is no block structure per se, i.e. there is no semantic difference to having no blocks. The character “; ” is a delimiter, just like in Java, not the sequencing operator.

[...]


I https://www.xmind.net

II Waite1984 p13

III Not always, in the program if x = 1 then y := z else y := z, y is not influenced by x, but this exception can be glossed over.

IV Project link: htttps://github.com/ph342/InfoFlow/

Excerpt out of 90 pages

Details

Title
Implementing a type system for secure information-flow. Potential security risks
College
City University London
Course
Software Engineering
Grade
77
Author
Year
2019
Pages
90
Catalog Number
V962215
ISBN (eBook)
9783346313560
Language
English
Tags
information flow analysis, security type system, end-to-end security, confidentiality, non-interference
Quote paper
Dominik Kropp (Author), 2019, Implementing a type system for secure information-flow. Potential security risks, Munich, GRIN Verlag, https://www.grin.com/document/962215

Comments

  • No comments yet.
Read the ebook
Title: Implementing a type system for secure information-flow. Potential
security risks



Upload papers

Your term paper / thesis:

- Publication as eBook and book
- High royalties for the sales
- Completely free - with ISBN
- It only takes five minutes
- Every paper finds readers

Publish now - it's free