This thesis extends a modified CPG approach that is able to operate on multiple programming languages, i.e. C/C++, Java, Python and Golang, available on GitHub3 [Fra21a].
Graph-based code analysis systems are versatile tools for reasoning about the correctness of complex software projects. One area in which they are widely used is in source code auditing: Security vulnerabilities, for example using cryptographic functions with insecure algorithms, can be introduced by coding patterns that spread over the boundaries of several methods, classes or even files in the project.
This is where graph-based analysis makes finding these vulnerabilities easier, by creating a framework where the source code can be represented as a graph and vulnerable
Table of Contents
- Glossary
- 1 Introduction
- 1.1 Problem Statement
- 1.2 Thesis Structure
- 2 Background
- 2.1 Code Analysis
- 2.1.1 Dynamic Code Analysis
- 2.1.2 Static Code Analysis
- 2.2 Code Property Graphs
- 2.2.1 Components and Structure
- 2.2.2 Generation Process
- 2.3 Incremental Parsing
- 2.1 Code Analysis
- 3 Related Work
- 3.1 Graph-Based Code Analysis
- 3.1.1 Joern
- 3.1.2 ProgQuery
- 3.1.3 Codyze: Code Property Graphs for Security Assessments
- 3.2 Incremental Parsing
- 3.1 Graph-Based Code Analysis
- 4 Improving the Construction Performance for Incremental Changes
- 4.1 Approach Overview
- 4.2 Concurrent Source Code Parsing
- 4.3 Change Classification
- 4.4 Performing In-Place Updates
- 4.4.1 Field Declaration Changes
- 4.4.2 Function Declaration Changes
- 4.5 Graph Equality Checking Algorithm
Objectives and Key Themes
This master's thesis explores the challenges of constructing Code Property Graphs (CPGs) for large software projects and presents an improved CPG generation process that prioritizes efficiency. The goal is to optimize the CPG generation process for incremental code changes, thereby reducing the time required for security analysis and improving the feedback cycle for developers.
- Code Property Graphs as a representation of software structure
- Incremental parsing and updating of CPGs
- Optimization of CPG generation for large projects
- Performance evaluation of incremental CPG generation
- Application of CPGs in security analysis
Chapter Summaries
- Chapter 1: Introduction - This chapter presents the problem statement, outlining the challenges of generating CPGs for large software projects and the need for an efficient solution. It also outlines the structure of the thesis.
- Chapter 2: Background - This chapter provides a detailed explanation of code analysis techniques, focusing on static code analysis and the role of CPGs in this process. It describes the components and structure of CPGs as well as their generation process.
- Chapter 3: Related Work - This chapter reviews existing research on graph-based code analysis systems and incremental parsing techniques. It discusses relevant projects and frameworks like Joern, ProgQuery, and Codyze.
- Chapter 4: Improving the Construction Performance for Incremental Changes - This chapter presents the thesis's main contribution - an improved approach for constructing CPGs incrementally. The chapter outlines the approach overview, delves into concurrent source code parsing, change classification, and the implementation of in-place updates for different code changes.
Keywords
Code Property Graph, CPG, code analysis, static analysis, incremental parsing, graph-based analysis, security analysis, performance optimization, source code auditing, vulnerability detection, Java, software engineering.
- Quote paper
- Samuel Hopstock (Author), 2021, Incremental Construction of Code Property Graphs, Munich, GRIN Verlag, https://www.grin.com/document/1146231