Grin logo
de en es fr
Shop
GRIN Website
Publish your texts - enjoy our full service for authors
Go to shop › Computer Science - IT-Security

Incremental Construction of Code Property Graphs

Title: Incremental Construction of Code Property Graphs

Master's Thesis , 2021 , 54 Pages , Grade: 2,0

Autor:in: Samuel Hopstock (Author)

Computer Science - IT-Security
Excerpt & Details   Look inside the ebook
Summary Excerpt Details

This thesis extends a modified CPG approach that is able to operate on multiple programming languages, i.e. C/C++, Java, Python and Golang, available on GitHub3 [Fra21a].

Graph-based code analysis systems are versatile tools for reasoning about the correctness of complex software projects. One area in which they are widely used is in source code auditing: Security vulnerabilities, for example using cryptographic functions with insecure algorithms, can be introduced by coding patterns that spread over the boundaries of several methods, classes or even files in the project.

This is where graph-based analysis makes finding these vulnerabilities easier, by creating a framework where the source code can be represented as a graph and vulnerable

Excerpt


Contents

1 Introduction

1.1 Problem Statement

1.2 Thesis Structure

2 Background

2.1 Code Analysis

2.1.1 Dynamic Code Analysis

2.1.2 Static Code Analysis

2.2 Code Property Graphs

2.2.1 Components and Structure

2.2.2 Generation Process

2.3 Incremental Parsing

3 Related Work

3.1 Graph-Based Code Analysis

3.1.1 Joern

3.1.2 ProgQuery

3.1.3 Codyze: Code Property Graphs for Security Assessments

3.2 Incremental Parsing

4 Improving the Construction Performance for Incremental Changes

4.1 Approach Overview

4.2 Concurrent Source Code Parsing

4.3 Change Classification

4.4 Performing In-Place Updates

4.4.1 FieldDeclaration Changes

4.4.2 FunctionDeclaration Changes

4.5 Graph Equality Checking Algorithm

5 Evaluation

5.1 Concurrent Source Code Parsing

5.1.1 Experiment Setup

5.1.2 Results

5.2 Incremental Graph Updates

5.2.1 Experiment Setup

5.2.2 Results

6 Conclusion

7 Future Work

7.1 Using a Listener System for Reference Resolution

7.2 Incremental AST Parsing

7.3 Parallelizing the Whole CPG Generation Process

Objectives and Topics

This thesis addresses the performance limitations of generating Code Property Graphs (CPG) for large software projects. The primary research goal is to develop an incremental CPG generation process that reuses existing graph structures for unchanged code, thereby significantly reducing the computational overhead and feedback cycles required for security analysis in continuous integration pipelines.

  • Techniques for incremental graph construction and in-place updates.
  • Methods for classifying source code changes to optimize graph regeneration.
  • Parallelization strategies for the initial abstract syntax tree (AST) parsing phase.
  • Development of an algorithmic framework for validating graph isomorphism and equality.
  • Performance evaluation based on real-world software project commit histories.

Excerpt from the Book

4.4 Performing In-Place Updates

While Algorithm 4.1 takes care of identifying the relationship between nodes in the original and changed graph, the crucial point how this can be used to speed up CPG construction lies in the update handlers that have been mentioned several times now. Each of them is able to process a specific node type and determines which actions need to be taken in case a node has been added, deleted or replaced. If updates cannot be performed with sufficient confidence, a full pass execution is triggered.

How this change process looks in practice will be the demonstrated in this section, using FieldDeclaration and FunctionDeclaration handlers as examples. Of course we could implement additional handlers, but some of the node types require a disproportionate amount of computational overhead for in-place updates, limiting their usefulness: When a change affects RecordDeclarations, for example, this can have widespread consequences due to changes in the program’s type hierarchy. In such a case it makes much more sense to just execute all passes again rather than trying to selectively resolve imprecisions in the graph that resulted from the incremental change.

Summary of Chapters

1 Introduction: Introduces the necessity of automated code analysis and sets the objective to optimize CPG generation for incremental software changes.

2 Background: Provides foundational knowledge on code analysis, CPG structures, and the concepts of incremental parsing.

3 Related Work: Reviews existing graph-based analysis tools and current approaches to incremental parsing in the context of static analysis.

4 Improving the Construction Performance for Incremental Changes: Details the core implementation of incremental updates, change classification, and the graph equality checking algorithm.

5 Evaluation: Presents the empirical results of benchmarking the proposed concurrent parsing and incremental update strategies against existing baseline implementations.

6 Conclusion: Summarizes the thesis findings, confirming the feasibility and performance benefits of the incremental CPG construction approach.

7 Future Work: Suggests potential future directions, including the transition to listener-based systems and integrating incremental AST parsers like tree-sitter.

Keywords

Code Property Graph, CPG, Incremental Parsing, Static Code Analysis, Software Security, Abstract Syntax Tree, AST, Control Flow Graph, CFG, Data Flow Graph, DFG, Performance Optimization, Continuous Integration, Graph Isomorphism, In-place Updates

Frequently Asked Questions

What is the core focus of this thesis?

The thesis focuses on improving the efficiency of Code Property Graph (CPG) generation by enabling incremental updates rather than full re-generation for every code change.

Which analysis approach is utilized?

It utilizes graph-based static code analysis to model syntactic and semantic relationships within software projects, specifically targeting security vulnerability detection.

What is the primary goal of the proposed research?

The goal is to reduce the runtime of CPG generation from several minutes to seconds for incremental changes, facilitating faster feedback loops in development environments.

Which specific scientific methods are employed?

The research employs graph algorithm development, including a specialized node mapping and traversal algorithm (Algorithm 4.1), and conducts performance benchmarking on real-world Java source code repositories.

What topics are discussed in the main body?

The main body covers the theoretical approach for incremental updates, methods for concurrent AST parsing, change classification, and the design of an automated testing framework for graph equality.

Which keywords best describe this work?

Key terms include CPG, Incremental Parsing, Static Code Analysis, Performance Optimization, and Graph Isomorphism.

How are FieldDeclaration changes handled in this study?

FieldDeclaration changes are handled via update handlers that allow for re-routing incoming references to updated nodes without necessitating a full re-parse of the entire project.

What role does the graph equality checking algorithm play?

It serves as a rigorous testing framework to ensure that incrementally produced graphs are identical to those generated from scratch, which is crucial for maintaining analysis accuracy.

Excerpt out of 54 pages  - scroll top

Details

Title
Incremental Construction of Code Property Graphs
College
Technical University of Munich  (Department of Informatics)
Grade
2,0
Author
Samuel Hopstock (Author)
Publication Year
2021
Pages
54
Catalog Number
V1146231
ISBN (eBook)
9783346540706
ISBN (Book)
9783346540713
Language
English
Tags
code property graph static code analysis incremental java c c++ graph
Product Safety
GRIN Publishing GmbH
Quote paper
Samuel Hopstock (Author), 2021, Incremental Construction of Code Property Graphs, Munich, GRIN Verlag, https://www.grin.com/document/1146231
Look inside the ebook
  • Depending on your browser, you might see this message in place of the failed image.
  • Depending on your browser, you might see this message in place of the failed image.
  • Depending on your browser, you might see this message in place of the failed image.
  • Depending on your browser, you might see this message in place of the failed image.
  • Depending on your browser, you might see this message in place of the failed image.
  • Depending on your browser, you might see this message in place of the failed image.
  • Depending on your browser, you might see this message in place of the failed image.
  • Depending on your browser, you might see this message in place of the failed image.
  • Depending on your browser, you might see this message in place of the failed image.
  • Depending on your browser, you might see this message in place of the failed image.
  • Depending on your browser, you might see this message in place of the failed image.
  • Depending on your browser, you might see this message in place of the failed image.
  • Depending on your browser, you might see this message in place of the failed image.
Excerpt from  54  pages
Grin logo
  • Grin.com
  • Shipping
  • Contact
  • Privacy
  • Terms
  • Imprint