Network Measurements play an essential role in operating and developing today's
Internet. A variety of measurement applications demand for multipoint
network measurements, e.g. service providers need to validate their delay guarantees
from Service Level Agreements and network engineers have incentives to
track where packets are changed, reordered, lost or delayed. Multipoint measurements
create an immense amount of measurement data which demands for high
resource measurement infrastructure. Data selection techniques, like sampling
and filtering, provide efficient solutions for reducing resource consumption while
still maintaining sufficient information about the metrics of interest. But not all
selection techniques are suitable for multipoint measurements; only deterministic filtering allows a synchronized selection of packets at multiple observation points.
Nevertheless a fillter bases its selection decision on the packet content and hence
is suspect to bias, i.e the selected subset is not representative for the whole population.
Hash-based selection is a filtering method that tries to emulate random
selection in order to obtain a representative sample for accurate estimations of
traffic characteristics.
The subject of the thesis is to assess which hash function and which packet content
should be used for hash-based selection to obtain a seemingly random and
unbiased selection of packets. This thesis empirically analyzes 25 hash functions
and different packet content combinations on their suitability for hash-based
selection. Experiments are based on a collection of 7 real traffic groups from
different networks.

Extrait

1 Introduction

1.1 Measurements in IP Networks

1.2 Active and Passive Measurements

1.3 Passive Multipoint Measurements

1.4 Measurement Architecture and Measurement Process

1.5 Incentives for Sampling

1.6 Selection Techniques

1.6.1 Sampling

1.6.2 Filtering

1.7 Comparison of Selection Techniques

1.8 Scenarios for Hash-Based Selection

1.9 Problem Statement and Target of this Work

1.9.1 Problem Statement

1.9.2 Targets of this Work

1.10 Document Structure

2 State of Art

2.1 Passive Multipoint Measurements

2.2 Packet Header Fields useful for Hash-Based Selection

2.3 Existing Evaluation Approaches for Hash Functions

3 Evaluation of Packet Content

3.1 Approach

3.1.1 Applicable Header Fields

3.1.2 Entropy per Byte

3.2 Traces Analyzed

3.3 Entropy Evaluation

3.3.1 IPv4

3.3.2 TCP

3.3.3 UDP

3.3.4 ICMP

3.3.5 IPv6

3.3.6 Conclusion Entropy Analysis

3.4 Hash Input Collisions Evaluation

3.4.1 Input Collisions

3.4.2 Experimental Setup

3.4.3 Results

4 Evaluation of Hash Functions

4.1 Desired Properties

4.2 Collection of Hash Functions

4.3 Security Issues for Hash-Based Selection

4.4 Performance Measurements

4.4.1 Experimental Setup

4.4.2 Results

4.5 Avalanche Criterion

4.5.1 Approach

4.5.2 Experimental Setup

4.5.3 Results

4.6 Independence of Sampling Decision and Representativeness

4.6.1 Chi-Square Independence Tests

4.6.2 Introduction to Person’s Goodness-Of-Fit Test

4.6.3 Derivation of Goodness-of-Fit from Independence Test

4.6.4 Parameters for the Chi-Square Tests

4.6.5 Multiple Independence Tests

4.6.6 Independence Sampling Decision and Packet Length

4.6.7 Independence Sampling Decision and Protocol

4.6.8 Independence Sampling Decision and Byte Value

4.7 Conclusion Hash Function Evaluation

5 Applicability of Hash-Based Selection on other Traces

5.1 Chi-Square Independence Tests for other Traces

5.1.1 Traffic Traces

5.1.2 Measurement Setup

5.1.3 Measurement Results

5.2 Trace Analysis

5.2.1 Reasons for Identical Packets

5.2.2 Consequences for Hash-based Selection

6 Hash Functions for Packet ID Generation

6.1 Introduction

6.2 Approach

6.3 Uniformity of Hash Value Distribution

6.3.1 Experimental Setup

6.3.2 Results

6.4 Collision Probability

6.5 Theoretical Probability

6.5.1 Initial Assumptions

6.5.2 Mathematical Model for Collision Probability

6.6 Empirical Collision Probability

6.6.1 Measurement Setup

6.6.2 Measurement Results

6.7 Packet ID and Selection Hash Value

6.7.1 Selection Range Adjustment for One Hash Function

6.7.2 Selection Range Adjustment for Two Hash Functions

6.7.3 Difference Between One and Two Hash Function Approach

6.7.4 Conclusion

7 Conclusion

7.1 Packet Content Evaluation

7.2 Hash Functions for Hash-Based Selection

7.3 Emulation of Random Sampling

7.4 Influence of Hash Input Collisions

7.5 Hash Functions for Packet ID generation

7.6 Further Work

Objectives and Topics

The primary objective of this thesis is to assess which hash functions and packet content combinations are most suitable for hash-based selection to enable unbiased and representative packet selection in IP networks. The work aims to emulate random sampling techniques to provide accurate traffic characteristic estimations while minimizing resource consumption at network measurement points.

Empirical analysis of 25 different hash functions for selection and packet ID generation.
Evaluation of optimal packet header fields to serve as hash inputs for reducing input collisions.
Statistical validation of hash-based selection using Chi-Square independence tests.
Investigation of the feasibility of using a single hash function for both selection and packet ID generation.

Excerpt from the Book

1.3 Passive Multipoint Measurements

A broad spectrum of network measurement applications demand for passive multipoint measurements. Providers of interactive services, like audio and video conferencing, guarantee their customer certain delay limits which they need to validate. Multipoint measurements can also be an input for traffic engineering. Packets can be traced throughout the network which allows a detailed picture of the measured domain. Routing loops can be detected and network locations where packets get reordered or lost can be identified.

In Fig. 1 the general concept of passive multipoint measurements is shown. As a packet traverses through the measured network it passes observation points. These can be any device that can listen on the shared medium such as a network card or a router. At the observation point a copy of the packet is taken and a packet ID which identifies the packet throughout the network is generated. This can be either parts of the packet or a hash value over the packet content. The packet ID and a timestamp of the packet’s arrival are transferred to a common multipoint collector. The collector can either be a dedicated device or can be co-located at an observation point. The trace of each packet and the delay between the observation points can be calculated by correlating the packet IDs from the different observation points. The delay is gained by subtracting the timestamp values of two corresponding packet IDs from two measurement points. The arithmetic sign shows the direction of the packet.

Summary of Chapters

1 Introduction: Discusses the motivation for network measurements and the necessity of data selection techniques to handle the large volumes of data generated in multipoint measurement scenarios.

2 State of Art: Reviews existing literature on multipoint measurement techniques and common approaches for evaluating hash functions in the context of packet selection and identification.

3 Evaluation of Packet Content: Analyzes various packet header fields to determine their suitability as hash inputs based on entropy analysis to identify fields that are both static across nodes and variable across packets.

4 Evaluation of Hash Functions: Defines quality criteria for hash functions—such as calculation speed and non-linearity—and empirically evaluates 25 distinct hash functions to determine their effectiveness for unbiased packet selection.

5 Applicability of Hash-Based Selection on other Traces: Investigates the impact of different traffic classes and scenarios on the reliability of hash-based selection, specifically addressing the influence of identical packets found in real-world traffic traces.

6 Hash Functions for Packet ID Generation: Focuses on the specific requirements for generating packet identifiers, evaluating the collision probability of hash functions and exploring the trade-offs of using a single versus dual hash function approach.

7 Conclusion: Summarizes the key findings regarding the recommended hash input configurations and hash functions, while highlighting areas for future research, such as IPv6-specific optimizations.

Keywords

Hash-based selection, Passive measurement, Multipoint measurements, IP networks, Packet sampling, Packet filtering, Hash functions, Network traffic, Entropy, Statistical analysis, Chi-Square test, Packet ID generation, Collision probability, Traffic engineering, QoS monitoring

Frequently Asked Questions

What is the core focus of this research?

The work focuses on the evaluation and implementation of hash-based packet selection as a method to reduce measurement data volumes while maintaining the representativeness of the collected samples in IP networks.

What are the primary fields studied in the thesis?

The central topics include passive multipoint measurements, the selection of optimal packet header fields for hashing, the rigorous performance and security evaluation of 25 specific hash functions, and the efficient generation of unique packet IDs.

What is the ultimate goal regarding hash-based selection?

The primary goal is to determine which hash function and hash input configuration can best emulate random sampling, thereby providing an unbiased and representative subset of network traffic for accurate performance analysis.

Which scientific methodology is employed for evaluation?

The author uses empirical analysis based on 7 distinct traffic trace groups, entropy measurement for feature selection, and Chi-Square independence tests to statistically validate the bias and representativeness of various selection methods.

What is covered in the main analysis section?

The main part covers the systematic evaluation of hash function properties like calculation speed, avalanche effect, and security, alongside the analysis of packet header entropy to minimize hash collisions.

Which keywords best characterize this work?

Key terms include hash-based selection, passive measurements, packet sampling, Chi-Square statistical validation, and packet ID generation for multipoint traceability.

Why are standard cryptographic hash functions often unsuitable here?

Cryptographic hash functions like SHA or MD5 are generally too computationally expensive for high-speed network measurement nodes, which often need to process packet selection at line speed.

How does this research address the security of the selection process?

The work discusses the vulnerabilities of linear hash functions against adversaries and emphasizes the necessity of using keyed hash functions to prevent intentional bias in packet selection.

Fin de l'extrait de 104 pages - haut de page

Résumé des informations

Titre: Evaluation of Hash Functions for Multipoint Sampling in IP Networks
Université: Technical University of Berlin
Note: 1
Auteur: Christian Henke (Auteur)
Année de publication: 2008
Pages: 104
N° de catalogue: V186592
ISBN (ebook): 9783869436067
ISBN (Livre): 9783869432953
Langue: anglais
mots-clé: evaluation hash functions multipoint sampling networks
Sécurité des produits: GRIN Publishing GmbH

Citation du texte: Christian Henke (Auteur), 2008, Evaluation of Hash Functions for Multipoint Sampling in IP Networks, Munich, GRIN Verlag, https://www.grin.com/document/186592

Evaluation of Hash Functions for Multipoint Sampling in IP Networks