Describes two hash functions. One was used to implement a randomized binary search tree and the other a hash function. Implementation details and empirical performance analysis are provided. A link to the full source code is also provided. The code is implemented in C++.
Table of Contents
1. Introduction
2. Hash Functions
3. Data Structures
3.1 Randomized Binary Search Tree
3.2 Hash Table
4. Performance
5. Conclusion
Objectives & Topics
The primary objective of this work is to present and evaluate specific hash function implementations designed for use in a randomized binary search tree and a fast hash table, focusing on collision resolution and performance efficiency in comparison to standard library containers.
- Design and application of custom hash functions for strings and integers.
- Implementation of a randomized binary search tree (rBST) with linked-list collision resolution.
- Construction and optimization of a high-performance hash table using multiple hashing functions.
- Comparative performance analysis against C++ Standard Template Library (STL) containers (std::map, std::unordered_map).
Excerpt from the Book
Randomized Binary Search Tree
The hash3 function was used to construct a randomized Binary Search Tree (rBST) and the get_numeric hash functions were used to construct a hash table. For rBST, the main data structure contains a class Value and pointers for left and right children as well as a pointer for next to construct a linked list. The linked list is needed for collision resolution. If there is a collision which occurs when two or more unique keys hash to the same value, the linked list is employed to store the data. Every node within the tree can have its own linked list as every node is possible to have the need for collision resolution. The left and right pointers allow implementation of the binary search tree in the usual way. The class Value contains to unsigned integer values, one for the result of the hash3 function applied to the key, and the other the original key value. For the sake of testing and without loss of generality, the Value class uses an original key which is also integer, but it can be of other types such as string or floating point.
The heart of the rBST structure is the hash of the original value prior to insertion into the tree. This hash is intended to randomize the values and prevent values from being inserted that are simply ordered sequences. The ordered sequences produce runs in the data structure equivalent in complexity to linked lists and are therefore O(N). However, since most binary search trees use recursive logic, if the run is too long, it will likely lead to even worse performance in practice and possibly stack overflow and crash the program. The idea behind using a hash function is to keep the tree balanced as possible and at the same time, not spend time re-balancing the tree as would be the case in Red-Black and AVL trees.
Summary of Chapters
1. Introduction: Briefly defines the role of hash functions in computer science and outlines the scope of the article, focusing on rBST and hash table implementations.
2. Hash Functions: Details the design and experimentation behind the "hash3" function and discusses how "inthash" leverages this for integer-based hashing.
3. Data Structures: Explains the architecture of the randomized Binary Search Tree and the custom Hash Table, specifically detailing how they utilize collision resolution through linked lists.
4. Performance: Provides a comparative analysis of the implemented structures against STL containers, demonstrating the impact of different optimization levels on execution time.
5. Conclusion: Summarizes findings, noting that while the rBST is competitive, the custom Hash Table outperformed standard library alternatives in the tested scenarios.
Keywords
Hash functions, Randomized Binary Search Tree, rBST, Hash table, Collision resolution, C++, Data structures, STL, Performance analysis, std::map, std::unordered_map, Algorithm optimization, Linked list, Integer hashing, String hashing
Frequently Asked Questions
What is the fundamental purpose of this work?
The paper explores how custom hash functions can be effectively utilized to build and optimize specific data structures like randomized binary search trees and fast hash tables to handle data collisions efficiently.
What are the central thematic areas?
The research focuses on hash function design, collision resolution strategies, the implementation of rBSTs, and comparative performance benchmarking against standard C++ containers.
What is the primary research goal?
The goal is to demonstrate that implementing custom data structures with specialized hashing logic can result in performance advantages or simplified implementation compared to standard library solutions.
Which scientific methods are employed?
The author uses an experimental approach involving C++ implementations, comparing the efficiency of operations (insertions, look-ups, deletions) in custom containers versus STL standard containers under various optimization settings.
What topics are covered in the main section?
The main section covers the design of "hash3", the structural implementation of rBST nodes with linked lists for collision management, and the construction of a templated hash table.
Which keywords characterize this work?
Key terms include hash functions, rBST, collision resolution, performance analysis, and C++ template data structures.
How does the rBST handle collisions?
The rBST employs a linked list at each node where a collision occurs, ensuring that unique keys hashing to the same value are correctly stored without losing data.
Why does the author prefer rBST over AVL or Red-Black trees?
The author suggests that using a hash function to keep the tree balanced via randomization is simpler to implement and avoids the overhead of explicit re-balancing logic required by Red-Black or AVL trees.
What did the performance tests reveal?
The tests revealed that while rBST performance is competitive with std::map, the custom hash table implemented by the author was the fastest among the tested containers, outperforming both std::map and std::unordered_map.
What does the author suggest for future work?
Future work includes exploring more sophisticated hash functions to further improve performance and investigating methods to make the hash table thread-safe (concurrent).
- Arbeit zitieren
- Dr. Roger Doss (Autor:in), 2013, On Some Hash Functions: Constructing a randomized binary search tree and a hash table, München, GRIN Verlag, https://www.grin.com/document/209288