# On Some Hash Functions: Constructing a randomized binary search tree and a hash table

## Technical Report, 2013

Excerpt

On Some Hash Functions

Dr. Roger G. Doss, PhD

## Introduction

Hash functions are very important in Computer Science. A hash function maps an input key with a much larger set of possible values to an output key with a restricted set of possible values. Traditional use of hash functions are in the implementation of hash tables. A hash table is an array where values are stored based on the outputted key value from a hash function. This article will present several hash functions as well as their intended use in implementing a randomized binary search tree (BST) and a fast hash table.

## Hash Functions

The first hash function to be discussed is called hash3 in the code that references it. It is the third version of a function originally conceived as an experiment to see how many unique values can be produced given unique string input. After experimenting with two other versions, this version was settled upon. The experiments utilized writing C++ code to extract unique words from sample text files and hashing them with this function. After the words were hashed, the integer output was compared for uniqueness. In the experiment, we observed that for the 10,000 unique words[1] that were hashed, the function produced 9962 unique integer values[2] which is an improvement over the previous version which had 8494 unique integer values. The function is:

illustration not visible in this excerpt

Notice that inthash is a function that uses hash3 to hash integers while hash3 is designed to hash strings. The design of hash3 focuses on a prime number set of random or pseudo-random integers which are used as exclusive or values while the string is iterated. The selection of which exclusive or value to use is based on the value of the character at that pass modulo the total number of possible values. There are prime many of these values such that the value is relatively prime to the character. The rest of the function accumulates the string value and shifts left or right one bit at each pass based on whether or not the value is odd at that pass.

The second hash function is a combination of hash functions based on existing hash functions. Instead of resolving collisions simply by using a collision list, the function uses four hash functions to look up a symbol. If all four functions fail to retrieve the data requests, the function then uses a collision list to find the data before ultimately giving up. The function is:

illustration not visible in this excerpt

Where t is an array of class node containing a pointer to the data and a pointer to next such that each entry in the table is capable of having its own linked list. The above code shows how four hash functions are tried in an attempt to find the data with a matching key, and that if none of the look ups succeed, then the last attempt's linked list is checked; otherwise the function fails to retrieve the item.

Insertion is similar, except the last step is not to search a linked list for an item; but to insert into a linked list:

illustration not visible in this excerpt

The four hash functions get_numeric* can be found here[3]. get_numeric is the sdbm hash function and get_numeric_1 is the djb2 function referenced in [5]. get_numeric_2 is the SAX hash function referenced in [6]. get_numeric_3 is the Boost hash function referenced in [4].

## Data Structures

### Randomized Binary Search Tree

The hash3 function was used to construct a randomized Binary Search Tree (rBST) and the get_numeric hash functions were used to construct a hash table. For rBST, the main data structure contains a class Value and pointers for left and right children as well as a pointer for next to construct a linked list. The linked list is needed for collision resolution. If there is a collision which occurs when two or more unique keys hash to the same value, the linked list is employed to store the data. Every node within the tree can have its own linked list as every node is possible to have the need for collision resolution. The left and right pointers allow implementation of the binary search tree in the usual way.

The class Value contains to unsigned integer values, one for the result of the hash3 function applied to the key, and the other the original key value. For the sake of testing and without loss of generality, the Value class uses an original key which is also integer, but it can be of other types such as string or floating point. The core binary search tree node class is given in part by:

illustration not visible in this excerpt

The heart of the rBST structure is the hash of the original value prior to insertion into the tree. This hash is intended to randomize the values and prevent values from being inserted that are simply ordered sequences. The ordered sequences produce runs in the data structure equivalent in complexity to linked lists and are therefore O(N). However, since most binary search trees use recursive logic, if the run is too long, it will likely lead to even worse performance in practice and possibly stack overflow and crash the program. The idea behind using a hash function is to keep the tree balanced as possible and at the same time, not spend time re-balancing the tree as would be the case in Red-Black and AVL trees. The key; however, is the hash function. The code for insertion requires that the class Value hash the original key which is done in its constructor and that this hash value is used inside the tree:

illustration not visible in this excerpt

[...]

Excerpt out of 13 pages

Details

Title
On Some Hash Functions: Constructing a randomized binary search tree and a hash table
College
Northcentral University
none
Author
Year
2013
Pages
13
Catalog Number
V209288
ISBN (eBook)
9783656374657
ISBN (Book)
9783656375050
File size
380 KB
Language
English
Notes
Article regarding two hash functions, one was used to construct a randomized binary search tree and the other a hash table. Implementation details and empirical performance measurements are included.
Tags
some, hash, functions, constructing
Quote paper
Dr. Roger Doss (Author), 2013, On Some Hash Functions: Constructing a randomized binary search tree and a hash table, Munich, GRIN Verlag, https://www.grin.com/document/209288