For new authors:
free, easy and fast
For registered authors
Technical Report, 2013
13 Pages, Grade: none
On Some Hash Functions
Dr. Roger G. Doss, PhD
Hash functions are very important in Computer Science. A hash function maps an input key with a much larger set of possible values to an output key with a restricted set of possible values. Traditional use of hash functions are in the implementation of hash tables. A hash table is an array where values are stored based on the outputted key value from a hash function. This article will present several hash functions as well as their intended use in implementing a randomized binary search tree (BST) and a fast hash table.
The first hash function to be discussed is called hash3 in the code that references it. It is the third version of a function originally conceived as an experiment to see how many unique values can be produced given unique string input. After experimenting with two other versions, this version was settled upon. The experiments utilized writing C++ code to extract unique words from sample text files and hashing them with this function. After the words were hashed, the integer output was compared for uniqueness. In the experiment, we observed that for the 10,000 unique words[1] that were hashed, the function produced 9962 unique integer values[2] which is an improvement over the previous version which had 8494 unique integer values. The function is:
illustration not visible in this excerpt
Notice that inthash is a function that uses hash3 to hash integers while hash3 is designed to hash strings. The design of hash3 focuses on a prime number set of random or pseudo-random integers which are used as exclusive or values while the string is iterated. The selection of which exclusive or value to use is based on the value of the character at that pass modulo the total number of possible values. There are prime many of these values such that the value is relatively prime to the character. The rest of the function accumulates the string value and shifts left or right one bit at each pass based on whether or not the value is odd at that pass.
The second hash function is a combination of hash functions based on existing hash functions. Instead of resolving collisions simply by using a collision list, the function uses four hash functions to look up a symbol. If all four functions fail to retrieve the data requests, the function then uses a collision list to find the data before ultimately giving up. The function is:
illustration not visible in this excerpt
Where t is an array of class node containing a pointer to the data and a pointer to next such that each entry in the table is capable of having its own linked list. The above code shows how four hash functions are tried in an attempt to find the data with a matching key, and that if none of the look ups succeed, then the last attempt's linked list is checked; otherwise the function fails to retrieve the item.
Insertion is similar, except the last step is not to search a linked list for an item; but to insert into a linked list:
illustration not visible in this excerpt
The four hash functions get_numeric* can be found here[3]. get_numeric is the sdbm hash function and get_numeric_1 is the djb2 function referenced in [5]. get_numeric_2 is the SAX hash function referenced in [6]. get_numeric_3 is the Boost hash function referenced in [4].
The hash3 function was used to construct a randomized Binary Search Tree (rBST) and the get_numeric hash functions were used to construct a hash table. For rBST, the main data structure contains a class Value and pointers for left and right children as well as a pointer for next to construct a linked list. The linked list is needed for collision resolution. If there is a collision which occurs when two or more unique keys hash to the same value, the linked list is employed to store the data. Every node within the tree can have its own linked list as every node is possible to have the need for collision resolution. The left and right pointers allow implementation of the binary search tree in the usual way.
The class Value contains to unsigned integer values, one for the result of the hash3 function applied to the key, and the other the original key value. For the sake of testing and without loss of generality, the Value class uses an original key which is also integer, but it can be of other types such as string or floating point. The core binary search tree node class is given in part by:
illustration not visible in this excerpt
The heart of the rBST structure is the hash of the original value prior to insertion into the tree. This hash is intended to randomize the values and prevent values from being inserted that are simply ordered sequences. The ordered sequences produce runs in the data structure equivalent in complexity to linked lists and are therefore O(N). However, since most binary search trees use recursive logic, if the run is too long, it will likely lead to even worse performance in practice and possibly stack overflow and crash the program. The idea behind using a hash function is to keep the tree balanced as possible and at the same time, not spend time re-balancing the tree as would be the case in Red-Black and AVL trees. The key; however, is the hash function. The code for insertion requires that the class Value hash the original key which is done in its constructor and that this hash value is used inside the tree:
illustration not visible in this excerpt
[...]
Scientific Essay, 7 Pages
Lesson Plan, 52 Pages
BWL - Unternehmensführung, Management, Organisation
Presentation (Elaboration), 18 Pages
Term Paper, 12 Pages
Seminar Paper, 29 Pages
Seminar Paper, 76 Pages
GRIN Publishing, located in Munich, Germany, has specialized since its foundation in 1998 in the publication of academic ebooks and books. The publishing website GRIN.com offer students, graduates and university professors the ideal platform for the presentation of scientific papers, such as research projects, theses, dissertations, and academic essays to a wide audience.
Free Publication of your term paper, essay, interpretation, bachelor's thesis, master's thesis, dissertation or textbook - upload now!