Computer Science

Hash Tables

Hash tables are data structures that store key-value pairs. They use a hash function to map keys to indices in an array, allowing for constant-time access to values. Hash tables are commonly used for fast lookups and insertions in computer programs.

Written by Perlego with AI-assistance

11 Key excerpts on "Hash Tables"

  • Book cover image for: Hands-On Data Structures and Algorithms with Python
    8 Hash Tables A hash table is a data structure that implements an associative array in which the data is stored by mapping the keys to the values as key-value pairs. In many applications, we mostly require different operations such as insert, search, and delete in a dictionary data structure. For example, a symbol table is a data structure based on a hash table that is used by the compiler. A compiler that translates a programming language maintains a symbol table in which keys are character strings that are mapped to the identifiers. In such situations, a hash table is an effective data structure since we can directly compute the index of the required record by applying a hash function to the key. So, instead of using the key as an array index directly, the array index is computed by applying the hash function to the key. It makes it very fast to access an element from any index from the hash table. The hash table uses the hashing function to compute the index of where the data item should be stored in the hash table. While looking up an element in the hash table, hashing of the key gives the index of the corresponding record in the table. Ideally, the hash function assigns a unique value to each of the keys; however, in practice, we may get hash collisions where the hash function generates the same index for more than one key. In this chapter, we will be discussing different techniques that deal with such collisions. In this chapter, we will discuss all the concepts related to these, including: Hashing methods and hash table techniques Different collision resolution techniques in Hash Tables Introducing Hash Tables As we know, arrays and lists store the data elements in sequence. As in an array, the data items are accessed by an index number. Accessing array elements using index numbers is fast. However, they are very inconvenient to use when it is required to access any element when we can’t remember the index number
  • Book cover image for: Beginning Java Data Structures and Algorithms
    No longer available |Learn more

    Beginning Java Data Structures and Algorithms

    Sharpen your problem solving skills by learning core computer science concepts in a pain-free manner

    Hash Tables and Binary Search Trees

    In the preceding chapter, we introduced the concept of data structures by looking at arrays, linked lists, queues, and stacks. In this chapter, we will use some of these primitive structures to build more complex ones. We'll start the chapter by looking at Hash Tables, which are useful data structures for fast key-value lookup. In the second part of the chapter, we will learn about a more complex data structure that supports range queries, called binary trees.
    By the end of this chapter, you will be able to:
    • Describe how Hash Tables work
    • Implement two main techniques to deal with hash collisions
    • Characterize different hashing choices
    • Explain the terminology, structure, and operations of binary trees
    • Demonstrate various tree traversal techniques
    • Define balanced binary search trees
    Passage contains an image

    Introducing Hash Tables

    A data structure that gives us the ability to insert, search, and optionally delete elements in a collection is called a data dictionary . Commonly, the type of data used is a key-value pair association, where we insert the key-value pair but search using a key to obtain the value.
    Hash Tables provide us with a fast data structure for organizing these key value pairs and implementing our data dictionary. They are useful in a wide variety of applications due to the quick lookup and ease of use for in-memory data storage. Insertion and search operations have a typical average runtime complexity of O(1) .
    Passage contains an image

    Understanding Hash Tables

    Let's look at an example problem to help us understand the need for Hash Tables. Imagine you are a teacher, instructing a class of a maximum capacity of 30 students. The students sit at their assigned desks every day. To make your life easier, you decide to assign a sequential number from one to 30 to each desk. You then use this number to identify each student, and use your self-developed app to bring up the student's records after you enter the desk number (see Figure 3.1
  • Book cover image for: Data Structures and Program Design Using C
    No longer available |Learn more

    Data Structures and Program Design Using C

    A Self-Teaching Introduction

    Answer. Hashing is the process of mapping keys to their appropriate locations in the hash table. It is the most effective technique of searching the values in an array or in a hash table. 430 • DATA STRUCTURES AND PROGRAM DESIGN USING C 10.1.2 Hash Tables A hash table is a data structure which supports one of the efficient searching techniques, that is, hashing. A hash table is an array in which the data is accessed through a special index called a key. In a hash table, keys are mapped to the array positions by a hash function. A hash function is a function, or we can say that it is a mathematical formula, which when applied to a key, produces an integer which is used as an index to find a key in the hash table. Thus, a value stored in a hash table can be searched in O(1) time with the help of a hash function. The main idea behind a hash table is to establish a direct mapping between the keys and the indices of the array. FIGURE 10.3. Mapping of keys using a direct addressing method. HASHING • 431 10.1.3 Hash Functions A hash function is a mathematical formula which when applied to a key, produces an integer which is used as an index to find a key in the hash table. Characteristics of the Hash Function There are four main characteristics of hash functions which are: 1. The hash function uses all the input data. 2. The hash function must generate different hash values. 3. The hash value is fully determined by the data being hashed. 4. The hash function must distribute the keys uniformly across the entire hash table. FIGURE 10.4. Mapping of keys to the hash table using hashing. 432 • DATA STRUCTURES AND PROGRAM DESIGN USING C Different Types of Hash Functions In this section, we will discuss some of the common hash functions: 1. Division Method – In the division method, a key k is mapped into one of the m slots by taking the remainder of k divided by m.
  • Book cover image for: Advanced Data Structures
    9 Hash Tables Hash Tables are a dictionary structure of great practical importance and can be very efficient. The underlying idea is quite simple: we have a universe U and want to store a set of objects with keys from U . We also have s buckets and a function h from U to S = {0, . . . , s − 1}. Then we store the object with key u in the h(u)th bucket. If several objects that we want to store are mapped to the same bucket, we have a collision between these objects. If there are no collisions, then we can realize the buckets just as an array, each array entry having space for one object. The theory of Hash Tables mainly deals with the questions of what to do about the collisions and how to choose the function h in such a way that the number of collisions is small. The idea of Hash Tables is quite old, apparently starting in several groups at IBM in 1953 (Knott 1972). For a long time the main reason for the popularity of Hash Tables was the simple implementation; the hash functions h were chosen ad hoc as some unintelligible way to map the large universe to the small array allocated for the table. It was the practical programmer’s dictionary structure of choice, easily written and conceptually understood, with no performance guarantees, and it still exists in this style in many texts aimed at that group. The development and analysis of hash table methods that are provably good in some sense started only in the 1980s, and now a well-designed hash table can indeed be a very efficient structure. 9.1 Basic Hash Tables and Collision Resolution If we map the keys of a big universe U to a small set S = {0, . . . , s − 1}, then it is unavoidable that many universe elements are mapped to the same element of S . In a dictionary structure, we do not have to store the entire universe, but only some set X ⊂ U of n keys for the objects currently in the dictionary. But if we 374
  • Book cover image for: Data Structure Using C
    eBook - ePub

    Data Structure Using C

    Theory and Program

    • Ahmad Talha Siddiqui, Shoeb Ahad Siddiqui(Authors)
    • 2023(Publication Date)
    • CRC Press
      (Publisher)
    Hash Tables support one of the most efficient types of searching: hashing. Fundamentally, a hash table consists of an array in which data is accessed via a special index called a key. The primary idea behind a hash table is to establish a mapping between the set of all possible keys and positions in the array using a hash function. A hash function accepts a key and returns its hash value. Keys vary in type, but coding are always integers. Since both computing a hash value and indexing into an array can be performed in constant time, the beauty of hashing is that we can use it to perform constant time searches. When a hash function can guarantee that no two keys will generate the same hash coding, the resulting hash table is said to be directly addressed. This is ideal, but direct addressing is rarely possible in practice. Typically, the number of entries in a hash table is small relative to the universe of possible keys. Consequently, most hash functions map some keys to the same position in the table. When two keys map to the same position, they collide. A good hash function minimises collisions, but we must still be prepared to deal with them.

    9.3 Applications of Hash Tables

    Some applications of Hash Tables are:

    1. Database Systems

    Generally, database systems try to optimise between two types of access method: sequential and random. Hash Tables are an important part of efficient random access because they provide a way to locate data in a constant amount of time.

    2. Symbol Tables

    The tables used by compilers to maintain information about symbols from a program. Compilers access information about symbols frequently. Therefore, it is important that symbol tables be implemented very efficiently.

    3. Tagged Buffers

    A mechanism for storing and retrieving data in a machine - independent manner. Each data member resides at fixed offset in the buffer. A hash table is stored in the buffer so that the location of each tagged member can be ascertained quickly. One use of a tagged buffer is sending structured data across a network to a machine whose byte ordering and structure alignment may not be same as the original host’s. The buffer handles these concerns as the data is stored and extracted member by member.

    4. Data Dictionaries

    Data Structures that support adding, deleting, and searching for data. Although the operations of a hash table and a data dictionary are similar, other data structures may be used to implement data dictionaries. Using a hash table is particularly efficient.

    9.4 Hashing

    Hashing is a technique to convert a range of key values into a range of indexes of an array. We’re going to use modulo operator to get a range of key values. Consider an example of hash table of size 20 with following items stored in it. Item are in (key, value) format.
  • Book cover image for: C++ Data Structures and Algorithms

    Associating a Value to a Key in a Hash Table

    In the previous chapter, we discussed the hierarchical tree data type, which is a non-linear data type, that stores data in a tree-like structure. In this chapter, we are going to discuss another non-linear data type, the hash table, which stores data based on a key. The following topics are discussed in this chapter:
    • Understanding Hash Tables
    • Preventing a collision in a hash table
    • Using a separate chaining technique to handle a collision
    • Using an open addressing technique to handle a collision
    Passage contains an image

    Technical requirement

    To follow along with this chapter, including the source code, we require the following:
    • Desktop PC or Notebook with Windows, Linux, or macOS
    • GNU GCC v5.4.0 or above
    • Code::Blocks IDE v17.12 (for Windows and Linux OS), or Code::Blocks IDE v13.12 (for macOS)
    • You will find the code files on GitHub at https://github.com/PacktPublishing/CPP-Data-Structures-and-Algorithms
    Passage contains an image

    Getting acquainted with Hash Tables

    Suppose we want to store a collection of numbers, for instance, a phone number, and let's say we have approximately 1,000,000 numbers. In previous chapters, we also discussed several data structures, and here we can consider using one of them. We can use an array or a list, but we have to provide a million slots of data in the array. If we need to add some phone numbers again, we have to resize the array. Also, the operation of searching will be costly, since we have to use a linear search algorithm with time complexity O(N) , where the time consumption will increase if we add data to the list. Indeed, we can use a binary search algorithm with O(log N) time complexity if we manage to sort the elements of the list containing the bunch of phone numbers; however, the insert operation will be costly, since we have to maintain the sorted list.
    Another data structure we can choose is the balanced binary search tree. It can give us a moderate time complexity, since it will be O(log N)
  • Book cover image for: Essential Algorithms
    eBook - ePub

    Essential Algorithms

    A Practical Approach to Computer Algorithms Using Python and C#

    • Rod Stephens(Author)
    • 2019(Publication Date)
    • Wiley
      (Publisher)
    dictionaries.
    The process of mapping a key value for use by the hash table is called hashing. Good hashing functions spread out key values so that they don't all go to the same position in the table. In particular, key values are often similar, so a good hashing function maps similar key values to dissimilar locations in the table.
    For example, suppose that you want to store customer records in a hash table and look them up by name. If two customers have the last names Richards and Richardson, ideally the hashing function should map them to two different locations.
    To achieve this, hashing functions often generate a value that looks something like gibberish, as if the key value had been chopped into hash.
    If you put enough values in a hash table, eventually you'll find two keys that hash to the same value. That's called a collision. When that occurs, you need a collision-resolution policy that determines what to do. Often the collision resolution policy maps the key to a series of new positions in the table until it finds an empty position.
    A hash table's fill percentage, the percentage of the table that contains entries, influences the chance of collisions occurring. Adding a new key to a hash table is more likely to cause a collision if the table's data structure is 95 percent full than if it's only 10 percent full.
    To summarize, a hash table needs the following:
    • A data structure to hold the data
    • A hashing function to map keys to locations in the data structure
    • A collision-resolution policy that specifies what to do when keys collide
    To be useful, a hash table must be able to at least add new items and locate items that were previously stored. Another feature that is useful but not provided by some Hash Tables is the ability to remove a hashed key.
  • Book cover image for: Data Structures And Algorithms
    However, sets and functions have no order and they can be accessed with any element of a table by index. Thus, information retrieval from a list naturally invokes a search. The time required to search a list generally depends on how many elements the list has, but the time for accessing a table does not depend on the number of elements in the table. For this reason, table access is significantly faster than list searching in many applications. Finally, we should clarify the difference between table and array. In general, we shall use table as we have defined it previously and restrict the term array to mean the programming feature available in most high-level languages and used for implementing both tables and contiguous lists. 10.5. Static Hashing 10.5.1. Hashing Function When we use a hash function for inserting the data (a key, value pair) into the hash table, there are three rules that must be considered. First of all, the hash function should be easy and quick to compute, otherwise too much time is wasted to put data into the table. Second, the hash function should achieve an even distribution of all data (keys), i.e. all data should be evenly distributed across the table. Third, if the hash function is evaluates the same data many times, the same result should be returned. Thus, the data (value) can be retrieved (by key) without failure. Therefore, a selection of hash function is very important. If we know the types of the data, we can select a hash function that will be more efficient. Unfortunately, we usually do not know the types of the data, so we can use three normal methods to set up a hash function. Truncation It means to take only part of the key as the index. For example, if the keys are more than two-digit integers and the hash table has 100 entries (the index is 0~99), the hash function can use the last two digits of the key as the index. For instance, the data with the key, 12345 will be put into the 45th entry of the table.
  • Book cover image for: Big C++
    eBook - PDF

    Big C++

    Late Objects

    • Cay S. Horstmann(Author)
    • 2017(Publication Date)
    • Wiley
      (Publisher)
    In the simplest implementation of a hash table, you could make a very long array and insert each element at the location of its hash code (see Figure 2). A hash function computes an integer value from an object. A good hash function minimizes collisions—identical hash codes for different values. A hash table uses the hash code to determine where to store each element. . . . . . . . . . . . . [70068] [74478] [74656] Eve Jim Joe Figure 2 A Simplistic Implementation of a Hash Table 506 Chapter 15 Sets, Maps, and Hash Tables A good hash function produces different hash codes for each value so that they are scattered about in a hash table. © one clear vision/iStockphoto. If there are no collisions, it is a very simple matter to find out whether a value is already present in the set or not. Compute its hash code and check whether the array position with that hash code is already occupied. This doesn’t require a search through the entire array! Of course, it is not feasible to allocate an array that is large enough to hold all pos- sible integer index positions. Therefore, we must pick an array of some reasonable size and then “compress” the hash code to become a valid array index. Compression can be easily achieved by using the remainder operation: int h = hash_code(x); h = h % len; if (h < 0) { h = -h; } See Exercise E15.17 for an alternative compression technique. After compressing the hash code, it becomes more likely that several elements will collide. There are several techniques for handling collisions. The most common one is called separate chaining. All colliding elements are collected in a linked list of ele- ments with the same position value (see Figure 3). Such a list is called a “bucket”. Special Topic 15.3 discusses open addressing, in which colliding elements are placed in empty locations of the hash table. A hash table can be implemented as an array of buckets— sequences of nodes that hold elements with the same hash code.
  • Book cover image for: C++ Data Structures and Algorithm Design Principles
    No longer available |Learn more

    C++ Data Structures and Algorithm Design Principles

    Leverage the power of modern C++ to build robust and scalable applications

    • John Carey, Shreyans Doshi, Payas Rajan(Authors)
    • 2019(Publication Date)
    • Packt Publishing
      (Publisher)
    O(n) , where n is the number of words in the dictionary, which is not only huge but is also increasing day by day.
    Hence, we need more efficient algorithms to allow for lookup that works much faster. We'll look at a couple of efficient structures in this chapter, that is, Hash Tables and bloom filters. We'll implement both of them and compare their pros and cons.

    Hash Tables

    Let's look at the very basic problem of searching in a dictionary. There are about 170,000 words in the Oxford English Dictionary. As we mentioned in the Introduction, a linear search will take O(n) time, where n is the number of words. A better way to store the data is to store it in a height-balanced tree that has similar properties to a BST. This makes it much faster than linear search as it has a time complexity of only O(log n) . But for applications that require tons of such queries, this is still not a good enough improvement. Think about the time it will take for data containing millions or billions of records, such as neuroscientific data or genetic data. It would take days to find something in the data. For these situations, we need something much faster, such as a hash table .
    One of the integral parts of Hash Tables is hashing . The idea behind this is to represent each value with a possibly unique key and, later on, use the same key to check for the presence of the key or to retrieve a corresponding value, depending on the use case. The function that derives a unique key from the given data is called a hash function. Let's look at how we can store and retrieve data by looking at some examples, and let's learn why we need such a function.

    Hashing

    Let's take one simple example before jumping into hashing. Let's say we have a container storing integers, and we want to know if a particular integer is part of the container or not as quickly as possible. The simplest way is to have a Boolean array with each bit representing a value that's the same as its index. When we want to insert an element, we'll set the Boolean value corresponding to that element to 0 . To insert x , we simply set data[x] = true . Checking whether a particular integer, x , is inside the container is just as simple — we simply check whether data[x] is true . Thus, our insertion, deletion, and search functions become O(1) . A simple hash table for storing integers numbered from 0 to 9
  • Book cover image for: Object-Orientation, Abstraction, and Data Structures Using Scala
    • Mark C. Lewis, Lisa Lacher, Lisa L. Lacher(Authors)
    • 2017(Publication Date)
    iterator is simpler with open addressing than with chaining because we do not have a 2D structure. We can make a single counter, we just have to have a bit of extra code that advances it to elements that store contents.
    On the other hand, growTable is a bit more complex than before because we cannot simply insert into a List . Even when we grow the table, we still have to worry about probing as the new table is likely to still have collisions, even if there are no duplicate keys.

    22.4    End of Chapter Material

    22.4.1    Summary of Concepts
    • Hash Tables are an alternate implementation choice for the Map interface that can provide O(1) performance for searching, adding, and removing.
    • A hash table is an array of entries paired with a function, called the hash function, that can convert keys to integers in the range of valid indices for the array.
    • The Any type includes a method called hashCode that will convert an instance of whatever type to an Int .
    • The two main methods of going from an Int to one that is in the range of valid indices for the table are the division (which actually uses modulo) and multiplication methods.
    • When two different keys map to the same index, it is called a collision. We discussed two different ways of dealing with collisions.
      – Chaining has each entry in the table store a list of the values that were mapped to that index. – Open addressing only stores one value per entry and uses probing to test other locations when entries are filled.
    22.4.2    Exercises
    1. You now potentially have many implementations of the Map ADT as we have discussed how it can be done with a sequence of tuples, a basic BST, an AVL-balanced BST, and now two different hash table implementations. Do performance testing for all of the implementations that you have to see which ones are fastest for different combinations of operations.
    2. For the chaining implementation, there are valid arguments to use a mutable.Buffer[(K, V)] instead of a List[(K, V)]
Index pages curate the most relevant extracts from our library of academic textbooks. They’ve been created using an in-house natural language model (NLM), each adding context and meaning to key research topics.