Computer Science

B Tree

A B-tree is a data structure used for organizing and storing data in a database. It is a balanced tree structure where each node can have multiple children and is used to optimize disk reads and writes. B-trees are commonly used in file systems and databases.

Written by Perlego with AI-assistance

11 Key excerpts on "B Tree"

  • Book cover image for: Disk-Based Algorithms for Big Data
    • Christopher Healey, Christopher G. Healey(Authors)
    • 2016(Publication Date)
    • CRC Press
      (Publisher)
    1 Significant research followed in the 1970s to improve upon the initial B-tree algorithms. In the 1980s B-trees were applied to database management systems, with an emphasis on how to ensure consistent B-tree access patterns to support concurrency control and data recovery. More recently, B-trees have been applied to disk management, to support efficient I/O and file versioning. For example, ZFS is built on top of an I/O efficient B-tree implementation.
    The original B-tree algorithm was designed to support efficient access and maintenance of an index that is too large to hold in main memory. This led to three basic goals for a B-tree.
    1.  Increase the tree’s node size to minimize seeks, and maximize the amount of data transferred on each read. 2.  Reduce search times to a very few seeks, even for large collections. 3.  Support efficient local insertion, search, and deletion.
    FIGURE 6.4 An order-4 B-tree with 3 keys per node, constructed from letters of the alphabet in a random order
    FIGURE 6.5 An order-1001 B-tree with 1000 keys per node; three levels yield enough space for about 1.1 billion keys
    The key insight of B-trees is that the tree should be built bottom-up, and not top-down. We begin by inserting keys into a single leaf node. When this leaf node over-flows, we split it into two half-full leaves and promote a single key upwards to form a new root node. Critically, since we defer the promotion until the leaf overflows, we can pick the key that does the best job of partitioning the leaf. This split–promote operation continues throughout the life of the B-tree.
    A B-tree is a generalization of a BST. Rather than holding 1 key and pointers to two subtrees at each node, we hold up to k − 1 keys and k subtree references. This is called an order-k B-tree. Using this terminology, a BST is an order-2 B-tree. Figure 6.4 shows an order-4 B-tree used to store the same collection of keys we inserted into the paged BST in Figure 6.3 .
    Although our examples have low order, a B-tree node will normally hold hundreds or even thousands of keys per node, with each node sized to fill one or more disk pages. Figure 6.5
  • Book cover image for: Data Structures
    eBook - PDF

    Data Structures

    Abstraction and Design Using Java

    • Elliot B. Koffman, Paul A. T. Wolfgang(Authors)
    • 2021(Publication Date)
    • Wiley
      (Publisher)
    B-trees were developed to store indexes to databases on disk storage. Disk storage is broken into blocks, and the nodes of a B-tree are sized to fit in a block, so each disk access to the index retrieves exactly one B-tree node. The time to retrieve a block is large compared to the time required to process it in memory, so by making the tree nodes as large as possible, we reduce the number of disk accesses required to find an item in the index. Assuming a block can store a node for a B-tree of order 200, each node would store at least 100 items. This would enable 100 4 or 100 million items to be accessed in a B-tree of height 4. The insertion process for a B-tree is similar to that of a 2–3 tree, and each insertion is into a leaf. For Figure 9.48, a number less than 10 would be inserted into the leftmost leaf; a num- ber greater than 40 would be inserted into the rightmost leaf; and numbers between 11 and 39 would be inserted into one of the interior leaves. A simple case is insertion of the number 39 into the fourth child of the root node (shown in color in Figure 9.49). However, if the leaf being inserted into is full, it is split into two nodes, each containing approximately half the items, and the middle item is passed up to the split node’s parent. If the parent is full, it is split and its middle item is passed up to its parent, and so on. If a node 22 10 30 40 15 13 18 20 35 32 38 7 5 8 27 26 46 42 F I G U R E 9 . 4 8 Example of a B-Tree 22 10 30 40 15 13 18 20 35 32 38 39 7 5 8 27 26 46 42 F I G U R E 9 . 4 9 B-Tree after Inserting 39 472 Chapter 9 Self-Balancing Search Trees being split is the root of the B-tree, a new root node is created, thereby increasing the height of the B-tree. The children of the new root will be the two nodes that resulted from splitting the old root.
  • Book cover image for: Disk-Based Algorithms for Big Data
    • Christopher G. Healey(Author)
    • 2016(Publication Date)
    • CRC Press
      (Publisher)
    Reduce search times to a very few seeks, even for large collections. 3. Support e ffi cient local insertion, search, and deletion. The key insight of B-trees is that the tree should be built bottom-up, and not top-down. We begin by inserting keys into a single leaf node. When this leaf node over-flows, we split it into two half-full leaves and promote a single key upwards to form a new root node. Critically, since we defer the promotion until the leaf overflows, we can pick the key that does the best job of partitioning the leaf. This split–promote operation continues throughout the life of the B-tree. A B-tree is a generalization of a BST. Rather than holding 1 key and pointers to two subtrees at each node, we hold up to k − 1 keys and k subtree references. This is called an order-k B-tree. Using this terminology, a BST is an order-2 B-tree. Figure 6.4 shows an order-4 B-tree used to store the same collection of keys we inserted into the paged BST in Figure 6.3 . Although our examples have low order, a B-tree node will normally hold hun-dreds or even thousands of keys per node, with each node sized to fill one or more disk pages. Figure 6.5 shows the number of nodes and keys in the first three levels of an order-1001 B-tree. Even with a very small height of 3, the tree holds more than a billion keys. We need at most 3 seeks to find any key in the tree, producing exactly the type of size : seek ratio we are looking for. 68 squaresolid Disk-Based Algorithms for Big Data The B-tree algorithm guarantees a number of important properties on order-k B-trees, to ensure e ffi cient search, management, and space utilization. 1. Every tree node can hold up to k − 1 keys and k subtree references. 2. Every tree node except the root holds at least ⌈ k slashbig 2 − 1 ⌉ keys. 3. All leaf nodes occur at the same depth in the tree. 4. The keys in a node are stored in ascending order.
  • Book cover image for: Data Science with Semantic Technologies
    eBook - PDF

    Data Science with Semantic Technologies

    Theory, Practice and Application

    • Archana Patel, Narayan C. Debnath, Bharat Bhusan(Authors)
    • 2022(Publication Date)
    • Wiley-Scrivener
      (Publisher)
    Keywords: Data structures, algorithms, balanced binary search trees, AVL-trees, Red-Black trees, partitioning *Email: [email protected] 140 Data Science with Semantic Technologies 6.1 Introduction Binary trees are hierarchical data structures to represent collections of items. They are now employed in a variety of computer science fields: memory management, compilers, mathematics, etc. Among the advanced data structures on binary trees, we can cite binary search trees, tries and balanced trees. These structures are more complex and permit to represent large sets. Binary search trees are hierarchical and dynamic data structures. They denote sets in which the members are arranged in a linear order. All values stored in the left subtree of any node x are fewer than the one stored at x, and all values stored in the right subtree of x are higher than the one stored at x, and that is an important property of a binary search tree. This is true for each node in a binary search tree (binary search tree property). Most books about data structures and algorithms contain an important part on binary search trees. The most referenced books are certainly the ones of Knuth and Aho & co. [1, 2]. Many other books have more practical aspects. We can retain as examples: [3] offer various examples of binary tree operations using Pascal language [4]; explore binary search tree imple- mentations in ML and Prolog, among other languages [5]; outlines binary search trees’ basic structure and operations [6–8]; contain more program- ming of binary search trees in C and C++. A delicate part of algorithms on binary search trees consists in main- taining them balanced or balancing them from time to time to keep good performance of operations. Recall that balancing the tree makes the cost of finding any key in lg (n). There are many variations of balanced binary search trees.
  • Book cover image for: Hands-On Data Structures and Algorithms with Python
    BST ) is a special kind of binary tree. It is one of the most important and commonly used data structures in computer science applications. A binary search tree is a tree that is structurally a binary tree, and stores data in its nodes very efficiently. It provides very fast search, insertion, and deletion operations.
    A binary tree is called a binary search tree if the value at any node in the tree is greater than the values in all the nodes of its left subtree, and less than (or equal to) the values of all the nodes of the right subtree. For example, if K1 , K2 , and K3 are key values in a tree of three nodes (as shown in Figure 6.22 ), then it should satisfy the following conditions:
    • The key values K2<=K1
    • The key values K3>K1
    The following figure depicts the above condition of the binary search tree: Figure 6.22: An example of a binary search tree
    Let’s consider another example so that we have a better understanding of binary search trees. Consider the binary search tree shown in Figure 6.23 :
    Figure 6.23: Binary search tree of six nodes
    In this tree, all the nodes in the left subtree are less than (or equal to) the value of the parent node. All the nodes in the right subtree of this node are also greater than that of the parent node.
    To see if the above example tree fulfills the properties of a binary search tree, we see that all the nodes in the left subtree of the root node have a value less than 5 . Likewise, all the nodes in the right subtree have a value that is greater than 5 . This property applies to all the nodes in the tree with no exceptions. For example, if we take another node with the value 3 , we can see that the values for all the left subtree nodes are less than the value 3 and the values for all the right subtree nodes are greater than 3 .
    Considering another example of a binary tree. Let’s check to see if it is a binary search tree. Despite the fact that the following diagram, Figure 6.24 , looks similar to the previous diagram, it does not qualify as a binary search tree, as node 7 is greater than the root node 5 ; even though it is located in the left subtree of the root node. Node 4 is to the right subtree of its parent node 7 , which is also violating a rule of binary search trees. Thus, the following figure, Figure 6.24
  • Book cover image for: PHP 7 Data Structures and Algorithms
    O(n) time, and we will explore it in the next chapter. Here is an example of a binary search tree:
    Passage contains an image

    Self-balanced binary search tree

    A self-balanced binary search tree or height-balance binary search tree is a special type of binary search tree that attempts to keep the height or number of levels of the tree as small as possible all the time by adjusting automatically. For example, the following diagram shows a binary search tree on the left and a the self-balanced binary search tree on the right:
    A height-balanced binary tree is always better as it helps search operations faster compared to a regular BST. There are different implementations of self-balanced or height-balanced binary search trees. Some of the popular ones are as follows:
    • AA tree
    • AVL tree
    • Red-black tree
    • Scapegoat tree
    • Splay tree
    • 2-3 tree
    • Treap
    We will discuss few of the height-balanced trees in the following sections.
    Passage contains an image

    AVL tree

    An AVL tree is a self-balancing binary search tree where the heights of two child subtrees of a node will differ by a maximum of 1. If the height increases, in any case, there will be a rebalance to make the height difference to 1. This gives the AVL tree an added advantage of logarithmic complexity for different operations. Here is an example of an AVL tree:
    Passage contains an image

    Red-black tree

    A red-black tree is a self-balanced binary search tree with some extra properties, which is the color. Each node in the binary tree stores one extra bit of information, which is known as color and can have either red or black as values. Like an AVL tree, a red-black tree is also used for real-time applications as the average and worst case complexity is also logarithmic. A sample red-black tree looks like this:
  • Book cover image for: Advanced Data Structures
    In the 1970s, computers were still very memory limited but usually already had a large external memory, so that it was a necessary consideration how a structure operates when a large part of it is not in main memory, but on external memory. This situation is now less important, but it is still relevant for database applications, where B-tree variants are still much used as index structures. The problem with normal binary search trees as external memory structure is that each tree node could be in a different external memory block, which becomes known only when the previous block has been retrieved from the external memory. So we might need as many external memory block accesses as the height of the tree, which is more than log 2 (n), and would be interested in each of these blocks, which are large enough to hold many nodes, in just a single node. The idea of B-trees is to take each block as a single node of high degree. In the original version, each node has degree between a and 2a − 1, where a is chosen as large as possible under the condition that a block must have room for 2a − 1 pointers and keys. Then balance was maintained by the criterion that all leaves should be at the same depth. The degree interval a to 2a − 1 is the smallest interval for which the re- balancing algorithm from Bayer and McCreight (1972) works. Because each block has room for at most 2a − 1 elements and is at least half full this way, it sounded like a good choice to optimize the space utilization. But then it was discovered by Huddleston and Mehlhorn (1982) and independently by Maier and Salveter (1981) that choosing the interval a bit larger makes an important difference for the rebalancing algorithm; if one allows node degrees from a to b for b ≥ 2a, then rebalancing changes only amortized O(1) blocks, whereas for b = 2a − 1, the original choice, (log n) block changes can be necessary.
  • Book cover image for: Data Structures And Algorithms
    We may use nodes of the tree to look up information, add, delete, or change. So, we want to make it into a binary search tree with the list or file of nodes. It is difficult to manipulate sequential ordered nodes to be efficient, for example, sequential search time is slow rather than binary search. Therefore, we try to construct them into a tree efficiently. In this section, we study the binary search tree to have some issues. One of them is an implementation for ordered lists in which we can search quickly, and another is insertion and deletions quickly in ordered list. 11.2.1. Definition We can use a binary search tree to solve the problems efficiently. So we can make comparison trees showing the progress of binary search by moving Binary Trees 239 node either left or right. If the target key is smaller than the one in the current node in the tree, moving is left. If the target key is larger than the one in the current node in the tree, moving is right. Thus concept of the target key is very important in binary search. Definition. A binary search tree is a binary tree. This binary tree is either empty or in which every node contains a key and satisfies the following properties: (1) The key in the left subtree of a node (if any) is less than the key in its parent node. (2) The key in the right subtree of a node (if any) is greater than the key in its parent node. (3) The left and right subtrees of the root are the binary search tree. This definition always assumes that not more than two entries may have equal keys in a binary search tree. If this definition allows entries with equal keys, it must be changed. From this definition, we can see that a binary search tree has the ordering relative to the key in the root node of the binary tree. Also each subtree is another binary search tree recursively.
  • Book cover image for: Data Structures and Algorithms Implementation Through C
    • Brijesh Bakariya, Dr. Brijesh Bakariya(Authors)
    • 2018(Publication Date)
    • BPB Publications
      (Publisher)
    B-Tree is a self-balanced search tree with multiple keys in every node and more than two children for every node. In a B-tree, all leaf nodes should be in the same level. It comes under the balancing property.
    If M-order B-tree then:
    1. All nodes other than root should contain M/2 children minimum and M/2 -1 keys minimum.
    2. All nodes should contain M children maximum and M-1 keys maximum.
    The maximum no of children is called degree or order. Let's take an example for understanding B-Tree.
    Example: Construct a B-tree having degree 5 and following keys:
    5, 10, 12, 13, 14, 1, 2, 3, 4.
    Solution: In the previous property, the maximum children are 5 and maximum keys are 4. The minimum children is 5/2 and minimum keys 5/2-1.
    The minimum children are 5/2 =2.5 take 3. The minimum keys is 2. There are following steps to construct a B-tree.

    10.18 B+ Tree

    B+ trees is an extension to B Trees. There are following advantages of B+ tree. It is similar to B Trees, with few differences.
    1. The B + -Tree consists of two types of nodes (1) internal nodes and (2) leaf nodes
    2. Internal nodes point to other nodes in the tree.
    3. Leaf nodes point to data in the database by using data pointers. The data is stored in the leaf nodes and all other nodes store in the indexes.
    4. Leaf nodes are linked to each other by using sibling pointer in sequential manner to form a linked list.
    5. Only leaf nodes needs to be traversed to scan the entire tree as data is present only in the leaf nodes without visiting the higher nodes at all reducing the block accesses to a great extent.
    6. Traversal is faster as compared to B Trees in which the data is present in all the nodes which in turn would require more number if block accesses.
    7. Just like B Trees, B+ trees are also balanced trees (every path from root node to leaf node has same length) and every node except the root must be at least half full. Root may contain a minimum of two entries.
  • Book cover image for: Data Structures Through C++
    eBook - ePub

    Data Structures Through C++

    Experience Data Structures C++ through animations

    Chapter 07

    Trees

    Of Herbs, Shrubs and Bushes

    Why This Chapter Matters?

    Nature is man's best teacher. In every walk of life man has explored nature, learnt his lessons and then applied the knowledge that nature offered him to solve every-day problems that he faced at work- place. It isn't without reason that there are data structures like Trees, Binary Trees, Search Trees, AVL Trees, Forests, etc. Trees are non-linear data structures. They have many applications in Computer Science, hence you must understand them comprehensively.
     
    I f large input data is stored in a linked list then time required to access the data is prohibitive. In such cases a data structure called Tree is used. This data structure is often used in constructing the file systems and evaluation of arithmetic expressions. This data structure gives a running time of O (log n) for most operations.
    Like linked lists, a tree also consists of several nodes. Each node may contain links that point to other nodes in the tree. So a tree can be used to represent a person and all of his descendants as shown in Figure 7-1 .
    Figure 7-1. A tree structure .
    Note that each node in this tree contains a name for data and one or more pointers to the other tree nodes. Although a tree may contain any number of pointers to the other tree nodes, a large number of have at the most two pointers to the other tree nodes. Such trees are called Binary trees .

    Binary Trees

    Let us begin our study of binary trees by discussing some basic concepts and terminology.
    A binary tree is a finite set of elements that is either empty or is partitioned into three disjoint sub-sets. The first sub-set contains a single element called the root of the tree. The other two sub-sets are themselves binary trees, called the left and right sub-trees of the original tree. A left or right sub-tree can be empty.
    Each element of a binary tree is called a node of the tree. The tree shown in Figure 7-2(a) consists of nine nodes with A as its root. Its left sub-tree is rooted at B and its right sub-tree is rooted at C . This is indicated by the two branches emanating from A to B on the left and to C on the right. The absence of a branch indicates an empty sub-tree. For example, the left sub-tree of the binary tree rooted at C and the right sub-tree of the binary tree rooted at E are both empty. The binary trees rooted at D , G , H and I
  • Book cover image for: Data Structures Through C
    eBook - ePub

    Data Structures Through C

    Learn the fundamentals of Data Structures through C

    Chapter 07

    Trees

    Of Herbs, Shrubs and Bushes

    Why This Chapter Matters?

    Nature is man’s best teacher. In every walk of life man has explored nature, learnt his lessons and then applied the knowledge that nature offered him to solve every-day problems that he faced at work- place. It isn’t without reason that there are data structures like Trees, Binary Trees, Search Trees, AVL Trees, Forests, etc. Trees are non-linear data structures. They have many applications in Computer Science, hence you must understand them comprehensively.
     
    I f large input data is stored in a linked list then time required to access the data is prohibitive. In such cases a data structure called Tree is used. This data structure is often used in constructing the file systems and evaluation of arithmetic expressions. This data structure gives a running time of O (log n) for most operations.
    Like linked lists, a tree also consists of several nodes. Each node may contain links that point to other nodes in the tree. So a tree can be used to represent a person and all of his descendants as shown in Figure 7-1 .
    Figure 7-1. A tree structure .
    Note that each node in this tree contains a name for data and one or more pointers to the other tree nodes. Although a tree may contain any number of pointers to the other tree nodes, a large number of have at the most two pointers to the other tree nodes. Such trees are called Binary trees .

    Binary Trees

    Let us begin our study of binary trees by discussing some basic concepts and terminology.
    A binary tree is a finite set of elements that is either empty or is partitioned into three disjoint sub-sets. The first sub-set contains a single element called the root of the tree. The other two sub-sets are themselves binary trees, called the left and right sub-trees of the original tree. A left or right sub-tree can be empty.
    Each element of a binary tree is called a node of the tree. The tree shown in Figure 7-2(a) consists of nine nodes with A as its root. Its left sub-tree is rooted at B and its right sub-tree is rooted at C . This is indicated by the two branches emanating from A to B on the left and to C on the right. The absence of a branch indicates an empty sub-tree. For example, the left sub-tree of the binary tree rooted at C and the right sub-tree of the binary tree rooted at E are both empty. The binary trees rooted at D , G , H and I
Index pages curate the most relevant extracts from our library of academic textbooks. They’ve been created using an in-house natural language model (NLM), each adding context and meaning to key research topics.