Ceph: Designing and Implementing Scalable Storage Systems
eBook - ePub

Ceph: Designing and Implementing Scalable Storage Systems

Design, implement, and manage software-defined storage solutions that provide excellent performance

,
  1. 606 pages
  2. English
  3. ePUB (mobile friendly)
  4. Available on iOS & Android
eBook - ePub

Ceph: Designing and Implementing Scalable Storage Systems

Design, implement, and manage software-defined storage solutions that provide excellent performance

,

About this book

Get to grips with the unified, highly scalable distributed storage system and learn how to design and implement it.

Key Features

  • Explore Ceph's architecture in detail
  • Implement a Ceph cluster successfully and gain deep insights into its best practices
  • Leverage the advanced features of Ceph, including erasure coding, tiering, and BlueStore

Book Description

This Learning Path takes you through the basics of Ceph all the way to gaining in-depth understanding of its advanced features. You'll gather skills to plan, deploy, and manage your Ceph cluster. After an introduction to the Ceph architecture and its core projects, you'll be able to set up a Ceph cluster and learn how to monitor its health, improve its performance, and troubleshoot any issues. By following the step-by-step approach of this Learning Path, you'll learn how Ceph integrates with OpenStack, Glance, Manila, Swift, and Cinder. With knowledge of federated architecture and CephFS, you'll use Calamari and VSM to monitor the Ceph environment. In the upcoming chapters, you'll study the key areas of Ceph, including BlueStore, erasure coding, and cache tiering. More specifically, you'll discover what they can do for your storage system. In the concluding chapters, you will develop applications that use Librados and distributed computations with shared object classes, and see how Ceph and its supporting infrastructure can be optimized.

By the end of this Learning Path, you'll have the practical knowledge of operating Ceph in a production environment.

This Learning Path includes content from the following Packt products:

  • Ceph Cookbook by Michael Hackett, Vikhyat Umrao and Karan Singh
  • Mastering Ceph by Nick Fisk
  • Learning Ceph, Second Edition by Anthony D'Atri, Vaibhav Bhembre and Karan Singh

What you will learn

  • Understand the benefits of using Ceph as a storage solution
  • Combine Ceph with OpenStack, Cinder, Glance, and Nova components
  • Set up a test cluster with Ansible and virtual machine with VirtualBox
  • Develop solutions with Librados and shared object classes
  • Configure BlueStore and see its interaction with other configurations
  • Tune, monitor, and recover storage systems effectively
  • Build an erasure-coded pool by selecting intelligent parameters

Who this book is for

If you are a developer, system administrator, storage professional, or cloud engineer who wants to understand how to deploy a Ceph cluster, this Learning Path is ideal for you. It will help you discover ways in which Ceph features can solve your data storage problems. Basic knowledge of storage systems and GNU/Linux will be beneficial.

Information

Operations and Maintenance

In this chapter, we explore the panoply of day-to-day tasks for maintaining your Ceph clusters. The topics covered include:
  • Topology
  • Configuration
  • Common tasks
  • Scrubs
  • Logs
  • Working with remote hands
We'll cover a lot of ground in this chapter. Be sure to take periodic breaks to refuel with garlic fries.

Topology

In this section, we'll describe commands to explore the logical layout of an example Ceph cluster. Before we go changing anything, we need to know exactly what we have first.

The 40,000 foot view

To visually see the overall topology of a Ceph cluster, run ceph osd tree. This will show us at once the hierarchy of CRUSH buckets, including the name of each bucket, the weight, whether it is marked up or down, a weight adjustment, and an advanced attribute of primary affinity. This cluster was provisioned initially with 3 racks each housing 4 hosts for a total of 12 OSD nodes. Each OSD node (also known as host, also known as server) in turn houses 24 OSD drives.
 # ceph osd tree ID WEIGHT TYPE NAME UP/DOWN REWEIGHT PRIMARY-AFFINITY  -1 974.89661 root default -14 330.76886 rack r1  -2 83.56099 host data001  0 3.48199 osd.0 up 1.00000 1.00000 ...  23 3.48199 osd.23 up 1.00000 1.00000  -3 80.08588 host data002  24 3.48199 osd.24 up 1.00000 1.00000  25 3.48199 osd.25 up 1.00000 1.00000  26 3.48199 osd.26 up 1.00000 1.00000  27 3.48199 osd.27 up 1.00000 1.00000  28 3.48199 osd.28 up 1.00000 1.00000  29 3.48199 osd.29 up 1.00000 1.00000  30 3.48199 osd.30 up 1.00000 1.00000  31 3.48199 osd.31 up 1.00000 1.00000  32 3.48199 osd.32 up 1.00000 1.00000  34 3.48199 osd.34 up 1.00000 1.00000  35 3.48199 osd.35 up 1.00000 1.00000  36 3.48199 osd.36 up 1.00000 1.00000  37 3.48199 osd.37 up 1.00000 1.00000  38 3.48199 osd.38 up 1.00000 1.00000  39 3.48199 osd.39 down 0 1.00000  40 3.48199 osd.40 up 1.00000 1.00000  41 3.48199 osd.41 up 1.00000 1.00000  42 3.48199 osd.42 up 1.00000 1.00000  43 3.48199 osd.43 up 1.00000 1.00000  44 3.48199 osd.44 up 1.00000 1.00000  45 3.48199 osd.45 up 1.00000 1.00000  46 3.48199 osd.46 up 1.00000 1.00000  47 3.48199 osd.47 up 1.00000 1.00000  -4 83.56099 host data003  48 3.48199 osd.48 up 1.00000 1.00000 ...  -5 83.56099 host data004  72 3.48199 osd.72 up 1.00000 1.00000 ...  95 3.48199 osd.95 up 1.00000 1.00000 -15 330.76810 rack r2  -6 83.56099 host data005  96 3.48199 osd.96 up 1.00000 1.00000 ...  -7 80.08557 host data006 120 3.48199 osd.120 up 1.00000 1.00000 ...  -8 83.56055 host data007  33 3.48169 osd.33 up 1.00000 1.00000 144 3.48169 osd.144 up 1.00000 1.00000 ... 232 3.48169 osd.232 up 1.00000 1.00000  -9 83.56099 host data008 168 3.48199 osd.168 up 1.00000 1.00000 -16 313.35965 rack r3 -10 83.56099 host data009 192 3.48199 osd.192 up 1.00000 1.00000 ... -11 69.63379 host data010 133 3.48169 osd.133 up 1.00000 1.00000 ... -12 83.56099 host data011 239 3.48199 osd.239 up 1.00000 1.00000 ... -13 76.60388 host data012 ... 286 3.48199 osd.286 up 1.00000 1.00000 
Let's go over what this tree is telling us. Note that a number of similar lines have been replaced with ellipses for brevity, a practice we will continue throughout this and following chapters.
After the column headers the first data line is:
-1 974.89661 root default 
The first column is an ID number that Ceph uses internally, and with which we rarely need to concern ourselves. The second column under the WEIGHT heading is the CRUSH weight. By default, the CRUSH weight of any bucket corresponds to its raw capacity in TB; in this case we have a bit shy of a petabyte (PB) of raw space. We'll see that this weight is the sum of the weights of the buckets under the root in the tree.
Since this cluster utilizes the conventional replication factor of 3, roughly 324 TB of usable space is currently available in this cluster. The balance of the line is root default, which tells us that this CRUSH bucket is of the root type, and that it's name is default. Complex Ceph clusters can contain multiple roots, but most need only one.
The next line is as follows:
-14 330.76886 rack r1 
It shows a bucket of type rack, with a weight of roughly 330 TB. Skipping ahead a bit we see two more rack buckets with weights 330 and 313 each. Their sum gets us to the roughly 974 TB capacity (weight) of the root bucket. When the rack weights are not equal, as in our example, usually either they contain different numbers of host buckets (or simply hosts), or more often their underlying hosts have unequal weights.
Next we see the following:
-2 83.56099 host data001 
This indicates a bucket of type host, with the name data001. As with root and rack buckets, the weight reflects the raw capacity (before replication) of the underlying buckets in the hierarchy. Below rack1 in our hierarchy we see hosts named data001, data002, data003, and data004. In our example, we see that host data002 presents a somewhat lower weight than the other three racks. This may mean that a mixture of drive sizes has been deployed or that some drives were missed during initial deployment. In our example, though, the host only contains 23 OSD buckets (or simply OSDs) instead of the expected 24. This reflects a drive that has failed and been removed entirely, or one that was not deployed in the first place.
Under each host bucket we see a number of OSD entries.
 24 3.48199 osd.24 up 1.00000 1.00000
In our example, these drives are SAS SSDs each nominally 3840 GB in size, which we describe as the marketing capacity. The discrepancy between that figure and the 3.48199 TB weight presented here is due to multiple factors:
  • The marketing capacity is expressed in base 10 units; everything else uses base 2 units
  • Each drive carves out 10 GB for journal use
  • XFS filesystem overhead
Note also that one OSD under data002 is marked down. This could be a question of the process having been killed or a hardware failure. The CRUSH weight is unchanged, but the weight adjustment is set to 0, which means that data previously allocated to this drive has been directed elsewhere. When we restart the OSD process successfully, the weight adjustment returns to one and data backfills to the drive.
Note also that while many Ceph commands will present OSDs and other items sorted, the IDs (or names) of OSDs on a given host or rack are a function of the cluster's history. When deployed sequentially the numbers will increment neatly, but over time as OSDs and hosts are added and removed discontinuities will accrue. In the above example, note that OSD 33 (also known as osd.33) currently lives on host data007 instead of data002 as one might expect from the present pattern. This reflects the sequence of events:
  • Drive failed on data002 and was removed
  • Drive failed on data007 and was removed
  • The replacement drive on data007 was deployed as a new OSD
When deploying OSDs, Ceph generally picks the lowest unused number; in our case that was 33. It is futile to try to maintain any given OSD number arrangement; it will change over time as drives and host come and go, and the cluster is expanded.
A number of Ceph status commands accept an optional -f json or -fjson -pretty switch, which results in output in a form less readable by humans, but more readily parsed by code. The format of default format commands may change between releases, but the JSON output formats are mostly constant. For this reason management and monitoring scripts are encouraged to use the -f json output format to ensure continued proper operation when Ceph itself is upgraded.
 # ceph osd tree -f json {"nodes":[{"id":-1,"name":"default","type":"root","type_id":10,"children":[-16,-15,-14]},{"id":-14,"name":"r1","type":"rack","type_id":3,"children":[-5,-4,-3,-2]},{"id":-2,"name":"data001","type":"host","type_id":1,"children": [23,22,21,20,19,18,17,16,15,14,13,12,11,10,9,8,7,6,5,4,3,2,1,0]},{"id":0,"name":"osd.0","type":"osd","type_id":0,"crush_weight":3.481995,"depth":3,"exists":1,"status":"up","reweight":1.000000,"primary_affinity":1.000000},{"id":1,"name":"osd .1","type":"osd","type_id":0,"crush_weight":3.481995,"depth":3,"exists":1,"status":"up","reweight":1.000000,"primary_affinity":1.000000},{"id":2,"name":"osd.2","type":"osd","type_id":0,"crush_weight":3.481995,"depth":3,"exists":1,"status":" up","reweight":1.000000,"primary_affinity":1.000000},{"id":3,"name":"osd.3","type":"osd","type_id":0,"crush_weight":3.481995,"depth":3,"exists":1,"status":"up","reweight":1.000000,"primary_affinity":1.000000},{"id":4,"name":"osd.4","type":" ...
The -f json-pretty output format is something of a compromise: it includes structure to aid programmatic parsing, but also uses whitespace to allow humans to readily inspect visually.
 # ceph osd tree -f json-pretty  {  "nodes": [  {  "id": -1,  "name": "default",  "type": "root",  "type_id": 10,  "children": [  -16,  -15,  -14  ]  },  {  "id": -14,  "name": "1",  "type": "rack",  "type_id": 3,  "children": [  -5,  -4,  -3,  -2  ]  },  {  "id": -2,  "name": "data001",  "type": "host",  "type_id": 1,  "children": [  23,  22,  21, ... 
One may for example extract a list of OSDs that have a non-default reweight adjustment value, using the jq utility. This approach saves a lot of tedious and error-prone coding with awk or perl.
 # ceph osd tree ...

Table of contents

  1. Title Page
  2. Copyright
  3. About Packt
  4. Contributors
  5. Preface
  6. Ceph - Introduction and Beyond
  7. Working with Ceph Block Device
  8. Working with Ceph and OpenStack
  9. Working with Ceph Object Storage
  10. Working with Ceph Object Storage Multi-Site v2
  11. Working with the Ceph Filesystem
  12. Operating and Managing a Ceph Cluster
  13. Ceph under the Hood
  14. The Virtual Storage Manager for Ceph
  15. More on Ceph
  16. Deploying Ceph
  17. BlueStore
  18. Erasure Coding for Better Storage Efficiency
  19. Developing with Librados
  20. Distributed Computation with Ceph RADOS Classes
  21. Tiering with Ceph
  22. Troubleshooting
  23. Disaster Recovery
  24. Operations and Maintenance
  25. Monitoring Ceph
  26. Performance and Stability Tuning
  27. Other Books You May Enjoy

Frequently asked questions

Yes, you can cancel anytime from the Subscription tab in your account settings on the Perlego website. Your subscription will stay active until the end of your current billing period. Learn how to cancel your subscription
No, books cannot be downloaded as external files, such as PDFs, for use outside of Perlego. However, you can download books within the Perlego app for offline reading on mobile or tablet. Learn how to download books offline
Perlego offers two plans: Essential and Complete
  • Essential is ideal for learners and professionals who enjoy exploring a wide range of subjects. Access the Essential Library with 800,000+ trusted titles and best-sellers across business, personal growth, and the humanities. Includes unlimited reading time and Standard Read Aloud voice.
  • Complete: Perfect for advanced learners and researchers needing full, unrestricted access. Unlock 1.4M+ books across hundreds of subjects, including academic and specialized titles. The Complete Plan also includes advanced features like Premium Read Aloud and Research Assistant.
Both plans are available with monthly, semester, or annual billing cycles.
We are an online textbook subscription service, where you can get access to an entire online library for less than the price of a single book per month. With over 1 million books across 990+ topics, we’ve got you covered! Learn about our mission
Look out for the read-aloud symbol on your next book to see if you can listen to it. The read-aloud tool reads text aloud for you, highlighting the text as it is being read. You can pause it, speed it up and slow it down. Learn more about Read Aloud
Yes! You can use the Perlego app on both iOS and Android devices to read anytime, anywhere — even offline. Perfect for commutes or when you’re on the go.
Please note we cannot support devices running on iOS 13 and Android 7 or earlier. Learn more about using the app
Yes, you can access Ceph: Designing and Implementing Scalable Storage Systems by in PDF and/or ePUB format, as well as other popular books in Computer Science & Cloud Computing. We have over one million books available in our catalogue for you to explore.