Mastering Ceph
eBook - ePub

Mastering Ceph

Infrastructure storage solutions with the latest Ceph release, 2nd Edition

  1. 356 pages
  2. English
  3. ePUB (mobile friendly)
  4. Available on iOS & Android
eBook - ePub

Mastering Ceph

Infrastructure storage solutions with the latest Ceph release, 2nd Edition

About this book

Discover the unified, distributed storage system and improve the performance of applications

Key Features

  • Explore the latest features of Ceph's Mimic release
  • Get to grips with advanced disaster and recovery practices for your storage
  • Harness the power of Reliable Autonomic Distributed Object Store (RADOS) to help you optimize storage systems

Book Description

Ceph is an open source distributed storage system that is scalable to Exabyte deployments. This second edition of Mastering Ceph takes you a step closer to becoming an expert on Ceph.

You'll get started by understanding the design goals and planning steps that should be undertaken to ensure successful deployments. In the next sections, you'll be guided through setting up and deploying the Ceph cluster with the help of orchestration tools. This will allow you to witness Ceph's scalability, erasure coding (data protective) mechanism, and automated data backup features on multiple servers. You'll then discover more about the key areas of Ceph including BlueStore, erasure coding and cache tiering with the help of examples. Next, you'll also learn some of the ways to export Ceph into non-native environments and understand some of the pitfalls that you may encounter. The book features a section on tuning that will take you through the process of optimizing both Ceph and its supporting infrastructure. You'll also learn to develop applications, which use Librados and distributed computations with shared object classes. Toward the concluding chapters, you'll learn to troubleshoot issues and handle various scenarios where Ceph is not likely to recover on its own.

By the end of this book, you'll be able to master storage management with Ceph and generate solutions for managing your infrastructure.

What you will learn

  • Plan, design and deploy a Ceph cluster
  • Get well-versed with different features and storage methods
  • Carry out regular maintenance and daily operations with ease
  • Tune Ceph for improved ROI and performance
  • Recover Ceph from a range of issues
  • Upgrade clusters to BlueStore

Who this book is for

If you are a storage professional, system administrator, or cloud engineer looking for guidance on building powerful storage solutions for your cloud and on-premise infrastructure, this book is for you.

Trusted by 375,005 students

Access to over 1.5 million titles for a fair monthly price.

Study more efficiently using our study tools.

Information

Year
2019
Edition
2
eBook ISBN
9781789615104

Section 1: Planning And Deployment

In this section, the reader will be taken through the best practices involved when deploying Ceph in a production setting.
The following chapters are in this section:
  • Chapter 1, Planning for Ceph
  • Chapter 2, Deploying Ceph with Containers
  • Chapter 3, BlueStore
  • Chapter 4, Ceph and Non-Native Protocols

Planning for Ceph

The first chapter of this book covers all the areas you need to consider when deploying a Ceph cluster, from the initial planning stages through to hardware choices. The topics we will cover include the following:
  • What Ceph is and how it works
  • Good use cases for Ceph and important considerations
  • Advice and best practices on infrastructure design
  • Ideas about planning a Ceph project

What is Ceph?

Ceph is an open source, distributed, scaled-out, software-defined storage system that can provide block, object, and file storage. Through the use of the Controlled Replication Under Scalable Hashing (CRUSH) algorithm, Ceph eliminates the need for centralized metadata and can distribute the load across all the nodes in the cluster. Since CRUSH is an algorithm, data placement is calculated rather than based on table lookups, and can scale to hundreds of petabytes without the risk of bottlenecks and the associated single points of failure. Clients also form direct connections with the required OSDs, which also eliminates any single points becoming bottlenecks.
Ceph provides three main types of storage: block storage via the RADOS Block Device (RBD), file storage via CephFS, and object storage via RADOS Gateway, which provides S3 and Swift-compatible storage.
Ceph is a pure SDS solution, and this means that you are free to run it on any hardware that matches Ceph's requirements. This is a major development in the storage industry, which has typically suffered from strict vendor lock-in.

It should be noted that Ceph prefers consistency as per the CAP theorem, and will try at all costs to make protecting your data a higher priority than availability in the event of a partition.

How Ceph works

The core storage layer in Ceph is the Reliable Autonomous Distributed Object Store (RADOS), which, as the name suggests, provides an object store on which the higher-level storage protocols are built. The RADOS layer in Ceph consists of a number of object storage daemons (OSDs). Each OSD is completely independent and forms peer-to-peer relationships to form a cluster. Each OSD is typically mapped to a single disk, in contrast to the traditional approach of presenting a number of disks combined into a single device via a RAID controller to the OS.
The other key component in a Ceph cluster is the monitors. These are responsible for forming a cluster quorum via the use of Paxos. The monitors are not directly involved in the data path and do not have the same performance requirements of OSDs. They are mainly used to provide a known cluster state, including membership, via the use of various cluster maps. These cluster maps are used by both Ceph cluster components and clients to describe the cluster topology and enable data to be safely stored in the right location. There is one final core component—the manager—which is responsible for configuration and statistics. Because of the scale that Ceph is intended to be operated at, one can appreciate that tracking the state of every single object in the cluster would become very computationally expensive. Ceph solves this problem by hashing the underlying object names to place objects into a number of placement groups. An algorithm called CRUSH is then used to place the placement groups onto the OSDs. This reduces the task of tracking millions of objects to a matter of tracking a much more manageable number of placement groups, normally measured in thousands.
Librados is a Ceph library that can be used to build applications that interact directly with the RADOS cluster to store and retrieve objects.
For more information on how the internals of Ceph work, it is strongly recommended that you read the official Ceph documentation, as well as the thesis written by Sage Weil, the creator and primary architect of Ceph.

Ceph use cases

Before jumping into specific use cases, let's look at the following key points that should be understood and considered before thinking about deploying a Ceph cluster:
  • Ceph is not a storage array: Ceph should not be compared to a traditional scale-up storage array; it is fundamentally different, and trying to shoe horn Ceph into that role using existing knowledge, infrastructure, and expectations will lead to disappointment. Ceph is software-defined storage with internal data movements that operate over TCP/IP networking, introducing several extra layers of technology and complexity compared to a simple SAS cable at the rear of a traditional storage array. Work is continuing within the Ceph project to expand its reach into areas currently dominated by legacy storage arrays with support for iSCSI and NFS, and with each release, Ceph gets nearer to achieving better interoperability.
  • Performance: Because of Ceph's non-centralized approach, it can offer unrestrained performance compared to scale-up storage arrays, which typically have to funnel all I/O through a pair of controller heads. While technological development means that faster CPUs and faster network speeds are constantly being developed, there is still a limit to the performance that you can expect to achieve with just a pair of storage controllers. With recent advances in Flash technology, combined with new interfaces such as NVMe, which bring the promise of a level of performance not seen before, the scale-out nature of Ceph provides a linear increase in CPU and network resources with every added OSD node. However, we should also consider where Ceph is not a good fit for performance. This is mainly concerning use cases where extremely low latency is desired. The very reason that enables Ceph to become a scale-out solution also means that low latency performance will suffer. The overhead of performing a large proportion of the processing in software and additional network hops means that latency will tend to be about double that of a traditional storage array and at least ten times that of local storage. Thought should be given to selecting the best technology for given performance requirements. That said, a well-designed and tuned Ceph cluster should be able to meet performance requirements in all but the most extreme cases. It is important to remember that with any storage system that employs wide striping, where data is spread across all disks in the system, speed will often be limited to the slowest component in the cluster. It's therefore important that every node in the cluster should be of similar performance. With new developments of NVMe and NVDIMMS, the latency of storage access is continuing to be forced lower.
Work in Ceph is being done to remove bottlenecks to take advantage of these new technologies, but thought should be given to how to balance latency requirements against the benefits of a distributed storage system.
  • Reliability: Ceph is designed to provide a highly fault-tolerant storage system by the scale-out nature of its components. While no individual component is highly available, when clustered together, any component should be able to fail without causing an inability to service client requests. In fact, as your Ceph cluster grows, failure of individual components should be expected and will become part of normal operating conditions. However, Ceph's ability to provide a resilient cluster should not be an invitation to compromise on hardware or design choice, and doing so will likely lead to failure. There are several factors that Ceph assumes your hardware will meet, which are covered later in this chapter. Unlike RAID, where disk rebuilds with larger disks can now stretch into time periods measured in weeks, Ceph will often recover from single disk failures in a matter of hours. With the increasing trend of larger capacity disks, Ceph offers numerous advantages to both the reliability and degraded performance when compared to a traditional storage array.
  • Use of commodity hardware: Ceph is designed to be run on commodity hardware, which gives us the ability to design and build a cluster without the premium cost demanded by traditional tier 1 storage and server vendors. This can be both a blessing and a curse. Being able to choose your own hardware allows you to build your Ceph components to exactly match your requirements. However, one thing that branded hardware does offer is compatibility testing. It's not unknown for strange exotic firmware bugs to be discovered that can cause very confusing symptoms. Thought should be applied to whether your IT teams have the time and skills to cope with any obscure issues that may crop up with untested hardware solutions. The use of commodity hardware also protects against the traditional fork-lift upgrade model, where the upgrade of a single component often requires the complete replacement of the whole storage array. With Ceph, you can replace individual components in a very granular way, and with automatic data balancing, lengthy data migration periods are avoided.

Specific use cases

We will now cover some of the more common use cases for Ceph and discuss some of the concepts behind them.

OpenStack or KVM based virtualization

Ceph is the perfect match for providing storage to an OpenStack environment; in fact, Ceph is currently the most popular choice. The OpenStack survey in 2018 revealed that 61% of surveyed OpenStack users are utilizing Ceph to provide storage in OpenStack. The OpenStack Cinder block driver uses Ceph RBDs to provision block volumes for VMs, and OpenStack Manila, the File as a Service (FaaS) software, integrates well with CephFS. There are a number of reasons why Ceph is such a good solution for OpenStack, as shown in the following list:
  • Both are open source projects with commercial offerings
  • Both have a proven track record in large-scale deployments
  • Ceph can provide block, CephFS, and object storage, all of which OpenStack can use
  • With careful planning, it is possible to deploy a hyper-converged cluster
If you are not using OpenStack, or have no plans to, Ceph also integrates very well with KVM virtualization.

Large bulk block storage

Because of the ability to design and build cost-effective OSD nodes, Ceph enables you to build large, high-performance storage clusters that are very cost-effective compared to alternative options. The Luminous release brought support for Erasure coding for block and file workloads, which has increased the attractiveness of Ceph even more for this task.

Object storage

The very fac...

Table of contents

  1. Title Page
  2. Copyright and Credits
  3. About Packt
  4. Contributors
  5. Preface
  6. Section 1: Planning And Deployment
  7. Planning for Ceph
  8. Deploying Ceph with Containers
  9. BlueStore
  10. Ceph and Non-Native Protocols
  11. Section 2: Operating and Tuning
  12. RADOS Pools and Client Access
  13. Developing with Librados
  14. Distributed Computation with Ceph RADOS Classes
  15. Monitoring Ceph
  16. Tuning Ceph
  17. Tiering with Ceph
  18. Section 3: Troubleshooting and Recovery
  19. Troubleshooting
  20. Disaster Recovery
  21. Assessments
  22. Other Books You May Enjoy

Frequently asked questions

Yes, you can cancel anytime from the Subscription tab in your account settings on the Perlego website. Your subscription will stay active until the end of your current billing period. Learn how to cancel your subscription
No, books cannot be downloaded as external files, such as PDFs, for use outside of Perlego. However, you can download books within the Perlego app for offline reading on mobile or tablet. Learn how to download books offline
Perlego offers two plans: Essential and Complete
  • Essential is ideal for learners and professionals who enjoy exploring a wide range of subjects. Access the Essential Library with 800,000+ trusted titles and best-sellers across business, personal growth, and the humanities. Includes unlimited reading time and Standard Read Aloud voice.
  • Complete: Perfect for advanced learners and researchers needing full, unrestricted access. Unlock 1.5M+ books across hundreds of subjects, including academic and specialized titles. The Complete Plan also includes advanced features like Premium Read Aloud and Research Assistant.
Both plans are available with monthly, semester, or annual billing cycles.
We are an online textbook subscription service, where you can get access to an entire online library for less than the price of a single book per month. With over 1.5 million books across 990+ topics, we’ve got you covered! Learn about our mission
Look out for the read-aloud symbol on your next book to see if you can listen to it. The read-aloud tool reads text aloud for you, highlighting the text as it is being read. You can pause it, speed it up and slow it down. Learn more about Read Aloud
Yes! You can use the Perlego app on both iOS and Android devices to read anytime, anywhere — even offline. Perfect for commutes or when you’re on the go.
Please note we cannot support devices running on iOS 13 and Android 7 or earlier. Learn more about using the app
Yes, you can access Mastering Ceph by Nick Fisk in PDF and/or ePUB format, as well as other popular books in Informatique & Administration du système. We have over 1.5 million books available in our catalogue for you to explore.