Mastering Ceph
eBook - ePub

Mastering Ceph

Nick Fisk

Condividi libro
  1. 240 pagine
  2. English
  3. ePUB (disponibile sull'app)
  4. Disponibile su iOS e Android
eBook - ePub

Mastering Ceph

Nick Fisk

Dettagli del libro
Anteprima del libro
Indice dei contenuti
Citazioni

Informazioni sul libro

Deep dive into the unified, distributed storage system in order to provide excellent performanceAbout This Book• Leverage Ceph's advanced features such as erasure coding, tiering, and Bluestore• Solve large-scale problems with Ceph as a tool by understanding its strengths and weaknesses to develop the best solutions• A practical guide that covers engaging use cases to help you use advanced features of Ceph effectivelyWho This Book Is ForIf you are a developer and an administrator who has deployed a Ceph cluster before and are curious about some of the most advanced features in order to improve performance then this book is for youWhat You Will Learn•Know when and how to use some of Ceph's advanced new features • Set up a test cluster with Ansible and some virtual machines using VirtualBox and Vagrant•Develop novel solutions to massive problems with librados and shared object classes.• Choose intelligent parameters for an erasure coded pool and set it up.• Configure the Bluestore settings and see how they interact with different hardware configurations.• Keep Ceph running through thick and thin with tuning, monitoring and disaster recovery advice.In DetailMastering Ceph covers all that you need to know to use Ceph effectively. Starting with design goals and planning steps that should be undertaken to ensure successful deployments, you will be guided through to setting up and deploying the Ceph cluster, with the help of orchestration tools. Key areas of Ceph including Bluestore, Erasure coding and cache tiering will be covered with help of examples. Development of applications which use Librados and Distributed computations with shared object classes are also covered. A section on tuning will take you through the process of optimisizing both Ceph and its supporting infrastructure. Finally, you will learn to troubleshoot issues and handle various scenarios where Ceph is likely not to recover on its own.By the end of the book, you will be able to successfully deploy and operate a resilient high performance Ceph cluster.Style and approachA practical guide which has each chapter explaining the concept, sharing tips and tricks and a use case to implement the most powerful features of Ceph

Domande frequenti

Come faccio ad annullare l'abbonamento?
È semplicissimo: basta accedere alla sezione Account nelle Impostazioni e cliccare su "Annulla abbonamento". Dopo la cancellazione, l'abbonamento rimarrà attivo per il periodo rimanente già pagato. Per maggiori informazioni, clicca qui
È possibile scaricare libri? Se sì, come?
Al momento è possibile scaricare tramite l'app tutti i nostri libri ePub mobile-friendly. Anche la maggior parte dei nostri PDF è scaricabile e stiamo lavorando per rendere disponibile quanto prima il download di tutti gli altri file. Per maggiori informazioni, clicca qui
Che differenza c'è tra i piani?
Entrambi i piani ti danno accesso illimitato alla libreria e a tutte le funzionalità di Perlego. Le uniche differenze sono il prezzo e il periodo di abbonamento: con il piano annuale risparmierai circa il 30% rispetto a 12 rate con quello mensile.
Cos'è Perlego?
Perlego è un servizio di abbonamento a testi accademici, che ti permette di accedere a un'intera libreria online a un prezzo inferiore rispetto a quello che pagheresti per acquistare un singolo libro al mese. Con oltre 1 milione di testi suddivisi in più di 1.000 categorie, troverai sicuramente ciò che fa per te! Per maggiori informazioni, clicca qui.
Perlego supporta la sintesi vocale?
Cerca l'icona Sintesi vocale nel prossimo libro che leggerai per verificare se è possibile riprodurre l'audio. Questo strumento permette di leggere il testo a voce alta, evidenziandolo man mano che la lettura procede. Puoi aumentare o diminuire la velocità della sintesi vocale, oppure sospendere la riproduzione. Per maggiori informazioni, clicca qui.
Mastering Ceph è disponibile online in formato PDF/ePub?
Sì, puoi accedere a Mastering Ceph di Nick Fisk in formato PDF e/o ePub, così come ad altri libri molto apprezzati nelle sezioni relative a Informatica e Amministrazione di sistemi. Scopri oltre 1 milione di libri disponibili nel nostro catalogo.

Informazioni

Anno
2017
ISBN
9781785881282
Edizione
1
Argomento
Informatica

Disaster Recovery

In the previous chapter, you learned how to troubleshoot common Ceph problems, which, although may be affecting the operation of the cluster, weren't likely to cause a total outage or data loss. This chapter will cover more serious scenarios where the Ceph cluster is down or unresponsive. It will also cover various techniques to recover from data loss. It is to be understood that these techniques are more than capable of causing severe data loss themselves and should only be attempted as a last resort. If you have a support contract with your Ceph vendor or have a relationship with Red Hat, it is highly advisable to consult them first before carrying out any of the recovery techniques listed in this chapter.
In this chapter, you will learn the following:
  • How to avoid data loss
  • How to use RBD mirroring to provide highly available block storage
  • How to investigate asserts
  • How to rebuild monitor dbs from OSDs
  • How to extract PGs from a dead OSD
  • How to recover from lost objects or inactive PGs
  • How to rebuild a RBD from dead OSDs

What is a disaster?

To be able to recover from a disaster, you first have to understand and be able to recognize one. For the purpose of this chapter, we will work with the assumption that anything that leads to a sustained period of downtime is classed as a disaster. This will not cover scenarios where a failure happens that Ceph is actively working to recover from, or where it is believed that the cause is likely to be short lived. The other type of disaster is one that leads to a permanent loss of data unless recovery of the Ceph cluster is possible. Data loss is probably the most serious issue as the data may be irreplaceable or can cause serious harm to the future of the business.

Avoiding data loss

Before starting to cover some recovery techniques, it is important to cover some points discussed in Chapter 1, Planning for Ceph. Disaster recovery should be seen as a last resort; the recovery guides in this chapter should not be relied upon as a replacement for following best practices.
Firstly, make sure you have working and tested backups of your data; in the event of an outage you will feel a million times more relaxed if you know that in the worst cases, you can fall back to backups. While an outage may cause discomfort for your users or customers, informing them that their data, which they had entrusted you with, is now gone and is far worse. Also, just because you have a backup system in place, do not blindly put your trust in it. Regular test restores will mean that you will be able to rely on them when needed.
Make sure you follow some design principles also mentioned in Chapter 1, Planning for Ceph. Don't use configuration options, such as nobarrier, and strongly consider the replication level you use with in Ceph to protect your data. The chances of data loss are strongly linked to the redundancy level configured in Ceph, so careful planning is advised here.

What can cause an outage or data loss?

The majority of outages and cases of data loss will be directly caused by the loss of a number of OSDs that exceed the replication level in a short period of time. If these OSDs do not come back online, be it due to a software or hardware failure and Ceph was not able to recover objects in-between OSD failures, then these objects are now lost.
If an OSD has failed due to a failed disk, then it is unlikely that recovery will be possible unless costly disk recovery services are utilized, and there is no guarantee that any recovered data will be in a consistent state. This chapter will not cover recovering from physical disk failures and will simply suggest that the default replication level of 3 should be used to protect you against multiple disk failures.
If an OSD has failed due to a software bug, the outcome is possibly a lot more positive, but the process is complex and time-consuming. Usually an OSD, which, although the physical disk is in a good condition is unable to start, is normally linked to either a software bug or some form of corruption. A software bug may be triggered by an uncaught exception that leaves the OSD in a state that it cannot recover from. Corruption may occur after an unexpected loss of power where the hardware or software was not correctly configured to maintain data consistency. In both cases, the outlook for the OSD itself is probably terminal, and if the cluster has managed to recover from the lost OSDs, it's best just to erase and reintroduce the OSD as an empty disk.
If the number of offline OSDs has meant that all copies of an object are offline, then recovery procedures should be attempted to try and extract the objects from the failed OSDs, and insert them back into the cluster.

RBD mirroring

As mentioned previously, working backups are a key strategy in ensuring that a failure does not result in the loss of data. Starting with the Jewel release, Ceph introduced RBD mirroring, which allows you to asynchronously mirror an RBD from one cluster to another. Note the difference between Cephs native replication, which is synchronous, and RBD mirroring. With synchronous replication, low latency between peers is essential, and asynchronous replication allows the two Ceph clusters to be geographically remote, as latency is no longer a factor.
By having a replicated copy of your RBD images on a separate cluster, you can dramatically reduce both your Recovery Time Objective (RTO) and Recovery Point Objective (RPO). The RTO is a measure of how long it takes from initiating recovery to when the data is usable. It is the worst case measurement of time between each data point and describes the expected data loss. A daily backup would have an RPO of 24 hours; for example, potentially, any data written up to 24 hours since the last backup would be lost if you had to restore from a backup.
With RBD mirroring, data is asynchronously replicated to the target RBD, and so, in most cases, the RPO should be under a minute. As the target RBD is also a replica and not a backup that would require to be first restored, the RTO is also likely going to be extremely low. Additionally, as the target RBD is stored on a separate Ceph cluster, it offers additional protection over snapshots, which could also be impacted if the Ceph cluster itself experiences issues. At first glance, this makes RBD mirroring seem like the perfect tool t...

Indice dei contenuti