Pulpos - A Ceph Storage for SciDMZ

February 9, 2017 | Ceph Storage

My first ITS project was to deploy a pilot Ceph Storage for Data Transfer and Data Warehousing in SciDMZ.

Etymology

The name Ceph is a common nickname given to pet octopuses and derives from cephalopods. It is only befitting that we christen our Ceph storage Pulpos, which is the Spanish word for octopuses, as UCSC is a Hispanic Serving Institution member of the Hispanic Association of Colleges and Universities. In recent years, Hispanic/Latino enrollment at UCSC has been more than 30 percent.

Objectives

The objectives of this projects were:

  • To deploy a Ceph storage with an initial raw capacity of at least 200TB (resulting in a usable capacity of about 100TB if we use a replication size of 2), including the following components:
    • One admin node
    • One monitor node
    • One metadata server
    • One data-transfer node
    • Three OSD (Object Storage Daemon) servers
    • One GE switch for internal management traffic
    • One 10GE switch for OSD replication and heartbeat traffic
  • To create a replicated pool with size 2 (that is, 2 copies of each object will be stored), in order to ensure resilience and high performance of the system
  • To design a scalable storage platform, amenable to future expansion
  • To rapidly transfer large amount of data between UCSC and other research institutions
  • To set up a distributed Ceph testbed among UCSC, UCSD, Caltech and potentially other PRP members
  • To tune up the storage and network stacks in order to utilize the system at maximum efficiency
  • To measure and benchmark the performance of the system
  • To explore and evaluate the usage of Ceph as a scratch file system for HPC and a possible replacement for parallel file systems like Lustre and GPFS
  • To explore and evaluate the usage of Ceph as a storage platform for SCIPP ATLAS cluster
  • To explore and evaluate the usage of Ceph as a data warehousing platform for Data Science / Machine Learning applications
  • To compile detailed documentations, and to share our experiences and lessons with the community

Hardware Specifications

A Supermicro 6028TP-HTR 2U-Quad-Nodes chassis contains 4 nodes: pulpo-admin, pulpo-dtn, pulpo-mon01 & pulpo-mds01.

pulpo-admin

pulpo-dtn

pulpo-mon01

pulpo-mds01

And there are 3 OSD servers: pulpo-osd01, pulpo-osd02 & pulpo-osd03, each with: