Pulpos - A Ceph Storage for SciDMZ
February 9, 2017 | Ceph StorageMy first ITS project was to deploy a pilot Ceph Storage for Data Transfer and Data Warehousing in SciDMZ.
Etymology
The name Ceph is a common nickname given to pet octopuses and derives from cephalopods. It is only befitting that we christen our Ceph storage Pulpos, which is the Spanish word for octopuses, as UCSC is a Hispanic Serving Institution member of the Hispanic Association of Colleges and Universities. In recent years, Hispanic/Latino enrollment at UCSC has been more than 30 percent.
Objectives
The objectives of this projects were:
- To deploy a Ceph storage with an initial raw capacity of at least 200TB (resulting in a usable capacity of about 100TB if we use a replication size of 2), including the following components:
- One admin node
- One monitor node
- One metadata server
- One data-transfer node
- Three OSD (Object Storage Daemon) servers
- One GE switch for internal management traffic
- One 10GE switch for OSD replication and heartbeat traffic
- To create a replicated pool with size 2 (that is, 2 copies of each object will be stored), in order to ensure resilience and high performance of the system
- To design a scalable storage platform, amenable to future expansion
- To rapidly transfer large amount of data between UCSC and other research institutions
- To set up a distributed Ceph testbed among UCSC, UCSD, Caltech and potentially other PRP members
- To tune up the storage and network stacks in order to utilize the system at maximum efficiency
- To measure and benchmark the performance of the system
- To explore and evaluate the usage of Ceph as a scratch file system for HPC and a possible replacement for parallel file systems like Lustre and GPFS
- To explore and evaluate the usage of Ceph as a storage platform for SCIPP ATLAS cluster
- To explore and evaluate the usage of Ceph as a data warehousing platform for Data Science / Machine Learning applications
- To compile detailed documentations, and to share our experiences and lessons with the community
Hardware Specifications
A Supermicro 6028TP-HTR 2U-Quad-Nodes chassis contains 4 nodes: pulpo-admin, pulpo-dtn, pulpo-mon01 & pulpo-mds01.
pulpo-admin
- Two 8-core Intel Xeon E5-2620 v4 processors @ 2.1 GHz
- 32GB DDR4 memory @ 2400 MHz
- 480GB Intel DC S3500 Series SSD
- An Intel X520-DA2 10GbE adapter, with 2 SFP+ ports
pulpo-dtn
- Two 8-core Intel Xeon E5-2620 v4 processors @ 2.1 GHz
- 32GB DDR4 memory @ 2400 MHz
- 480GB Intel DC S3500 Series SSD
- An Intel X520-DA2 10GbE adapter, with 2 SFP+ ports
pulpo-mon01
- Two 10-core Intel Xeon E5-2630 v4 processors @ 2.2 GHz
- 64GB DDR4 memory @ 2400 MHz
- Two 480GB Intel DC S3500 Series SSDs
- An Intel X520-DA2 10GbE adapter, with 2 SFP+ ports
pulpo-mds01
- Two 10-core Intel Xeon E5-2630 v4 processors @ 2.2 GHz
- 64GB DDR4 memory @ 2400 MHz
- Two 480GB Intel DC S3500 Series SSDs
- An Intel X520-DA2 10GbE adapter, with 2 SFP+ ports
And there are 3 OSD servers: pulpo-osd01, pulpo-osd02 & pulpo-osd03, each with:
- Two 10-core Intel Xeon E5-2630 v4 processors @ 2.2 GHz
- 64GB DDR4 memory @ 2400 MHz
- Two 1TB Micron 1100 3D NAND Client SATA SSDs
- Two 1.2TB Intel DC P3520 Series PCIe SSDs
- Twelve 8TB Seagate Enterprise SATA Hard Drives (model #: ST8000NM0055)
- An Intel X520-DA2 10GbE adapter, with 2 SFP+ ports
- A single-port Mellanox ConnectX-3 Pro 10/40/56GbE Adapter
- LSI SAS 9300-8i 12GB/s Host Bus Adapter
- Supermicro 2U chassis, with 12x 3.5” drive bays