Mysql Cluster Deployment Best Practices

<Insert Picture Here>
MySQL Cluster Deployment Best Practices

Agenda
•  Suitable Applications
•  MySQL Cluster compared to InnoDB – main
differences
•  Network & Hardware Selection
•  Disk Data Deployment
•  Configuration
•  Administration & Implementation Best Practices
•  Online/Offline Operations
•  Backup and restore
•  Monitoring
•  Services available to get started
MySQL Cluster – Users & Applications
HA, Transactional Services: Web & Telecoms
•  User & Subscriber Databases

•  Service Delivery Platforms
•  Application Servers
•  Web Session Stores
•  eCommerce
•  VoIP, IPTV & VoD
•  Mobile Content Delivery
•  On-Line app stores and portals
•  On-Line Gaming
•  DNS/DHCP for Broadband
•  Payment Gateways
•  Data Store for LDAP Directories
http://www.mysql.com/customers/cluster/
Suitable Applications
•  Good fit
•  OLTP apps with short running queries
•  Application with realtime characteristics and requirements
•  A lot of concurrent requests
•  Write intensive applications
•  Typically the following are a poor fit:
•  Heavy reporting type (OLAP)
•  Data Warehouse
•  However, replicate from MySQL Cluster to

regular MySQL (innodb) which runs the
reporting.
Realtime and Reporting Architecture
Don't mix real-time operations with Reporting - separate!
Realtime Apps
Reporting System
App Servers
•  Replication
SQL Layer Complex
•  Mysqldump reporting queries
Storage •  ndb_restore
Layer → csv
→ LOAD DATA INFILE
Data Collection/Aggregation
Architecture
Aggregate data from peripheral systems (sources)
HA Shard Catalog
•  Shard Catalog stores user_id → shard_id and other indexes/
mappings (user_id → friend_id:shard_id).
•  Shard Catalog can grow online
Shard Catalog Shard_0 Shard_n

MySQL Cluster
App Servers
Memcached / caching layer

SQL Layer
Storage
Layer
MySQL Cluster compared to InnoDB /
Other databases
•  Every database has its characteristics
•  MySQL Cluster is designed for
•  Short, but many, parallell transactions
•  High volume
•  High degree of concurrency
•  High availability (99.999%)
•  Let’s look how MySQL Cluster compares to Innodb (and most other
traditional databases)
Refer to Docs comparisons:

http://dev.mysql.com/doc/refman/5.5/en/mysql-cluster-compared.html
MySQL Cluster compared to InnoDB -
Table Locks
•  Table locks are usually taken before an Offline operation (e.g
ALTER to change data type). During normal traffic then a small
granularity is preferred, such as ROW LEVEL locking.
•  InnoDB
•  LOCK TABLES tablename READ will lock the table for
writes on the mysql server.
•  MySQL Cluster
•  LOCK TABLES tablename READ will lock the table for
writes only on the mysql server where the command is
issued!!
•  To lock 'tablename' on the entire cluster you must do LOCK
TABLES tablename READ on every mysql server.
•  Or if you use the Configurator scripts:
•  cd tools
•  ./execute-all-mysql.sh -e “LOCK TABLES
tablename READ”
Cluster compared to InnoDB - ALTER
•  InnoDB
•  Blocking alter tables. Altered table is locked.
•  Cluster
•  Online (non blocking) – add column online (ALTER ONLINE
TABLE … ADD COLUMN x BIGINT ) , add index online, drop
index online.
•  Other ALTER (changing column size, data type, column name etc,
is not online)
•  Non-online ALTER TABLE is not blocking!
•  You can do the ALTER TABLE on one MySQL and still write to
the table on another MySQL server → inconsistent data.
•  Non-blocking – There is no table lock distributed across all
mysql servers.
•  Use LOCK TABLES manually before on all mysql servers, then
ALTER, then UNLOCK TABLES on all mysql servers
FOREIGN KEYS
•  Considerations for Foreign Keys
•  FKs simplify business logic, but FKs incur a performance overhead
•  What is the role of your data? What is the role of the application?
•  InnoDB
•  Is the only storage engine currently supporting Foreign Keys
•  MySQL Cluster
•  Workaround is to use TRIGGERs to emulate Foreign Keys
For more info

http://forge.mysql.com/wiki/
ForeignKeySupport#Appendix_A:_Triggers_implementing_foreign_key_constraints
Transactions
•  Failed transactions must be retried by the application
•  Also true for InnoDB (and most other databases on the market)
•  If the REDOLOG or REDOBUFFER become full, the transaction
will be aborted
•  This differs from InnoDB behaviour, InnoDB will run slower (and
potentially grind to a virtual halt)
•  There are also other resources / timeouts
•  "Lock wait timeout" – transaction will abort after
TransactionDeadlockDetectionTimeout
•  MaxNoOfConcurrent[Operations/Transactions]
•  Nodefail / noderestart will cause transaction to abort
Example Setup
Clients
Load Balancer(s)
Redundant switches
S Q L + M g m S Q L + M g m
+AppServer +AppServer
+WebServer... +WebServer...
Bonding
Data node Data node

Recommendation
•  Start with four computers ..
•  2 x Data Nodes MYSQLD MYSQLD
•  2 x MySQL servers NDB_MGMD NDB_MGMD

•  2 x Management servers
•  … and scale it from there.
NDBMTD NDBMTD
Hardware Selection : Network I
•  Dedicated >= 1Gb/s networking
•  On Oracle Sun CMT it may be necessary to bond 4 or more NICs
together because typically many data nodes are on the same
physical host.
•  Prevent network failures (NIC x 2, Bonding, dual switches)
•  Use dedicated network for cluster communication
•  Put Data nodes ansd MySQL Servers on e.g 10.0.1.0 network and
let MySQL listen on a “public” interface.
•  No security layer to management node
•  Enable port 1186 access only from cluster nodes and
administrators
Hardware Selection : Network II
•  The speed of the network greatly affects the performance
•  ping <hostname>
•  If ping time is > 0.200ms check (on 1Gig-E)
•  routes – do you have >1 switch hop from one data
node to another?
•  Do you have full duplex?
•  NAPI enabled (should be)?
•  On my machines I have 0.150ms (on 1Gig-E), but if
the switches are good then 0.080-0.100 is also
possible
•  JUMBO frames, you can try to enable this but I have not
seen any noticeable improvements with this.
Hardware Selection - RAM & CPU
•  Storage Layer (Data nodes)
•  One data node can (7.0+) use 8 cores
•  CPU: 2 x 4 core (Nehalem works really well). Faster CPU → faster
processing of messages.
•  RAM: As much as you need
•  a 10GB data set will require 20GB of RAM (because of
redundancy
•  Each node will then need 2 x 10 / # of data nodes. (2 data nodes
→ 10GB of RAM → 16GB RAM is good
•  SQL Layer (MySQL Servers)
•  CPU: 2 – 16 cores
•  RAM: Not as important – 4GB enough (depends on connections and
buffers)
Hardware Selection - Disk Subsystem
low-end mid-end high-end
LCP LCP LCP

REDOLOG REDOLOG REDOLOG
1 x SATA 7200RPM 1 x SAS 10KRPM 4 x SAS 10KRPM

•  For a read-most, write •  Heavy duty (many MB/s) •  Heavy duty (many MB/s)
not so much •  No redundancy •  Disk redundancy (RAID1+0)
•  No redundancy (but other data node is hot swap
(but other data node is the mirror)
the mirror)
•  REDO, LCP, BACKUP – written sequentually in small chunks (256KB)

•  If possible, use Odirect = 1
Hardware Selection - Disk Data Storage
Minimal recommended high-end
LCP UNDOLOG
REDOLOG (REDO LOG)
UNDOLOG
TABLESPACE 1
TABLESPACE
TABLESPACE 2
2 x SAS 10KRPM (preferably)
(REDO LOG / UNDO LOG)
LCP
4 x SAS 10-15KRPM (preferably)
•  Use High-end for heavy read / write workloads (1000's of 10KB records per sec) of data
(e.g Content Delivery platforms)
•  SSD for TABLESPACE is also interesting – not much experience of this yet
•  Having TABLESPACE on separate disk is good for read performance
•  Enable WRITE_CACHE on devices
Disk Space Usage
•  The data nodes use the disk for:

•  LCP: 3 x sizeof(used DataMemory)
•  REDO: [4-6]xDataMemory
•  More (6x) REDO log for write intensive
•  Don’t have a too short REDO (e.g 2x or 3x)
•  Backups: sizeof(used DataMemory)
•  TableSpace (if disk data tables): Must fit dataset.
Choosing the Filesystem
•  Most customers uses EXT3 (Linux) and UFS (Solaris)

•  EXT2 is an option (but recovery is longer)
•  Mount with noatime
•  ZFS
•  You must separate journal (Zil) and filesystem
•  Raw device is not supported
•  EXT4, XFS – we haven't experienced so much…
Configuration : Disk Data Storage
•  Use Disk Data tables for

•  Simple accesses (read/write on PK)
•  Same for InnoDB – you can easily get IO BOUND (iostat)
•  Set
•  DiskPageBufferMemory=3072M
•  is a good start if you rely a lot on disk data – like the
Innodb_Buffer_Pool, but set it as high as you can!
•  Increased chance that a page will be cached
•  SharedGlobalMemory=384M-1024M
•  UNDO_BUFFER=64M to 128M (if you write a lot)
•  You cannot change this BUFFER later!
•  Specified at LOGFILE GROUP creation time
•  DiskIOThreadPool=[ 8 .. 16 ] (introduced in 7.0)
Configuration : General
•  Set
•  MaxNoOfExecutionThreads<=#cores
•  Otherwise contention will occur → unexpected behaviour.
•  RedoBuffer=32-64M
•  If you need to set it higher → your disks are probably too
slow
•  FragmentLogFileSize=256M
•  NoOfFragmentLogFiles= 6 x DataMemory (in MB) /
(4x 256MB)
•  Most common issue – customers never configure large
enough redo log
•  The above parameters (and others, also for MySQL)
are set for production usage at:
•  www.severalnines.com/config
Administration
•  Data nodes – designed for zero maintenance.
•  Logs
•  Writes error logs and trace files in its data directory.
•  Configurable how many error messages/trace files that should be saved
•  Memory Fragmentation
•  Free pages are reclaimed and can be reused
•  If you do a lot of insert/delete on VAR* attributes (of different sizes) you
can get fragmentation
•  OPTIMIZE TABLE / Rolling restart of data nodes can help reduce
fragmentation
•  See http://johanandersson.blogspot.com/2009/03/memory-
deallocationdefragmentation-and.html
•  Management servers
•  Writes cluster log (rotating, size configurable) in its data directory
•  Cluster logs can be sent to Syslog if desired
Administration
•  MySQL Servers
•  Binary logs - (if enabled) must be removed manually (can be
done with –expire_logs_days but are you sure all have
been applied on the slave?)
•  General log / error log / slow log - does not rotate
automatically. A script called mysql_log_rotate can help.
•  Or move/cp log manually (or scripted) and do FLUSH LOGS
•  For MySQL Cluster it is also good to have a dedicated
MySQL Server for administration purposes.
•  Perform offline ALTER TABLE (like change data type etc)
Administration Layer
•  Introduce a MySQL Server for administration purposes!
•  Should never get application requests
•  Simplifies heavy (non online) schema changes
Application layer
SQL layer
Storage layer
Synchronous Replication #give explicit nodeid in config.ini: 
[mysqld] 
id=8  
hostname=X 
Admin layer # in my.cnf: 
ndb_connectstring=”nodeid=8;x,y” 
ndb_cluster_connnection_pool=1
Administration Layer
•  Modifying Schema is NOT online when you perform
the following:
•  Rename a table
•  Change data type
•  Change storage size
•  Drop column
•  Rename column
•  Add/Drop a PRIMARY KEY
•  Altering a 1GB table requires 1GB of free
DataMemory (copying)
•  Online (and ok to do with transactions ongoing):
•  Add column (ALTER ONLINE …)
•  CREATE INDEX
•  Online add node
Admistration Layer
•  ALTER TABLE etc (non-online DDL) performed on Admin
Layer!
•  1. Block traffic from
SQL layer to data nodes
App layer •  ndb_mgm>
ENTER SINGLE USER
MODE 8
•  Only Admin mysqld is now
SQL layer connected to the data nodes
STOP!! No Traffic Now! •  Or do LOCK TABLES on SQL
Layer!
•  2. Perform heavy ALTER on
Storage
admin layer
layer
Synchronous Replication •  3. Allow traffic from SQL layer
to data nodes
#give explicit nodeid in config.ini 
[mysqld] 
•  ndb_mgm> EXIT SINGLE
id=8   USER MODE
hostname=X 
Admin layer # in my.cnf:  •  Or do UNLOCK TABLES on
ndb_connectstring=”nodeid=8;x,y” 
ndb_cluster_connnection_pool=1
the whole SQL Layer!
Admistration Layer
•  You can also set up MySQL Replication from Admin layer to the
SQL layer
•  Replicate mysql database
•  GRANT, SPROCs etc will be replicated.
•  Keeps the SQL Layer aligned¨
App layer
SQL layer
Storage layer
Synchronous Replication
Admin layer
binlog_do_db=mysql
Online Upgrades
•  Change Online
•  OS, SW version (7.0.x → 7.2.x)
•  Configuration( e.g, increase DM, IM, Buffers, redo log, [mysqld] slots
etc
•  Hardware (upgrade more RAM etc)
•  These procedures requires a Rolling Restart
•  Change config.ini, copy it over to all ndb_mgmd
•  Stop ndb_mgmd , start ndb_mgmd with --reload
•  Restart one data node at a time
•  Restart one mysqld at a time
•  Adding data nodes (7.0 and above)
•  Adding MySQL Servers
•  Make sure you have free [mysqld] slots
•  Start the new mysqld
Scaling
•  One data node can (7.0+) use up to 8 cores
•  CPU: Reaches bottleneck at about 370% CPU
•  add another node group (to spread load)
•  DISK: iostat -kx 1 : Check util; await, svctime etc..
•  Add disks
•  NETWORK: iftop (linux)
•  add another node group (to spread load)
•  MySQL Server
•  CPU: About the same – 300-500%
•  Add another MySQL Server to offload query processing
•  DISK: Should not be a factor if you are using only NDB tables
•  NETWORK:
•  Add another MySQL Server to offload query processing
Monitoring
•  Mandatory to monitor
•  CPU/Network/Memory usage
•  Disk capacity (I/O) usage
•  Network latency between nodes
•  Node status ...
•  Used Index/Data Memory
•  www.severalnines.com/cmon - monitors data nodes and mysql
servers
•  New in MySQL Cluster 7.1 :
•  NDB$INFO Table in INFORMATION_SCHEMA
•  Check node status
•  Check buffer status etc
•  Statistics
Best Practice : Primary Keys
•  To avoid problems with
•  Cluster 2 Cluster replication
•  Recovery
•  Application behavior (KEY NOT FOUND.. etc)
•  ALWAYS DEFINE A PRIMARY KEY ON THE TABLE!
•  A hidden PRIMARY KEY is added if no PK is specified. BUT..
•  .. NOT recommended
•  The hidden primary key is for example not replicated (between
Clusters)!!
•  There are problems in this area, so avoid the problems!
•  So always, at least have
id BIGINT AUTO_INCREMENT PRIMARY KEY
•  Even if you don't “need” it for you applications
Best Practice : Query Cache
•  Don't enable the Query Cache!
•  It is very expensive to invalidate over X mysql servers
•  A write on one server will force the others to purge their
cache.
•  If you have tables that are read only (or change very
seldom):
•  my.cnf:
•  query_cache_type=2 (ON DEMAND)
•  SELECT SQL_CACHE <cols> .. FROM table;
•  Cache only queries with SQL_CACHE
•  This can be good for STATIC data
Best Practice : Large Transactions
•  Remember MySQL Cluster is designed for many and
short transactions
•  You are recommended to UPDATE / DELETE in small chunks
•  Use LIMIT 10000 until all records are UPDATED/DELETED
•  MaxNoOfConcurrentOperations sets the upper
limit for how many records than can be modified
simultaneously on one data node.
•  MaxNoOfConcurrentOperations=1000000 will use 1GB
of RAM
•  Despite being possible, we recommend DELETE/UPDATE in
smaller chunks.
Best Practice : Table logging
•  Some types of tables account for a lot of WRITEs, but do not

need to be recovered (E.g, Session tables)
•  A session table is often unnecessary to REDO LOG and to
CHECKPOINT
•  Create these tables as 'NO LOGGING' tables:
mysql> set @ndb_curr_val=@@ndb_table_no_logging;
mysql> set ndb_table_no_logging=1;
mysql> create table session_table(..) engine=ndb;
mysql> set ndb_table_no_logging=@ndb_curr_val;

•  'session_table' will not be
•  REDO logged or Checkpointed → No disk activity for this table!
•  After System Restart it will be there, but empty!
Best Practice : Backup
•  Backup of NDB tables
•  Online – can have ongoing transactions
•  Consistent – only committed data and changes are backed up
•  ndb_mgm -e “START BACKUP”
•  Copy backup files from data nodes to safe location
•  Non-NDB tables must be backed up separately
•  MySQL system tables are stored only in MYISAM.
•  You want to backup (for each mysql server)
•  mysql database
•  Triggers, routines, events ...
•  Use 'mysqldump'
•  mysqldump mysql > mysql.sql
•  mysqldump --no-data --no-create-info -R > routines.sql
•  Copy my.cnf & config.ini files
Best Practice: Restore
•  ndb_restore is in many cases the MOST write intensive operation on
Cluster
•  The problem is that ndb_restore produces REDO LOG
•  This is unnecessary but a fact for now
•  Restores many records in parallel, no throttling..
•  So 128 or more small records may be fine, but 128 BLOBs….
Temporary error: 410: REDO log buffers overloaded, consult online manual
(increase RedoBuffer, and|or
decrease TimeBetweenLocalCheckpoints, and|or increase NoOfFragmentLogFiles)
•  If you run into this during restore

•  Try increase RedoBuffer (a value of higher than 64MB is seldom practical nor
needed)
•  Run only one instance of ndb_restore
•  ndb_restore -p10 ....
•  Or even a lower value, e.g, -p1 RB
Synced
•  If this does not help → faster disk(s) is/are needed every
TBGCP
Resources
•  Getting Started with MySQL Cluster – 5 Steps, <15 minutes

•  http://www.mysql.com/products/database/cluster/get-started.html#quickstart
•  MySQL Cluster Evaluation Guide

•  http://www.mysql.com/why-mysql/white-papers/mysql_cluster_eval_guide.php
•  MySQL Cluster Performance Tuning Best Practices

http://www.mysql.com/why-mysql/white-papers/mysql_wp_cluster_perfomance.php

Mysql Cluster Deployment Best Practices

Uploaded by

Mysql Cluster Deployment Best Practices

Uploaded by

<Insert Picture Here>