MySQL - Architecture

Many companies use MySQL mainly because it's free, reliable, easy to setup and maintain. I have be using MySQL for many years but have never really gone into depth, mainly because once it has been installed it happily sits working for years with only slight modifications, the reason is that it is a simple database and not to bloated with many features that you don't need, the other feature is that MySQL can have a number of different engine types to suit the application.

MySQL is a open source database which means it's freely available with free redistribution, this means you have full access to the source code. MySQL began as Unireg that was developed by Michael "Monty" Widenius for a swedish company called TcX during the 1980's, the My part is Monty's daughters name. The initial release in 1995 had a SQL interface and a dual license model, a free and an embedded version. David Axmark, Monty and Allen Larrson founded MySQL AB in 1995, it was taken over by Sun Microsystems in 2008 and Sun itself was taken by Oracle in 2010.

MySQL is written in C and C++ and in 2001 MySQL began supporting transactions with the integration of the BDB and InnoDB engines (the default engine), this allowed for safer handling of concurrent write operations, which began the trend of adding features needed by the Enterprise environments.

MySQL supports the following platforms and has both 32-bit and 64-bit versions available

Linux
Solaris
Windows
AIX
HPUX

Although MySQL comes with no tools, there are a number of graphical tools available the main one being the MySQL workbench.

MySQL comes in a number of flavors, commercial customers have a number of different choices depending on your needs.

Community Server (free edition)
Standard Edition
Enterprise Edition
Cluster Carrier Grade edition

All the versions have the following features

Pluggable Storage Engine Architecture
Multiple Storage Engines InnoDB, MyISAM, NDB (MySQL Cluster), Memory, Merge, Archive, CSV, etc
Replication
Partitioning
Stored Procedures, Triggers, Views
Information Schema
MySQL connectors (ODBC, JDBC, .NET, etc)
MySQL Workbench for visual modeling, SQL development and administration.

Lastly before I start on the architecture there are a number of user community projects or resources that you may be interested in

MySQL Blog - good place to start to see other users experience with MySQL
MySQL Podcasts - good place to get some learning tips and presentations can even download to your phone for viewing on the train, etc
MySQL Newsletter - get the latest information on MySQL what's coming and any security notices, etc

MySQL Storage Engines

MySQL is very different from other databases, in that it's storage engine is pluggable, what I mean by this is that the MySQL server core code is separate from the storage engine, which means that you can have a pluggable storage engine that fits your application. MySQL has over 20 storage engines, here is a list of the common ones

Storage Engine	Transactional Support	Locking Level	Online Non-blocking Backup	Server Version(s) available
InnoDB (default)	Yes	Row	Yes	5.1, 5.5, 5.6
MyISAM/Merge	No	Table	No	5.1, 5.5, 5.6
Memory	No	Table	No	5.1, 5.5, 5.6
Marta	Yes	Row	No	5.1, 5.5, 5.6
Falcon	Yes	Row	Yes	5.6
PBXT	Yes	Row	Yes	5.1, 5.5, 5.6
FEDERATED	No	n/a	n/a	5.1, 5.5, 5.6
NDB	Yes	Row	Yes	5.1 and available in MySQL Cluster
Archive	No	Row	No	5.1, 5.5, 5.6
CSV	No	Table	No	5.1, 5.5, 5.6

The different storage engines can vastly improve a database performance when using the correct one, below is a list of features that may make you decide to change from the default InnoDB storage engine

Transactional Support - if you application does not require transactional support then you can reduce the resources by using a non-transaction engine
Table-Level features - if you need specific features for an example the checksum feature then you would have to use the MyISAM engine
Locking - Depending on your application you may want row or table locking
Indexing - different storage engines use different indexing strategies that meet different application types, OLAP, Data Warehouse, etc
Foreign Keys - a number of storage engines do not support foreign keys, so double check when you select a particular engine
Buffering - Some engines use buffering others don't, for example the MyISAM engine does not buffer data but supports multiple buffers for indexes
File Storage - some engines store their data/indexes in self-contained files, others use centralized metadata this has a impact if you want to move the data files to another server
Backup - some engines allow you to perform hot backups, others you need to take the database down before performing a backup

Although the idea is great sometimes you can get too bogged down on what engine to choose, hence why most mysql storage engines will be either be MyISAM or Innodb. I am only going to cover MyISAM, Innodb, MEMORY and CSV in depth, and leave you to the MySQL documentation regarding the others.

MyISAM was the default engine until recently, this engine has been used since MySQL version 3.2, it is a non-transactional engine and does not implement any additional locking mechanisms, this may cause problems when you have a large number of concurrent writes to a table. If your application has a lot of write activity the writes will end up blocking the reads which will cause performance problems, thus you should use either a InnoDB or Falcon engine.

MyISAM has three files associated with it, because the three files represent a table they can be simply copied to another server and used, however to avoid corruption you should take down mysql server before copying. The other server must have the same endian format as the source server, so you cannot copy from linux X86 server to a Sparc server for instance.

.frm	the table format file
.MYD	the data file
.MYI	the index file

MyISAM has the following features

Non-transactional
No foreign key support
FULLTEXT indexes for text matching
No data cache
Index caches can be specified by name
Implements both HASH and BTREE indexes
Table level locking
Very fast read activity, suitable for data warehouses
Compressed data (with myisampack)
Online backup with mysqlhotcopy
Maximum of 64 indexes per table

You can set a number of parameters within the my.cnf configuration file that relate to the MyISAM engine

key_buffer_size	Determines the size of the memory cache used for storing MyISAM indexes, the default is 8MB and the maximum is 4GB
concurrent_insert	Determines the behavior of concurrent inserts 0 - disables concurrent inserts 1 (default) - concurrent inserts with no data gaps are enabled 2 - concurrent inserts with data gaps are enabled
delay_key_write	Delays updating indexes for MyISAM tables until table are closed. ON (default) - MyISAM tables that have delay_key_write option defined will delay index updates OFF - disable delayed index writes entirely ALL - enable delayed index writes entirely
max_write_lock_count	Determines how many writes to a table take precedence over reads, this helps with read starvation due to constant writes to a table, the default is 4294967295 I will cover this more in SQL Extensions section.
preload_buffer_size	Determines the size of the buffer used for index preloading of the key cache, the default is 32KB.

There are three utility programs that can be used with MyISAM tables

myisamchk

used to analyze, optimize and repair tables, to avoid data corruption you should shutdown mysql when performing any of the below actions.

# check a table
myisamchk /var/lib/mysql/<table_name>.MYI

# Repair a table
myisamchk -r /var/lib/mysql/<table_name>.MYI

myisampack

used to create compressed, read-only tables, to avoid data corruption you should shutdown mysql when performing any of the below actions.

# compress a table
myisampack <table_name>.MYI

# Rebuild the indexes for optimal performance
myisampack --rq --sort-index -analyze <table_name>.MYI

myisam_ftdump

used to display information about fulltext fields in tables

# First you need the program to analyze, use show create table command
show create table employees\G

# Now you can dump the table, notice the index number at the end,
myisam_ftdump employees 1

I am just going to talk about the Merge Storage engine as it is related to the MyISAM one, it is actually a sort of wrapper around MyISAM tables with the same schemas. All underlying tables can be queried at once by querying the Merged table. This is one way on how to implement partitioning in MySQL, the merged tables use the same buffers and configurations as the underlying MyISAM tables. Two files are created .frm file contains the table format and the second file .MRG contains the names of the underlying MyISAM tables, as a side note you cannot use the replace statement with merged tables. The benefits of merged tables are better management of tables and better performance, using merged tables with smaller underlying tables not only speeds up these operations because of the smaller table size but it will allow you to rotate out the table from use by modifying the merge table definition to exclude it while maintenance is occurring.

InnoDB is now the default storage engine in MySQL, it has the following key features

Transactional support provided by MVCC (Multi Version Concurrency Control)
Row-level locking
Foreign key support
Indexing using clustered B-tree indexes
Configurable buffer caching of both data and indexes
Online non-blocking backup through separate commercial backup program

With the Innodb storage engine you have control of the format and the location of the tablespace, a tablespace is a logical group of one or more data files in a database (yes the same as in Oracle), using parameters you can control the path the home directory and if you want to use separate files or a shared tablespace.

innodb_data_file_path	Determine both the path to individual centralized data files (shared tablespace) and the size of the files
innodb_data_home_dir	The common part of the directory path for all InnoDB data files
innodb_file_per_table	If enable then any InnoDB tables will be using their own .idb file for both data and indexes rather than in the shared tablespace

The full path to each shared tablespace is formed by adding innoDB_data_home_dir to each path specified in the innoDB_data_file_path, the file sizes can be specified in K, M or G. By default if innoDB_data_file_path is not specified a 10MB ibdata1 file is created in the data directory (datadir). You cannot just move the data files from one system to another like you can in the MyISAM engine.

There are a number of configuration parameters that you can use with the InnoDB storage engine

innodb_buffer_pool_size

Determines the size of the buffer that the Innodb storage engine uses to cache both data and indexes

innodb_flush_log_at_trx_commit

Configures how often the log buffer is flushed to disk

0 - log buffer is written every second
1 - every commit will make the log buffer flush
2 - basically is a combination of 0 and 1

innodb_log_file_size

Determines the size (in bytes) of each of the Innodb log files, by default they are 5MB. Remember the bigger the file the slower recovery after a server crash.

The larger the buffer the more data you can hold in memory which in turn increases performance, thus try to keep your most frequently used data in memory. Don't go too mad as to much memory could cause swapping by the operating system which in turn will degrade performance.

You can get detailed information regarding your InnoDB storage engine using the show command, there are a number of status sections which I will highlight now

show command	show engine innodb status
Innodb Status sections
semaphores	Reports threads waiting for semaphore and statistics on how many times threads have been forced to wait for an OS call, waiting on a spin wait, or a mutex or rw-lock semaphore. A large number of threads waiting indicates that there is a disk I/O problem or contention issues within the InnoDB.
foreign key errors	Displays information about foreign key problems
deadlocks	Displays information about the last deadlock that occurred
transactions	Reports lock waits, which may highlight lock contention within your application, it can also help detecting transaction deadlocks.
file I/O	Show activity about the threads used by the I/O
insert buffer and adaptive hash index	Displays information about insert buffer including size and amount of free space.
log	show information about the log files
buffer pool and memory	show buffer pool activity, including hit rates, etc
row operations	show activity of the main thread

InnoDB uses shared tablespace for all tables and indexes that can consist of one or more files, hey are generally located in the datadir directory, the files contain metadata and are referred to as ibdata files. When the tablespace becomes fragmented the files are not shrunk, but the free space is still available to the database, you can view this free space by using the data_free field of the information_schema.tables system view and the data_free field of show table status.

You can have separate files instead by using the parameter innodb_file_per_table as mentioned above, however a small amount of metadata will be in the shared tablespace, to defrag the tablespace use the optimize table command which will be discussed in another section entitle SQL Extensions.

When working with ibdata files you can add additional files, these can be split over different disks, but with SAN replacing the old disk technology this seems a lesser requirement today, here are some examples on creating the ibdata files

ibdata files

## Create a 1GB ibdata file that will autoextend if needed
innodb_data_file=ibdata1:1024M:autoextend
innodb_data_file=ibdata1:1G:autoextend

## create 1GB ibdata file and a second 1GB ibdata file that will autoextend
innodb_data_file=ibdata1:1024M;ibdata2:1024M:autoextend

MySQL will not stripe across disks, thus the first data file must be full before using the second data file, also some tables may be split across the two files when you start to use the second file, for example say that you have 1MB free in the first file and you create a 2MB table, 1MB will be in the first file and the second 1MB will be in the second file.

If you wish to change the size of the data files there is no easy way, the following steps must be taken

perform a export of all the InnoDB tables
shutdown MySQL
change the configuration options as desired to remove or reorganize a tablespace
move the existing ibdata file and log files to a backup location
restart MySQL
import the InnoDB tables
once happy you can delete the old database files to recoup some space back

Due to the log files, InnoDB automatic crash recovery is possible, just like in other databases all changes are recorded in the log files and replayed back if the server were to crash. There are a number of parameters that can affect the log files

innodb_log_files_in_group	the log files are written in a circular manner, they are initialized to there full size when MySQL starts
innodb_log_file_size	sets the logfile size in each group, the total size of all the logfiles must be less than 4GB, but this may be due to the O/S limitations so check with the latest MySQL release
innodb_fast_shutdown	this determines if logs files are purged during a shutdown which means it may take longer to completely shutdown MySQL 0 - log files are purged after shutdown 1 - log files are not purged after shutdown (default) 2 - simulates a crash, which causes the log buffer to be written after every commit and flushes the log files to disk once per second
innodb_flush_log_at_trx_commit	controls how often the log files are written to 0 - causes the log files to be written to disk once per second 1 - causes the log files to be written after every transaction commit (default) 2 - a combination of 0 and 1
innodb_flush_method	changes how Innodb opens and flushes data log files, this is a trade off between performance and inconsistent data during a crash, I will discuss this further in tuning.
innodb_log_buffer_size	this is a write buffer for the log files, the larger the buffer the less often the log files are written to thus saving I/O.

MEMORY storage engine creates a table in memory only, this allows for extremely fast data access, however the drawback is that data is not persisted across reboots, also if you replicate this table to a slave server, it will also truncate the slave table as well even though it did not restart. If rows are deleted from a memory table you must use either alter table or optimize table to defrag the table and reduce the amount of memory used.

The MEMORY storage has the following features

All data is stored in memory
Non-transactional
No foreign key support
very fast read and write activity due to being entirely in memory
table-level locking
A MEMORY table can include yo to 32 indexes per table
implements both hash and b-tree indexes
tables use a fixed length row storage format and thus cannot be defined with data types of blob or text

The MEMORY storage engine has a few parameters that you can tweak

max_heap_table_size	the maximum size of MEMORY tables is limited by the system variables, default is 16MB
init_file	this can be used to specify a file to use to populate a MEMORY table when mysqld starts

The last storage engine I am going to cover is the CSV storage engine (Comma Separated Value), it is simply a text file that can be manipulated with a simple text editor, this is ideal for data exchange or data importing.

CSV storage has the following features

Plain text data in CSV format
Easily imported into programs such as Microsoft Excel
Table-level locking
No foreign key support
Non-transactional
Trivial backups and restores (just copy the files)
Does not support indexes or partitioning

Three files are created .frm which contains the table format, .CSM which contains metadata and lastly .CSV file which contains the data.

There are many other storage engines that may meet a particular need so I will leave you to the MySQL documentation on these.

The overall picture of MySQL is below

I want to explain how all this comes together inside MySQL, firstly you would create a database (or schema as it means the same thing), then inside each database you create tables that can be associated with a different type of storage engine, for example I could create four tables all using different storage engines all within the same database (schema), the picture below describes a single mysqld daemon running two databases/schemas each with fours tables, the tables can be associated with different storage engines.

This is a totally different way of setting up a database to other databases like Oracle where there is only one engine.

Lastly the MySQL data dictionary is stored in the mysql database, a view only copy is stored in the information_schema database, it is a simple set of tables.

Accessing MySQL

This is a short section on how to access MySQL, you have both commandline and GUI tools that allow you to access the mysqld daemon, you should really learn both. I prefer the commandline tool mysql it has a number of options

mysql Connect String
mysql	mysql -u pvalle -h mysqldb -ppassword <database> Note: when entering the password make sure there are no space between the password and the -p, if you don't specify a password then you will be prompted for one.
mysql command options
-u or --user	You can specify a specific user
-p or --password	You can enter the users password
-h or --host	You can select a particular host which should have a running MySQL daemon
-S or --socket	You can select a particular socket, if on the same server as MySQL, then by default you would use the /tmp/mysqld.sock
-D or --database	You can select a particular database you want to connect too, you can specify the database without using these options as in the example above
-V or --version	obtain the version number
-v or --verbose	display more verbose output, good for tracking down problems when the connect string is not working
--delimiter	change the default delimiter which is a semi-colon (;)
--protocol	You can select a different protocol to use Unix - Socket or TCP Windows - TCP, PIPE or MEMORY

You also have the option to feed in either a sql file or sql commands from the commandline

mysql and sql commands

mysql -u pvalle -h mysqldb -ppassword <database> -e "select * from mysql"

mysql -u pvalle -h mysqldb -ppassword <database> < sql_commands.sql

Once you have connected to MySQL, there are a number of useful commands

List all commands	help \h ?
List databases	show databases;
Select a particular database	\u mysql use mydatabase
Change the delimiter	\d @ delimiter @
Change the output to vertical	# Just put a \G at the end of the select statement instead of the normal delimiter select * from tables\G
Run a shell command	\! uptime
list connected database and user	select user(), database();

Another useful command is mysqladmin, which allows you to perform adminisitive tasks

Command	Usage	Description
create	mysqladmin create test2	create a database called test2
debug	mysqladmin debug	send additional detailed logging to the error log
drop	mysqladmin drop test2	drop the database called test2
extended-status	mysqladmin extended-status	displays mysqld system variables and their current values
flush-hosts	mysqladmin flush-hosts	clears internal information about hosts including the DNS cache and hostnames blocked due to too many connection errors
flush-logs	mysqladmin flush-logs	flushes server logs by closing current logs and reopening new log files.
flush-privileges	mysqladmin flush-privileges	reloads the grant tables, thus refreshing users privileges
flush-status	mysqladmin flush-status	resets most server status variables
flush-tables	mysqladmin flush-tables	closes currently open table file handles, it waits for the current thread connections to finish before releasing file handles used by those connections
flush-threads	mysqladmin flush-threads	resets the thread cache
kill	mysqladmin kill 50	kill specified client threads
password	mysqladmin password <password>	change the connection password for the user account specified
old-password	mysqladmin old-password <password>	change the connection password for the user account specified, however stores the password in the old less secure way using 16 characters instead of 41 characters
ping	mysqladmin ping	determines if the server is online and available
processlist	mysqladmin processlist	displays all active server threads
reload	mysqladmin reload	see flush-privileges
refresh	mysqladmin refresh	similar to flush-hosts and flush-logs
shutdown	mysqladmin shutdown	stop the mysqld daemon cleanly
start-slave	mysqladmin start-slave	starts replication
status	mysqladmin status	Displays a number of global status variables
stop-slave	mysqladmin stop-slave	stop replication
variables	mysqladmin variables	displays the global status variables and values
version	mysqladmin version	displays the version of MySQL

With some of the mysqladmin options you can use two additional options --sleep and --count, if you have ever used vmstat in the unix world then they act in the same way, for example you could use the following command to take readings every 5 seconds for 5 times, you can also use --debug-info to get additional information on the status for example

sleep and count	mysqladmin status --sleep 5 --count 5
--debug-info	mysqladmin status --debug-info

Lastly there are a number of GUI tools that do the same thing as above, I prefer the commandline but some people prefer a GUI, here are some that you can download and try

SQLyog
phpMyAdmin
MySQL Query Browser
MySQL Administrator
MySQL Workbench