SM SKP ADS-1
Planet MySQL - http://www.planetmysql.org/
MMUG11: Talk about binlog servers at Madrid MySQL Users Group meeting on 29th January 2015
Madrid MySQL Users Group will have its next meeting on Thursday, the 29th of January.
I will be giving a presentation on the MySQL binlog server and how it can be used to help scale reads and be used for other purposes. If you have (more than) a few slaves this talk might be interesting for you. The meeting will be in Spanish. I hope to see you there.
Details can be found on the group’s Meetup page here: http://www.meetup.com/Madrid-MySQL-users-group/events/219810484/
La próxima reunión de Madrid MySQL Users Group tendrá lugar el jueves 29 de enero. Ofreceré una presentación sobre el MySQL binlog server y como se puede utilizar para ayudar con la escalada de lecturas a la base de datos y para otros usos. La reunión será en español. Espero veros allí.
Se puede encontrar más detalles en la página del grupo: http://www.meetup.com/Madrid-MySQL-users-group/events/219810484/.
Performance issues and fixes -- MySQL 5.6 Semi-Synchrnous Replication
Long time ago I wrote a blog post that we started using Semi-Synchronous replication in production at Facebook. We are running our own MySQL facebook-5.6 branch, which added many replication (including Semisync) patches. Some of them were also merged with WebScaleSQL. Recently I heard from people from community that they were suffering from Semisync performance issues in official MySQL 5.6. I took some time to review our previous MySQL bug reports, then realized that some important bugs were either still "verified" or inefficiently fixed. Two most affecting bug reports were https://bugs.mysql.com/bug.php?id=70342 and http://bugs.mysql.com/bug.php?id=70669. We fixed both at our branch so I haven't paid much attention after that, but people outside Facebook are certainly affected. In this post, I'm going to describe some effective configurations to get better Semisync throughput on master and slaves, by showing simple benchmark numbers. I used three machines -- client and master and semisync slave --, all running on pure flash. They are located within very close distance. I created 100 databases and enabled Multi-Threaded-Slave, and ran 100 mysqlslap processes for 100 databases, with 30 concurrent connections each (3000 concurrent connections in total). All queries were auto-committed inserts and I used InnoDB storage engine on both master and slaves.1. Set master_info_repository=TABLE MySQL 5.6 and 5.7 have a performance bug that writing FILE based master info (and relay log info) files are very expensive. This is especially serious for Semisync replication, since this bug slows down IO thread. On Semisync, slow IO threads takes longer time to send ACK back to the master, so it slows down master throughput as well. Default master_info_repository is FILE, so without changing this parameter to TABLE, you are affected by this bug. Here are benchmark results between FILE and TABLE.5.6.22 master_info_repository=FILE : 5870/s5.6.22 master_info_repository=TABLE: 7365/s These numbers were the number of commits per second on both master and slave instances. Slave didn't lag for most experiments, thanks to multi-threaded slave feature. Please don't confuse between master_info_repository and relay_log_info_repository parameters. relay_log_info_repository has to be TABLE, otherwise crash safe slave doesn't work. master_info_repository works on both FILE and TABLE, but I suggest to use TABLE for performance reasons.2. Reduce durability on master Older 5.6 had a bug that slaves couldn't continue replication after crashed master's recovery, even if setting fully durable configurations. Here is a closed bug report. This bug report was closed, but it caused some performance regression. Master extended LOCK_log mutex holding duration -- releasing LOCK_log mutex after calling fsync(). This certainly fixed the bug, but caused performance regression because LOCK_log was very hot mutex in 5.6 -- both Binlog Dump thread and application threads need to hold the lock. Hopefully reducing durability (I mean setting sync_binlog=0 and innodb_flush_log_at_trx_commit=0|2) mitigates the regression a lot. When using Semisync replication, you are most certainly to promote a slave on master failure, so durability on master does not matter much == You can reduce durability on master. 5.6.22 master_info_repository=TABLE, full durability: 7365/s5.6.22 master_info_repository=TABLE, less durability: 9800/s3. Loss Less Semisync MySQL 5.7 improved replication performance a lot. Probably the most effective improvement was that Binlog Dump thread no longer held LOCK_log mutex. In addition to that, 5.7 introduced "Loss-Less Semisync" feature. If you read my previous Semisync blog post carefully, you may have noticed that 1. we backported "Loss-Less Semisync" from 5.7, and 2. we got better throughput with Loss-Less Semisync than Normal Semisync. This was because Loss-Less Semisync actually reduced mutex contentions -- LOCK_commit and LOCK_binlog_. My 5.7 benchmark result was as follows.5.7 Normal Semisync: 12302/s5.7 Loss Less Semisync: 14500/s(master_info_repository=TABLE, less durable) Reducing LOCK_log contention on Binlog Dump thread and introducing Loss Less Semisync were major contributors in MySQL 5.7 performance improvements. At Facebook, we ported both in our 5.6 branch. It would be interesting for community if these can be available on other distributions, since using 5.7 in production will not happen anytime soon.In addition to the above three configurations, there are still some considerations to make Semisync throughput not bad / better.4. Semisync mysqlbinlog At Facebook, we implemented mysqlbinlog to speak Semisync protocol, and used it as a Semisync replication reader. On replication slaves, IO thread and SQL thread conflict with internal mutexes. As I mentioned above, slow IO thread slows down Semisync master throughput. Semisync mysqlbinlog doesn't have such slowdown, because it doesn't have SQL thread. So using Semisync mysqlbinlog instead of Semisync slave can improve master throughput.5.7 Loss Less Semisync + Semisync slave: 14500/s (on both master and slave)5.7 Loss Less Semisync + Semisync mysqlbinlog + async slave: 48814/s on master, 15363/s on slave This shows Semisync mysqlbinlog improved master throughput. But this is actually not so good news -- because slave lags a lot. Fundamentally we need to fix mutex contentions between IO thread and SQL threads.5. GTID There are some open performance bugs in GTID. Especially 5.7 one is serious. If you really need high throughput Semisync, you need to carefully benchmark with GTID (and ask Oracle to fix!).Here are whole table definitions and mysqlslap commands I used for benchmark.for i in `seq 1 100`domysql -e "drop database test$i"mysql -e "create database test$i"mysql test$i -e "create table t1 (id int auto_increment primary key, \value int, value2 varchar(100)) engine=innodb"donefor i in `seq 1 100`domysqlslap --host=master --query="insert into test$i.t1 \values (null, 1, 'abcdefghijklmn')" --number-of-queries=100000 --concurrency=30 &done
Presenting Undrop for InnoDB Toolkit on Percona Live 2015
I’ll be presenting Undrop for InnoDB data recovery toolkit on Percona Live 2015. The conference takes place in Santa Clara on 13-16 April.
You may wonder why do I care if I plan to drop none of my production databases. To name few reasons:
Taxes, death and data loss are inevitable
Good knowledge of how InnoDB stores data on disk help to design fast and scalable databases
The toolkit can be used to troubleshoot bugs in InnoDB.
So bookmark the session, it’s going to be a great discussion.
Undrop for InnoDB | Percona Live MySQL Conference 2015
by The post Presenting Undrop for InnoDB Toolkit on Percona Live 2015 appeared first on Backup and Data Recovery for MySQL.
How to Bootstrap MySQL/MariaDB Galera Cluster
January 27, 2015
Unlike standard MySQL server and MySQL Cluster, the way to start a MySQL/MariaDB Galera Cluster is a bit different. Galera requires you to start a node in a cluster as a reference point, before the remaining nodes are able to join and form the cluster. This process is known as cluster bootstrap. Bootstrapping is an initial step to introduce a database node as primary component, before others see it as a reference point to sync up data.
How does it work?
When Galera starts with the bootstrap command on a node, that particular node will reach Primary state (check the value of wsrep_cluster_status). The remaining nodes will just require a normal start command and they will automatically look for existing Primary Component (PC) in the cluster and join to form a cluster. Data synchronization then happens through either incremental state transfer (IST) or snapshot state transfer (SST) between the joiner and the donor.
So basically, you should only bootstrap the cluster if you want to start a new cluster or when no other nodes in the cluster is in PRIMARY state. Care should be taken when choosing the action to take, or else you might end up with split clusters or loss of data.
The following example scenarios illustrate when to bootstrap the cluster:
How to start Galera cluster?
The 3 Galera vendors use different bootstrapping commands (based on the software’s latest version). On the first node, run:
$ service mysql bootstrap
Percona XtraDB Cluster:
$ service mysql bootstrap-pxc
MariaDB Galera Cluster:
$ service mysql bootstrap
The above command is just a wrapper and what it actually does is to start the MySQL instance on that node with gcomm:// as the wsrep_cluster_address variable. You can also manually define the variables inside my.cnf and run the standard start/restart command. However, do not forget to change wsrep_cluster_address back again to contain the addresses to all nodes after the start.
Reading RBR binary logs with pt-query-digest
For purposes of auditing anything that goes on our servers we're looking to parse the binary logs of all servers (masters), as with "Anemomaster". With Row Based Replication this is problematic since pt-query-digest does not support parsing RBR binary logs (true for 2.2.12, latest at this time).
I've written a simple script that translates RBR logs to SBR-like logs, with a little bit of cheating. My interest is that pt-query-digest is able to capture and count the queries, nothing else. By doing some minimal text manipulation on the binary log I'm able to now feed it to pt-query-digest which seems to be happy.
The script of course does not parse the binary log directly; furthermore, it requires the binary log to be extracted via:
mysqlbinlog --verbose --base64-output=DECODE-ROWS your-mysql-binlog-filemame.000001
The above adds the interpretation of the RBR entires in the form of (unconventional) statements, commented, and strips out the cryptic RBR text. All that is left is to do a little manipulation on entry headers and uncomment the interpreted queries.
The script can be found in my gist repositories. Current version is as follows:
# Convert a Row-Based-Replication binary log to Statement-Based-Replication format, cheating a little.
# This script exists since Percona Toolkit's pt-query-digest cannot digest RBR format. The script
# generates enough for it to work with.
# Expecting standard input
# Expected input is the output of "mysqlbinlog --verbose --base64-output=DECODE-ROWS <binlog_file_name>"
# For example:
# $ mysqlbinlog --verbose --base64-output=DECODE-ROWS mysql-bin.000006 | python binlog-rbr-to-sbr.py | pt-query-digest --type=binlog --order-by Query_time:cnt --group-by fingerprint
inside_rbr_statement = False
for line in fileinput.input():
line = line.strip()
if line.startswith("#") and "end_log_pos" in line:
for rbr_token in ["Update_rows:", "Write_rows:", "Delete_rows:", "Rows_query:", "Table_map:",]:
if rbr_token in line:
line = "%s%s" % (line.split(rbr_token), "Query\tthread_id=1\texec_time=0\terror_code=0")
if line.startswith("### "):
inside_rbr_statement = True
# The "### " commented rows are the pseudo-statement interpreted by mysqlbinlog's "--verbose",
# and which we will feed into pt-query-digest
line = line.split(" ", 1).strip()
inside_rbr_statement = False