Skip to main content

Logging with MySQL

# Be aware that this log type is a performance killer.

I read this line from configuration file of MySQL, when trying to enable disabled-by-default general logging option for MySQL Server, in Linux. I decided to dig in deep and find out why? And what are the alternatives?

Performance loss:

If you enable MySQL logging, you'll notice instant drop in performance if you are dealing with large data, being manipulated by over a dozen connections. The primary reason for this is the IO operation after every database event; if the database repository and log files are on the same storage device, the performance reduces even more due to the simultaneous write operations (not to forget that write is more expensive than read). Secondarily, the size of the log file is from 4 to 20 times than actual data, depending on the architecture of data feeding application (you try this yourself). This is because the file contains not only the actual values present in the data, but also the queries it used. So a table with columns, say: ID, Name and Gender may contain:
But the general log file is going to look like:
                   95 Connect   root@localhost on
                   95 Query     set autocommit=1
                   95 Query     SHOW SESSION VARIABLES LIKE 'lower_case_table_names'
                   95 Query     SET CHARACTER SET utf8
                   95 Query     SET NAMES utf8
                   95 Query     SET SQL_SAFE_UPDATES=1
                   95 Query     SELECT CONNECTION_ID()
                   95 Query     SELECT current_user()
                   95 Query     USE `test`
                   95 Query     set autocommit=1
                   95 Query     SHOW SESSION VARIABLES LIKE 'sql_mode'
                   95 Query     SELECT current_user()
121218 18:25:49    94 Query     SHOW TRIGGERS FROM `test` LIKE 'test'
                   94 Query     SHOW INDEXES FROM `test`.`test`
                   94 Query     SHOW CREATE TABLE `test`.`test`
                   94 Query     SHOW FULL COLUMNS FROM `test`.`test`
121218 18:26:03    95 Query     insert into test.test (id, name, gender) values (1, 'ALI', 'M')
                   95 Query     SHOW WARNINGS
121218 18:26:04    95 Query     insert into test.test (id, name, gender) values (2, 'OWAIS', 'M')
                   95 Query     SHOW WARNINGS
121218 18:26:05    95 Query     insert into test.test (id, name, gender) values (3, 'FATIMA', 'F')
                   95 Query     SHOW WARNINGS
Now think of reproducing the example for a web application, making a HTTP call that might be calling 5 queries.
Another addition to this is the SELECT queries, that have nothing to do with data modification but are still logged. The use of such logging is minimal, as compared to the DML queries. Not only this, some applications use standards and frameworks, which generate fully-qualified queries; "Hibernate" framework for Object-Relation Mapping is an example. The cost of logging fully-qualified queries is high, take an example:

Query with aliases:
select,, t.gender from test as t

Fully-qualified query:
select `test`.`id`, `test`.`name`, `test`.`gender` from `mydb`.`test`

Logging Types:

MySQL has multiple ways to handle logs, depending on the query types and responses:
  1. Error Log: contains critical errors that occur during starting and stopping mysqld, the MySQL Server's executable.
  2. General Query Log: contains activities performed during the execution of mysqld; in other words, it is use used to keep track of MySQL client activities. These activities are logged as they are received.
  3. Binary Log: a binary file - unlike General Query Log file, which is text - containing activities that change database and its schema. These activities are logged after their execution.
  4. Slow-query Log: consists of queries that take time longer than specified (set in variable long_query_time in configuration file.


Using Existing Application/Web Server Logs

Every appropriately designed application logs its activities to some extent; also, if your application is hosted on a web server, like Tomact or IIS, it too will have its own logging mechanism. You can use these logs to track the database activities and keep the database logs disabled.

Note that this will work only when logs are not read on regular basis because then, you'll have to write a parser to extract useful information from the log for quick readability; the cost of the extraction process will be another addition, so choose this option carefully. Also, if more than one projects are being hosted on the same web server, then this solution may not suit because a single log file will contain a mix of activities from all the applications unless you configure your web server to log every application on a separate file.

Logging into MySQL Tables

MySQL Server allows to log the activities into database instead of a file. You can configure this option in configuration file by setting variable log-output = TABLE (or adding this line if it doesn't exist, it should be set to FILE by default).

Doing so will record the logs in separate tables for each type of log in mysql schema.

Using stored procedures

MySQL Server only logs the direct statements executed. If you have queries that execute in batches, it is better for logging efficiency to put them in stored procedures.

Log maintenance using scheduled jobs

By using MySQL Events or OS tasks like cronjob, you can archive old logs as well as remove unwanted queries. Here is an example using MySQL events:
First, make sure Event Scheduler is enabled by adding event-scheduler variable (if it doesn't exist). This requires restarting the MySQL Server:
event-scheduler = ON
Now create event:

   CREATE TABLE IF NOT EXISTS mysql.general_log_archive LIKE mysql.general_log;
   INSERT INTO mysql.general_log_archive
   SELECT * FROM mysql.general_log;
   TRUNCATE mysql.general_log;
   CREATE TABLE IF NOT EXISTS mysql.slow_log_archive LIKE mysql.slow_log;
   INSERT INTO mysql.slow_log_archive
   SELECT * FROM mysql.slow_log;
   TRUNCATE mysql.slow_log;


The event will fire like a Trigger (in fact, events are temporal triggers) on its start time.

Tuning up more

- Limiting Databases: setting binlog-do-db variable in config file will limit logging to only the databases listed. You can define this like:
binlog-do-db = test
binlog-do-db = mysql
binlog-do-db = my_db

- Indexing: certain columns can be indexed to improve querying, like:
ALTER TABLE mysql.general_log ADD INDEX (event_time);

ALTER TABLE mysql.slow_log ADD INDEX (db);

- Convert log table(s) to MyISAM: since MyISAM storage engine is a better option for huge data, you may convert the engine of log tables to MyISAM. Example:
ALTER TABLE mysql.slow_log ENGINE = MyISAM;

Query Results:

I performed some experiments using 30,000 INSERTUPDATE and DELETE queries on random data to illustrate the difference in performance with and without logging:

- Logging disabled: 138 sec
General Log enabled on File: 139 sec
General Log enabled on Table: 141 sec
- Binary Logging enabled on File165 sec
- Binary Logging enabled on Table143 sec
- All Logs (General, Binary, Error, Slow-query) enabled on File: 174 sec
- All Logs (General, Binary, Error, Slow-query) enabled on Table: 152 sec

Below are the results of a larger number of queries on a real data:

- Logging disabled: 335 sec
All Logs (General, Binary, Error, Slow-query) enabled on File432 sec
All Logs (General, Binary, Error, Slow-query) enabled on Table: 422 sec

Was the article helpful for you? Please provide feedback. Corrections are more than welcome.

Owais Ahmed


Popular posts from this blog

A faster, Non-recursive Algorithm to compute all Combinations of a String

Imagine you're me, and you studied Permutations and Combinations in your high school maths and after so many years, you happen to know that to solve a certain problem, you need to apply Combinations.

You do your revision and confidently open your favourite IDE to code; after typing some usual lines, you pause and think, then you do the next best thing - search on Internet. You find out a nice recursive solution, which does the job well. Like the following:

import java.util.ArrayList;
import java.util.Date;

public class Combination {
   public ArrayList<ArrayList<String>> compute (ArrayList<String> restOfVals) {
      if (restOfVals.size () < 2) {
         ArrayList<ArrayList<String>> c = new ArrayList<ArrayList<String>> ();
         c.add (restOfVals);
         return c;
      else {
         ArrayList<ArrayList<String>> newList = new ArrayList<ArrayList<String>> ();
         for (String o : restOfVals) {

Executing MapReduce Applications on Hadoop (Single-node Cluster) - Part 1

Okay. You just set up Hadoop on a single node on a VM and now wondering what comes next. Of course, you’ll run something on it, and what could be better than your own piece of code? But before we move to that, let’s first try to run an existing program to make sure things are well set on our Hadoop cluster.
Power up your Ubuntu with Hadoop on it and on Terminal (Ctrl+Alt+T) run the following command: $
Provide the password whenever asked and when all the jobs have started, execute the following command to make sure all the jobs are running: $ jps
Note: The “jps” utility is available only in Oracle JDK, not Open JDK. See, there are reasons it was recommended in the first place.
You should be able to see the following services: NameNode SecondaryNameNode DataNode JobTracker TaskTracker Jps

We'll take a minute to very briefly define these services first.
NameNode: a component of HDFS (Hadoop File System) that manages all the file system metadata, links, trees, directory structure, etc…

Titanic: A case study for predictive analysis on R (Part 1) is a popular community of data scientists, which holds various competitions of data science. The article performs predictive analysis on a benchmark case study -- Titanic, picked from -- in-depth.

The case study is a classification problem, where the objective is to determine which class does an instance of data belong to. This can also be called prediction problem, because we are predicting class of a record based on its attributes.

Note: This tutorial requires some basic R programming background. If you haven't yet gotten yourself acquainted with R, maybe this is the right time. Codeacademy's tutorial is my personal recommendation. We will be using RStudio here, the most used IDE for 'R' language. It is free and open-source, you can download it here.

RMS Titanic was a British cruise that sank on its course in the North Atlantic Ocean on its maiden voyage. 1502 people, out of 2224 on board lost their lives in this disaster. Due to lack of li…