Skip to main content

Logging with MySQL

# Be aware that this log type is a performance killer.

I read this line from configuration file of MySQL, when trying to enable disabled-by-default general logging option for MySQL Server, in Linux. I decided to dig in deep and find out why? And what are the alternatives?

Performance loss:

If you enable MySQL logging, you'll notice instant drop in performance if you are dealing with large data, being manipulated by over a dozen connections. The primary reason for this is the IO operation after every database event; if the database repository and log files are on the same storage device, the performance reduces even more due to the simultaneous write operations (not to forget that write is more expensive than read). Secondarily, the size of the log file is from 4 to 20 times than actual data, depending on the architecture of data feeding application (you try this yourself). This is because the file contains not only the actual values present in the data, but also the queries it used. So a table with columns, say: ID, Name and Gender may contain:
1 ALI M
2 OWAIS M
3 FATIMA F
But the general log file is going to look like:
------------------------------------------------------------------------------------------------------------
                   95 Connect   root@localhost on
                   95 Query     set autocommit=1
                   95 Query     SET SESSION TRANSACTION ISOLATION LEVEL REPEATABLE READ
                   95 Query     SHOW SESSION VARIABLES LIKE 'lower_case_table_names'
                   95 Query     SET CHARACTER SET utf8
                   95 Query     SET NAMES utf8
                   95 Query     SET SQL_SAFE_UPDATES=1
                   95 Query     SELECT CONNECTION_ID()
                   95 Query     SELECT current_user()
                   95 Query     USE `test`
                   95 Query     set autocommit=1
                   95 Query     SHOW SESSION VARIABLES LIKE 'sql_mode'
                   95 Query     SELECT current_user()
121218 18:25:49    94 Query     SHOW TRIGGERS FROM `test` LIKE 'test'
                   94 Query     SHOW INDEXES FROM `test`.`test`
                   94 Query     SHOW CREATE TABLE `test`.`test`
                   94 Query     SHOW FULL COLUMNS FROM `test`.`test`
121218 18:26:03    95 Query     insert into test.test (id, name, gender) values (1, 'ALI', 'M')
                   95 Query     SHOW WARNINGS
121218 18:26:04    95 Query     insert into test.test (id, name, gender) values (2, 'OWAIS', 'M')
                   95 Query     SHOW WARNINGS
121218 18:26:05    95 Query     insert into test.test (id, name, gender) values (3, 'FATIMA', 'F')
                   95 Query     SHOW WARNINGS
------------------------------------------------------------------------------------------------------------
Now think of reproducing the example for a web application, making a HTTP call that might be calling 5 queries.
Another addition to this is the SELECT queries, that have nothing to do with data modification but are still logged. The use of such logging is minimal, as compared to the DML queries. Not only this, some applications use standards and frameworks, which generate fully-qualified queries; "Hibernate" framework for Object-Relation Mapping is an example. The cost of logging fully-qualified queries is high, take an example:

Query with aliases:
select t.id, t.name, t.gender from test as t

Fully-qualified query:
select `test`.`id`, `test`.`name`, `test`.`gender` from `mydb`.`test`


Logging Types:

MySQL has multiple ways to handle logs, depending on the query types and responses:
  1. Error Log: contains critical errors that occur during starting and stopping mysqld, the MySQL Server's executable.
  2. General Query Log: contains activities performed during the execution of mysqld; in other words, it is use used to keep track of MySQL client activities. These activities are logged as they are received.
  3. Binary Log: a binary file - unlike General Query Log file, which is text - containing activities that change database and its schema. These activities are logged after their execution.
  4. Slow-query Log: consists of queries that take time longer than specified (set in variable long_query_time in configuration file.

Solutions:

Using Existing Application/Web Server Logs

Every appropriately designed application logs its activities to some extent; also, if your application is hosted on a web server, like Tomact or IIS, it too will have its own logging mechanism. You can use these logs to track the database activities and keep the database logs disabled.

Note that this will work only when logs are not read on regular basis because then, you'll have to write a parser to extract useful information from the log for quick readability; the cost of the extraction process will be another addition, so choose this option carefully. Also, if more than one projects are being hosted on the same web server, then this solution may not suit because a single log file will contain a mix of activities from all the applications unless you configure your web server to log every application on a separate file.

Logging into MySQL Tables

MySQL Server allows to log the activities into database instead of a file. You can configure this option in configuration file by setting variable log-output = TABLE (or adding this line if it doesn't exist, it should be set to FILE by default).

Doing so will record the logs in separate tables for each type of log in mysql schema.

Using stored procedures

MySQL Server only logs the direct statements executed. If you have queries that execute in batches, it is better for logging efficiency to put them in stored procedures.

Log maintenance using scheduled jobs

By using MySQL Events or OS tasks like cronjob, you can archive old logs as well as remove unwanted queries. Here is an example using MySQL events:
First, make sure Event Scheduler is enabled by adding event-scheduler variable (if it doesn't exist). This requires restarting the MySQL Server:
event-scheduler = ON
Now create event:



------------------------------------------------------------------------------------------------------------
DELIMITER $$
CREATE EVENT `log_maintenance` ON SCHEDULE EVERY 1 WEEK 
DO BEGIN
   CREATE TABLE IF NOT EXISTS mysql.general_log_archive LIKE mysql.general_log;
   INSERT INTO mysql.general_log_archive
   SELECT * FROM mysql.general_log;
   TRUNCATE mysql.general_log;
   CREATE TABLE IF NOT EXISTS mysql.slow_log_archive LIKE mysql.slow_log;
   INSERT INTO mysql.slow_log_archive
   SELECT * FROM mysql.slow_log;
   TRUNCATE mysql.slow_log;
END 
$$ DELIMITER ;


------------------------------------------------------------------------------------------------------------


The event will fire like a Trigger (in fact, events are temporal triggers) on its start time.

Tuning up more

- Limiting Databases: setting binlog-do-db variable in config file will limit logging to only the databases listed. You can define this like:
binlog-do-db = test
binlog-do-db = mysql
binlog-do-db = my_db

- Indexing: certain columns can be indexed to improve querying, like:
ALTER TABLE mysql.general_log ADD INDEX (event_time);

ALTER TABLE mysql.slow_log ADD INDEX (db);

- Convert log table(s) to MyISAM: since MyISAM storage engine is a better option for huge data, you may convert the engine of log tables to MyISAM. Example:
ALTER TABLE mysql.slow_log ENGINE = MyISAM;

Query Results:

I performed some experiments using 30,000 INSERTUPDATE and DELETE queries on random data to illustrate the difference in performance with and without logging:

- Logging disabled: 138 sec
General Log enabled on File: 139 sec
General Log enabled on Table: 141 sec
- Binary Logging enabled on File165 sec
- Binary Logging enabled on Table143 sec
- All Logs (General, Binary, Error, Slow-query) enabled on File: 174 sec
- All Logs (General, Binary, Error, Slow-query) enabled on Table: 152 sec

Below are the results of a larger number of queries on a real data:

- Logging disabled: 335 sec
All Logs (General, Binary, Error, Slow-query) enabled on File432 sec
All Logs (General, Binary, Error, Slow-query) enabled on Table: 422 sec


--
Was the article helpful for you? Please provide feedback. Corrections are more than welcome.

Owais Ahmed

Comments

Popular posts from this blog

A faster, Non-recursive Algorithm to compute all Combinations of a String

Imagine you're me, and you studied Permutations and Combinations in your high school maths and after so many years, you happen to know that to solve a certain problem, you need to apply Combinations. You do your revision and confidently open your favourite IDE to code; after typing some usual lines, you pause and think, then you do the next best thing - search on Internet. You find out a nice recursive solution, which does the job well. Like the following: import java.util.ArrayList; import java.util.Date; public class Combination {    public ArrayList<ArrayList<String>> compute (ArrayList<String> restOfVals) {       if (restOfVals.size () < 2) {          ArrayList<ArrayList<String>> c = new ArrayList<ArrayList<String>> ();          c.add (restOfVals);          return c;       }       else {          ArrayList<ArrayList<String>> newList = new ArrayList<ArrayList<String>> ();          for (String

How to detach from Facebook... properly

Yesterday, I deactivated my Facebook account after using it for 10 years. Of course there had to be a very solid reason; there was, indeed... their privacy policy . If you go through this page, you might consider pulling off as well. Anyways, that's not what this blog post is about. What I learned from yesterday is that the so-called "deactivate" option on Facebook is nothing more than logging out. You can log in again without any additional step and resume from where you last left. Since I really wanted to remove myself from Facebook as much as I can, I investigated ways to actually delete a Facebook account. There's a plethora of blogs on the internet, which will tell you how you can simply remove Facebook account. But almost all of them will either tell you to use "deactivate" and "request delete" options. The problem with that is that Facebook still has a last reusable copy of your data. If you really want to be as safe from its s

A step-by-step guide to query data on Hadoop using Hive

Hadoop empowers us to solve problems that require intense processing and storage on commodity hardware harnessing the power of distributed computing, while ensuring reliability. When it comes to applicability beyond experimental purposes, the industry welcomes Hadoop with warm heart, as it can query their databases in realistic time regardless of the volume of data. In this post, we will try to run some experiments to see how this can be done. Before you start, make sure you have set up a Hadoop cluster . We will use Hive , a data warehouse to query large data sets and a adequate-sized sample data set, along with an imaginary database of a travelling agency on MySQL; the DB  consisting of details about their clients, including Flight bookings, details of bookings and hotel reservations. Their data model is as below: The number of records in the database tables are as: - booking: 2.1M - booking_detail: 2.1M - booking_hotel: 1.48M - city: 2.2K We will write a query that