mardi 19 février 2013

Fixing awkward TIMESTAMP behaviors...

There are great features in MySQL 5.6. But not only that. We also tried to correct some old behaviors and limitations which, over the years, have shown to irritate our Community. The behavior of TIMESTAMP columns is one of them.

My colleague Martin Hansson did most of the work and summarized it well in his blog. Thanks to him, since MySQL 5.6.5, it's possible to declare more than one TIMESTAMP column with the DEFAULT CURRENT_TIMESTAMP or ON UPDATE CURRENT_TIMESTAMP attributes. And it's possible to have DATETIME columns with such attributes. Two limitations lifted!

But that is not the end of the story. TIMESTAMP was still special. Unlike other datatypes, if not declared with the NULL or NOT NULL attributes, it would automatically get NOT NULL. And the first TIMESTAMP column of the table would automatically get DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP. Many people found these behaviors odd. For them, my colleague Gopal Shankar added a new option to the server, documented here. The old behavior, now depreciated, is still available if the option is not used.

But that is still not the end of the story. "CREATE TABLE ... SELECT" dealt strangely with columns having DEFAULT CURRENT_TIMESTAMP. Just look at this simple example, which shows results with version 5.6.9:

CREATE TABLE t1 (
t1a TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
t1b TIMESTAMP DEFAULT '2000-01-01 01:00:00'
);
INSERT INTO t1 VALUES ();
SELECT * FROM t1;
t1a t1b
2013-02-19 18:12:41 2000-01-01 01:00:00
The content of t1 is as expected (remember that "()" in INSERT means "insert columns' defaults").
Now let's create a second table, which should have four columns: first, two extra columns t2a and t2b, then two columns filled with values selected from t1:

CREATE TABLE t2 (
t2a TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
t2b TIMESTAMP DEFAULT '2000-01-01 02:00:00'
) SELECT * FROM t1;
SHOW CREATE TABLE t2;
Table Create Table
t2 CREATE TABLE `t2` (
  `t2a` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
  `t2b` timestamp NOT NULL DEFAULT '2000-01-01 02:00:00',
  `t1a` timestamp NOT NULL DEFAULT '0000-00-00 00:00:00',
  `t1b` timestamp NOT NULL DEFAULT '2000-01-01 01:00:00'
) ENGINE=InnoDB DEFAULT CHARSET=latin1
See how t2.t1b inherited the DEFAULT attribute of its source column t1.t1b: DEFAULT '2000-01-01 01:00:00', as expected, and as the documentation says. But! t2.t1a did NOT inherit the DEFAULT attribute of its source column t1.t1a: it rather got the strange DEFAULT '0000-00-00 00:00:00' (year zero...). That's one first problem: constant defaults are properly inherited, but function defaults are not.
Now let's look at the content of t2:

SELECT * FROM t2;
t2a t2b t1a t1b
0000-00-00 00:00:00 2000-01-01 02:00:00 2013-02-19 18:12:41 2000-01-01 01:00:00
The last two columns, which have their source in t1, have the same value as their source column, which is correct.
The two first (extra) columns did not have their values specified, so their default should have been inserted. That's what nicely happened for t2b. But not for t2a: year zero again! That's one second problem: an extra column is not filled with its default if it is a function default.

I grouped those two problems under the name of Bug#16163936, and fixed them. Here are the results in 5.6.10:

SHOW CREATE TABLE t2;
Table Create Table
t2 CREATE TABLE `t2` (
  `t2a` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
  `t2b` timestamp NOT NULL DEFAULT '2000-01-01 02:00:00',
  `t1a` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
  `t1b` timestamp NOT NULL DEFAULT '2000-01-01 01:00:00'
) ENGINE=InnoDB DEFAULT CHARSET=latin1
SELECT * FROM t2;
t2a t2b t1a t1b
2013-02-19 18:27:39 2000-01-01 02:00:00 2013-02-19 18:27:39 2000-01-01 01:00:00
All correct!

vendredi 12 octobre 2012

Cost-based choice between subquery materialization and EXISTS

In a previous post, I had demonstrated how subquery materialization, introduced in MySQL 5.6.5, improves the performance of certain queries, like query Q16 of DBT3. Such improvement was easily explained:
  • Subquery materialization has a high start up cost (it needs to create and fill the temporary table).
  • But afterwards it has fast lookups (temporary table has a hash index, no duplicates, and is in memory).
  • In other words, compared to EXISTS, the first evaluation of the IN predicate is slow (high start up cost) and all following evaluations are fast (just a hash lookup).
  • In the DBT 3 setup, one outer table (named "part") has 200,000 rows, so there are 200,000 evaluations of IN, thus subquery materialization wins over EXISTS because the time it loses in the first evaluation is more than compensated by the many faster following evaluations.
However, if there were only few outer rows, then subquery materialization should logically be slower than EXISTS (the compensation would not happen anymore)... MySQL 5.6.5, by blindly always choosing subquery materialization, takes the risk of making certain queries slower. There needs to be a cost-based choice between the two strategies, to pick the best, depending on the situation! That is what I have implemented in MySQL 5.6.7.

To show it in action, I will use query Q16 again. First I will run it with the normal "part" table which has 200,000 rows. Then I will reduce this table to only 200 rows, and run the query again. Each time, I will run EXPLAIN to see what subquery strategy is chosen by the optimizer. I will also, by tweaking the optimizer_switch variable, force the optimizer to use the other strategy which it didn't like, in order to verify that it is indeed worse.

For brevity, let me jump directly to the results, obtained with a release build of MySQL 5.6.7 on my machine:

Rows in part Optimizer chooses Execution time If I force alternative
200,000 Materialization 550 ms 830 ms
200 EXISTS 1 ms 10 ms

We can see that in both cases the optimizer has made the right choice!

mardi 10 avril 2012

Faster subqueries with materialization

In a previous post, I analyzed how a query of the famous DBT3 benchmark was
optimized by MySQL. It was this query, named "Q16" in the DBT3 jargon:

select
 p_brand,
 p_type,
 p_size,
 count(distinct ps_suppkey) as supplier_cnt
from
 partsupp,
 part
where
 p_partkey = ps_partkey
 and p_brand <> 'Brand#23'
 and p_type not like 'LARGE PLATED%'
 and p_size in (43, 1, 25, 5, 35, 12, 42, 40)
 and ps_suppkey not in (
  select
   s_suppkey
  from
   supplier
  where
   s_comment like '%Customer%Complaints%'
 )
group by
 p_brand,
 p_type,
 p_size
order by
 supplier_cnt desc,
 p_brand,
 p_type,
 p_size;

Here is a brief recap of conclusions I had drawn:
  • for this query, MySQL tranforms the IN condition to EXISTS and then
    evaluates it with the "unique_subquery" technique, which does an index
    lookup into the subquery's table.
  • IN is evaluated 120,000 times (once per combined row of the outer tables).
  • The total execution time of query Q16 is 0.65 seconds.
If you look at the original subquery, before IN becomes EXISTS, you will see that it's not correlated, which means that it does not mention columns of tables of the top query's FROM clause ('partsupp' and 'part'). Thus, its resultset is constant throughout execution of the entire top query; here it is:


mysql> select
    -> s_suppkey
    -> from
    -> supplier
    -> where
    -> s_comment like '%Customer%Complaints%';
+-----------+
| s_suppkey |
+-----------+
|       358 |
|      2820 |
|      3804 |
|      9504 |
+-----------+
4 rows in set (0.00 sec)

The transformation to EXISTS, because it injects equalities like
  `partsupp`.`ps_suppkey` = supplier`.`s_suppkey`
into the subquery's WHERE clause, makes the subquery correlated: it thus has to be executed 120,000 times, so we do 120,000 times this:
  • an index lookup in 'supplier' (which has 10,000 rows)
  • a test of the found row(s) against the LIKE condition.

Intuitively, determining the 4-row resultset once for all, and injecting it into the top query should yield better performance - it is fast to evaluate
  ps_suppkey not in (358, 2820, 3804, 9504) .
Starting from the just released MySQL 5.6.5, this transformation is automatically done by the Optimizer, and is called subquery materialization. The subquery's resultset is determined once for all and stored into an in-memory temporary table. If the temporary table has only 4 rows as in our example, searching for a match in it can be done with a scan; but if it had more rows, a hash index would help greatly. So a hash index is always created on the temporary table's columns. Last, this index is unique: there is no point in storing duplicates, and they would make the table bigger. After this one-time setup has been completed, each evaluation of IN simply does a hash index lookup into the 4-row temporary table.

My former colleague Timour Katchaounov started developing this feature years ago, when working for MySQL/Sun. In the last months, after a round of intensive QA, we have fixed some last bugs in it, in preparation for releasing in 5.6.5. But the feature still had one limitation: it was applicable only if IN was placed at certain positions in the query. For example, it couldn't be used with NOT IN. And query Q16 has a NOT IN! so the Optimizer could not apply subquery materialization to it, and was thus stuck with using EXISTS. Sad!

Why it could not work with NOT IN, is not very easy to explain. It has to do with NULL values, because they sometimes prevent using the hash index. To give an idea, look at this:
  (NULL, 1) NOT IN (SELECT ...)
Per the SQL standard, if the subquery's resultset contains at least one row of the form (x,1) where x is any number (or NULL), then the IN condition is neither TRUE, nor FALSE, it is UNKNOWN. So is the NOT IN condition, because it is the negation of IN, and NOT(UNKNOWN) is UNKNOWN.
Here are example of such rows: (NULL,1), (421,1), (236,1), (5329,1), ad infinitam.
We can see that those rows will not be found by a lookup in the hash index: this index is defined on the two columns, it has a usual "prefix-only" behaviour, which means that it cannot be used to search for "any value in first column, then 1 in second column". As long as the sentence starts with "any value in first column" a table scan is necessary; we should read each row of the temporary table and compare its second column with 1 until we find a matching row. And that:
  • will drag subquery materialization's performance down
  • will drag subquery materialization's code complexity up.
And I have even not covered all problems here: there can be more than two columns, there can be more than one NULL in the left argument of IN, there can also be NULLs inside the subquery's resultset.

In some lucky cases, the scan can be avoided, for example:
  SELECT * FROM table1 WHERE (a,b) IN (SELECT ...)
If (a,b) is (NULL,1), the IN will be UNKNOWN or FALSE. It will be UNKNOWN if the subquery's resultset contains one (x,1) as seen above; otherwise it will be FALSE. No matter what, it will not be TRUE, and this is all that WHERE wants to know - (a,b) can thus be rejected without doing a scan.
Now, for
  SELECT * FROM table1 WHERE (a,b) NOT IN (SELECT ...)
i.e.
  SELECT * FROM table1 WHERE NOT ((a,b) IN (SELECT ...))
things are different: if (a,b) is (NULL,1), the IN will be UNKNOWN or FALSE, as we said. So NOT IN will be UNKNOWN or TRUE. "Hum, can you be more specific?? I need to know if it's TRUE", will the WHERE evaluation code ask. Then we have to do the scan...

So now you understand why subquery materialization was restricted to certain placement of IN.

What I have done recently is to lift this restriction in two simple, however common, cases:
  1. If all outer and inner expressions are not nullable, then no NULL can get in the way, so there is no problem.
  2. If there is only one outer expression (and thus there is only one inner expression), figuring out the correct TRUE/FALSE/UNKNOWN answer is immediate. Understanding why... is left as an exercise to the reader :-)
Those two cases are independent: as long as one is satisfied, subquery materialization can apply to IN, no matter where it is placed (NOT IN, etc).

It turned out to be very easy to code this: I had a working prototype in an afternoon.

Q16 happens to meet the criteria of both cases: columns 'ps_suppkey' and 's_suppkey' are declared NOT NULL (first case), and the subquery has only one outer and one inner expression (second case).

So nowadays MySQL can, and does, use subquery materialization for query Q16; thanks to it, the execution time is down from 0.65 seconds to 0.47 seconds, which is a 25% improvement!

The new technique is visible in EXPLAIN. I want to first show how EXPLAIN was with the EXISTS transformation, so I temporarily disable subquery materialization, then run EXPLAIN, enable  subquery materialization, run EXPLAIN:


mysql> set optimizer_switch='materialization=off';
mysql> explain ...

+----+--------------------+----------+-----------------+----------------------+--------------+---------+---------------------+--------+----------------------------------------------+
| id | select_type        | table    | type            | possible_keys        | key          | key_len | ref                 | rows   | Extra                                        |
+----+--------------------+----------+-----------------+----------------------+--------------+---------+---------------------+--------+----------------------------------------------+
|  1 | PRIMARY            | part     | ALL             | PRIMARY              | NULL         | NULL    | NULL                | 199742 | Using where; Using temporary; Using filesort |
|  1 | PRIMARY            | partsupp | ref             | PRIMARY,i_ps_partkey | i_ps_partkey | 4       | dbt3.part.p_partkey |      2 | Using where; Using index                     |
|  2 | DEPENDENT SUBQUERY | supplier | unique_subquery | PRIMARY              | PRIMARY      | 4       | func                |      1 | Using where                                  |
+----+--------------------+----------+-----------------+----------------------+--------------+---------+---------------------+--------+----------------------------------------------+

mysql> set optimizer_switch='materialization=default'; # 'on' would work too
mysql> explain ...

+----+-------------+----------+------+----------------------+--------------+---------+---------------------+--------+----------------------------------------------+
| id | select_type | table    | type | possible_keys        | key          | key_len | ref                 | rows   | Extra                                        |
+----+-------------+----------+------+----------------------+--------------+---------+---------------------+--------+----------------------------------------------+
|  1 | PRIMARY     | part     | ALL  | PRIMARY              | NULL         | NULL    | NULL                | 199742 | Using where; Using temporary; Using filesort |
|  1 | PRIMARY     | partsupp | ref  | PRIMARY,i_ps_partkey | i_ps_partkey | 4       | dbt3.part.p_partkey |      2 | Using where; Using index                     |
|  2 | SUBQUERY    | supplier | ALL  | NULL                 | NULL         | NULL    | NULL                |  10113 | Using where                                  |
+----+-------------+----------+------+----------------------+--------------+---------+---------------------+--------+----------------------------------------------+
If you compare, the big difference is that the third line says SUBQUERY and not DEPENDENT SUBQUERY anymore. DEPENDENT SUBQUERY means that it has be executed once per row of the top query. SUBQUERY means that it is executed only once.
EXPLAIN FORMAT=JSON, another new feature in MySQL 5.6.5, shows more details of materialization:

{
  "query_block": {
    "select_id": 1,
    "ordering_operation": {
      "using_temporary_table": true,
      "using_filesort": true,
      "grouping_operation": {
        "using_filesort": true,
        "nested_loop": [
          {
            "table": {
              "table_name": "part",
              "access_type": "ALL",
              "possible_keys": [
                "PRIMARY"
              ],
              "rows": 199742,
              "filtered": 100,
              "attached_condition": "((`dbt3`.`part`.`p_brand` <> 'Brand#23') and (not((`dbt3`.`part`.`p_type` like 'LARGE PLATED%'))) and (`dbt3`.`part`.`p_size` in (43,1,25,5,35,12,42,40)))"
            }
          },
          {
            "table": {
              "table_name": "partsupp",
              "access_type": "ref",
              "possible_keys": [
                "PRIMARY",
                "i_ps_partkey"
              ],
              "key": "i_ps_partkey",
              "key_length": "4",
              "ref": [
                "dbt3.part.p_partkey"
              ],
              "rows": 2,
              "filtered": 100,
              "using_index": true,
              "attached_condition": "(not(< in_optimizer >(`dbt3`.`partsupp`.`ps_suppkey`,`dbt3`.`partsupp`.`ps_suppkey` in ( < materialize > (select `dbt3`.`supplier`.`s_suppkey` from `dbt3`.`supplier` where (`dbt3`.`supplier`.`s_comment` like '%Customer%Complaints%') ), < primary_index_lookup >(`dbt3`.`partsupp`.`ps_suppkey` in < temporary table > on distinct_key where ((`dbt3`.`partsupp`.`ps_suppkey` = `materialized subselect`.`s_suppkey`)))))))",
              "attached_subqueries": [
                {
                  "using_temporary_table": true,
                  "dependent": false,
                  "cacheable": true,
                  "table": {
                    "access_type": "eq_ref",
                    "key": "< auto_key >",
                    "rows": 1
                  },
                  "query_block": {
                    "select_id": 2,
                    "table": {
                      "table_name": "supplier",
                      "access_type": "ALL",
                      "rows": 10113,
                      "filtered": 100,
                      "attached_condition": "(`dbt3`.`supplier`.`s_comment` like '%Customer%Complaints%')"
                    }
                  }
                }
              ]
            }
          }
        ]
      }
    }
  }
}
Don't forget to scroll the box above to the right, because lines are long. This shows that:
  1. For the top query, we read a row from 'part', then one row from 'partsupp', then execute the subquery.
  2. the very first execution of the subquery materializes (<materialize>) select `dbt3`.`supplier`.`s_suppkey` from `dbt3`.`supplier` where (`dbt3`.`supplier`.`s_comment` like '%Customer%Complaints%') into a temporary table
  3. Each subquery execution does a lookup on the primary key of this temporary table (<primary_index_lookup> ... in <temporary table>)
  4. Going further down, we see how the temporary table will be filled: it will be the resultset of a table scan ("access_type": "ALL") of 'supplier' with a filtering LIKE condition.
More details on the feature's usage can be found in the manual.

This is the end of this post. I hope that it puts in good light the work we have put into 5.6. There are many other Optimizer features in this version, like EXPLAIN FORMAT=JSON and others; they are described in my colleagues' blog posts.

jeudi 9 février 2012

Optimizer tracing used by others!

In a previous post, I had explained how to use MySQL's optimizer tracing, a new feature which appeared in MySQL 5.6.3.

As a developer, it feels really good to see others adopt my work and make something useful out of it! My colleague Dimitri Kravtchuk, who is one of our top Benchmarking experts, has written a blog post where he shows how the optimizer tracing has helped him to figure out why, under load, once in a while and randomly, a query performed badly. His investigation technique may be reusable by other people, so I encourage you to read more about it, here.

mardi 29 novembre 2011

Understanding the unique_subquery optimization

If you use the EXPLAIN SELECT statement to see how your subqueries are treated by MySQL, you may sometimes meet the "unique_subquery" optimization. Here is how the manual describes it:
"unique_subquery: this type replaces ref for some IN subqueries of the following form: value IN (SELECT primary_key FROM single_table WHERE some_expr); unique_subquery is just an index lookup function that replaces the subquery completely for better efficiency".
Few weeks ago, while I was reviewing a patch fixing a bug in unique_subquery, I got a "simplification" pulsion. I told myself that:
  •  unique_subquery is an optimization for a special case of simple subqueries (single inner table, using index, no aggregates);
  • we have a more general system around, used for more complex subqueries, naturally capable of handling simple ones too if we wanted;
  • this general system does not have the bug in question...
Then I wondered: what if we removed the unique_subquery optimization, and let the general system handle this simple subquery? This would certainly simplify code, and thus maintainance...But before removing it, of course, we should check whether unique_subquery brings a significant performance benefit.

So today I'm testing unique_subquery against the DBT3 benchmark. I grab a copy of MySQL 5.6.3, and focus on the sixteenth query of DBT3, which contains a subquery (in red) suitable for handling by unique_subquery:


select
 p_brand,
 p_type,
 p_size,
 count(distinct ps_suppkey) as supplier_cnt
from
 partsupp,
 part
where
 p_partkey = ps_partkey
 and p_brand <> 'Brand#23'
 and p_type not like 'LARGE PLATED%'
 and p_size in (43, 1, 25, 5, 35, 12, 42, 40)
 and ps_suppkey not in (
  select
   s_suppkey
  from
   supplier
  where
   s_comment like '%Customer%Complaints%'
 )
group by
 p_brand,
 p_type,
 p_size
order by
 supplier_cnt desc,
 p_brand,
 p_type,
 p_size;

This query executes in 0.65 seconds on my Linux box, and EXPLAIN is:

+----+--------------------+----------+-----------------+----------------------+--------------+---------+---------------------+--------+----------------------------------------------+
| id | select_type        | table    | type            | possible_keys        | key          | key_len | ref                 | rows   | Extra                                        |
+----+--------------------+----------+-----------------+----------------------+--------------+---------+---------------------+--------+----------------------------------------------+
|  1 | PRIMARY            | part     | ALL             | PRIMARY              | NULL         | NULL    | NULL                | 199498 | Using where; Using temporary; Using filesort |
|  1 | PRIMARY            | partsupp | ref             | PRIMARY,i_ps_partkey | i_ps_partkey | 4       | dbt3.part.p_partkey |      2 | Using where; Using index                     |
|  2 | DEPENDENT SUBQUERY | supplier | unique_subquery | PRIMARY              | PRIMARY      | 4       | func                |      1 | Using where                                  |
+----+--------------------+----------+-----------------+----------------------+--------------+---------+---------------------+--------+----------------------------------------------+

When I disable unique_subquery (by modifying MySQL's C++ code), EXPLAIN becomes:

+----+--------------------+----------+--------+----------------------+--------------+---------+---------------------+--------+----------------------------------------------+
| id | select_type        | table    | type   | possible_keys        | key          | key_len | ref                 | rows   | Extra                                        |
+----+--------------------+----------+--------+----------------------+--------------+---------+---------------------+--------+----------------------------------------------+
|  1 | PRIMARY            | part     | ALL    | PRIMARY              | NULL         | NULL    | NULL                | 199498 | Using where; Using temporary; Using filesort |
|  1 | PRIMARY            | partsupp | ref    | PRIMARY,i_ps_partkey | i_ps_partkey | 4       | dbt3.part.p_partkey |      2 | Using where; Using index                     |
|  2 | DEPENDENT SUBQUERY | supplier | eq_ref | PRIMARY              | PRIMARY      | 4       | func                |      1 | Using where                                  |
+----+--------------------+----------+--------+----------------------+--------------+---------+---------------------+--------+----------------------------------------------+

The only change, as expected, is that "unique_subquery" becomes "eq_ref". The used index is the same (the primary key of the "supplier" table). The optimizer has the same notion of unicity: "unique_subquery" and "eq_ref" both denote that a single lookup is needed, as the index is UNIQUE. Same index, same number of lookups: execution could well be as fast with "eq_ref" as it was with "unique_subquery".
But... no. Query now executes in 0.80 seconds. 23% slower than with unique_subquery!

Finer-grained timing shows that the extra 0.15 seconds are indeed lost in the subquery evaluation code.

To understand this, let's follow the execution in detail, based on EXPLAIN output above.
  • First line of EXPLAIN output: we do a table scan on the "part" table ("type=ALL" means "table scan") . The "rows" column of EXPLAIN suggests that we are going to have 199,498 rows of "part".
  • Second line of EXPLAIN output: for each row from the "part" table, we do an index lookup ("ref") into the "i_ps_partkey" index of the "partsupp" table; apparently such lookup will find two rows ("rows=2").
  • At this point, we have a row made of needed columns of "part" and of "partsupp". An upper estimate of the number of those rows is 199,498 multiplied by 2: 400,000. Actually, the real number is around 120,000 (there has been filtering going on, as the "Using where" indicates).
  • Then we evaluate the WHERE clause and thus the "NOT IN (subquery)" predicate (the "DEPENDENT SUBQUERY"). 120,000 evaluations of such predicate. And that's where the difference is.
EXPLAIN EXTENDED and then SHOW WARNINGS show how the predicate
looks like. Let's start with the case where unique_subquery is disabled:


/* select#1 */ select `dbt3`.`part`.`p_brand` AS `p_brand`,`dbt3`.`part`.`p_type` AS `p_type`,`dbt3`.`part`.`p_size` AS `p_size`,count(distinct `dbt3`.`partsupp`.`ps_suppkey`) AS `supplier_cnt` from `dbt3`.`partsupp` join `dbt3`.`part` where ((`dbt3`.`partsupp`.`ps_partkey` = `dbt3`.`part`.`p_partkey`) and (`dbt3`.`part`.`p_brand` <> 'Brand#23') and (not((`dbt3`.`part`.`p_type` like 'LARGE PLATED%'))) and (`dbt3`.`part`.`p_size` in (43,1,25,5,35,12,42,40)) and (not(<in_optimizer>(`dbt3`.`partsupp`.`ps_suppkey`,<exists>(/* select#2 */ select 1 from `dbt3`.`supplier` where ((`dbt3`.`supplier`.`s_comment` like '%Customer%Complaints%') and (<cache>(`dbt3`.`partsupp`.`ps_suppkey`) = `dbt3`.`supplier`.`s_suppkey`))))))) group by `dbt3`.`part`.`p_brand`,`dbt3`.`part`.`p_type`,`dbt3`.`part`.`p_size` order by count(distinct `dbt3`.`partsupp`.`ps_suppkey`) desc,`dbt3`.`part`.`p_brand`,`dbt3`.`part`.`p_type`,`dbt3`.`part`.`p_size`

Above, the part in red says that

ps_suppkey not in (
        select
            s_suppkey
        from
            supplier
        where
            s_comment like '%Customer%Complaints%'
    )
has been transformed from "IN(non correlated subquery)" to "EXISTS(correlated subquery)", yielding this:


not exists (
        select
            1
        from
            supplier
        where
            s_comment like '%Customer%Complaints%'
                        AND s_suppkey = ps_suppkey
        )

or, more exactly (leaving out the NOT operator, for brevity):


<exists>(/* select#2 */ select 1 from `dbt3`.`supplier`
           where ((`dbt3`.`supplier`.`s_comment` like '%Customer%Complaints%')
           and (<cache>(`dbt3`.`partsupp`.`ps_suppkey`) = `dbt3`.`supplier`.`s_suppkey`)))

Evaluating this EXISTS() evaluates the new subquery. This means all the subquery evaluation machinery: calls to JOIN::exec(), sub_select(), evaluate_join_record()... Sure, deep down it does an index lookup like unique_subquery does, but all those function calls have a cost, and so has all the logic which is lying around ready to handle any complexity in the subquery, as this is generic subquery evaluation code ("if group_by do this", "if order_by do this", "if left_join do this": none of those if()s are entered, but deciding to enter them or not has a cost). Plus some initialization code. Plus some de-initialization code. This overhead, repeated 120,000 times, amounts to 0.15 seconds...

Now, EXPLAIN EXTENDED when unique_subquery is enabled:


/* select#1 */ select `dbt3`.`part`.`p_brand` AS `p_brand`,`dbt3`.`part`.`p_type` AS `p_type`,`dbt3`.`part`.`p_size` AS `p_size`,count(distinct `dbt3`.`partsupp`.`ps_suppkey`) AS `supplier_cnt` from `dbt3`.`partsupp` join `dbt3`.`part` where ((`dbt3`.`partsupp`.`ps_partkey` = `dbt3`.`part`.`p_partkey`) and (`dbt3`.`part`.`p_brand` <> 'Brand#23') and (not((`dbt3`.`part`.`p_type` like 'LARGE PLATED%'))) and (`dbt3`.`part`.`p_size` in (43,1,25,5,35,12,42,40)) and (not(<in_optimizer>(`dbt3`.`partsupp`.`ps_suppkey`,<exists>(<primary_index_lookup>(<cache>(`dbt3`.`partsupp`.`ps_suppkey`) in supplier on PRIMARY where ((`dbt3`.`supplier`.`s_comment` like '%Customer%Complaints%') and (<cache>(`dbt3`.`partsupp`.`ps_suppkey`) = `dbt3`.`supplier`.`s_suppkey`)))))))) group by `dbt3`.`part`.`p_brand`,`dbt3`.`part`.`p_type`,`dbt3`.`part`.`p_size` order by count(distinct `dbt3`.`partsupp`.`ps_suppkey`) desc,`dbt3`.`part`.`p_brand`,`dbt3`.`part`.`p_type`,`dbt3`.`part`.`p_size`

The optimizer has first done the same transformation (IN to EXISTS) as we saw before, then has done one more transformation, and EXISTS has become, as written in red above:


<exists>(<primary_index_lookup>(<cache>(`dbt3`.`partsupp`.`ps_suppkey`)
          in supplier on PRIMARY
          where ((`dbt3`.`supplier`.`s_comment` like '%Customer%Complaints%')
          and (<cache>(`dbt3`.`partsupp`.`ps_suppkey`) = `dbt3`.`supplier`.`s_suppkey`))))

which is directly an index lookup ("<primary_index_lookup>"), followed by an additional WHERE clause. So the overhead of full-blown subquery evaluation is
avoided. And this overhead is not neglectable, compared to the index lookup (assuming the relevant index pages are already in memory).

So the conclusion of my experiment is that unique_subquery is worth having. I'll have to direct simplification pulsions to some other code!

Note that there also exists a similar "index_subquery" optimization applying to non-unique indices. And it's worth having, for the same reasons.

mardi 4 octobre 2011

Optimizer tracing: how to configure it

In this blog post, my colleague Jørgen Løland described a new feature of MySQL 5.6: Optimizer Tracing. I recommend reading his article, as it presents this new feature in a simple, easy-to-read manner.

The Optimizer Tracing feature can help understanding what the Optimizer is doing; it is available since milestone 5.6.3, announced October 3rd at Oracle Open World (here is the changelog). It's good to see it mature now; I remember that Sergey Petrunia did the first prototype back in March 2009!

Today  I will be giving some must-have tips related to handling big traces.

First thing to know, a trace lives in main memory (internally it is allocated on the heap or free store of the MySQL Server). An SQL statement which gives the optimizer a lot of work (for example, by joining many tables) will generate a large trace. Up to gigabytes in some pathological cases! To avoid hogging memory then, a trace will never grow beyond the value of the @@optimizer_trace_max_mem_size session variable. Which has a default of 16 kilobytes; yes, it's a low value, to protect the innocent. I often raise it in my session, to 1 megabyte, which is enough for most queries in my daily work:

mysql> SET OPTIMIZER_TRACE_MAX_MEM_SIZE=1000000;

If a trace was truncated because exceeding this limit, the column INFORMATION_SCHEMA.OPTIMIZER_TRACE.MISSING_BYTES_BEYOND_MAX_MEM_SIZE shows a non-zero value: the number of bytes which could not be added to the trace.

When I read a trace I like to scroll up and down in it, search for some optimization phase (that means searching for some keyword in the trace); to do this, I set, in the "mysql" command-line client, the pager to "less" (I'm using Unix):

mysql> pager less;

This is very useful. I have all "less" commands under hand: can save to a file, can search for a regular expression match, search forward or backward... One last benefit is that when I have finished reading, quitting "less" makes the trace go away from the terminal, it does not linger on my screen forever, does not occupy the terminal's display history...

When there are many tables to join, as I said above the trace can grow a lot, because the Optimizer evaluates many possible join orders (using an algorithm known as "greedy search"): each evaluated join order is mentioned in the trace. Greedy search ends up being the greatest part of the trace. What if I want to see the trace of what happens in the Optimizer after greedy search is complete? If I set @@optimizer_trace_max_mem_size to a low value, it will trim greedy search and what follows. If I set @@optimizer_trace_max_mem_size to a high value, to see what I want, greedy search will be traced which will possibly exceed the amount of memory I can afford on this machine... It would be nice if I could tell the system: "please do not trace this and that". In my case, it would be "please do not trace greedy search".

As an example, let's consider twenty tables, t1...t20, all similar to this one:

mysql> CREATE TABLE t1 (a INT, INDEX(a)) ENGINE=MYISAM;
mysql> INSERT INTO t1 VALUES(x),(y); # x and y being some integers

and let's run this query, after turning tracing on:
mysql> SET OPTIMIZER_TRACE="ENABLED=ON,END_MARKER=ON";
mysql> EXPLAIN SELECT 1 FROM t1  JOIN t2 ON t2.a=t1.a JOIN t3 ON t3.a=t2.a JOIN t4 ON t4.a=t3.a JOIN t5 ON t5.a=t4.a JOIN t6 ON t6.a=t5.a JOIN t7 ON t7.a=t6.a JOIN t8 ON t8.a=t7.a JOIN t9 ON t9.a=t8.a JOIN t10 ON t10.a=t9.a JOIN t11 ON t11.a=t10.a JOIN t12 ON t12.a=t11.a JOIN t13 ON t13.a=t12.a JOIN t14 ON t14.a=t13.a JOIN t15 ON t15.a=t14.a JOIN t16 ON t16.a=t15.a JOIN t17 ON t17.a=t16.a JOIN t18 ON t18.a=t17.a JOIN t19 ON t19.a=t18.a JOIN t20 ON t20.a=t19.a;
+----+-------------+-------+-------+---------------+------+---------+------------+------+--------------------------+
| id | select_type | table | type  | possible_keys | key  | key_len | ref        | rows | Extra                    |
+----+-------------+-------+-------+---------------+------+---------+------------+------+--------------------------+
|  1 | SIMPLE      | t1    | index | a             | a    | 5       | NULL       |    2 | Using where; Using index |
|  1 | SIMPLE      | t2    | ref   | a             | a    | 5       | test.t1.a  |    1 | Using where; Using index |
|  1 | SIMPLE      | t3    | ref   | a             | a    | 5       | test.t2.a  |    1 | Using where; Using index |
|  1 | SIMPLE      | t4    | ref   | a             | a    | 5       | test.t1.a  |    1 | Using index              |
|  1 | SIMPLE      | t5    | ref   | a             | a    | 5       | test.t1.a  |    1 | Using index              |
|  1 | SIMPLE      | t6    | ref   | a             | a    | 5       | test.t1.a  |    1 | Using index              |
|  1 | SIMPLE      | t7    | ref   | a             | a    | 5       | test.t1.a  |    1 | Using index              |
|  1 | SIMPLE      | t8    | ref   | a             | a    | 5       | test.t1.a  |    1 | Using index              |
|  1 | SIMPLE      | t9    | ref   | a             | a    | 5       | test.t1.a  |    1 | Using where; Using index |
|  1 | SIMPLE      | t10   | ref   | a             | a    | 5       | test.t9.a  |    1 | Using where; Using index |
|  1 | SIMPLE      | t11   | ref   | a             | a    | 5       | test.t1.a  |    1 | Using where; Using index |
|  1 | SIMPLE      | t12   | ref   | a             | a    | 5       | test.t11.a |    1 | Using where; Using index |
|  1 | SIMPLE      | t13   | ref   | a             | a    | 5       | test.t12.a |    1 | Using where; Using index |
|  1 | SIMPLE      | t14   | ref   | a             | a    | 5       | test.t12.a |    1 | Using where; Using index |
|  1 | SIMPLE      | t15   | ref   | a             | a    | 5       | test.t12.a |    1 | Using where; Using index |
|  1 | SIMPLE      | t16   | ref   | a             | a    | 5       | test.t12.a |    1 | Using where; Using index |
|  1 | SIMPLE      | t17   | ref   | a             | a    | 5       | test.t12.a |    1 | Using where; Using index |
|  1 | SIMPLE      | t18   | ref   | a             | a    | 5       | test.t11.a |    1 | Using where; Using index |
|  1 | SIMPLE      | t19   | ref   | a             | a    | 5       | test.t12.a |    1 | Using where; Using index |
|  1 | SIMPLE      | t20   | ref   | a             | a    | 5       | test.t11.a |    1 | Using where; Using index |
+----+-------------+-------+-------+---------------+------+---------+------------+------+--------------------------+
As optimizer developer, it catches my eye that some tables have "Using where" and others don't. "Using where" tells me that a condition is evaluated after fetching a row from the table, but does not tell me what condition. To know more, I will look at the trace. But I'm not interested in the trace of greedy search, which likely accounts, in this 20-table example, for most of the total trace, and would just make my reading less comfortable, or even hog memory, if I had more than those 20 tables or more indices on them, or more complex conditions. So I do:

mysql> SET OPTIMIZER_TRACE_FEATURES="GREEDY_SEARCH=OFF";
and run the EXPLAIN again. To show the resulting trace, I'm posting a screenshot of how it is displaid by the JsonView Firefox add-on. Using this add-on requires taking two precautions:
  • turning off the end_marker flag of @@optimizer_trace (this flag is good for human-readability but is not JSON-compliant)
  • sending the trace to a file without escaping of newlines (which is why I use INTO DUMPFILE instead of INTO OUTFILE):

mysql> SET OPTIMIZER_TRACE="END_MARKER=OFF";
mysql> SELECT TRACE INTO DUMPFILE "/tmp/trace.json" FROM INFORMATION_SCHEMA.OPTIMIZER_TRACE;

Here's the trace now in Firefox:

We see how greedy_search=off has eliminated the trace of greedy search ("considered_execution_plans"), replacing it with just an ellipsis ("...")! Then there is what I wanted to see: the part after greedy search, named "attached_conditions_summary", which describes what condition is behind each "Using where" in EXPLAIN. There are equality conditions of course. Some coming directly from the "JOIN ON" conditions. Some deduced by equality propagation (if t1.a=t2.a and t2.a=t3.a then t1.a=t3.a, a new condition which we see is attached to t3). There are also "IS NOT NULL" conditions; indeed, an equality condition of the form t1.a=t2.a allows us to deduce that both t1.a and t2.a are not NULL, so if rows of t1 are retrieved first they can be filtered with this deduced condition, known as early NULL filtering.

There are other flags in @@optimizer_trace_features; we added a flag for each feature which we think can make the trace grow unreasonably large:
  • greedy search
  • repeated execution of subqueries (one execution per row)
  • repeated range optimization (known as "range checked for each record" in EXPLAIN)
  • range optimization in general.
I think I have shown enough for today. A positive note to finish: in the Optimizer team, this tracing has already helped us to debug some problems, so has started to fulfill its goal!

For the curious: the Optimizer Trace is covered in full detail, including the above, and more, in a chapter of the "MySQL Internals" manual.

lundi 3 octobre 2011

Kick-off

Hi all!
Here is a new blog; I'll post here some thoughts, tutorials... all related to my job - software developer at Oracle Corporation, working on MySQL. It will soon be nine years since I joined MySQL AB. After working on interesting topics: Replication, Online Backup, the Maria storage engine, each of them having taught me something, I'm now full-time on the Optimizer, and guess what - it is an interesting topic and it teaches me something!