tag:blogger.com,1999:blog-91912312749991210632024-02-19T17:32:28.924-08:00About MySQL development (Optimizer mainly)Guilhem Bichothttp://www.blogger.com/profile/11253623513851616256noreply@blogger.comBlogger23125tag:blogger.com,1999:blog-9191231274999121063.post-44522033017648702202020-01-15T06:04:00.001-08:002020-01-15T06:04:21.247-08:00A new, simple way to figure out why your recursive CTE is running away<div dir="ltr" style="text-align: left;" trbidi="on">
In MySQL 8.0.1, I introduced support for recursive common table expressions (CTE). In the just-released version 8.0.19, I added an easy solution to a problem which nearly everybody meets when writing queries with
recursive CTE’s: when infinite recursion happens, how to debug? You can read about it in my blog post on <a href="https://mysqlserverteam.com/a-new-simple-way-to-figure-out-why-your-recursive-cte-is-running-away/" target="_blank">mysqlserverteam</a>.</div>
Guilhem Bichothttp://www.blogger.com/profile/11253623513851616256noreply@blogger.com0tag:blogger.com,1999:blog-9191231274999121063.post-71057035292057457062019-12-18T05:02:00.001-08:002019-12-18T05:02:22.727-08:00Antijoin in MySQL 8<div dir="ltr" style="text-align: left;" trbidi="on">
<h1 class="entry-title">
</h1>
<div class="entry-meta">
<span class="date"><a href="https://mysqlserverteam.com/antijoin-in-mysql-8/" rel="bookmark" title="Permalink to Antijoin in MySQL 8"><time class="entry-date" datetime="2019-12-18T12:10:56+00:00"></time></a></span><span class="edit-link">Hello!</span> In MySQL 8.0.17, we made an observation in the well-known <a href="http://www.tpc.org/tpch/default5.asp">TPC-H</a>
benchmark for one particular query. The query was executing 20% faster
than in MySQL 8.0.16. This improvement is because of the “antijoin”
optimization which I implemented. You can find out more in <a href="https://mysqlserverteam.com/antijoin-in-mysql-8/" target="_blank">my blog post</a> on mysqlserverteam.</div>
</div>
Guilhem Bichothttp://www.blogger.com/profile/11253623513851616256noreply@blogger.com0tag:blogger.com,1999:blog-9191231274999121063.post-54170949480177091242019-12-17T06:56:00.001-08:002019-12-17T06:56:37.116-08:00Supporting all kinds of outer references in derived tables (lateral, or not)<div dir="ltr" style="text-align: left;" trbidi="on">
Hi. In <a href="https://mysqlserverteam.com/support-for-lateral-derived-tables-added-to-mysql-8-0-14/" rel="noopener noreferrer" target="_blank">my earlier post</a>, I showed how MySQL, since version 8.0.14, has support for <a href="https://dev.mysql.com/doc/refman/8.0/en/lateral-derived-tables.html" rel="noopener noreferrer" target="_blank">LATERAL</a>
derived tables. With LATERAL, a JOIN can have a second table – a
subquery-based derived table – be defined based on values from columns
of the first table, and thus be re-calculated for each row of the first
table. Typically:<br />
<br />
<pre>SELECT ... FROM t1, LATERAL (SELECT ... FROM t2
^ WHERE t2.col=t1.col ... ) AS derived;
| |
| |
+---------------------------+</pre>
<br />
We say that in the second table (<i>derived</i>), <i>t1.col</i> is a “lateral outer reference” to the first table <i>t1</i>.
The word “lateral” hints at the fact that the referenced table is
placed “next to”, “on the side of” the derived table (i.e. both are part
of the same FROM clause). The arrow goes “laterally”.<br />
While implementing this LATERAL feature, I added another related one at the same time: <i>support for <b>NON</b>-lateral outer references in derived tables</i>. Please, find all information in <a href="https://mysqlserverteam.com/supporting-all-kinds-of-outer-references-in-derived-tables-lateral-or-not/" target="_blank">my post on mysqlserverteam</a>.<br />
<br />
<br /></div>
Guilhem Bichothttp://www.blogger.com/profile/11253623513851616256noreply@blogger.com0tag:blogger.com,1999:blog-9191231274999121063.post-48260757510089159192019-01-31T05:03:00.003-08:002019-01-31T05:03:35.457-08:00Support for LATERAL derived tables added to MySQL 8.0.14<div dir="ltr" style="text-align: left;" trbidi="on">
Hello all. Last months have been busy for me with various tasks, and one of them is finally done and released in MySQL 8.0.14 : LATERAL derived tables. All details are in my post on <a href="https://mysqlserverteam.com/support-for-lateral-derived-tables-added-to-mysql-8-0-14/" target="_blank">mysqlserverteam</a>. I think it is a useful addition to our SQL capabilities. Enjoy!</div>
Guilhem Bichothttp://www.blogger.com/profile/11253623513851616256noreply@blogger.com6tag:blogger.com,1999:blog-9191231274999121063.post-30854102294088037752018-01-08T00:07:00.001-08:002018-01-08T00:07:07.693-08:00Row numbering, ranking: how to use LESS user variables in MySQL queries<div dir="ltr" style="text-align: left;" trbidi="on">
Happy New Year to everyone!<br />
I published an article about row numbering and ranking with MySQL, <a href="https://mysqlserverteam.com/row-numbering-ranking-how-to-use-less-user-variables-in-mysql-queries/" target="_blank">here.</a></div>
Guilhem Bichothttp://www.blogger.com/profile/11253623513851616256noreply@blogger.com0tag:blogger.com,1999:blog-9191231274999121063.post-73721252145711306062017-11-27T01:45:00.000-08:002017-11-27T01:45:01.822-08:00Sudoku Recursive Common Table Expression Solver<div dir="ltr" style="text-align: left;" trbidi="on">
Colin, from Percona, wrote a nice post demonstrating a funny use of common table expressions: <a href="https://www.percona.com/blog/2017/11/22/sudoku-recursive-common-table-expression-solver/" target="_blank">solving a Sudoku Puzzle</a>. Ok, that's not the type of problem which matters a lot to database users... but it still feels good when he mentions that MySQL's implementation (which I wrote), is the fastest of all three tested DBMSs 😊.</div>
Guilhem Bichothttp://www.blogger.com/profile/11253623513851616256noreply@blogger.com1tag:blogger.com,1999:blog-9191231274999121063.post-67728912752382105492017-10-02T04:19:00.000-07:002017-10-02T04:19:25.798-07:00Presentation at PerconaLive in Dublin, 25-27 Sep 2017<div dir="ltr" style="text-align: left;" trbidi="on">
<div>
Back from the <a href="https://www.percona.com/live/e17/" target="_blank">PerconaLive</a> conference! It was nice meeting some old faces 😃 There I presented MySQL's Common Table Expressions; check out my <a href="https://www.slideshare.net/GuilhemBichot/ctes-percona-live2017" target="_blank">slides</a>, with example queries and some details about the implementation. I got two suggestions from the audience:<br />
<ol style="text-align: left;">
<li>add to views the same single-materialization which we have in CTEs now (slides 19-27)</li>
<li>support outer references in derived tables and CTEs.</li>
</ol>
</div>
The first point would speed up queries which reference a view more than once. The second point would make it easier to write certain queries, where a subquery depends on the outer query's current row and this subquery contains a CTE. Will I get to work on any of these? Time will tell...</div>
Guilhem Bichothttp://www.blogger.com/profile/11253623513851616256noreply@blogger.com4tag:blogger.com,1999:blog-9191231274999121063.post-88100807580429724062017-09-14T08:18:00.001-07:002017-09-14T08:18:33.315-07:00MySQL 8.0.1: [Recursive] Common Table Expressions in MySQL (CTEs), Part Four – depth-first or breadth-first traversal, transitive closure, cycle avoidance<div dir="ltr" style="text-align: left;" trbidi="on">
I published a new CTE blog <a href="http://mysqlserverteam.com/mysql-8-0-1-recursive-common-table-expressions-in-mysql-ctes-part-four-depth-first-or-breadth-first-traversal-transitive-closure-cycle-avoidance/" target="_blank">here</a>.</div>
Guilhem Bichothttp://www.blogger.com/profile/11253623513851616256noreply@blogger.com0tag:blogger.com,1999:blog-9191231274999121063.post-17968744316231607952016-10-25T12:07:00.001-07:002016-10-25T12:07:04.425-07:00MySQL 8.0 Labs: [Recursive] Common Table Expressions in MySQL (CTEs), Part Two and Three<div dir="ltr" style="text-align: left;" trbidi="on">
I did more CTE blogging: check it out <a href="http://mysqlserverteam.com/mysql-8-0-labs-recursive-common-table-expressions-in-mysql-ctes-part-two-how-to-generate-series/" target="_blank">here</a> and <a href="http://mysqlserverteam.com/mysql-8-0-labs-recursive-common-table-expressions-in-mysql-ctes-part-three-hierarchies/" target="_blank">here</a> !</div>
Guilhem Bichothttp://www.blogger.com/profile/11253623513851616256noreply@blogger.com0tag:blogger.com,1999:blog-9191231274999121063.post-60730565842420945372016-09-22T01:39:00.002-07:002016-09-22T01:43:53.184-07:00MySQL 8.0 Labs: [Recursive] Common Table Expressions in MySQL (CTEs)<div dir="ltr" style="text-align: left;" trbidi="on">
I realize that these last months I have not published anything... it's because I was quite busy developing a prominent feature: <i>[Recursive] Common Table Expressions</i>, also known as<br />
<ul>
<li><i>[recursive] CTE</i>,</li>
<li><i>[recursive] subquery factoring</i>,</li>
<li><i>WITH [RECURSIVE] clause.</i></li>
</ul>
This feature is available today, in a <a href="https://labs.mysql.com/">Labs release</a> of the MySQL Server. <br />
In <a href="http://mysqlserverteam.com/mysql-8-0-labs-recursive-common-table-expressions-in-mysql-ctes/" target="_blank">my post here</a>, you will find more information: syntax, capacities, examples...<br />
Feels good to have this big piece of work finally out! </div>
Guilhem Bichothttp://www.blogger.com/profile/11253623513851616256noreply@blogger.com1tag:blogger.com,1999:blog-9191231274999121063.post-36100413354564985902015-12-16T01:32:00.002-08:002015-12-16T01:33:49.191-08:00Using the aggregate functions ANY, SOME, EVERY with MySQL<div dir="ltr" style="text-align: left;" trbidi="on">
Hello! I have posted this entry on the MySQL Server team's blog:<br />
<br />
<span style="font-size: large;"><a href="http://mysqlserverteam.com/using-the-aggregate-functions-any-some-every-with-mysql/" target="_blank">Using the aggregate functions ANY, SOME, EVERY with MySQL</a> .</span></div>
Guilhem Bichothttp://www.blogger.com/profile/11253623513851616256noreply@blogger.com0tag:blogger.com,1999:blog-9191231274999121063.post-70039548206354158802015-10-14T11:58:00.006-07:002015-10-14T12:00:05.733-07:00Storing UUID Values in MySQL Tables<div dir="ltr" style="text-align: left;" trbidi="on">
Hello! I have posted this entry on the MySQL Server team's blog:<br />
<h1 class="entry-title">
<span style="font-size: large;"><span style="font-family: inherit;"><a href="http://mysqlserverteam.com/storing-uuid-values-in-mysql-tables/" target="_blank">Storing UUID Values in MySQL Tables</a></span></span></h1>
</div>
Guilhem Bichothttp://www.blogger.com/profile/11253623513851616256noreply@blogger.com0tag:blogger.com,1999:blog-9191231274999121063.post-83060942228311119002015-01-19T02:27:00.002-08:002015-01-19T02:27:58.557-08:00When ONLY_FULL_GROUP_BY Won’t See the Query Is Deterministic…Hi! Just to say I wrote this new <a href="http://mysqlserverteam.com/when-only_full_group_by-wont-see-the-query-is-deterministic/" target="_blank">post</a> about<i> only_full_group_by</i> tricks, on the MySQL Server team's blog.Guilhem Bichothttp://www.blogger.com/profile/11253623513851616256noreply@blogger.com0tag:blogger.com,1999:blog-9191231274999121063.post-43473464836319761512014-12-10T05:36:00.004-08:002014-12-10T05:36:57.914-08:00MySQL 5.7: only_full_group_by Improved, Recognizing Functional Dependencies, Enabled by Default!I just posted on <a href="http://mysqlserverteam.com/mysql-5-7-only_full_group_by-improved-recognizing-functional-dependencies-enabled-by-default/" target="_blank">the Server team's blog</a>, an account on my recent <i>only_full_group_by</i> work. We have made significant improvements in 5.7.5, worth a look!Guilhem Bichothttp://www.blogger.com/profile/11253623513851616256noreply@blogger.com1tag:blogger.com,1999:blog-9191231274999121063.post-45444875523077109062014-05-21T08:07:00.000-07:002014-05-21T08:10:42.812-07:00Re-factoring some internals of prepared statements in 5.7[ <i>this is a re-posting of what I published on the <a href="http://mysqlserverteam.com/re-factoring-some-internals-of-prepared-statements-in-5-7/" target="_blank">MySQL server team blog</a> a few days ago</i> ]<br />
<i> </i> <br />
When the MySQL server receives a SELECT query, the query goes through several consecutive phases:<br />
<ul>
<li><b>parsing</b>: SQL words are recognized, the query is
split into different parts following the SQL grammar rules: a list of
selected expressions, a list of tables to read, a WHERE condition, …</li>
<li><b>resolution</b>: the output of the parsing stage
contains names of columns and names of tables. Resolution is about
making sense out of this. For example, in “<code>WHERE foo=3</code>“,
“foo” is a column name without a table name; by applying SQL name
resolution rules, we discover the table who contains “foo” (it can be
complicated if subqueries or outer joins are involved).</li>
<li><b>optimization</b>: finding the best way to read tables:
the best order of tables, and for each table, the best way to access it
(index lookup, index scan, …). The output is the so-called “plan”.</li>
<li><b>execution</b>: we read tables as dictated by the plan above, and send output rows to the client.</li>
</ul>
This design hasn’t changed since many many years. Originally, MySQL
didn’t have prepared statements. So it took a query, passed it through
the phases above, and then threw it away. This still happens nowadays
when using non-prepared statements.<br />
But, with a prepared statement, come more requirements. MySQL has to be
able to execute the query a second time (and a third time, and so on, at
the user’s will). No matter the used API (the <a href="http://dev.mysql.com/doc/refman/5.7/en/c-api-prepared-statement-function-overview.html" target="_blank">C API</a> or the <a href="http://dev.mysql.com/doc/refman/5.7/en/sql-syntax-prepared-statements.html" target="_blank">SQL API</a>), this usually looks like the following dialogue:<br />
<br />
<pre> # Assume that "t" contains 3 rows with values 1, 2, 98 in a column
user: PREPARE stmt FROM 'SELECT * FROM t';
MySQL: okay
user: EXECUTE stmt;
MYSQL: 1,2,98
user: INSERT INTO t VALUES (0);
user: EXECUTE stmt;
MYSQL: 1,2,98,0
</pre>
<br />
SQL mandates that syntax and semantic errors be reported by PREPARE,
without postponing them to EXECUTE. So, PREPARE needs to do at least
parsing and resolution, to spot any such error.<br />
Then there is an implementation question: when it receives <code>"EXECUTE stmt"</code>, how can MySQL remember what “stmt” is?<br />
The simplest, most obvious solution would be, during PREPARE, to simply store <code>"stmt" => "SELECT * FROM t"</code> in some session-local key/value dictionary. When <code>"EXECUTE stmt"</code>
comes, MySQL would find the query string in the dictionary, and go
through all stages (parsing, resolution, optimization, execution). And
when the next <code>"EXECUTE stmt"</code> comes, it would do the same again.<br />
Of course, that would not be efficient. At least parsing and resolution
theoretically don’t need to be re-done every time: query’s text doesn’t
change, so only one parsing is necessary; the structure of tables (names
and type of columns) doesn’t change, so only one resolution is
necessary (but see footnote 1). So, what PREPARE actually stores in the key/value dictionary, is <code>"stmt" => tree</code>,
where “tree” is the output of the resolution phase; it’s an in-memory
tree of C++ objects, each object representing an expression, a column of
a table, a table, a left join, a boolean condition, …<br />
With this design, only optimization and execution are repeated by each <code>"EXECUTE stmt"</code> (but see footnote 2).
Repeating optimization is sometimes a smart thing to do; imagine that
between the first and second EXECUTE, the table’s data distribution has
significantly changed (perhaps due to other clients), then the plan of
the first EXECUTE is maybe not the best one anymore, and searching for a
new plan makes sense. In other cases, repeating optimization is a
waste, because the old plan would be good enough. As of today, MySQL
always repeats optimization; making this more flexible would be an
interesting enhancement for the future, but today’s situation is: we
repeat optimization.<br />
<br />
Optimization does everything it can to speed up the query. Imagine
the query references 10 tables in its FROM clause. To find the best
order to access those 10 tables, we need to consider 10! possible
orders, roughly 3,000,000. It takes time to consider that many plans.
The optimizer has techniques to cut this down, let me show one. If one
table is referenced like this in the WHERE clause:<br />
<code>WHERE t.pk=39 AND ...</code><br />
then, because “t.pk” is a primary key, I know that at most one row of
this table will participate in the result, and it makes sense to read it
with a primary key lookup. And this read has to be done only once in
this execution. So let’s put “t” first in the plan, unconditionally.
This technique, known as “constant tables”, divides the number of
possible orders by a factor of 10 in our example (only 9 tables left to
order). Now, imagine that the WHERE clause looks like this:<br />
<code>WHERE t.pk=39 AND t.c1>t2.c2</code><br />
As soon as I read the row of “t” with a primary key lookup (looking up
39), I get access to the value of t.c1; say it is 100. My condition can
thus be simplified to:<br />
<code>WHERE 100>t2.c2</code><br />
Notice how the AND has disappeared.<br />
The inequality predicate “100>t2.c2″ is interesting: assuming that
t2.c2 is indexed, it means that a range scan is a possibility for
reading t2.<br />
This little example was meant to demonstrate that MySQL, in the
optimization phase, does data-based transformations to the query:
transformations which depend on the content of tables, and which apply
for one single execution; indeed, when the next execution comes, it may
be that the row of “t” with t.pk=39 now has t.c1=1200, so our condition
simplifications don’t hold anymore. Another way to say this, is:
data-based transformations must be non-permanent.<br />
So the AND operator which was killed above during the optimization of
the first EXECUTE, must be resurrected for the optimization of the
second EXECUTE.<br />
To achieve this, at the end of resolution we rename <i>tree</i> to <i>permanent_tree</i>, then we make a copy of it, which we name <i>tree</i>. The optimization phase has access to <i>tree</i>, and no access to <i>permanent_tree</i>. The optimization operates on <i>tree</i>, does data-based transformations on it. When the second EXECUTE starts, <i>permanent_tree</i> is fetched, copied into a new <i>tree</i>, on which the optimization operates. For example, the AND operator always exists in <i>permanent_tree</i>.<br />
<br />
What about permanent transformations? For example, consider this query:<br />
<code>SELECT * FROM t1 JOIN (t2 JOIN t3 ON t2.a=t3.a) ON t2.b+t3.b>t1.b<br />
WHERE t1.b*2>15;</code><br />
The result of this query is defined by the SQL standard as: get the
result of the most inner join (t2,t3), filter with t2.a=t3.a, then join
the result with t1, filter with t2.b+t3.b>t1.b, then filter with
t1.b*2>15. With this definition, the order of tables’ reading and the
order of application of conditions are constrained. For example,
reading t2 then t1 then t3 is not possible. But if we notice that the
query is equivalent to:<br />
<code>SELECT * FROM t1, t2, t3 WHERE t2.a=t3.a AND t2.b+t3.b>t1.b<br />
AND t1.b*2>15;</code><br />
then we have 3! = 6 possible orders for tables. More plans to examine,
but more potential for finding a good plan – it is very possible that
the best plan, the one yielding the fastest execution, is not among the
ones suggested by the SQL standard’s definition.<br />
The equivalence between both queries is semantics-based, not data-based.
Thus, the transformation from the first to the second query can be
permanent: it can be done once for all, not at every EXECUTE.<br />
Permanent transformations include:<br />
<ul>
<li>JOIN-to-WHERE, like in the example above</li>
<li>outer-join-to-inner-join, when there is something in the WHERE
clause which allows to deduce that NULL-complemented rows will actually
not participate in the result</li>
<li>semi-join, more or less merging an “IN (subquery)” predicate into the parent query</li>
<li>IN-to-EXISTS, rewriting “x IN (subquery)” to “EXISTS (modified subquery)”.</li>
</ul>
After giving all this background (phew…), I’m now ready to explain
one re-factoring which I did in MySQL 5.7.4. The situation of permanent
transformations in MySQL 5.6 is the following:<br />
<ol>
<li>IN-to-EXISTS : done in the resolution phase.</li>
<li>semi-join, JOIN-to-WHERE, outer-join-to-inner-join: done in the optimization phase.</li>
</ol>
Doing permanent transformations during optimization (step (2) above)
is surprising, as optimization is re-done at every EXECUTE. Fortunately,
we can internally know if an optimization is the first or not; if it’s
not, we skip step (2).<br />
So in the end, efficiency is guaranteed – permanent transformations are not re-done. Though this design agreeably looks strange.<br />
<br />
Now, putting pieces together, on the way to more strangeness: you remember that after the resolution phase, we produce <i>permanent_tree</i>
and save it for all future EXECUTEs. It thus contains permanent
transformations done in resolution, good. But, it does not contain those
done in optimization (semijoin …), as optimization runs after <i>permanent_tree</i> has been produced. Still we do want semi-join and friends to be reused by all future EXECUTEs, so they must be put in <i>permanent_tree</i>! So we get an even more strange design:<br />
<ol>
<li>resolution
<ol>
<li>do some permanent transformations (IN-to-EXISTS) on <i>tree</i></li>
<li>rename <i>tree</i> to <i>permanent_tree</i></li>
<li>copy <i>permanent_tree</i> to a new <i>tree</i> (for optimization to do nonpermanent transformations on it)</li>
</ol>
</li>
<li>optimization
<ol>
<li>if first optimization:
<ol>
<li>do some more permanent transformations (semijoin, etc) on <i>tree</i></li>
<li>throw <i>permanent_tree</i> away</li>
<li>copy <i>tree</i> to a new <i>permanent_tree</i></li>
</ol>
</li>
<li>do nonpermanent transformations on <i>tree</i></li>
</ol>
</li>
<li>execution
<ol>
<li>read tables and send rows</li>
<li>throw <i>tree</i> away</li>
</ol>
</li>
</ol>
This has a few nasty effects:<br />
<ul>
<li>permanent transformations are scattered: the boundary between phases
is blurred, which is a code readability problem, and an impediment for
future improvements in the Optimizer in general</li>
<li>efficiency loss: copying a tree in resolution phase takes some time and memory; in the first optimization we throw <i>permanent_tree</i> away and do the copying again… Bad.</li>
<li>real bugs. Yes, because what the resolution phase thinks of <i>permanent_tree</i>
is not true anymore: this object has been deleted and replaced by
another one, in the first optimization… so the next EXECUTE gets
confused…</li>
</ul>
In MySQL 5.7.4, I have moved all permanent transformations to where they belong, so now we have the more straightforward design:<br />
<ol>
<li>resolution
<ol>
<li>do <b>all</b> permanent transformations on <i>tree</i></li>
<li>rename <i>tree</i> to <i>permanent_tree</i></li>
</ol>
</li>
<li>optimization
<ol>
<li>copy <i>permanent_tree</i> to a new <i>tree</i> (for optimization to do nonpermanent transformations on it)</li>
<li>do nonpermanent transformations on <i>tree</i></li>
</ol>
</li>
<li>execution
<ol>
<li>read tables and send rows</li>
<li>throw <i>tree</i> away</li>
</ol>
</li>
</ol>
If we had done this re-factoring earlier, we would have saved us some
headaches. But, better late than never. It’s at least comforting that
nowadays we have time to do re-factoring, even if it means spending
several man-months on a task like that. It really took that much time:
the idea may look simple, but the devil was in the details as usual, and
if you add, on top of coding time, review time by two reviewers, and QA
time to verify that I didn’t break anything… But in my team we are
convinced that this is a long-term investment which will pay. Moreover,
when we do such re-factoring work, it gives us the occasion to remove
the little hacks which accumulated over time to work around the root
problem (which the re-factoring finally addresses). And that leads to
even more code simplification.<br />
<br />
There are a few more details in <a href="http://dev.mysql.com/worklog/task/?id=7082" target="_blank">the Worklog page</a> if you are interested (note the multiple tabs there).<br />
<br />
That’s all for today. If you reached this line, congratulations :-)<br />
<br />
<br />
<b id="footnote">Footnotes:</b><br />
<b>1</b>. I hear you say: “what if an ALTER TABLE is done
between PREPARE and EXECUTE?! This may change names and types of
columns!”. Yes, you are right, but there exists, somewhere in the
prepared statement subsystem, a detector for this; it runs when EXECUTE
starts, and if it spots a table structure change since the statement was
prepared, it throws away the prepared statement (as “out-of-date”),
silently re-prepares it and then docks into EXECUTE; the user does not
notice, except if she looks at com_stmt_reprepare in <code>SHOW STATUS</code>.
This detector is external to the Optimizer and runs before it; so if
the Optimizer is involved in EXECUTE, it can safely assume that nothing
has changed since PREPARE.<br />
<b>2</b>. Ok, a part of resolution is repeated too. It would be good to avoid that, in the future.Guilhem Bichothttp://www.blogger.com/profile/11253623513851616256noreply@blogger.com6tag:blogger.com,1999:blog-9191231274999121063.post-53348566500645248452013-11-18T06:17:00.000-08:002019-07-08T03:05:44.014-07:00WITH RECURSIVE and MySQL<div dir="ltr" style="text-align: left;" trbidi="on">
<i>[2017 update: MySQL 8.0.1 now features SQL-standard CTE syntax; more information is <a href="http://mysqlserverteam.com/mysql-8-0-labs-recursive-common-table-expressions-in-mysql-ctes/" target="_blank">here</a> ; the entry below, from 2013, shows how to work around the absence of CTEs in older MySQL versions.]</i><br />
<br />
If you have been using certain DBMSs, or reading recent versions of the SQL standard, you are probably aware of the so-called <i>"WITH clause"</i> of SQL.<br />
Some call it <a href="http://docs.oracle.com/cd/B19306_01/server.102/b14200/statements_10002.htm#i2161315" target="_blank">Subquery Factoring</a>. Others call it <a href="http://msdn.microsoft.com/en-us/library/ms190766%28v=sql.105%29.aspx" target="_blank">Common Table Expression</a> (CTE). In its simplest form, this feature is a kind of "boosted derived table".<br />
<br />
Assume that a table T1 has three columns:
<br />
<div style="background: none repeat scroll 0% 0% black; color: white; height: 100%; overflow: auto; width: 100%;">
<pre><span style="font-size: small;">
CREATE TABLE T1(
YEAR INT, # 2000, 2001, 2002 ...
MONTH INT, # January, February, ...
SALES INT # how much we sold on that month of that year
);
</span></pre>
</div>
Now I want to know the sales trend (increase/decrease), year after year:<br />
<div style="background: none repeat scroll 0% 0% black; color: white; height: 100%; overflow: auto; width: 100%;">
<pre><span style="font-size: small;">
SELECT D1.YEAR, (CASE WHEN D1.S>D2.S THEN 'INCREASE' ELSE 'DECREASE' END) AS TREND
FROM
(SELECT YEAR, SUM(SALES) AS S FROM T1 GROUP BY YEAR) AS D1,
(SELECT YEAR, SUM(SALES) AS S FROM T1 GROUP BY YEAR) AS D2
WHERE D1.YEAR = D2.YEAR-1;
</span></pre>
</div>
Both derived tables are based on the same subquery text, but usually a DBMS is not smart enough to recognize it. Thus, it will evaluate "SELECT YEAR, SUM(SALES)... GROUP BY YEAR"
twice! A first time to fill D1, a second time to fill D2. This limitation is sometimes stated as <i>"it's not possible to refer to a derived table twice in the same query"</i>.<br />
Such double evaluation can lead to a serious performance problem. Using WITH, this limitation does not exist, and the following statement evaluates the subquery only once:<br />
<div style="background: none repeat scroll 0% 0% black; color: white; height: 100%; overflow: auto; width: 100%;">
<pre><span style="font-size: small;">
WITH D AS (SELECT YEAR, SUM(SALES) AS S FROM T1 GROUP BY YEAR)
SELECT D1.YEAR, (CASE WHEN D1.S>D2.S THEN 'INCREASE' ELSE 'DECREASE' END) AS TREND
FROM
D AS D1,
D AS D2
WHERE D1.YEAR = D2.YEAR-1;
</span></pre>
</div>
This already demonstrates one benefit of WITH.<br />
In MySQL, WITH is not yet supported. But it can be emulated with a view:<br />
<div style="background: none repeat scroll 0% 0% black; color: white; height: 100%; overflow: auto; width: 100%;">
<pre><span style="font-size: small;">
CREATE VIEW D AS (SELECT YEAR, SUM(SALES) AS S FROM T1 GROUP BY YEAR);
SELECT D1.YEAR, (CASE WHEN D1.S>D2.S THEN 'INCREASE' ELSE 'DECREASE' END) AS TREND
FROM
D AS D1,
D AS D2
WHERE D1.YEAR = D2.YEAR-1;
DROP VIEW D;
</span></pre>
</div>
Instead of a view, I could as well create D as a normal table. But not as a temporary table, because in MySQL a temporary table cannot be referred twice in the same query, as mentioned in <a href="http://dev.mysql.com/doc/refman/5.7/en/temporary-table-problems.html" target="_blank">the manual</a>.<br />
<br />
After this short introduction, showing the simplest form of WITH, I would like to turn to the <i>more complex form of WITH: the RECURSIVE form</i>.<br />
According to the SQL standard, to use the recursive form, you should write WITH RECURSIVE. However, looking at some other DBMSs, they seem to not require the RECURSIVE word.<br />
WITH RECURSIVE is a powerful construct. For example, it can do the same job as Oracle's CONNECT BY clause (you can check out <a href="https://web.archive.org/web/20151116062928/http://rajeshwaranbtech.blogspot.co.uk/2012/12/recursive-with-clause-to-implement.html" target="_blank">some example conversions</a> between both constructs).<br />
Let's walk through an example, to understand what WITH RECURSIVE does. <br />
<br />
Assume you have a table of employees (this is a <i>very</i> classical example of WITH RECURSIVE):<br />
<div style="background: none repeat scroll 0% 0% black; color: white; height: 100%; overflow: auto; width: 100%;">
<pre><span style="font-size: small;">
CREATE TABLE EMPLOYEES (
ID INT PRIMARY KEY,
NAME VARCHAR(100),
MANAGER_ID INT,
INDEX (MANAGER_ID),
FOREIGN KEY (MANAGER_ID) REFERENCES EMPLOYEES(ID)
);
INSERT INTO EMPLOYEES VALUES
(333, "Yasmina", NULL),
(198, "John", 333),
(29, "Pedro", 198),
(4610, "Sarah", 29),
(72, "Pierre", 29),
(692, "Tarek", 333);
</span></pre>
</div>
In other words, Yasmina is CEO, John and Tarek report to her. Pedro reports to John, Sarah and Pierre report to Pedro.<br />
In a big company, they would be thousands of rows in this table.<br />
<br />
Now, let's say that you would like to know, for each employee: "how many people are, directly and indirectly, reporting to him/her"? Here is how I would do it. First, I would make a list of people who are not managers: with a subquery I get the list of all managers, and using <i>NOT IN (subquery)</i> I get the list of all non-managers:<br />
<div style="background: none repeat scroll 0% 0% black; color: white; height: 100%; overflow: auto; width: 100%;">
<pre><span style="font-size: small;">
SELECT ID, NAME, MANAGER_ID, 0 AS REPORTS
FROM EMPLOYEES
WHERE ID NOT IN (SELECT MANAGER_ID FROM EMPLOYEES WHERE MANAGER_ID IS NOT NULL);
</span></pre>
</div>
Then I would insert the results into a new table named EMPLOYEES_EXTENDED; EXTENDED stands for "extended with more information", the new information being the fourth column named REPORTS: it is a count of people who are reporting directly or indirectly to the employee. Because we have listed people who are not managers, they have a value of 0 in the REPORTS column.<br />
Then, we can produce the rows for "first level" managers (the direct managers of non-managers):<br />
<div style="background: none repeat scroll 0% 0% black; color: white; height: 100%; overflow: auto; width: 100%;">
<pre><span style="font-size: small;">
SELECT M.ID, M.NAME, M.MANAGER_ID, SUM(1+E.REPORTS) AS REPORTS
FROM EMPLOYEES M JOIN EMPLOYEES_EXTENDED E ON M.ID=E.MANAGER_ID
GROUP BY M.ID, M.NAME, M.MANAGER_ID;
</span></pre>
</div>
Explanation: for a row of M (that is, for an employee), the JOIN will produce zero or more rows, one per non-manager directly reporting to the employee.<br />
Each such non-manager contributes to the value of REPORTS for his manager, through two numbers: 1 (the non-manager himself), and the number of direct/indirect reports of the non-manager (i.e. the value of REPORTS for the non-manager).<br />
Then I would empty EMPLOYEES_EXTENDED, and fill it with the rows produced just above, which describe the first level managers.<br />
Then the same query should be run again, and it would produce information about the "second level" managers. And so on.<br />
Finally, at one point Yasmina will be the only row of EMPLOYEES_EXTENDED, and when we run the above SELECT again, the JOIN will produce no rows, because E.MANAGER_ID will be NULL (she's the CEO). We are done.<br />
<br />
It's time for a recap: EMPLOYEES_EXTENDED has been a kind of "temporary buffer", which has <i>successively </i>held non-managers, first level managers, second level managers, etc. We have used <i>recursion</i>. The answer to the original problem is: the <i>union</i> of all the successive content of EMPLOYEES_EXTENDED.<br />
Non-managers have been the start of the recursion, which is usually called "the anchor member" or "the seed". The SELECT query which moves from one step of recursion to the next one, is the "recursive member". The complete statement looks like this:<br />
<div style="background: none repeat scroll 0% 0% black; color: white; height: 100%; overflow: auto; width: 100%;">
<pre><span style="font-size: small;">
WITH RECURSIVE
# The temporary buffer, also used as UNION result:
EMPLOYEES_EXTENDED
AS
(
# The seed:
SELECT ID, NAME, MANAGER_ID, 0 AS REPORTS
FROM EMPLOYEES
WHERE ID NOT IN (SELECT MANAGER_ID FROM EMPLOYEES WHERE MANAGER_ID IS NOT NULL)
UNION ALL
# The recursive member:
SELECT M.ID, M.NAME, M.MANAGER_ID, SUM(1+E.REPORTS) AS REPORTS
FROM EMPLOYEES M JOIN EMPLOYEES_EXTENDED E ON M.ID=E.MANAGER_ID
GROUP BY M.ID, M.NAME, M.MANAGER_ID
)
# what we want to do with the complete result (the UNION):
SELECT * FROM EMPLOYEES_EXTENDED;
</span></pre>
</div>
MySQL does not yet support WITH RECURSIVE, but it is possible to code a generic stored procedure which can easily emulate it. Here is how you would call it:<br />
<div style="background: none repeat scroll 0% 0% black; color: white; height: 100%; overflow: auto; width: 100%;">
<pre><span style="font-size: small;">
CALL WITH_EMULATOR(
"EMPLOYEES_EXTENDED",
"
SELECT ID, NAME, MANAGER_ID, 0 AS REPORTS
FROM EMPLOYEES
WHERE ID NOT IN (SELECT MANAGER_ID FROM EMPLOYEES WHERE MANAGER_ID IS NOT NULL)
",
"
SELECT M.ID, M.NAME, M.MANAGER_ID, SUM(1+E.REPORTS) AS REPORTS
FROM EMPLOYEES M JOIN EMPLOYEES_EXTENDED E ON M.ID=E.MANAGER_ID
GROUP BY M.ID, M.NAME, M.MANAGER_ID
",
"SELECT * FROM EMPLOYEES_EXTENDED",
0,
""
);
</span></pre>
</div>
You can recognize, as arguments of the stored procedure, every member of the WITH standard syntax: name of the temporary buffer, query for the seed, query for the recursive member, and what to do with the complete result. The last two arguments - 0 and the empty string - are details which you can ignore for now.<br />
<br />
Here is the result returned by this stored procedure:<br />
<div style="background: none repeat scroll 0% 0% black; color: white; height: 100%; overflow: auto; width: 100%;">
<pre><span style="font-size: small;">
+------+---------+------------+---------+
| ID | NAME | MANAGER_ID | REPORTS |
+------+---------+------------+---------+
| 72 | Pierre | 29 | 0 |
| 692 | Tarek | 333 | 0 |
| 4610 | Sarah | 29 | 0 |
| 29 | Pedro | 198 | 2 |
| 333 | Yasmina | NULL | 1 |
| 198 | John | 333 | 3 |
| 333 | Yasmina | NULL | 4 |
+------+---------+------------+---------+
7 rows in set
</span></pre>
</div>
Notice how Pierre, Tarek and Sarah have zero reports, Pedro has two, which looks correct... However, Yasmina appears in two rows! Odd? Yes and no. Our algorithm starts from non-managers, the "leaves" of the tree (Yasmina being the root of the tree). Then our algorithm looks at first level managers, the direct parents of leaves. Then at second level managers. But Yasmina is both a first level manager (of the nonmanager Tarek) and a third level manager (of the nonmanagers Pierre and Sarah). That's why she appears twice in the final result: once for the "tree branch" which ends at leaf Tarek, once for the tree branch which ends at leaves Pierre and Sarah. The first tree branch contributes 1 direct/indirect report. The second tree branch contributes 4. The right number, which we want, is the sum of the two: 5. Thus we just need to change the final query, in the CALL:<br />
<div style="background: none repeat scroll 0% 0% black; color: white; height: 100%; overflow: auto; width: 100%;">
<pre><span style="font-size: small;">
CALL WITH_EMULATOR(
"EMPLOYEES_EXTENDED",
"
SELECT ID, NAME, MANAGER_ID, 0 AS REPORTS
FROM EMPLOYEES
WHERE ID NOT IN (SELECT MANAGER_ID FROM EMPLOYEES WHERE MANAGER_ID IS NOT NULL)
",
"
SELECT M.ID, M.NAME, M.MANAGER_ID, SUM(1+E.REPORTS) AS REPORTS
FROM EMPLOYEES M JOIN EMPLOYEES_EXTENDED E ON M.ID=E.MANAGER_ID
GROUP BY M.ID, M.NAME, M.MANAGER_ID
",
"
SELECT ID, NAME, MANAGER_ID, SUM(REPORTS)
FROM EMPLOYEES_EXTENDED
GROUP BY ID</span><span style="font-size: small;"><span style="font-size: small;">, NAME, MANAGER_ID</span>
",
0,
""
);
</span></pre>
</div>
And here is finally the proper result:<br />
<div style="background: none repeat scroll 0% 0% black; color: white; height: 100%; overflow: auto; width: 100%;">
<pre><span style="font-size: small;">
+------+---------+------------+--------------+
| ID | NAME | MANAGER_ID | SUM(REPORTS) |
+------+---------+------------+--------------+
| 29 | Pedro | 198 | 2 |
| 72 | Pierre | 29 | 0 |
| 198 | John | 333 | 3 |
| 333 | Yasmina | NULL | 5 |
| 692 | Tarek | 333 | 0 |
| 4610 | Sarah | 29 | 0 |
+------+---------+------------+--------------+
6 rows in set
</span></pre>
</div>
Let's finish by showing the body of the stored procedure. You will notice that it does heavy use of dynamic SQL, thanks to prepared statements. Its body does not depend on the particular problem to solve, it's reusable as-is for other WITH RECURSIVE use cases. I have added comments inside the body, so it should be self-explanatory. If it's not, feel free to drop a comment on this post, and I will explain further. Note that it uses temporary tables internally, and the first thing it does is dropping any temporary tables with the same names.<br />
<div style="background: none repeat scroll 0% 0% black; color: white; height: 100%; overflow: auto; width: 100%;">
<pre><span style="font-size: small;">
# Usage: the standard syntax:
# WITH RECURSIVE recursive_table AS
# (initial_SELECT
# UNION ALL
# recursive_SELECT)
# final_SELECT;
# should be translated by you to
# CALL WITH_EMULATOR(recursive_table, initial_SELECT, recursive_SELECT,
# final_SELECT, 0, "").
# ALGORITHM:
# 1) we have an initial table T0 (actual name is an argument
# "recursive_table"), we fill it with result of initial_SELECT.
# 2) We have a union table U, initially empty.
# 3) Loop:
# add rows of T0 to U,
# run recursive_SELECT based on T0 and put result into table T1,
# if T1 is empty
# then leave loop,
# else swap T0 and T1 (renaming) and empty T1
# 4) Drop T0, T1
# 5) Rename U to T0
# 6) run final select, send relult to client
# This is for *one* recursive table.
# It would be possible to write a SP creating multiple recursive tables.
delimiter |
CREATE PROCEDURE WITH_EMULATOR(
recursive_table varchar(100), # name of recursive table
initial_SELECT varchar(65530), # seed a.k.a. anchor
recursive_SELECT varchar(65530), # recursive member
final_SELECT varchar(65530), # final SELECT on UNION result
max_recursion int unsigned, # safety against infinite loop, use 0 for default
create_table_options varchar(65530) # you can add CREATE-TABLE-time options
# to your recursive_table, to speed up initial/recursive/final SELECTs; example:
# "(KEY(some_column)) ENGINE=MEMORY"
)
BEGIN
declare new_rows int unsigned;
declare show_progress int default 0; # set to 1 to trace/debug execution
declare recursive_table_next varchar(120);
declare recursive_table_union varchar(120);
declare recursive_table_tmp varchar(120);
set recursive_table_next = concat(recursive_table, "_next");
set recursive_table_union = concat(recursive_table, "_union");
set recursive_table_tmp = concat(recursive_table, "_tmp"); </span></pre>
<pre><span style="font-size: small;"> # Cleanup any previous failed runs
SET @str =
CONCAT("DROP TEMPORARY TABLE IF EXISTS ", recursive_table, ",",
recursive_table_next, ",", recursive_table_union,
",", recursive_table_tmp);
PREPARE stmt FROM @str;
EXECUTE stmt; </span></pre>
<pre><span style="font-size: small;"> # If you need to reference recursive_table more than
# once in recursive_SELECT, remove the TEMPORARY word.
SET @str = # create and fill T0
CONCAT("CREATE TEMPORARY TABLE ", recursive_table, " ",
create_table_options, " AS ", initial_SELECT);
PREPARE stmt FROM @str;
EXECUTE stmt;
SET @str = # create U
CONCAT("CREATE TEMPORARY TABLE ", recursive_table_union, " LIKE ", recursive_table);
PREPARE stmt FROM @str;
EXECUTE stmt;
SET @str = # create T1
CONCAT("CREATE TEMPORARY TABLE ", recursive_table_next, " LIKE ", recursive_table);
PREPARE stmt FROM @str;
EXECUTE stmt;
if max_recursion = 0 then
set max_recursion = 100; # a default to protect the innocent
end if;
recursion: repeat
# add T0 to U (this is always UNION ALL)
SET @str =
CONCAT("INSERT INTO ", recursive_table_union, " SELECT * FROM ", recursive_table);
PREPARE stmt FROM @str;
EXECUTE stmt;
# we are done if max depth reached
set max_recursion = max_recursion - 1;
if not max_recursion then
if show_progress then
select concat("max recursion exceeded");
end if;
leave recursion;
end if;
# fill T1 by applying the recursive SELECT on T0
SET @str =
CONCAT("INSERT INTO ", recursive_table_next, " ", recursive_SELECT);
PREPARE stmt FROM @str;
EXECUTE stmt;
# we are done if no rows in T1
select row_count() into new_rows;
if show_progress then
select concat(new_rows, " new rows found");
end if;
if not new_rows then
leave recursion;
end if;
# Prepare next iteration:
# T1 becomes T0, to be the source of next run of recursive_SELECT,
# T0 is recycled to be T1.
SET @str =
CONCAT("ALTER TABLE ", recursive_table, " RENAME ", recursive_table_tmp);
PREPARE stmt FROM @str;
EXECUTE stmt;
# we use ALTER TABLE RENAME because RENAME TABLE does not support temp tables
SET @str =
CONCAT("ALTER TABLE ", recursive_table_next, " RENAME ", recursive_table);
PREPARE stmt FROM @str;
EXECUTE stmt;
SET @str =
CONCAT("ALTER TABLE ", recursive_table_tmp, " RENAME ", recursive_table_next);
PREPARE stmt FROM @str;
EXECUTE stmt;
# empty T1
SET @str =
CONCAT("TRUNCATE TABLE ", recursive_table_next);
PREPARE stmt FROM @str;
EXECUTE stmt;
until 0 end repeat;
# eliminate T0 and T1
SET @str =
CONCAT("DROP TEMPORARY TABLE ", recursive_table_next, ", ", recursive_table);
PREPARE stmt FROM @str;
EXECUTE stmt;
# Final (output) SELECT uses recursive_table name
SET @str =
CONCAT("ALTER TABLE ", recursive_table_union, " RENAME ", recursive_table);
PREPARE stmt FROM @str;
EXECUTE stmt;
# Run final SELECT on UNION
SET @str = final_SELECT;
PREPARE stmt FROM @str;
EXECUTE stmt;
# No temporary tables may survive:
SET @str =
CONCAT("DROP TEMPORARY TABLE ", recursive_table);
PREPARE stmt FROM @str;
EXECUTE stmt;
# We are done :-)
END|
delimiter ;
</span></pre>
</div>
<br />
<b>Update from 2017:</b><br />
<br />
MySQL now supports recursive CTEs natively, since version 8.0.1 (see <a href="http://mysqlserverteam.com/mysql-8-0-labs-recursive-common-table-expressions-in-mysql-ctes/" target="_blank">here</a>).<br />
<br />
However, while working on the development of this feature, I discovered something which I didn't know back when I wrote this blog in 2013: that GROUP BY is not allowed in the recursive member, neither in the SQL Standard nor in other main DBMSs. So I have not allowed it in MySQL either. That means that my proposed query using WITH RECURSIVE, above, has to be modified: aggregation has to be done outside of the CTE's generation. So here is a modified query: starting with all employees of the company, the first iteration connects them to their boss, producing rows of the form "employee X is managed by Y" (X and Y are IDs); the second iteration connects that row to the boss of Y, producing rows of the form "employee X is indirectly managed by Z"; the third iteration connects it to the boss of Z: "employee X is indirectly managed by T". When the CTE is done, by reading it we can find the list of all people who are indirectly managed by Z (for example), and, with grouping, produce a group of Z and its count of reports; that's the job of the second CTE, which is non-recursive (and can thus use aggregation). Finally, we join that with the table of employees to get names instead of IDs, for clarity. We use a left join so we get rows for non-managers too, and COALESCE for those get a count of reports of 0 instead of NULL (easier to read). All in all:<br />
<div style="background: none repeat scroll 0% 0% black; color: white; height: 100%; overflow: auto; width: 100%;">
<pre><span style="font-size: small;">
WITH RECURSIVE
EMPLOYEES_EXTENDED
AS
(
SELECT ID, NAME, MANAGER_ID
FROM EMPLOYEES
UNION ALL
SELECT E.ID, E.NAME, M.MANAGER_ID
FROM EMPLOYEES M JOIN EMPLOYEES_EXTENDED E ON M.ID=E.MANAGER_ID
),
REPORTS_COUNT (ID, REPORTS)
AS
( SELECT MANAGER_ID, COUNT(*) FROM EMPLOYEES_EXTENDED GROUP BY MANAGER_ID )
SELECT EMPLOYEES.*, COALESCE(REPORTS,0) FROM EMPLOYEES LEFT JOIN REPORTS_COUNT
ON EMPLOYEES.ID = REPORTS_COUNT.ID;
</span></pre>
</div>
and here is the result:
<br />
<div style="background: none repeat scroll 0% 0% black; color: white; height: 100%; overflow: auto; width: 100%;">
<pre><span style="font-size: small;">
+------+---------+------------+---------------------+
| ID | NAME | MANAGER_ID | COALESCE(REPORTS,0) |
+------+---------+------------+---------------------+
| 29 | Pedro | 198 | 2 |
| 72 | Pierre | 29 | 0 |
| 198 | John | 333 | 3 |
| 333 | Yasmina | NULL | 5 |
| 692 | Tarek | 333 | 0 |
| 4610 | Sarah | 29 | 0 |
+------+---------+------------+---------------------+
6 rows in set (0.04 sec)</span></pre>
</div>
And here is the equivalent using WITH_EMULATOR if your MySQL is older than 8.0:<br />
<br />
<div style="background: none repeat scroll 0% 0% black; color: white; height: 100%; overflow: auto; width: 100%;">
<pre><span style="font-size: small;">
CALL WITH_EMULATOR(
"EMPLOYEES_EXTENDED",
"SELECT ID, NAME, MANAGER_ID FROM EMPLOYEES",
"SELECT E.ID, E.NAME, M.MANAGER_ID
FROM EMPLOYEES M JOIN EMPLOYEES_EXTENDED E ON M.ID=E.MANAGER_ID",
"SELECT EMPLOYEES.*, COALESCE(REPORTS,0) FROM
EMPLOYEES LEFT JOIN
( SELECT MANAGER_ID AS ID, COUNT(*) AS REPORTS FROM
EMPLOYEES_EXTENDED GROUP BY MANAGER_ID ) AS REPORTS_COUNT
ON EMPLOYEES.ID = REPORTS_COUNT.ID",
0,
""
);
</span></pre>
</div>
</div>
Guilhem Bichothttp://www.blogger.com/profile/11253623513851616256noreply@blogger.com35tag:blogger.com,1999:blog-9191231274999121063.post-46924718879678772712013-02-19T08:12:00.002-08:002013-02-19T08:13:41.450-08:00Fixing awkward TIMESTAMP behaviors...There are great features in MySQL 5.6. But not only that. We also tried to correct some old behaviors and limitations which, over the years, have shown to irritate our Community. The behavior of TIMESTAMP columns is one of them.<br />
<br />
My colleague Martin Hansson did most of the work and summarized it well in his <a href="http://optimize-this.blogspot.co.uk/2012/04/datetime-default-now-finally-available.html" target="_blank">blog</a>. Thanks to him, since MySQL 5.6.5, it's possible to declare more than one TIMESTAMP column with the DEFAULT CURRENT_TIMESTAMP or ON UPDATE CURRENT_TIMESTAMP attributes. And it's possible to have DATETIME columns with such attributes. Two limitations lifted!<br />
<br />
But that is not the end of the story. TIMESTAMP was still special. Unlike other datatypes, if not declared with the NULL or NOT NULL attributes, it would automatically get NOT NULL. And the first TIMESTAMP column of the table would automatically get DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP. Many people found these behaviors odd. For them, my colleague Gopal Shankar added a new option to the server, documented <a href="http://dev.mysql.com/doc/refman/5.6/en/server-system-variables.html#sysvar_explicit_defaults_for_timestamp" target="_blank">here</a>. The old behavior, now depreciated, is still available if the option is not used.<br />
<br />
But that is still not the end of the story. "CREATE TABLE ... SELECT" dealt strangely with columns having DEFAULT CURRENT_TIMESTAMP. Just look at this simple example, which shows results with version 5.6.9:
<br />
<div style="background: none repeat scroll 0% 0% black; color: white; height: 100%; overflow: auto; width: 100%;">
<pre><span style="font-size: small;">
CREATE TABLE t1 (
t1a TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
t1b TIMESTAMP DEFAULT '2000-01-01 01:00:00'
);
INSERT INTO t1 VALUES ();
SELECT * FROM t1;
t1a t1b
2013-02-19 18:12:41 2000-01-01 01:00:00
</span></pre>
</div>
The content of t1 is as expected (remember that "()" in INSERT means "insert columns' defaults").<br />
Now let's create a second table, which should have four columns: first, two extra columns t2a and t2b, then two columns filled with values selected from t1:
<br />
<div style="background: none repeat scroll 0% 0% black; color: white; height: 100%; overflow: auto; width: 100%;">
<pre><span style="font-size: small;">
CREATE TABLE t2 (
t2a TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
t2b TIMESTAMP DEFAULT '2000-01-01 02:00:00'
) SELECT * FROM t1;
SHOW CREATE TABLE t2;
Table Create Table
t2 CREATE TABLE `t2` (
`t2a` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
`t2b` timestamp NOT NULL DEFAULT '2000-01-01 02:00:00',
`t1a` timestamp NOT NULL DEFAULT '0000-00-00 00:00:00',
`t1b` timestamp NOT NULL DEFAULT '2000-01-01 01:00:00'
) ENGINE=InnoDB DEFAULT CHARSET=latin1
</span></pre>
</div>
See how t2.t1b inherited the DEFAULT attribute of its source column t1.t1b: <i>DEFAULT '2000-01-01 01:00:00'</i>, as expected, and as the documentation says. But! t2.t1a did NOT inherit the DEFAULT attribute of its source column t1.t1a: it rather got the strange <i>DEFAULT '0000-00-00 00:00:00'</i> (year zero...). That's one first problem: constant defaults are properly inherited, but function defaults are not.<br />
Now let's look at the content of t2:
<br />
<div style="background: none repeat scroll 0% 0% black; color: white; height: 100%; overflow: auto; width: 100%;">
<pre><span style="font-size: small;">
SELECT * FROM t2;
t2a t2b t1a t1b
0000-00-00 00:00:00 2000-01-01 02:00:00 2013-02-19 18:12:41 2000-01-01 01:00:00
</span></pre>
</div>
The last two columns, which have their source in t1, have the same value as their source column, which is correct.<br />
The two first (extra) columns did not have their values specified, so their default should have been inserted. That's what nicely happened for t2b. But not for t2a: year zero again! That's one second problem: an extra column is not filled with its default if it is a function default.<br />
<br />
I grouped those two problems under the name of Bug#16163936, and fixed them. Here are the results in 5.6.10:
<br />
<div style="background: none repeat scroll 0% 0% black; color: white; height: 100%; overflow: auto; width: 100%;">
<pre><span style="font-size: small;">
SHOW CREATE TABLE t2;
Table Create Table
t2 CREATE TABLE `t2` (
`t2a` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
`t2b` timestamp NOT NULL DEFAULT '2000-01-01 02:00:00',
`t1a` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
`t1b` timestamp NOT NULL DEFAULT '2000-01-01 01:00:00'
) ENGINE=InnoDB DEFAULT CHARSET=latin1
SELECT * FROM t2;
t2a t2b t1a t1b
2013-02-19 18:27:39 2000-01-01 02:00:00 2013-02-19 18:27:39 2000-01-01 01:00:00
</span></pre>
</div>
All correct!Guilhem Bichothttp://www.blogger.com/profile/11253623513851616256noreply@blogger.com3tag:blogger.com,1999:blog-9191231274999121063.post-62193886290661209862012-10-12T03:02:00.000-07:002012-10-12T03:04:13.177-07:00Cost-based choice between subquery materialization and EXISTSIn <a href="http://guilhembichot.blogspot.co.uk/2012/04/faster-subqueries-with-materialization.html">a previous post</a>, I had demonstrated how <a href="http://dev.mysql.com/doc/refman/5.6/en/subquery-materialization.html"><i>subquery materialization</i></a>, introduced in MySQL 5.6.5, improves the performance of certain queries, like query Q16 of DBT3. Such improvement was easily explained:<br />
<ul>
<li>Subquery materialization has a high start up cost (it needs to create and fill the temporary table).</li>
<li>But afterwards it has fast lookups (temporary table has a hash index, no duplicates, and is in memory).</li>
<li>In other words, compared to EXISTS, the first evaluation of the IN predicate is slow (high start up cost) and all following evaluations are fast (just a hash lookup).</li>
<li>In the DBT 3 setup, one outer table (named "part") has 200,000 rows, so there are 200,000 evaluations of IN, thus subquery materialization wins over EXISTS because the time it loses in the first evaluation is more than compensated by the many faster following evaluations.</li>
</ul>
However, if there were only few outer rows, then subquery materialization should logically be <i>slower</i> than EXISTS (the compensation would not happen anymore)... MySQL 5.6.5, by blindly always choosing subquery materialization, takes the risk of making certain queries slower. There needs to be a cost-based choice between the two strategies, to pick the best, depending on the situation! That is what I have implemented in MySQL 5.6.7.<br />
<br />
To show it in action, I will use query Q16 again. First I will run it with the normal "part" table which has 200,000 rows. Then I will reduce this table to only 200 rows, and run the query again. Each time, I will run EXPLAIN to see what subquery strategy is chosen by the optimizer. I will also, by tweaking the optimizer_switch variable, force the optimizer to use the other strategy which it didn't like, in order to verify that it is indeed worse.<br />
<br />
For brevity, let me jump directly to the results, obtained with a release build of MySQL 5.6.7 on my machine:<br />
<br />
<table border="1">
<tbody>
<tr>
<td><b>Rows</b> <b>in</b> <b><i>part</i></b></td>
<td><b>Optimizer</b> <b>chooses</b></td>
<td><b>Execution</b> <b>time</b></td>
<td><b>If</b> <b>I</b> <b>force</b> <b>alternative</b></td>
</tr>
<tr>
<td>200,000</td>
<td>Materialization</td>
<td>550 ms</td>
<td>830 ms</td>
</tr>
<tr>
<td>200</td>
<td>EXISTS</td>
<td>1 ms</td>
<td>10 ms </td>
</tr>
</tbody></table>
<br />
We can see that in both cases the optimizer has made the right choice!
Guilhem Bichothttp://www.blogger.com/profile/11253623513851616256noreply@blogger.com4tag:blogger.com,1999:blog-9191231274999121063.post-83421659283582031002012-04-10T08:55:00.010-07:002012-04-12T01:06:28.631-07:00Faster subqueries with materializationIn <a href="http://guilhembichot.blogspot.com/2011/11/understanding-uniquesubquery.html">a previous post</a>, I analyzed how a query of the famous DBT3 benchmark was<br />
optimized by MySQL. It was this query, named "Q16" in the DBT3 jargon:<br />
<div style="background: none repeat scroll 0% 0% black; color: white; height: 100%; overflow: auto; width: 100%;"><pre><span style="font-size: small;">
select
p_brand,
p_type,
p_size,
count(distinct ps_suppkey) as supplier_cnt
from
partsupp,
part
where
p_partkey = ps_partkey
and p_brand <> 'Brand#23'
and p_type not like 'LARGE PLATED%'
and p_size in (43, 1, 25, 5, 35, 12, 42, 40)
and <span style="color: red;">ps_suppkey not in (
select
s_suppkey
from
supplier
where
s_comment like '%Customer%Complaints%'
)</span>
group by
p_brand,
p_type,
p_size
order by
supplier_cnt desc,
p_brand,
p_type,
p_size;
</span></pre></div><br />
Here is a brief recap of conclusions I had drawn:<br />
<ul><li style="text-align: justify;">for this query, MySQL tranforms the IN condition to EXISTS and then<br />
evaluates it with the "unique_subquery" technique, which does an index<br />
lookup into the subquery's table.</li>
<li style="text-align: justify;">IN is evaluated 120,000 times (once per combined row of the outer tables).</li>
<li style="text-align: justify;">The total execution time of query Q16 is 0.65 seconds.</li>
</ul>If you look at the original subquery, before IN becomes EXISTS, you will see that it's <i>not correlated</i>, which means that it does not mention columns of tables of the top query's FROM clause (<i>'partsupp'</i> and '<i>part'</i>). Thus, its resultset is constant throughout execution of the entire top query; here it is:<br />
<br />
<div style="background: none repeat scroll 0% 0% black; color: white; height: 100%; overflow: auto; width: 100%;"><pre><span style="font-size: small;">
mysql> select
-> s_suppkey
-> from
-> supplier
-> where
-> s_comment like '%Customer%Complaints%';
+-----------+
| s_suppkey |
+-----------+
| 358 |
| 2820 |
| 3804 |
| 9504 |
+-----------+
4 rows in set (0.00 sec)
</span></pre></div><br />
The transformation to EXISTS, because it injects equalities like<br />
<i> `partsupp`.`ps_suppkey` = supplier`.`s_suppkey`</i><br />
into the subquery's WHERE clause, makes the subquery correlated: it thus has to be executed 120,000 times, so we do 120,000 times this:<br />
<ul><li>an index lookup in <i>'supplier'</i> (which has 10,000 rows)</li>
<li>a test of the found row(s) against the LIKE condition.</li>
</ul><br />
Intuitively, determining the 4-row resultset once for all, and injecting it into the top query should yield better performance - it is fast to evaluate<br />
<i> ps_suppkey not in (358, 2820, 3804, 9504) .</i><br />
Starting from the just released MySQL 5.6.5, this transformation is automatically done by the Optimizer, and is called <b>subquery materialization</b>. The subquery's resultset is determined once for all and stored into an in-memory temporary table. If the temporary table has only 4 rows as in our example, searching for a match in it can be done with a scan; but if it had more rows, a hash index would help greatly. So a hash index is always created on the temporary table's columns. Last, this index is unique: there is no point in storing duplicates, and they would make the table bigger. After this one-time setup has been completed, each evaluation of IN simply does a hash index lookup into the 4-row temporary table.<br />
<br />
My former colleague Timour Katchaounov started developing this feature years ago, when working for MySQL/Sun. In the last months, after a round of intensive QA, we have fixed some last bugs in it, in preparation for releasing in 5.6.5. But the feature still had one limitation: it was applicable only if IN was placed at certain positions in the query. For example, it couldn't be used with NOT IN. And query Q16 has a NOT IN! so the Optimizer could not apply subquery materialization to it, and was thus stuck with using EXISTS. Sad!<br />
<br />
Why it could not work with NOT IN, is not very easy to explain. It has to do with NULL values, because they sometimes prevent using the hash index. To give an idea, look at this:<br />
<i>(NULL, 1) NOT IN (SELECT ...)</i><br />
Per the SQL standard, if the subquery's resultset contains at least one row of the form (x,1) where x is any number (or NULL), then the IN condition is neither TRUE, nor FALSE, it is <b>UNKNOWN</b>. So is the NOT IN condition, because it is the negation of IN, and NOT(UNKNOWN) is UNKNOWN.<br />
Here are example of such rows: (NULL,1), (421,1), (236,1), (5329,1), ad infinitam.<br />
We can see that those rows will not be found by a lookup in the hash index: this index is defined on the two columns, it has a usual "prefix-only" behaviour, which means that it cannot be used to search for "any value in first column, then 1 in second column". As long as the sentence starts with "any value in first column" a table scan is necessary; we should read each row of the temporary table and compare its second column with 1 until we find a matching row. And that:<br />
<ul><li>will drag subquery materialization's performance down</li>
<li>will drag subquery materialization's code complexity up.</li>
</ul>And I have even not covered all problems here: there can be more than two columns, there can be more than one NULL in the left argument of IN, there can also be NULLs inside the subquery's resultset.<br />
<br />
In some lucky cases, the scan can be avoided, for example:<br />
<i> SELECT * FROM table1 WHERE (a,b) IN (SELECT ...)</i><br />
If (a,b) is (NULL,1), the IN will be UNKNOWN or FALSE. It will be UNKNOWN if the subquery's resultset contains one (x,1) as seen above; otherwise it will be FALSE. No matter what, it will not be TRUE, and this is all that WHERE wants to know - (a,b) can thus be rejected without doing a scan.<br />
Now, for<br />
<i> SELECT * FROM table1 WHERE (a,b) NOT IN (SELECT ...)</i><br />
i.e.<br />
<i> SELECT * FROM table1 WHERE NOT ((a,b) IN (SELECT ...))</i><br />
things are different: if (a,b) is (NULL,1), the IN will be UNKNOWN or FALSE, as we said. So NOT IN will be UNKNOWN or TRUE. "Hum, can you be more specific?? I need to know if it's TRUE", will the WHERE evaluation code ask. Then we have to do the scan...<br />
<br />
So now you understand why subquery materialization was restricted to certain placement of IN.<br />
<br />
What I have done recently is to lift this restriction in two simple, however common, cases:<br />
<ol><li>If all outer and inner expressions are not nullable, then no NULL can get in the way, so there is no problem.</li>
<li>If there is only one outer expression (and thus there is only one inner expression), figuring out the correct TRUE/FALSE/UNKNOWN answer is immediate. Understanding why... is left as an exercise to the reader :-)</li>
</ol>Those two cases are independent: as long as one is satisfied, subquery materialization can apply to IN, no matter where it is placed (NOT IN, etc).<br />
<br />
It turned out to be very easy to code this: I had a working prototype in an afternoon.<br />
<br />
Q16 happens to meet the criteria of both cases: columns <i>'ps_suppkey</i>' and <i>'s_suppkey'</i> are declared NOT NULL (first case), and the subquery has only one outer and one inner expression (second case).<br />
<br />
So nowadays MySQL can, and does, use subquery materialization for query Q16; thanks to it, the execution time is down from 0.65 seconds to 0.47 seconds, which is a 25% improvement!<br />
<br />
The new technique is visible in EXPLAIN. I want to first show how EXPLAIN was with the EXISTS transformation, so I temporarily disable subquery materialization, then run EXPLAIN, enable subquery materialization, run EXPLAIN:<br />
<br />
<div style="background: none repeat scroll 0% 0% black; color: white; height: 100%; overflow: auto; width: 100%;"><pre><span style="font-size: small;">
mysql> set optimizer_switch='materialization=off';
mysql> explain ...
+----+--------------------+----------+-----------------+----------------------+--------------+---------+---------------------+--------+----------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+--------------------+----------+-----------------+----------------------+--------------+---------+---------------------+--------+----------------------------------------------+
| 1 | PRIMARY | part | ALL | PRIMARY | NULL | NULL | NULL | 199742 | Using where; Using temporary; Using filesort |
| 1 | PRIMARY | partsupp | ref | PRIMARY,i_ps_partkey | i_ps_partkey | 4 | dbt3.part.p_partkey | 2 | Using where; Using index |
| 2 | <span style="color: red;">DEPENDENT SUBQUERY</span> | supplier | unique_subquery | PRIMARY | PRIMARY | 4 | func | 1 | Using where |
+----+--------------------+----------+-----------------+----------------------+--------------+---------+---------------------+--------+----------------------------------------------+
mysql> set optimizer_switch='materialization=default'; # 'on' would work too
mysql> explain ...
+----+-------------+----------+------+----------------------+--------------+---------+---------------------+--------+----------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+----------+------+----------------------+--------------+---------+---------------------+--------+----------------------------------------------+
| 1 | PRIMARY | part | ALL | PRIMARY | NULL | NULL | NULL | 199742 | Using where; Using temporary; Using filesort |
| 1 | PRIMARY | partsupp | ref | PRIMARY,i_ps_partkey | i_ps_partkey | 4 | dbt3.part.p_partkey | 2 | Using where; Using index |
| 2 | <span style="color: red;">SUBQUERY</span> | supplier | ALL | NULL | NULL | NULL | NULL | 10113 | Using where |
+----+-------------+----------+------+----------------------+--------------+---------+---------------------+--------+----------------------------------------------+
</span></pre></div>If you compare, the big difference is that the third line says SUBQUERY and not DEPENDENT SUBQUERY anymore. DEPENDENT SUBQUERY means that it has be executed once per row of the top query. SUBQUERY means that it is executed only once.<br />
<a href="http://glebshchepa.blogspot.com/2012/04/optimizer-new-explain-formatjson.html">EXPLAIN FORMAT=JSON</a>, another new feature in MySQL 5.6.5, shows more details of materialization:<br />
<div style="background: none repeat scroll 0% 0% black; color: white; height: 100%; overflow: auto; width: 100%;"><pre><span style="font-size: small;">
{
"query_block": {
"select_id": 1,
"ordering_operation": {
"using_temporary_table": true,
"using_filesort": true,
"grouping_operation": {
"using_filesort": true,
"nested_loop": [
{
"table": {
"table_name": "part",
"access_type": "ALL",
"possible_keys": [
"PRIMARY"
],
"rows": 199742,
"filtered": 100,
"attached_condition": "((`dbt3`.`part`.`p_brand` <> 'Brand#23') and (not((`dbt3`.`part`.`p_type` like 'LARGE PLATED%'))) and (`dbt3`.`part`.`p_size` in (43,1,25,5,35,12,42,40)))"
}
},
{
"table": {
"table_name": "partsupp",
"access_type": "ref",
"possible_keys": [
"PRIMARY",
"i_ps_partkey"
],
"key": "i_ps_partkey",
"key_length": "4",
"ref": [
"dbt3.part.p_partkey"
],
"rows": 2,
"filtered": 100,
"using_index": true,
"attached_condition": "(not(< in_optimizer >(`dbt3`.`partsupp`.`ps_suppkey`,`dbt3`.`partsupp`.`ps_suppkey` in ( <span style="color: red;">< materialize ></span> (select `dbt3`.`supplier`.`s_suppkey` from `dbt3`.`supplier` where (`dbt3`.`supplier`.`s_comment` like '%Customer%Complaints%') ), <span style="color: red;">< primary_index_lookup ></span>(`dbt3`.`partsupp`.`ps_suppkey` in <span style="color: red;">< temporary table ></span> on distinct_key where ((`dbt3`.`partsupp`.`ps_suppkey` = `materialized subselect`.`s_suppkey`)))))))",
"attached_subqueries": [
{
"using_temporary_table": true,
"dependent": false,
"cacheable": true,
"table": {
"access_type": "eq_ref",
"key": "< auto_key >",
"rows": 1
},
"query_block": {
"select_id": 2,
"table": {
"table_name": "supplier",
"access_type": "ALL",
"rows": 10113,
"filtered": 100,
"attached_condition": "(`dbt3`.`supplier`.`s_comment` like '%Customer%Complaints%')"
}
}
}
]
}
}
]
}
}
}
}
</span></pre></div>Don't forget to scroll the box above to the right, because lines are long. This shows that:<br />
<ol><li>For the top query, we read a row from '<i>part</i>', then one row from '<i>partsupp</i>', then execute the subquery.</li>
<li>the very first execution of the subquery materializes (<i><materialize></i>) <i>select `dbt3`.`supplier`.`s_suppkey` from `dbt3`.`supplier` where (`dbt3`.`supplier`.`s_comment` like '%Customer%Complaints%')</i> into a temporary table</li>
<li>Each subquery execution does a lookup on the primary key of this temporary table (<i><primary_index_lookup> ... in <temporary table></i>)</li>
<li>Going further down, we see how the temporary table will be filled: it will be the resultset of a table scan (<i>"access_type": "ALL"</i>) of '<i>supplier</i>' with a filtering LIKE condition.</li>
</ol>More details on the feature's usage can be found in the <a href="http://dev.mysql.com/doc/refman/5.6/en/subquery-materialization.html">manual</a>.<br />
<br />
This is the end of this post. I hope that it puts in good light the work we have put into 5.6. There are many other Optimizer features in this version, like EXPLAIN FORMAT=JSON and others; they are described in <a href="http://mysqloptimizerteam.blogspot.co.uk/">my colleagues' blog posts.</a>Guilhem Bichothttp://www.blogger.com/profile/11253623513851616256noreply@blogger.com5tag:blogger.com,1999:blog-9191231274999121063.post-64427201263031342982012-02-09T07:51:00.000-08:002012-02-09T08:37:22.284-08:00Optimizer tracing used by others!In a previous post, I had explained <a href="http://guilhembichot.blogspot.com/2011/09/optimizer-tracing-how-to-configure-it.html">how to use MySQL's optimizer tracing</a>, a new feature which appeared in MySQL 5.6.3.<br />
<br />
As a developer, it feels really good to see others adopt my work and make something useful out of it! My colleague Dimitri Kravtchuk, who is one of our top Benchmarking experts, has written a blog post where he shows how the optimizer tracing has helped him to figure out why, under load, once in a while and randomly, a query performed badly. His investigation technique may be reusable by other people, so I encourage you to read more about it, <a href="http://dimitrik.free.fr/blog/archives/2012/01/mysql-performance-overhead-of-optimizer-tracing-in-mysql-56.html">here</a>.Guilhem Bichothttp://www.blogger.com/profile/11253623513851616256noreply@blogger.com4tag:blogger.com,1999:blog-9191231274999121063.post-43772938173663854112011-11-29T06:37:00.000-08:002011-11-30T03:45:40.616-08:00Understanding the unique_subquery optimizationIf you use the EXPLAIN SELECT statement to see how your subqueries are treated by MySQL, you may sometimes meet the "unique_subquery" optimization. Here is how the <a href="http://dev.mysql.com/doc/refman/5.5/en/explain-output.html">manual</a> describes it:<br />
<blockquote class="tr_bq">"<i>unique_subquery</i>: this type replaces <i>ref</i> for some IN subqueries of the following form: <i>value IN (SELECT primary_key FROM single_table WHERE some_expr); </i>unique_subquery is just an index lookup function that replaces the subquery completely for better efficiency".</blockquote>Few weeks ago, while I was reviewing a patch fixing a <a href="http://bazaar.launchpad.net/%7Emysql/mysql-server/trunk/revision/jorgen.loland@oracle.com-20110929124732-t0s1x214cu3nkngu">bug</a> in unique_subquery, I got a "simplification" pulsion. I told myself that:<br />
<ul><li> unique_subquery is an optimization for a special case of simple subqueries (single inner table, using index, no aggregates);</li>
<li>we have a more general system around, used for more complex subqueries, naturally capable of handling simple ones too if we wanted;</li>
<li>this general system does not have the bug in question...</li>
</ul>Then I wondered: what if we removed the unique_subquery optimization, and let the general system handle this simple subquery? This would certainly simplify code, and thus maintainance...But before removing it, of course, we should check whether unique_subquery brings a significant performance benefit. <br />
<br />
So today I'm testing unique_subquery against the DBT3 benchmark. I grab a copy of MySQL 5.6.3, and focus on the sixteenth query of DBT3, which contains a subquery (in red) suitable for handling by unique_subquery:<br />
<br />
<div style="background: none repeat scroll 0% 0% black; color: white; height: 100%; overflow: auto; width: 100%;"><pre><span style="font-size: small;">
select
p_brand,
p_type,
p_size,
count(distinct ps_suppkey) as supplier_cnt
from
partsupp,
part
where
p_partkey = ps_partkey
and p_brand <> 'Brand#23'
and p_type not like 'LARGE PLATED%'
and p_size in (43, 1, 25, 5, 35, 12, 42, 40)
and <span style="color: red;">ps_suppkey not in (
select
s_suppkey
from
supplier
where
s_comment like '%Customer%Complaints%'
)</span>
group by
p_brand,
p_type,
p_size
order by
supplier_cnt desc,
p_brand,
p_type,
p_size;
</span></pre></div><br />
This query executes in 0.65 seconds on my Linux box, and EXPLAIN is:<br />
<div style="background: none repeat scroll 0% 0% black; color: white; height: 100%; overflow: auto; width: 100%;"><pre><span style="font-size: small;">
+----+--------------------+----------+-----------------+----------------------+--------------+---------+---------------------+--------+----------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+--------------------+----------+-----------------+----------------------+--------------+---------+---------------------+--------+----------------------------------------------+
| 1 | PRIMARY | part | ALL | PRIMARY | NULL | NULL | NULL | 199498 | Using where; Using temporary; Using filesort |
| 1 | PRIMARY | partsupp | ref | PRIMARY,i_ps_partkey | i_ps_partkey | 4 | dbt3.part.p_partkey | 2 | Using where; Using index |
| 2 | DEPENDENT SUBQUERY | supplier | <span style="color: red;">unique_subquery</span> | PRIMARY | PRIMARY | 4 | func | 1 | Using where |
+----+--------------------+----------+-----------------+----------------------+--------------+---------+---------------------+--------+----------------------------------------------+
</span></pre></div><br />
When I disable unique_subquery (by modifying MySQL's C++ code), EXPLAIN becomes:<br />
<div style="background: none repeat scroll 0% 0% black; color: white; height: 100%; overflow: auto; width: 100%;"><pre><span style="font-size: small;">
+----+--------------------+----------+--------+----------------------+--------------+---------+---------------------+--------+----------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+--------------------+----------+--------+----------------------+--------------+---------+---------------------+--------+----------------------------------------------+
| 1 | PRIMARY | part | ALL | PRIMARY | NULL | NULL | NULL | 199498 | Using where; Using temporary; Using filesort |
| 1 | PRIMARY | partsupp | ref | PRIMARY,i_ps_partkey | i_ps_partkey | 4 | dbt3.part.p_partkey | 2 | Using where; Using index |
| 2 | DEPENDENT SUBQUERY | supplier | <span style="color: red;">eq_ref</span> | PRIMARY | PRIMARY | 4 | func | 1 | Using where |
+----+--------------------+----------+--------+----------------------+--------------+---------+---------------------+--------+----------------------------------------------+
</span></pre></div><br />
The only change, as expected, is that "unique_subquery" becomes "eq_ref". The used index is the same (the primary key of the "supplier" table). The optimizer has the same notion of unicity: "unique_subquery" and "eq_ref" both denote that a single lookup is needed, as the index is UNIQUE. Same index, same number of lookups: execution could well be as fast with "eq_ref" as it was with "unique_subquery".<br />
But... no. Query now executes in 0.80 seconds. <i>23% slower</i> than with unique_subquery!<br />
<br />
Finer-grained timing shows that the extra 0.15 seconds are indeed lost in the subquery evaluation code.<br />
<br />
To understand this, let's follow the execution in detail, based on EXPLAIN output above.<br />
<ul><li>First line of EXPLAIN output: we do a table scan on the "part" table ("type=ALL" means "table scan") . The "rows" column of EXPLAIN suggests that we are going to have 199,498 rows of "part".</li>
<li>Second line of EXPLAIN output: for each row from the "part" table, we do an index lookup ("ref") into the "i_ps_partkey" index of the "partsupp" table; apparently such lookup will find two rows ("rows=2").</li>
<li>At this point, we have a row made of needed columns of "part" and of "partsupp". An upper estimate of the number of those rows is 199,498 multiplied by 2: 400,000. Actually, the real number is around 120,000 (there has been filtering going on, as the "Using where" indicates).</li>
<li>Then we evaluate the <i>WHERE</i> clause and thus the "<i>NOT IN (subquery)</i>" predicate (the "DEPENDENT SUBQUERY"). 120,000 evaluations of such predicate. And that's where the difference is.</li>
</ul>EXPLAIN EXTENDED and then SHOW WARNINGS show how the predicate<br />
looks like. Let's start with the case where unique_subquery is disabled:<br />
<br />
<div style="background: none repeat scroll 0% 0% black; color: white; height: 100%; overflow: auto; width: 100%;"><pre><span style="font-size: small;">
/* select#1 */ select `dbt3`.`part`.`p_brand` AS `p_brand`,`dbt3`.`part`.`p_type` AS `p_type`,`dbt3`.`part`.`p_size` AS `p_size`,count(distinct `dbt3`.`partsupp`.`ps_suppkey`) AS `supplier_cnt` from `dbt3`.`partsupp` join `dbt3`.`part` where ((`dbt3`.`partsupp`.`ps_partkey` = `dbt3`.`part`.`p_partkey`) and (`dbt3`.`part`.`p_brand` <> 'Brand#23') and (not((`dbt3`.`part`.`p_type` like 'LARGE PLATED%'))) and (`dbt3`.`part`.`p_size` in (43,1,25,5,35,12,42,40)) and (not<span style="color: red;">(<in_optimizer>(`dbt3`.`partsupp`.`ps_suppkey`,<exists>(/* select#2 */ select 1 from `dbt3`.`supplier` where ((`dbt3`.`supplier`.`s_comment` like '%Customer%Complaints%') and (<cache>(`dbt3`.`partsupp`.`ps_suppkey`) = `dbt3`.`supplier`.`s_suppkey`)))))</span>)) group by `dbt3`.`part`.`p_brand`,`dbt3`.`part`.`p_type`,`dbt3`.`part`.`p_size` order by count(distinct `dbt3`.`partsupp`.`ps_suppkey`) desc,`dbt3`.`part`.`p_brand`,`dbt3`.`part`.`p_type`,`dbt3`.`part`.`p_size`
</span></pre></div><br />
Above, the part in red says that<br />
<div style="background: none repeat scroll 0% 0% black; color: white; height: 100%; overflow: auto; width: 100%;"><pre><span style="font-size: small;">
ps_suppkey not in (
select
s_suppkey
from
supplier
where
s_comment like '%Customer%Complaints%'
)
</span></pre></div>has been transformed from "IN(non correlated subquery)" to "EXISTS(correlated subquery)", yielding this:<br />
<br />
<div style="background: none repeat scroll 0% 0% black; color: white; height: 100%; overflow: auto; width: 100%;"><pre><span style="font-size: small;">
not exists (
select
1
from
supplier
where
s_comment like '%Customer%Complaints%'
AND s_suppkey = ps_suppkey
)
</span></pre></div><br />
or, more exactly (leaving out the NOT operator, for brevity):<br />
<br />
<div style="background: none repeat scroll 0% 0% black; color: white; height: 100%; overflow: auto; width: 100%;"><pre><span style="font-size: small;">
<exists>(/* select#2 */ select 1 from `dbt3`.`supplier`
where ((`dbt3`.`supplier`.`s_comment` like '%Customer%Complaints%')
and (<cache>(`dbt3`.`partsupp`.`ps_suppkey`) = `dbt3`.`supplier`.`s_suppkey`)))
</span></pre></div><br />
Evaluating this EXISTS() evaluates the new subquery. This means all the subquery evaluation machinery: calls to JOIN::exec(), sub_select(), evaluate_join_record()... Sure, deep down it does an index lookup like unique_subquery does, but all those function calls have a cost, and so has all the logic which is lying around ready to handle any complexity in the subquery, as this is generic subquery evaluation code ("if group_by do this", "if order_by do this", "if left_join do this": none of those if()s are entered, but deciding to enter them or not has a cost). Plus some initialization code. Plus some de-initialization code. This overhead, repeated 120,000 times, amounts to 0.15 seconds...<br />
<br />
Now, EXPLAIN EXTENDED when unique_subquery is enabled:<br />
<br />
<div style="background: none repeat scroll 0% 0% black; color: white; height: 100%; overflow: auto; width: 100%;"><pre><span style="font-size: small;">
/* select#1 */ select `dbt3`.`part`.`p_brand` AS `p_brand`,`dbt3`.`part`.`p_type` AS `p_type`,`dbt3`.`part`.`p_size` AS `p_size`,count(distinct `dbt3`.`partsupp`.`ps_suppkey`) AS `supplier_cnt` from `dbt3`.`partsupp` join `dbt3`.`part` where ((`dbt3`.`partsupp`.`ps_partkey` = `dbt3`.`part`.`p_partkey`) and (`dbt3`.`part`.`p_brand` <> 'Brand#23') and (not((`dbt3`.`part`.`p_type` like 'LARGE PLATED%'))) and (`dbt3`.`part`.`p_size` in (43,1,25,5,35,12,42,40)) and (not<span style="color: red;">(<in_optimizer>(`dbt3`.`partsupp`.`ps_suppkey`,<exists>(<primary_index_lookup>(<cache>(`dbt3`.`partsupp`.`ps_suppkey`) in supplier on PRIMARY where ((`dbt3`.`supplier`.`s_comment` like '%Customer%Complaints%') and (<cache>(`dbt3`.`partsupp`.`ps_suppkey`) = `dbt3`.`supplier`.`s_suppkey`))))))</span>)) group by `dbt3`.`part`.`p_brand`,`dbt3`.`part`.`p_type`,`dbt3`.`part`.`p_size` order by count(distinct `dbt3`.`partsupp`.`ps_suppkey`) desc,`dbt3`.`part`.`p_brand`,`dbt3`.`part`.`p_type`,`dbt3`.`part`.`p_size`
</span></pre></div><br />
The optimizer has first done the same transformation (IN to EXISTS) as we saw before, then has done one more transformation, and EXISTS has become, as written in red above:<br />
<br />
<div style="background: none repeat scroll 0% 0% black; color: white; height: 100%; overflow: auto; width: 100%;"><pre><span style="font-size: small;">
<exists>(<primary_index_lookup>(<cache>(`dbt3`.`partsupp`.`ps_suppkey`)
in supplier on PRIMARY
where ((`dbt3`.`supplier`.`s_comment` like '%Customer%Complaints%')
and (<cache>(`dbt3`.`partsupp`.`ps_suppkey`) = `dbt3`.`supplier`.`s_suppkey`))))
</span></pre></div><br />
which is <i>directly</i> an index lookup ("<primary_index_lookup>"), followed by an additional WHERE clause. So the overhead of full-blown subquery evaluation is<br />
avoided. And this overhead is not neglectable, compared to the index lookup (assuming the relevant index pages are already in memory).<br />
<br />
So the conclusion of my experiment is that unique_subquery is worth having. I'll have to direct simplification pulsions to some other code!<br />
<br />
Note that there also exists a similar "index_subquery" optimization applying to non-unique indices. And it's worth having, for the same reasons.Guilhem Bichothttp://www.blogger.com/profile/11253623513851616256noreply@blogger.com1tag:blogger.com,1999:blog-9191231274999121063.post-11392268073389798712011-10-04T15:40:00.000-07:002011-10-04T06:40:14.407-07:00Optimizer tracing: how to configure itIn this <a href="http://jorgenloland.blogspot.com/2011/10/optimizer-tracing-query-execution-plan.html">blog post</a>, my colleague Jørgen Løland described a new feature of MySQL 5.6: <i>Optimizer Tracing</i>. I recommend reading his article, as it presents this new feature in a simple, easy-to-read manner.<br />
<br />
The Optimizer Tracing feature can help understanding what the Optimizer is doing; it is available since milestone 5.6.3, announced October 3rd at Oracle Open World (here is the <a href="http://dev.mysql.com/doc/refman/5.6/en/news-5-6-3.html">changelog</a>). It's good to see it mature now; I remember that Sergey Petrunia did the first prototype back in March 2009!<br />
<br />
Today I will be giving some must-have tips related to handling <b>big</b> traces.<br />
<br />
First thing to know, a trace lives in main memory (internally it is allocated on the <i>heap</i> or <i>free store</i> of the MySQL Server). An SQL statement which gives the optimizer a lot of work (for example, by joining many tables) will generate a large trace. Up to gigabytes in some pathological cases! To avoid hogging memory then, a trace will never grow beyond the value of the <i>@@optimizer_trace_max_mem_size</i> session variable. Which has a default of 16 kilobytes; yes, it's a low value, to protect the innocent. I often raise it in my session, to 1 megabyte, which is enough for most queries in my daily work:<br />
<br />
<div style="background: none repeat scroll 0% 0% black; color: white; height: 100%; overflow: auto; width: 100%;"><pre><span style="font-size: small;">mysql> SET OPTIMIZER_TRACE_MAX_MEM_SIZE=1000000;</span></pre></div><br />
If a trace was truncated because exceeding this limit, the column <i>INFORMATION_SCHEMA.OPTIMIZER_TRACE.MISSING_BYTES_BEYOND_MAX_MEM_SIZE </i>shows a non-zero value: the number of bytes which could not be added to the trace.<br />
<br />
When I read a trace I like to scroll up and down in it, search for some optimization phase (that means searching for some keyword in the trace); to do this, I set, in the "<i>mysql</i>" command-line client, the pager to "<i>less</i>" (I'm using Unix):<br />
<br />
<div style="background: none repeat scroll 0% 0% black; color: white; height: 100%; overflow: auto; width: 100%;"><pre><span style="font-size: small;">mysql> pager less;</span></pre></div><br />
This is very useful. I have all "<i>less</i>" commands under hand: can save to a file, can search for a regular expression match, search forward or backward... One last benefit is that when I have finished reading, quitting "<i>less</i>" makes the trace go away from the terminal, it does not linger on my screen forever, does not occupy the terminal's display history...<br />
<br />
When there are many tables to join, as I said above the trace can grow a lot, because the Optimizer evaluates many possible join orders (using an algorithm known as "<i>greedy search"</i>): each evaluated join order is mentioned in the trace. Greedy search ends up being the greatest part of the trace. What if I want to see the trace of what happens in the Optimizer <i>after</i> greedy search is complete? If I set <i>@@optimizer_trace_max_mem_size</i> to a low value, it will trim greedy search and what follows. If I set <i>@@optimizer_trace_max_mem_size</i> to a high value, to see what I want, greedy search will be traced which will possibly exceed the amount of memory I can afford on this machine... It would be nice if I could tell the system: "please do not trace this and that". In my case, it would be "please do not trace greedy search".<br />
<br />
As an example, let's consider twenty tables, t1...t20, all similar to this one:<br />
<br />
<div style="background: none repeat scroll 0% 0% black; color: white; height: 100%; overflow: auto; width: 100%;"><pre><span style="font-size: small;">mysql> CREATE TABLE t1 (a INT, INDEX(a)) ENGINE=MYISAM;
mysql> INSERT INTO t1 VALUES(x),(y); # x and y being some integers
</span></pre></div><br />
and let's run this query, after turning tracing on:<br />
<div style="background: none repeat scroll 0% 0% black; color: white; height: 100%; overflow: auto; width: 100%;"><pre><span style="font-size: small;">mysql> SET OPTIMIZER_TRACE="ENABLED=ON,END_MARKER=ON";</span></pre><pre><span style="font-size: small;">mysql> EXPLAIN SELECT 1 FROM t1 JOIN t2 ON t2.a=t1.a JOIN t3 ON t3.a=t2.a JOIN t4 ON t4.a=t3.a JOIN t5 ON t5.a=t4.a JOIN t6 ON t6.a=t5.a JOIN t7 ON t7.a=t6.a JOIN t8 ON t8.a=t7.a JOIN t9 ON t9.a=t8.a JOIN t10 ON t10.a=t9.a JOIN t11 ON t11.a=t10.a JOIN t12 ON t12.a=t11.a JOIN t13 ON t13.a=t12.a JOIN t14 ON t14.a=t13.a JOIN t15 ON t15.a=t14.a JOIN t16 ON t16.a=t15.a JOIN t17 ON t17.a=t16.a JOIN t18 ON t18.a=t17.a JOIN t19 ON t19.a=t18.a JOIN t20 ON t20.a=t19.a;
+----+-------------+-------+-------+---------------+------+---------+------------+------+--------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+-------+---------------+------+---------+------------+------+--------------------------+
| 1 | SIMPLE | t1 | index | a | a | 5 | NULL | 2 | Using where; Using index |
| 1 | SIMPLE | t2 | ref | a | a | 5 | test.t1.a | 1 | Using where; Using index |
| 1 | SIMPLE | t3 | ref | a | a | 5 | test.t2.a | 1 | Using where; Using index |
| 1 | SIMPLE | t4 | ref | a | a | 5 | test.t1.a | 1 | Using index |
| 1 | SIMPLE | t5 | ref | a | a | 5 | test.t1.a | 1 | Using index |
| 1 | SIMPLE | t6 | ref | a | a | 5 | test.t1.a | 1 | Using index |
| 1 | SIMPLE | t7 | ref | a | a | 5 | test.t1.a | 1 | Using index |
| 1 | SIMPLE | t8 | ref | a | a | 5 | test.t1.a | 1 | Using index |
| 1 | SIMPLE | t9 | ref | a | a | 5 | test.t1.a | 1 | Using where; Using index |
| 1 | SIMPLE | t10 | ref | a | a | 5 | test.t9.a | 1 | Using where; Using index |
| 1 | SIMPLE | t11 | ref | a | a | 5 | test.t1.a | 1 | Using where; Using index |
| 1 | SIMPLE | t12 | ref | a | a | 5 | test.t11.a | 1 | Using where; Using index |
| 1 | SIMPLE | t13 | ref | a | a | 5 | test.t12.a | 1 | Using where; Using index |
| 1 | SIMPLE | t14 | ref | a | a | 5 | test.t12.a | 1 | Using where; Using index |
| 1 | SIMPLE | t15 | ref | a | a | 5 | test.t12.a | 1 | Using where; Using index |
| 1 | SIMPLE | t16 | ref | a | a | 5 | test.t12.a | 1 | Using where; Using index |
| 1 | SIMPLE | t17 | ref | a | a | 5 | test.t12.a | 1 | Using where; Using index |
| 1 | SIMPLE | t18 | ref | a | a | 5 | test.t11.a | 1 | Using where; Using index |
| 1 | SIMPLE | t19 | ref | a | a | 5 | test.t12.a | 1 | Using where; Using index |
| 1 | SIMPLE | t20 | ref | a | a | 5 | test.t11.a | 1 | Using where; Using index |
+----+-------------+-------+-------+---------------+------+---------+------------+------+--------------------------+
</span></pre></div>As optimizer developer, it catches my eye that some tables have "<i>Using where</i>" and others don't. "<i>Using where</i>" tells me that a condition is evaluated after fetching a row from the table, but does not tell me <b>what</b> condition. To know more, I will look at the trace. But I'm not interested in the trace of greedy search, which likely accounts, in this 20-table example, for most of the total trace, and would just make my reading less comfortable, or even hog memory, if I had more than those 20 tables or more indices on them, or more complex conditions. So I do:<br />
<br />
<div style="background: none repeat scroll 0% 0% black; color: white; height: 100%; overflow: auto; width: 100%;"><pre><span style="font-size: small;">mysql> SET OPTIMIZER_TRACE_FEATURES="GREEDY_SEARCH=OFF";</span></pre></div>and run the <i>EXPLAIN</i> again. To show the resulting trace, I'm posting a screenshot of how it is displaid by the <i>JsonView</i> Firefox add-on. Using this add-on requires taking two precautions:<br />
<ul><li>turning off the <i>end_marker</i> flag <i>of @@optimizer_trace </i>(this flag is good for human-readability but is not JSON-compliant)</li>
<li> sending the trace to a file without escaping of newlines (which is why I use <i>INTO DUMPFILE</i> instead of <i>INTO OUTFILE</i>):</li>
</ul><br />
<div style="background: none repeat scroll 0% 0% black; color: white; height: 100%; overflow: auto; width: 100%;"><pre><span style="font-size: small;">mysql> SET OPTIMIZER_TRACE="END_MARKER=OFF";
mysql> SELECT TRACE INTO DUMPFILE "/tmp/trace.json" FROM INFORMATION_SCHEMA.OPTIMIZER_TRACE;</span></pre></div><br />
Here's the trace now in Firefox:<br />
<div class="separator" style="clear: both; text-align: center;"></div><div class="separator" style="clear: both; text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjgaF5eq7kd1IxaL8sLArt7iIEcJ-zIYhQcsy_1up2uWyAyoc5giNnypvCDTFR68O9kmg2YlfDyL7E0blw_aATHFCKpikpAM8D9xSSYGkIM6kAYO0Cpnz6ydDtRjlxm3emyV9fLKKxLTQbS/s1600/capt2.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="640" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjgaF5eq7kd1IxaL8sLArt7iIEcJ-zIYhQcsy_1up2uWyAyoc5giNnypvCDTFR68O9kmg2YlfDyL7E0blw_aATHFCKpikpAM8D9xSSYGkIM6kAYO0Cpnz6ydDtRjlxm3emyV9fLKKxLTQbS/s640/capt2.png" width="491" /></a></div><br />
We see how <i>greedy_search=off</i> has eliminated the trace of greedy search (<i>"considered_execution_plans"</i>), replacing it with just an ellipsis ("<i>...</i>")! Then there is what I wanted to see: the part after greedy search, named "<i>attached_conditions_summary"</i>, which describes what condition is behind each "<i>Using where"</i> in <i>EXPLAIN.</i> There are equality conditions of course. Some coming directly from the <i>"JOIN ON</i>" conditions. Some deduced by <i>equality propagation</i> (if <i>t1.a=t2.a</i> and <i>t2.a=t3.a</i> then <i>t1.a=t3.a</i>, a new condition which we see is attached to t3). There are also <i>"IS NOT NULL</i>" conditions; indeed, an equality condition of the form <i>t1.a=t2.a</i> allows us to deduce that both <i>t1.a </i>and<i> t2.a </i>are not NULL, so if rows of t1 are retrieved first they can be filtered with this deduced condition, known as <a href="http://forge.mysql.com/wiki/MySQL_Internals_Optimizer">early NULL filtering</a>.<br />
<br />
There are other flags in <i>@@optimizer_trace_features</i>; we added a flag for each feature which we think can make the trace grow unreasonably large:<br />
<ul><li>greedy search</li>
<li>repeated execution of subqueries (one execution per row) </li>
<li>repeated range optimization (known as <i>"range checked for each record"</i> in EXPLAIN)</li>
<li>range optimization in general.</li>
</ul>I think I have shown enough for today. A positive note to finish: in the Optimizer team, this tracing has already helped us to debug some problems, so has started to fulfill its goal!<br />
<br />
For the curious: the Optimizer Trace is covered in full detail, including the above, and more, in a <a href="http://forge.mysql.com/wiki/MySQL_Internals_Optimizer_tracing">chapter of the "MySQL Internals" manual</a>.Guilhem Bichothttp://www.blogger.com/profile/11253623513851616256noreply@blogger.com3tag:blogger.com,1999:blog-9191231274999121063.post-91924086778619807842011-10-03T06:52:00.000-07:002011-10-03T06:52:37.283-07:00Kick-offHi all!<br />
Here is a new blog; I'll post here some thoughts, tutorials... all related to my job - software developer at Oracle Corporation, working on <a href="http://www.mysql.com/">MySQL</a>. It will soon be nine years since I joined MySQL AB. After working on interesting topics: Replication, Online Backup, the Maria storage engine, each of them having taught me something, I'm now full-time on the Optimizer, and guess what - it is an interesting topic and it teaches me something!Guilhem Bichothttp://www.blogger.com/profile/11253623513851616256noreply@blogger.com0