Support for LATERAL derived tables added to MySQL 8.0.14

janvier 31, 2019

Hello all. Last months have been busy for me with various tasks, and one of them is finally done and released in MySQL 8.0.14 : LATERAL derived tables. All details are in my post on mysqlserverteam. I think it is a useful addition to our SQL capabilities. Enjoy!

Commentaires

Vasiliy Lyk'yanchikov2 avril 2019 à 22:10
Hello, Guilhem.
I wrote article How to select N rows for each group. One of solutions uses LATERAL.
However, the performance was lower than expected - mysql do not uses unique index, instead of that it reads all rows from group and does filesort. Example

CREATE TABLE `posts` (
`post_id` int(11) NOT NULL AUTO_INCREMENT,
`user_id` int(11) NOT NULL,
`date_added` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
`post_text` text NOT NULL,
PRIMARY KEY (`post_id`),
UNIQUE KEY `user_id` (`user_id`,`date_added`)
) ENGINE=InnoDB AUTO_INCREMENT=32754 DEFAULT CHARSET=utf8;

/*
15962 rows in table
20 distinct user_id
~800 rows per group
*/

mysql> explain select t2.* from (select user_id from posts group by user_id) as t1,
-> lateral (select * from posts where t1.user_id=posts.user_id order by date_added desc limit 3) as t2;
+----+-------------------+------------+------------+-------+---------------+---------+---------+------------+------+----------+----------------------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------------+------------+------------+-------+---------------+---------+---------+------------+------+----------+----------------------------+
| 1 | PRIMARY | | NULL | ALL | NULL | NULL | NULL | NULL | 21 | 100.00 | Rematerialize () |
| 1 | PRIMARY | | NULL | ALL | NULL | NULL | NULL | NULL | 3 | 100.00 | NULL |
| 3 | DEPENDENT DERIVED | posts | NULL | ref | user_id | user_id | 4 | t1.user_id | 801 | 100.00 | Using filesort |
| 2 | DERIVED | posts | NULL | range | user_id | user_id | 4 | NULL | 21 | 100.00 | Using index for group-by |
+----+-------------------+------------+------------+-------+---------------+---------+---------+------------+------+----------+----------------------------+

mysql> show status like 'handler_%';
+----------------------------+-------+
| Variable_name | Value |
+----------------------------+-------+
| Handler_commit | 1 |
| Handler_delete | 0 |
| Handler_discover | 0 |
| Handler_external_lock | 4 |
| Handler_mrr_init | 0 |
| Handler_prepare | 0 |
| Handler_read_first | 1 |
| Handler_read_key | 102 |
| Handler_read_last | 1 |
| Handler_read_next | 15962 |
| Handler_read_prev | 0 |
| Handler_read_rnd | 60 |
| Handler_read_rnd_next | 101 |
| Handler_rollback | 0 |
| Handler_savepoint | 0 |
| Handler_savepoint_rollback | 0 |
| Handler_update | 0 |
| Handler_write | 80 |
+----------------------------+-------+

mysql> show status like 'sort%';
+-------------------+-------+
| Variable_name | Value |
+-------------------+-------+
| Sort_merge_passes | 0 |
| Sort_range | 0 |
| Sort_rows | 60 |
| Sort_scan | 20 |
+-------------------+-------+

In pseudo-code, this does:
select 20 distinct user_id, store into t1
for each row in t1:
select all rows from posts where user_id=t1.user_id
do filesort
take 3 last rows.

We have unique index (user_id, date_added). Optimal way like in this pseudo-code:
select 20 distinct user_id, store into t1
for each row in t1:
take 3 rows using unique key

Is it bug?

P.S. Also I translated some of your articles (LATERAL and emulate CHECK constraints).
RépondreSupprimer
Réponses
Guilhem Bichot3 avril 2019 à 06:14
Hello Vasiliy. First, thanks a lot fot the translations!!
About your problem: that is indeed surprising; to decide if it's correct, we should look at the cost-based decision. For that, I suggest you look at the optimizer trace https://dev.mysql.com/doc/internals/en/optimizer-tracing.html ; it will tell which strategy was considered and why it was chosen or rejected.
RépondreSupprimer
Réponses
Vasiliy Lyk'yanchikov4 avril 2019 à 01:20
Thanks for the reply.

You can download optimizer trace here

important part:
{
"reconsidering_access_paths_for_index_ordering": {
"clause": "ORDER BY",
"steps": [
],
"index_order_summary": {
"table": "`posts`",
"index_provides_order": false,
"order_direction": "undefined",
"index": "user_id",
"plan_changed": false
}
}
}

I don't understand why it's so.
RépondreSupprimer
Réponses
Guilhem Bichot4 avril 2019 à 08:54
Hi. Thanks. Not specific of LATERAL, also visible with
explain select (select post_text from posts where t1.user_id=posts.user_id order by date_added limit 3) as s from posts t1;
I'll contact a colleague who knows more, and will let you know.
RépondreSupprimer
Réponses
Guilhem Bichot5 avril 2019 à 06:06
Vasiliy: my colleague has looked a bit and it could theoretically be improved (no idea when, though). Could you please file a bug report? Please note there that you're doing it on my suggestion. A workaround is: extending ORDER BY, like this:
select (select post_text from posts where t1.user_id =
posts.user_id order by **posts.user_id**, date_added limit 3) as s from posts t1;
then MySQL thinks it can use the index.
For your original query which orders DESC, it would be:
order by posts.user_id desc, date_added desc
and then I get:
| 3 | DEPENDENT DERIVED | posts | NULL | ref | user_id | user_id | 4 | t1.user_id | 1 | 100.00 | Backward index scan |
RépondreSupprimer
Réponses
Vasiliy Lyk'yanchikov8 avril 2019 à 01:30
Thank you so much. It's black magic :) but it really works.

I have already submitted bug report - http://bugs.mysql.com/94903
RépondreSupprimer
Réponses

Ajouter un commentaire

Rechercher dans ce blog

About MySQL development (Optimizer mainly)

Support for LATERAL derived tables added to MySQL 8.0.14

Commentaires

Enregistrer un commentaire

Posts les plus consultés de ce blog

WITH RECURSIVE and MySQL

Faster subqueries with materialization