jeudi 31 janvier 2019

Support for LATERAL derived tables added to MySQL 8.0.14

Hello all. Last months have been busy for me with various tasks, and one of them is finally done and released in MySQL 8.0.14 : LATERAL derived tables. All details are in my post on mysqlserverteam. I think it is a useful addition to our SQL capabilities. Enjoy!

6 commentaires:

  1. Hello, Guilhem.
    I wrote article How to select N rows for each group. One of solutions uses LATERAL.
    However, the performance was lower than expected - mysql do not uses unique index, instead of that it reads all rows from group and does filesort. Example

    CREATE TABLE `posts` (
    `post_id` int(11) NOT NULL AUTO_INCREMENT,
    `user_id` int(11) NOT NULL,
    `date_added` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
    `post_text` text NOT NULL,
    PRIMARY KEY (`post_id`),
    UNIQUE KEY `user_id` (`user_id`,`date_added`)
    ) ENGINE=InnoDB AUTO_INCREMENT=32754 DEFAULT CHARSET=utf8;

    /*
    15962 rows in table
    20 distinct user_id
    ~800 rows per group
    */

    mysql> explain select t2.* from (select user_id from posts group by user_id) as t1,
    -> lateral (select * from posts where t1.user_id=posts.user_id order by date_added desc limit 3) as t2;
    +----+-------------------+------------+------------+-------+---------------+---------+---------+------------+------+----------+----------------------------+
    | id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
    +----+-------------------+------------+------------+-------+---------------+---------+---------+------------+------+----------+----------------------------+
    | 1 | PRIMARY | | NULL | ALL | NULL | NULL | NULL | NULL | 21 | 100.00 | Rematerialize () |
    | 1 | PRIMARY | | NULL | ALL | NULL | NULL | NULL | NULL | 3 | 100.00 | NULL |
    | 3 | DEPENDENT DERIVED | posts | NULL | ref | user_id | user_id | 4 | t1.user_id | 801 | 100.00 | Using filesort |
    | 2 | DERIVED | posts | NULL | range | user_id | user_id | 4 | NULL | 21 | 100.00 | Using index for group-by |
    +----+-------------------+------------+------------+-------+---------------+---------+---------+------------+------+----------+----------------------------+

    mysql> show status like 'handler_%';
    +----------------------------+-------+
    | Variable_name | Value |
    +----------------------------+-------+
    | Handler_commit | 1 |
    | Handler_delete | 0 |
    | Handler_discover | 0 |
    | Handler_external_lock | 4 |
    | Handler_mrr_init | 0 |
    | Handler_prepare | 0 |
    | Handler_read_first | 1 |
    | Handler_read_key | 102 |
    | Handler_read_last | 1 |
    | Handler_read_next | 15962 |
    | Handler_read_prev | 0 |
    | Handler_read_rnd | 60 |
    | Handler_read_rnd_next | 101 |
    | Handler_rollback | 0 |
    | Handler_savepoint | 0 |
    | Handler_savepoint_rollback | 0 |
    | Handler_update | 0 |
    | Handler_write | 80 |
    +----------------------------+-------+

    mysql> show status like 'sort%';
    +-------------------+-------+
    | Variable_name | Value |
    +-------------------+-------+
    | Sort_merge_passes | 0 |
    | Sort_range | 0 |
    | Sort_rows | 60 |
    | Sort_scan | 20 |
    +-------------------+-------+

    In pseudo-code, this does:
    select 20 distinct user_id, store into t1
    for each row in t1:
    select all rows from posts where user_id=t1.user_id
    do filesort
    take 3 last rows.

    We have unique index (user_id, date_added). Optimal way like in this pseudo-code:
    select 20 distinct user_id, store into t1
    for each row in t1:
    take 3 rows using unique key

    Is it bug?

    P.S. Also I translated some of your articles (LATERAL and emulate CHECK constraints).

    RépondreSupprimer
  2. Hello Vasiliy. First, thanks a lot fot the translations!!
    About your problem: that is indeed surprising; to decide if it's correct, we should look at the cost-based decision. For that, I suggest you look at the optimizer trace https://dev.mysql.com/doc/internals/en/optimizer-tracing.html ; it will tell which strategy was considered and why it was chosen or rejected.

    RépondreSupprimer
  3. Thanks for the reply.

    You can download optimizer trace here

    important part:
    {
    "reconsidering_access_paths_for_index_ordering": {
    "clause": "ORDER BY",
    "steps": [
    ],
    "index_order_summary": {
    "table": "`posts`",
    "index_provides_order": false,
    "order_direction": "undefined",
    "index": "user_id",
    "plan_changed": false
    }
    }
    }

    I don't understand why it's so.

    RépondreSupprimer
  4. Hi. Thanks. Not specific of LATERAL, also visible with
    explain select (select post_text from posts where t1.user_id=posts.user_id order by date_added limit 3) as s from posts t1;
    I'll contact a colleague who knows more, and will let you know.

    RépondreSupprimer
  5. Vasiliy: my colleague has looked a bit and it could theoretically be improved (no idea when, though). Could you please file a bug report? Please note there that you're doing it on my suggestion. A workaround is: extending ORDER BY, like this:
    select (select post_text from posts where t1.user_id =
    posts.user_id order by **posts.user_id**, date_added limit 3) as s from posts t1;
    then MySQL thinks it can use the index.
    For your original query which orders DESC, it would be:
    order by posts.user_id desc, date_added desc
    and then I get:
    | 3 | DEPENDENT DERIVED | posts | NULL | ref | user_id | user_id | 4 | t1.user_id | 1 | 100.00 | Backward index scan |

    RépondreSupprimer
  6. Thank you so much. It's black magic :) but it really works.

    I have already submitted bug report - http://bugs.mysql.com/94903

    RépondreSupprimer