Please help me optimize this LEFT JOIN query.

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
15 messages Options
Reply | Threaded
Open this post in threaded view
|

Please help me optimize this LEFT JOIN query.

Yuzem
This post was updated on .
I have 2 tables:
CREATE TABLE movies (
  movies UNIQUE,
  name,
  icon_modified
);

CREATE TABLE genres (
  genres,
  movies,
);

I want to construct genres icons and each icon must display 4 movies.
I need to get: genres from genres and movies,icon_modified from movies.
This is the query I have:
SELECT genres name,
            count(genres.movies) count,
            substr(group_concat(grouped.movies,' '),1,40) src,
            substr(group_concat(grouped.icon_modified,' '),1,80) icon_modified
    FROM genres
        LEFT JOIN (SELECT movies,icon_modified FROM movies WHERE icon_modified != '') grouped
            ON grouped.movies = genres.movies
    GROUP BY genres
    ORDER BY name;

The query works correctly but it is very very slow, it takes 2.474s on my machine, if I run the same query but using JOIN instead of LEFT JOIN it only takes 0.039s

Is there any way of optimizing these query?
Thanks in advance.
Reply | Threaded
Open this post in threaded view
|

Re: Please help me optimize this LEFT JOIN query.

Igor Tandetnik-2
On 9/4/2013 10:21 AM, Yuzem wrote:
> SELECT genres name,count(genres.movies)
> count,substr(group_concat(grouped.movies,' '),1,40)
> src,substr(group_concat(grouped.icon_modified,' '),1,80) icon_modified FROM
> genres LEFT JOIN (SELECT movies,icon_modified FROM movies WHERE
> icon_modified != '') grouped on grouped.movies = genres.movies  GROUP BY
> genres ORDER BY name"

Try

FROM genres LEFT JOIN movies grouped ON (genres.movies = movies.movies
AND icon_modified != '')

The use of nested SELECT likely prevents SQLite from using the index on
movies(movies).
--
Igor Tandetnik

_______________________________________________
sqlite-users mailing list
[hidden email]
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Please help me optimize this LEFT JOIN query.

Yuzem
Thanks Igor but now it is taking even more time: 3.139s
Any other idea?
Reply | Threaded
Open this post in threaded view
|

Re: Please help me optimize this LEFT JOIN query.

Yuzem
I did some testing and found some strange results.

SELECT genres FROM genres LEFT JOIN movies ON genres.movies = movies.movies GROUP BY genres ORDER BY genres;
time: 2.475s

SELECT genres FROM genres JOIN movies ON genres.movies = movies.movies GROUP BY genres ORDER BY genres;
time: 0.035s

SELECT tasks FROM tasks LEFT JOIN people ON tasks.people = people.people GROUP BY tasks ORDER BY tasks"
time: 0.164s

SELECT tasks FROM tasks JOIN people ON tasks.people = people.people GROUP BY tasks ORDER BY tasks"
time: 0.163s

The strange thing is that tasks and people are much larger tables than genres and movies:
SELECT count() from genres;
3998
SELECT count() from movies;
1529
SELECT count() from tasks;
24964
SELECT count() from people;
19626
Reply | Threaded
Open this post in threaded view
|

Re: Please help me optimize this LEFT JOIN query.

Simon Slavin-3

On 5 Sep 2013, at 7:56pm, Yuzem <[hidden email]> wrote:

> I did some testing and found some strange results.

Please do an ANALYZE and try the same things again.

Also, do you have any indexes on those tables (apart from the primary keys, of course) ?

Simon.
_______________________________________________
sqlite-users mailing list
[hidden email]
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Please help me optimize this LEFT JOIN query.

Yuzem
This post was updated on .
I did an ANALYZE but I'm getting the same results.
I do have indexes:
CREATE TABLE movies (
  movies UNIQUE,
  name,
  icon_modified
);

CREATE TABLE genres (
  genres,
  movies,
  UNIQUE(genres,movies)
);

people has an index on people (people UNIQUE) and tasks is a view:
CREATE VIEW tasks AS
  SELECT 'creators'  tasks,movies,people,'' details FROM creators
    UNION ALL
  SELECT 'directors' tasks,movies,people,'' details FROM directors
    UNION ALL
  SELECT 'writers'   tasks,movies,people,'' details FROM writers
    UNION ALL
  SELECT 'actors'    tasks,movies,people,characters details FROM actors;

The tables creators, directors, etc, have similar indexes as genres:
CREATE TABLE directors (
  directors,
  people,
  UNIQUE(directors,people)
);
Reply | Threaded
Open this post in threaded view
|

Re: Please help me optimize this LEFT JOIN query.

Petite Abeille-2
In reply to this post by Yuzem

On Sep 4, 2013, at 4:21 PM, Yuzem <[hidden email]> wrote:

> I want to construct genres icons and each icon must display 4 movies.

Assuming this is IMDB… what about a scalar subquery?

For example, assuming a slightly different schema from yours:

select    genre.code as genre,
          (
            select  group_concat( title )
            from
            (
              select    movie.title as title
              from      movie

              join      movie_genre
              on        movie_genre.movie_id = movie.id

              where     movie_genre.genre_id = genre.id

              order by  movie.title

              limit     4
            )
          ) as sample
from      genre

where     genre.code = 'Western'

order by  genre.code

> Western|"26 Men" (1957),"A Man Called Shenandoah" (1965),"ABC Weekend Specials" (1977) {The Winged Colt (#1.5)},"Action in the Afternoon" (1953)

CPU Time: user 0.083246 sys 0.000443

This is for:

select count( * ) from genre;
30

select count( * ) from movie;
2545331

select count( * ) from movie_genre;
1545196


_______________________________________________
sqlite-users mailing list
[hidden email]
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Please help me optimize this LEFT JOIN query.

Yuzem
Thanks Petite Abeille, I translated your code to this:
SELECT genres.genres,
       (
            SELECT  substr(group_concat(name,' '),1,60)
            FROM
            (
              SELECT    name
              FROM      movies

              JOIN      genres
              ON        movies.movies = genres.movies

              ORDER BY  movies.name

              LIMIT     6
            )
        ) AS sample

FROM      genres
GROUP BY  genres
ORDER BY  genres.genres;
time: 0.028s

It is incredibly fast but it gives me the wrong result:
Action|American Graffiti American Graffiti American Graffiti Mulhol
Adventure|American Graffiti American Graffiti American Graffiti Mulhol
Animation|American Graffiti American Graffiti American Graffiti Mulhol
Biography|American Graffiti American Graffiti American Graffiti Mulhol
Etc...

Am I doing something wrong?
Reply | Threaded
Open this post in threaded view
|

Re: Please help me optimize this LEFT JOIN query.

Petite Abeille-2

On Sep 5, 2013, at 9:45 PM, Yuzem <[hidden email]> wrote:

> It is incredibly fast but it gives me the wrong result:

Yeah… you forgot the 'where' clause in the subquery… you need to restrict it on the genre


_______________________________________________
sqlite-users mailing list
[hidden email]
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Please help me optimize this LEFT JOIN query.

Petite Abeille-2
In reply to this post by Yuzem

On Sep 5, 2013, at 8:56 PM, Yuzem <[hidden email]> wrote:

> SELECT genres FROM genres LEFT JOIN movies ON genres.movies = movies.movies
> GROUP BY genres ORDER BY genres;
> time: 2.475s
>
> SELECT genres FROM genres JOIN movies ON genres.movies = movies.movies GROUP
> BY genres ORDER BY genres;
> time: 0.035s

Yeah… the inner join vs out join will produce a significantly different access plan.

For example, using a different schema:

(1) inner join
0|0|1|SCAN TABLE movie_genre USING COVERING INDEX sqlite_autoindex_movie_genre_1
0|1|0|SEARCH TABLE genre USING INTEGER PRIMARY KEY (rowid=?)

(2) outer join
0|0|0|SCAN TABLE genre USING COVERING INDEX sqlite_autoindex_genre_1
0|1|1|SCAN TABLE movie_genre USING COVERING INDEX sqlite_autoindex_movie_genre_1

In this case, the first plan is much better in terms of throughput as movie_genre is much larger table.

_______________________________________________
sqlite-users mailing list
[hidden email]
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Please help me optimize this LEFT JOIN query.

Yuzem
Ok, wonderful, now it is working correctly but how do I select multiple columns from table movies?
Should I add another sub query?
Example:

SELECT genres.genres,
       (
            SELECT  substr(group_concat(name,' '),1,60)
            FROM
            (
              SELECT    name
              FROM      movies

              JOIN      genres AS movie_genres
              ON        movies.movies = movie_genres.movies

              WHERE     movie_genres.genres = genres.genres

              ORDER BY  movies.name

              LIMIT     6
            )
        ),
        (
            SELECT  substr(group_concat(movies,' '),1,60)
            FROM
            (
              SELECT    name
              FROM      movies

              JOIN      genres AS movie_genres
              ON        movies.movies = movie_genres.movies

              WHERE     movie_genres.genres = genres.genres

              ORDER BY  movies.name

              LIMIT     6
            )
        )

FROM      genres
GROUP BY  genres
ORDER BY  genres.genres;

If I add another sub query it takes almost twice the time than using only one sub query. It is still faster than before.

 
Petite Abeille-2 wrote
Yeah… the inner join vs out join will produce a significantly different access plan.
Yes but if that's the case don't you think that the difference in time is a bit much?
And why there is almost no difference between JOIN and LEFT JOIN when using tasks and people which are larger tables?
Reply | Threaded
Open this post in threaded view
|

Re: Please help me optimize this LEFT JOIN query.

Petite Abeille-2

On Sep 5, 2013, at 10:28 PM, Yuzem <[hidden email]> wrote:

> Ok, wonderful, now it is working correctly but how do I select multiple
> columns from table movies?
> Should I add another sub query?

Nope. You have now changed the problem definition, so scalars will not be a good fit. Blindly copy & paste them will not get you anywhere.

The key to success in your case is to access the movie_genre table only once, as selectively as possible.

So, if, for a given genre you want 4 movies, you could try something along these lines:

select    genre.code,
          movie.title
from      (
            select    movie_genre.movie_id,
                      movie_genre.genre_id
            from      movie_genre

            where     movie_genre.genre_id = 30

            order by  movie_genre.id
   
            limit     4
          )
as        movie_genre

join      movie
on        movie.id = movie_genre.movie_id

join      genre
on        genre.id = movie_genre.genre_id

Which gives you 4 rows:

Western|"26 Men" (1957)
Western|"A Man Called Shenandoah" (1965)
Western|"ABC Weekend Specials" (1977) {The Winged Colt (#1.5)}
Western|"Action in the Afternoon" (1953)





_______________________________________________
sqlite-users mailing list
[hidden email]
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Please help me optimize this LEFT JOIN query.

Yuzem
Thanks but 4 rows is not what I am looking for.
I found a solution, concatenation:
SELECT genres.genres,
       (
            SELECT  group_concat(movie,' ')
            FROM
            (
              SELECT    movies.movies||','||name movie
              FROM      movies

              JOIN      genres AS movie_genres
              ON        movies.movies = movie_genres.movies

              WHERE     movie_genres.genres = genres.genres

              ORDER BY  movies.name

              LIMIT     3
            )
        )

FROM      genres
GROUP BY  genres
ORDER BY  genres.genres;

Any clue on why LEFT JOIN is so slow when used with genres but not with larger tables?
Reply | Threaded
Open this post in threaded view
|

Re: Please help me optimize this LEFT JOIN query.

Petite Abeille-2

On Sep 5, 2013, at 11:27 PM, Yuzem <[hidden email]> wrote:

> Any clue on why LEFT JOIN is so slow when used with genres but not with
> larger tables?

Sure. But your conclusion is a most likely a red herring. The crux of the matter is that inner and outer join have a wildly different semantic. And therefore execution plan. That's all. Seems all very reasonable from an optimizer point of view.

_______________________________________________
sqlite-users mailing list
[hidden email]
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Please help me optimize this LEFT JOIN query.

Yuzem
Petite Abeille-2 wrote
Sure. But your conclusion is a most likely a red herring. The crux of the matter is that inner and outer join have a wildly different semantic. And therefore execution plan. That's all. Seems all very reasonable from an optimizer point of view.
But I have no conclusion, I acknowledge that inner and outer join are different but I don't know why LEFT JOIN works very fast on larger tables and very slow on smaller tables at least in those cases. I would like to have a conclusion on that matter.