Hi All,
This is more of an academic question as I've come up with a better query, but I was wondering why SQLite doesn't optimise this query. Lets say I have two tables, simplified here. One contains webpage contents and a unique hash of those contents (the primary key), the other contains a history of lookups. The hash is used as a foreign key between the two tables. Table: webpage_contents content_hash -- Primary key post_processing_info page_content This table has about 10,000 rows crawling_lookups content_hash -- foreign key, has an index is_json -- a flag indicating if it's JSON is_html -- a flag indicating if it's HTML This table has 500,000 rows (Note: I appreciate this is bad design now and the flags should be in webpage content; it evolved organically). Now, I want to query the content table post_processing_info, and join in the flags from the lookups at the same time. After a bit of fiddling around to stop duplication happening, I got this: SELECT w.post_processing_info, l.is_json FROM webpage_contents w JOIN (SELECT * from lookup_tables.lookups GROUP BY content_xxhash) l USING (content_hash) WHERE content_hash = 'abc' This takes about 2 seconds to run, which is quite slow given content_hash is indexed in both tables and it's running from an SSD. I appreciate there are better ways to write this (and I'm using one now, I was a wee bit meandering getting there though), but it I was left wondering - why did SQLite not optimise the subquery? The Explain Query Plan for that is: 1 0 0 SCAN TABLE crawling_lookups USING INDEX content_hash_idx 0 0 0 SEARCH TABLE webpage_contents AS c USING INDEX sqlite_autoindex_page_contents_1 (content_hash=?) 0 1 1 SCAN SUBQUERY 1 AS l It seems to be doing the group on the subselect using the index (fast - about 0.1 seconds), then at step 3 doing a scan of the subquery results for the content_hash. This would be the slow part as that doesn't have an index. I figured SQL would spot that the where clause in the outermost scope would also apply to the subselect and put it in there automatically so the group-by would only be for the things the WHERE clause affected. Is this an optimisation opportunity? Or is my SQL so bad it was inevitable (more likely)? Cheers, Jonathan _______________________________________________ sqlite-users mailing list [hidden email] http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users |
Could you describe your intent in more detail?
SELECT * FROM ... GROUP BY is unstable at best, so I cannot really grasp your intention. Which lookup record's If I guess right, you might want something in the lines of: SELECT w.post_processing_info, l.is_json FROM webpage_contents w JOIN (SELECT DISTINCT is_json from lookup_tables.lookups) l USING (content_hash) WHERE content_hash = 'abc' (this might return 0-3 rows if lookup_tables contains 3 values (0,1,and NULL) for is_json) which in turn I think would optimize best if written as: SELECT DISTINCT w.content_hash, w.post_processing_info, l.is_json FROM webpage_contents w JOIN lookup_tables.lookups l USING (content_hash) WHERE content_hash = 'abc' -- Sent from: http://sqlite.1065341.n5.nabble.com/ _______________________________________________ sqlite-users mailing list [hidden email] http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users |
If you want just 1 value (any) from lookup, you can use:
SELECT post_processing_info, ( SELECT is_json FROM tables_lookup WHERE tables_lookup.content_hash=webpage_contents.content_hash LIMIT 1 ) AS is_json FROM webpage_contents WHERE content_hash = 'abc' If you want only one value, or SELECT post_processing_info, x.is_json FROM webpage_contents JOIN ( SELECT content_hash, MIN(is_json) FROM tables_lookup GROUP BY content_hash ) AS x ON x.content_hash=webpage_contents.content_hash FROM webpage_contents if you want the full join... I'm using MIN() here as an arbitrary function to get just one value from the group. -- Sent from: http://sqlite.1065341.n5.nabble.com/ _______________________________________________ sqlite-users mailing list [hidden email] http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users |
Free forum by Nabble | Edit this page |