speeding up FTS4

classic Classic list List threaded Threaded
32 messages Options
12
Reply | Threaded
Open this post in threaded view
|

speeding up FTS4

Puneet Kishor-2
I have

    CREATE TABLE uris (
        uri_id INTEGER PRIMARY KEY,
        uri TEXT,
        uri_content TEXT,
        downloaded_on DATETIME DEFAULT CURRENT_TIMESTAMP,
        feed_history_id INTEGER
    );

with 46608 rows

    CREATE TABLE feed_history (
        feed_history_id INTEGER PRIMARY KEY,
        feed_id INTEGER,
        scanned_on DATETIME DEFAULT CURRENT_TIMESTAMP
    );

with 3276 rows

    CREATE TABLE feeds (feed_id INTEGER PRIMARY KEY, feed_uri TEXT, project_id INTEGER);

with 79 rows

    CREATE TABLE projects (project_id INTEGER PRIMARY KEY, project_name TEXT);

with 3 rows

    CREATE INDEX idx_uris_downloaded_on ON uris (downloaded_on);
    CREATE VIRTUAL TABLE fts_uri USING fts4 (uri_id, uri_content);

The database file is about 27 GB.


The following query takes (CPU Time: user 23.952698 sys 73.188765) returning 46608 rows

    SELECT u.uri_id uri_id, u.uri uri, u.downloaded_on,
        Snippet(fts_uri, '<span class="hilite">', '</span>', '&hellip;', -1, 64) snippet
    FROM fts_uri f
        JOIN uris u ON f.uri_id = u.uri_id
        JOIN feed_history fh ON u.feed_history_id = fh.feed_history_id
        JOIN feeds f ON fh.feed_id = f.feed_id
        JOIN projects p ON f.project_id = p.project_id
    WHERE p.project_id = 3
        AND Datetime(u.downloaded_on) >= Datetime(p.project_start)
        AND fts_uri MATCH 'education,school'
    ORDER BY u.uri_id, downloaded_on DESC;


The EXPLAIN QUERY PLAN for the above query tells me

    0|0|0|SCAN TABLE fts_uri AS f VIRTUAL TABLE INDEX 4: (~0 rows)
    0|1|1|SEARCH TABLE uris AS u USING INTEGER PRIMARY KEY (rowid=?) (~1 rows)
    0|2|2|SEARCH TABLE feed_history AS fh USING INTEGER PRIMARY KEY (rowid=?) (~1 rows)
    0|3|3|SEARCH TABLE feeds AS f USING INTEGER PRIMARY KEY (rowid=?) (~1 rows)
    0|4|4|SEARCH TABLE projects AS p USING INTEGER PRIMARY KEY (rowid=?) (~1 rows)
    0|0|0|USE TEMP B-TREE FOR ORDER BY


Is there anything I can do to speed this up?

--
Puneet Kishor
_______________________________________________
sqlite-users mailing list
[hidden email]
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: EXT : speeding up FTS4

Black, Michael (IS)
Have you done "ANALYZE"?  That might help.

Also...try to arrange your joins based on record count (both high-to-low and low-to-high) and see what difference it makes.

Since you have only one WHERE clause I'm guessing having project_ids as the first join makes sense.

Michael D. Black
Senior Scientist
NG Information Systems
Advanced Analytics Directorate




From: [hidden email] [[hidden email]] on behalf of Puneet Kishor [[hidden email]]
Sent: Tuesday, September 27, 2011 5:46 PM
To: General Discussion of SQLite Database
Subject: EXT :[sqlite] speeding up FTS4


I have

    CREATE TABLE uris (
        uri_id INTEGER PRIMARY KEY,
        uri TEXT,
        uri_content TEXT,
        downloaded_on DATETIME DEFAULT CURRENT_TIMESTAMP,
        feed_history_id INTEGER
    );

with 46608 rows

    CREATE TABLE feed_history (
        feed_history_id INTEGER PRIMARY KEY,
        feed_id INTEGER,
        scanned_on DATETIME DEFAULT CURRENT_TIMESTAMP
    );

with 3276 rows

    CREATE TABLE feeds (feed_id INTEGER PRIMARY KEY, feed_uri TEXT, project_id INTEGER);

with 79 rows

    CREATE TABLE projects (project_id INTEGER PRIMARY KEY, project_name TEXT);

with 3 rows

    CREATE INDEX idx_uris_downloaded_on ON uris (downloaded_on);
    CREATE VIRTUAL TABLE fts_uri USING fts4 (uri_id, uri_content);

The database file is about 27 GB.


The following query takes (CPU Time: user 23.952698 sys 73.188765) returning 46608 rows

    SELECT u.uri_id uri_id, u.uri uri, u.downloaded_on,
        Snippet(fts_uri, '<span class="hilite">', '</span>', '&hellip;', -1, 64) snippet
    FROM fts_uri f
        JOIN uris u ON f.uri_id = u.uri_id
        JOIN feed_history fh ON u.feed_history_id = fh.feed_history_id
        JOIN feeds f ON fh.feed_id = f.feed_id
        JOIN projects p ON f.project_id = p.project_id
    WHERE p.project_id = 3
        AND Datetime(u.downloaded_on) >= Datetime(p.project_start)
        AND fts_uri MATCH 'education,school'
    ORDER BY u.uri_id, downloaded_on DESC;


The EXPLAIN QUERY PLAN for the above query tells me

    0|0|0|SCAN TABLE fts_uri AS f VIRTUAL TABLE INDEX 4: (~0 rows)
    0|1|1|SEARCH TABLE uris AS u USING INTEGER PRIMARY KEY (rowid=?) (~1 rows)
    0|2|2|SEARCH TABLE feed_history AS fh USING INTEGER PRIMARY KEY (rowid=?) (~1 rows)
    0|3|3|SEARCH TABLE feeds AS f USING INTEGER PRIMARY KEY (rowid=?) (~1 rows)
    0|4|4|SEARCH TABLE projects AS p USING INTEGER PRIMARY KEY (rowid=?) (~1 rows)
    0|0|0|USE TEMP B-TREE FOR ORDER BY


Is there anything I can do to speed this up?

--
Puneet Kishor
_______________________________________________
sqlite-users mailing list
[hidden email]
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
_______________________________________________
sqlite-users mailing list
[hidden email]
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: speeding up FTS4

Black, Michael (IS)
In reply to this post by Puneet Kishor-2
P.S.  Your projects table is missing project_start.  So apparently these aren't the real create statements you are using.





Michael D. Black

Senior Scientist

NG Information Systems

Advanced Analytics Directorate



________________________________
From: [hidden email] [[hidden email]] on behalf of Puneet Kishor [[hidden email]]
Sent: Tuesday, September 27, 2011 5:46 PM
To: General Discussion of SQLite Database
Subject: EXT :[sqlite] speeding up FTS4

I have

    CREATE TABLE uris (
        uri_id INTEGER PRIMARY KEY,
        uri TEXT,
        uri_content TEXT,
        downloaded_on DATETIME DEFAULT CURRENT_TIMESTAMP,
        feed_history_id INTEGER
    );

with 46608 rows

    CREATE TABLE feed_history (
        feed_history_id INTEGER PRIMARY KEY,
        feed_id INTEGER,
        scanned_on DATETIME DEFAULT CURRENT_TIMESTAMP
    );

with 3276 rows

    CREATE TABLE feeds (feed_id INTEGER PRIMARY KEY, feed_uri TEXT, project_id INTEGER);

with 79 rows

    CREATE TABLE projects (project_id INTEGER PRIMARY KEY, project_name TEXT);

with 3 rows

    CREATE INDEX idx_uris_downloaded_on ON uris (downloaded_on);
    CREATE VIRTUAL TABLE fts_uri USING fts4 (uri_id, uri_content);

The database file is about 27 GB.


The following query takes (CPU Time: user 23.952698 sys 73.188765) returning 46608 rows

    SELECT u.uri_id uri_id, u.uri uri, u.downloaded_on,
        Snippet(fts_uri, '<span class="hilite">', '</span>', '&hellip;', -1, 64) snippet
    FROM fts_uri f
        JOIN uris u ON f.uri_id = u.uri_id
        JOIN feed_history fh ON u.feed_history_id = fh.feed_history_id
        JOIN feeds f ON fh.feed_id = f.feed_id
        JOIN projects p ON f.project_id = p.project_id
    WHERE p.project_id = 3
        AND Datetime(u.downloaded_on) >= Datetime(p.project_start)
        AND fts_uri MATCH 'education,school'
    ORDER BY u.uri_id, downloaded_on DESC;


The EXPLAIN QUERY PLAN for the above query tells me

    0|0|0|SCAN TABLE fts_uri AS f VIRTUAL TABLE INDEX 4: (~0 rows)
    0|1|1|SEARCH TABLE uris AS u USING INTEGER PRIMARY KEY (rowid=?) (~1 rows)
    0|2|2|SEARCH TABLE feed_history AS fh USING INTEGER PRIMARY KEY (rowid=?) (~1 rows)
    0|3|3|SEARCH TABLE feeds AS f USING INTEGER PRIMARY KEY (rowid=?) (~1 rows)
    0|4|4|SEARCH TABLE projects AS p USING INTEGER PRIMARY KEY (rowid=?) (~1 rows)
    0|0|0|USE TEMP B-TREE FOR ORDER BY


Is there anything I can do to speed this up?

--
Puneet Kishor
_______________________________________________
sqlite-users mailing list
[hidden email]
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
_______________________________________________
sqlite-users mailing list
[hidden email]
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: speeding up FTS4

Puneet Kishor-2

On Sep 28, 2011, at 8:19 AM, Black, Michael (IS) wrote:

> P.S.  Your projects table is missing project_start.  So apparently these aren't the real create statements you are using.
>
>
>
>


Sorry, I think that is the only table from which I snipped off information to make the post brief(er). These are the actual statements. The projects table is

CREATE TABLE projects (
    project_id INTEGER PRIMARY KEY,
    project_name TEXT,
    website_tags TEXT,
    twitter_tags TEXT,
    flickr_tags TEXT,
    project_start DATETIME,
    project_en DATETIME,
    project_end DATETIME
);

>
>
> ________________________________
> From: [hidden email] [[hidden email]] on behalf of Puneet Kishor [[hidden email]]
> Sent: Tuesday, September 27, 2011 5:46 PM
> To: General Discussion of SQLite Database
> Subject: EXT :[sqlite] speeding up FTS4
>
> I have
>
>    CREATE TABLE uris (
>        uri_id INTEGER PRIMARY KEY,
>        uri TEXT,
>        uri_content TEXT,
>        downloaded_on DATETIME DEFAULT CURRENT_TIMESTAMP,
>        feed_history_id INTEGER
>    );
>
> with 46608 rows
>
>    CREATE TABLE feed_history (
>        feed_history_id INTEGER PRIMARY KEY,
>        feed_id INTEGER,
>        scanned_on DATETIME DEFAULT CURRENT_TIMESTAMP
>    );
>
> with 3276 rows
>
>    CREATE TABLE feeds (feed_id INTEGER PRIMARY KEY, feed_uri TEXT, project_id INTEGER);
>
> with 79 rows
>
>    CREATE TABLE projects (project_id INTEGER PRIMARY KEY, project_name TEXT);
>
> with 3 rows
>
>    CREATE INDEX idx_uris_downloaded_on ON uris (downloaded_on);
>    CREATE VIRTUAL TABLE fts_uri USING fts4 (uri_id, uri_content);
>
> The database file is about 27 GB.
>
>
> The following query takes (CPU Time: user 23.952698 sys 73.188765) returning 46608 rows
>
>    SELECT u.uri_id uri_id, u.uri uri, u.downloaded_on,
>        Snippet(fts_uri, '<span class="hilite">', '</span>', '&hellip;', -1, 64) snippet
>    FROM fts_uri f
>        JOIN uris u ON f.uri_id = u.uri_id
>        JOIN feed_history fh ON u.feed_history_id = fh.feed_history_id
>        JOIN feeds f ON fh.feed_id = f.feed_id
>        JOIN projects p ON f.project_id = p.project_id
>    WHERE p.project_id = 3
>        AND Datetime(u.downloaded_on) >= Datetime(p.project_start)
>        AND fts_uri MATCH 'education,school'
>    ORDER BY u.uri_id, downloaded_on DESC;
>
>
> The EXPLAIN QUERY PLAN for the above query tells me
>
>    0|0|0|SCAN TABLE fts_uri AS f VIRTUAL TABLE INDEX 4: (~0 rows)
>    0|1|1|SEARCH TABLE uris AS u USING INTEGER PRIMARY KEY (rowid=?) (~1 rows)
>    0|2|2|SEARCH TABLE feed_history AS fh USING INTEGER PRIMARY KEY (rowid=?) (~1 rows)
>    0|3|3|SEARCH TABLE feeds AS f USING INTEGER PRIMARY KEY (rowid=?) (~1 rows)
>    0|4|4|SEARCH TABLE projects AS p USING INTEGER PRIMARY KEY (rowid=?) (~1 rows)
>    0|0|0|USE TEMP B-TREE FOR ORDER BY
>
>
> Is there anything I can do to speed this up?
>
> --
> Puneet Kishor
> _______________________________________________
> sqlite-users mailing list
> [hidden email]
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
> _______________________________________________
> sqlite-users mailing list
> [hidden email]
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

_______________________________________________
sqlite-users mailing list
[hidden email]
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: EXT : speeding up FTS4

Puneet Kishor-2
In reply to this post by Black, Michael (IS)

On Sep 28, 2011, at 8:02 AM, Black, Michael (IS) wrote:

> Have you done "ANALYZE"?  That might help.
>
> Also...try to arrange your joins based on record count (both high-to-low and low-to-high) and see what difference it makes.
>
> Since you have only one WHERE clause I'm guessing having project_ids as the first join makes sense.
>
>


So, I ran ANALYZE. Then, with the query as is, I got

CPU Time: user 24.742481 sys 79.120486

If I flip the WHERE clause to

    WHERE Datetime(u.downloaded_on) >= Datetime(p.project_start)
        AND p.project_id = 3
        AND fts_uri MATCH 'education school'

I still get

CPU Time: user 24.726792 sys 79.240780



Yesterday, in desperation, I tried to pare down my fts_uri table which brought the file size to 17 GB, but ended up creating a "malformed disk image" whatever the heck that means. Thankfully I had a backup (clever boy).

So, I am now back with a 27 GB file, and a query that takes forever thereby locking up my web app.


>
>
>
> From: [hidden email] [[hidden email]] on behalf of Puneet Kishor [[hidden email]]
> Sent: Tuesday, September 27, 2011 5:46 PM
> To: General Discussion of SQLite Database
> Subject: EXT :[sqlite] speeding up FTS4
>
>
> I have
>
>    CREATE TABLE uris (
>        uri_id INTEGER PRIMARY KEY,
>        uri TEXT,
>        uri_content TEXT,
>        downloaded_on DATETIME DEFAULT CURRENT_TIMESTAMP,
>        feed_history_id INTEGER
>    );
>
> with 46608 rows
>
>    CREATE TABLE feed_history (
>        feed_history_id INTEGER PRIMARY KEY,
>        feed_id INTEGER,
>        scanned_on DATETIME DEFAULT CURRENT_TIMESTAMP
>    );
>
> with 3276 rows
>
>    CREATE TABLE feeds (feed_id INTEGER PRIMARY KEY, feed_uri TEXT, project_id INTEGER);
>
> with 79 rows
>
>    CREATE TABLE projects (project_id INTEGER PRIMARY KEY, project_name TEXT);
>
> with 3 rows
>
>    CREATE INDEX idx_uris_downloaded_on ON uris (downloaded_on);
>    CREATE VIRTUAL TABLE fts_uri USING fts4 (uri_id, uri_content);
>
> The database file is about 27 GB.
>
>
> The following query takes (CPU Time: user 23.952698 sys 73.188765) returning 46608 rows
>
>    SELECT u.uri_id uri_id, u.uri uri, u.downloaded_on,
>        Snippet(fts_uri, '<span class="hilite">', '</span>', '&hellip;', -1, 64) snippet
>    FROM fts_uri f
>        JOIN uris u ON f.uri_id = u.uri_id
>        JOIN feed_history fh ON u.feed_history_id = fh.feed_history_id
>        JOIN feeds f ON fh.feed_id = f.feed_id
>        JOIN projects p ON f.project_id = p.project_id
>    WHERE p.project_id = 3
>        AND Datetime(u.downloaded_on) >= Datetime(p.project_start)
>        AND fts_uri MATCH 'education,school'
>    ORDER BY u.uri_id, downloaded_on DESC;
>
>
> The EXPLAIN QUERY PLAN for the above query tells me
>
>    0|0|0|SCAN TABLE fts_uri AS f VIRTUAL TABLE INDEX 4: (~0 rows)
>    0|1|1|SEARCH TABLE uris AS u USING INTEGER PRIMARY KEY (rowid=?) (~1 rows)
>    0|2|2|SEARCH TABLE feed_history AS fh USING INTEGER PRIMARY KEY (rowid=?) (~1 rows)
>    0|3|3|SEARCH TABLE feeds AS f USING INTEGER PRIMARY KEY (rowid=?) (~1 rows)
>    0|4|4|SEARCH TABLE projects AS p USING INTEGER PRIMARY KEY (rowid=?) (~1 rows)
>    0|0|0|USE TEMP B-TREE FOR ORDER BY
>
>
> Is there anything I can do to speed this up?
>
> --
> Puneet Kishor

_______________________________________________
sqlite-users mailing list
[hidden email]
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: EXT : speeding up FTS4

Simon Slavin-3

On 28 Sep 2011, at 3:41pm, Puneet Kishor wrote:

>    WHERE Datetime(u.downloaded_on) >= Datetime(p.project_start)

Why are you doing 'Datetime' here ?  Not only does the conversion take time, but it means you can't usefully index either of those two columns.

Can you instead store your stamps in a format which is readily sortable ?  Either in text form or as julian days.

Simon.
_______________________________________________
sqlite-users mailing list
[hidden email]
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: EXT : speeding up FTS4

Puneet Kishor-2

On Sep 28, 2011, at 9:44 AM, Simon Slavin wrote:

>
> On 28 Sep 2011, at 3:41pm, Puneet Kishor wrote:
>
>>   WHERE Datetime(u.downloaded_on) >= Datetime(p.project_start)
>
> Why are you doing 'Datetime' here ?  Not only does the conversion take time, but it means you can't usefully index either of those two columns.
>
> Can you instead store your stamps in a format which is readily sortable ?  Either in text form or as julian days.


Could I? Sure, if I had known better. Should I? I would be happy to create a new column, convert the values to julian days, and try that, but on a 27 GB db, that would take a bit of a while.

But, if I understood [http://www.sqlite.org/datatype3.html] correctly, there really is no such thing as DATETIME value. Internally, it is stored as TEXT anyway.


>
> Simon.
> _______________________________________________
> sqlite-users mailing list
> [hidden email]
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

_______________________________________________
sqlite-users mailing list
[hidden email]
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: EXT : speeding up FTS4

Petite Abeille-2

On Sep 28, 2011, at 4:48 PM, Puneet Kishor wrote:

> But, if I understood [http://www.sqlite.org/datatype3.html] correctly, there really is no such thing as DATETIME value. Internally, it is stored as TEXT anyway.

Or as a number. Your choice:

        • TEXT as ISO8601 strings ("YYYY-MM-DD HH:MM:SS.SSS").
        • REAL as Julian day numbers, the number of days since noon in Greenwich on November 24, 4714 B.C. according to the proleptic Gregorian calendar.
        • INTEGER as Unix Time, the number of seconds since 1970-01-01 00:00:00 UTC.

Considering the amount of data you have, perhaps something like a unix time would be more, hmmm, more frugal.

_______________________________________________
sqlite-users mailing list
[hidden email]
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: EXT : speeding up FTS4

Simon Slavin-3
In reply to this post by Puneet Kishor-2

On 28 Sep 2011, at 3:48pm, Puneet Kishor wrote:

> Could I? Sure, if I had known better. Should I? I would be happy to create a new column, convert the values to julian days, and try that, but on a 27 GB db, that would take a bit of a while.

You only have to do it once, you can do it overnight, and you can pick which night you do it.  You can store them as INTEGER or REAL, which is far faster to search than TEXT.  And the alternative is to do some extra work every time someone uses that SELECT or anything like it.  It also means you can usefully put those columns in an INDEX.  Speeding up SELECTs is what INDEXes are all about.

What you have done is the equivalent of collecting all the knowledge of the world and putting it unsorted in a huge warehouse.  Every time anyone wants something they have to wade through, on average, half the warehouse before they find it.

> But, if I understood [http://www.sqlite.org/datatype3.html] correctly, there really is no such thing as DATETIME value. Internally, it is stored as TEXT anyway.

If you want to see what value is actually being stored just SELECT it without converting to Datatype and see what you get.

Simon.
_______________________________________________
sqlite-users mailing list
[hidden email]
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: speeding up FTS4

Black, Michael (IS)
In reply to this post by Puneet Kishor-2
Sounds like you may just be hitting disk i/o.  Your "sys" numbers seem to indicate that.



How much memory does your machine have?



How much time does each WHERE clause take?

select count(*) from project where project_id = 3;

select count(*) from fts_uri MATCH 'education,school';

select count(*) from project as p,fts_uri as u where DateTime(u.downloaded_on) >= DateTime(p.project_start);

(I don't think you need Datetime at all...but I doubt it makes a lot of difference....test it)...should get the same count.

select count(*) from project as p,uris as u where u.downloaded_on >= p.project_start;



Michael D. Black

Senior Scientist

NG Information Systems

Advanced Analytics Directorate



________________________________
From: [hidden email] [[hidden email]] on behalf of Puneet Kishor [[hidden email]]
Sent: Wednesday, September 28, 2011 9:41 AM
To: General Discussion of SQLite Database
Subject: Re: [sqlite] EXT : speeding up FTS4


On Sep 28, 2011, at 8:02 AM, Black, Michael (IS) wrote:

> Have you done "ANALYZE"?  That might help.
>
> Also...try to arrange your joins based on record count (both high-to-low and low-to-high) and see what difference it makes.
>
> Since you have only one WHERE clause I'm guessing having project_ids as the first join makes sense.
>
>


So, I ran ANALYZE. Then, with the query as is, I got

CPU Time: user 24.742481 sys 79.120486

If I flip the WHERE clause to

    WHERE Datetime(u.downloaded_on) >= Datetime(p.project_start)
        AND p.project_id = 3
        AND fts_uri MATCH 'education school'

I still get

CPU Time: user 24.726792 sys 79.240780



Yesterday, in desperation, I tried to pare down my fts_uri table which brought the file size to 17 GB, but ended up creating a "malformed disk image" whatever the heck that means. Thankfully I had a backup (clever boy).

So, I am now back with a 27 GB file, and a query that takes forever thereby locking up my web app.


>
>
>
> From: [hidden email] [[hidden email]] on behalf of Puneet Kishor [[hidden email]]
> Sent: Tuesday, September 27, 2011 5:46 PM
> To: General Discussion of SQLite Database
> Subject: EXT :[sqlite] speeding up FTS4
>
>
> I have
>
>    CREATE TABLE uris (
>        uri_id INTEGER PRIMARY KEY,
>        uri TEXT,
>        uri_content TEXT,
>        downloaded_on DATETIME DEFAULT CURRENT_TIMESTAMP,
>        feed_history_id INTEGER
>    );
>
> with 46608 rows
>
>    CREATE TABLE feed_history (
>        feed_history_id INTEGER PRIMARY KEY,
>        feed_id INTEGER,
>        scanned_on DATETIME DEFAULT CURRENT_TIMESTAMP
>    );
>
> with 3276 rows
>
>    CREATE TABLE feeds (feed_id INTEGER PRIMARY KEY, feed_uri TEXT, project_id INTEGER);
>
> with 79 rows
>
>    CREATE TABLE projects (project_id INTEGER PRIMARY KEY, project_name TEXT);
>
> with 3 rows
>
>    CREATE INDEX idx_uris_downloaded_on ON uris (downloaded_on);
>    CREATE VIRTUAL TABLE fts_uri USING fts4 (uri_id, uri_content);
>
> The database file is about 27 GB.
>
>
> The following query takes (CPU Time: user 23.952698 sys 73.188765) returning 46608 rows
>
>    SELECT u.uri_id uri_id, u.uri uri, u.downloaded_on,
>        Snippet(fts_uri, '<span class="hilite">', '</span>', '&hellip;', -1, 64) snippet
>    FROM fts_uri f
>        JOIN uris u ON f.uri_id = u.uri_id
>        JOIN feed_history fh ON u.feed_history_id = fh.feed_history_id
>        JOIN feeds f ON fh.feed_id = f.feed_id
>        JOIN projects p ON f.project_id = p.project_id
>    WHERE p.project_id = 3
>        AND Datetime(u.downloaded_on) >= Datetime(p.project_start)
>        AND fts_uri MATCH 'education,school'
>    ORDER BY u.uri_id, downloaded_on DESC;
>
>
> The EXPLAIN QUERY PLAN for the above query tells me
>
>    0|0|0|SCAN TABLE fts_uri AS f VIRTUAL TABLE INDEX 4: (~0 rows)
>    0|1|1|SEARCH TABLE uris AS u USING INTEGER PRIMARY KEY (rowid=?) (~1 rows)
>    0|2|2|SEARCH TABLE feed_history AS fh USING INTEGER PRIMARY KEY (rowid=?) (~1 rows)
>    0|3|3|SEARCH TABLE feeds AS f USING INTEGER PRIMARY KEY (rowid=?) (~1 rows)
>    0|4|4|SEARCH TABLE projects AS p USING INTEGER PRIMARY KEY (rowid=?) (~1 rows)
>    0|0|0|USE TEMP B-TREE FOR ORDER BY
>
>
> Is there anything I can do to speed this up?
>
> --
> Puneet Kishor

_______________________________________________
sqlite-users mailing list
[hidden email]
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
_______________________________________________
sqlite-users mailing list
[hidden email]
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: EXT : speeding up FTS4

Simon Slavin-3
In reply to this post by Petite Abeille-2

On 28 Sep 2011, at 3:52pm, Petite Abeille wrote:

> On Sep 28, 2011, at 4:48 PM, Puneet Kishor wrote:
>
>> But, if I understood [http://www.sqlite.org/datatype3.html] correctly, there really is no such thing as DATETIME value. Internally, it is stored as TEXT anyway.
>
> Or as a number. Your choice:
>
> • TEXT as ISO8601 strings ("YYYY-MM-DD HH:MM:SS.SSS").
> • REAL as Julian day numbers, the number of days since noon in Greenwich on November 24, 4714 B.C. according to the proleptic Gregorian calendar.
> • INTEGER as Unix Time, the number of seconds since 1970-01-01 00:00:00 UTC.
>
> Considering the amount of data you have, perhaps something like a unix time would be more, hmmm, more frugal.

Of course, if he is consistent for any of these he can just replace

>   WHERE Datetime(u.downloaded_on) >= Datetime(p.project_start)

with

  WHERE u.downloaded_on >= p.project_start

and make indexes which include the columns.  No conversion needed.  The only reason to need conversion is if the source data is in one format in some rows and another format in other rows.

Simon.
_______________________________________________
sqlite-users mailing list
[hidden email]
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: speeding up FTS4

Black, Michael (IS)
In reply to this post by Puneet Kishor-2
Your change to numeric date/time may not take a long as you think.



drop any indexes on project_start and downloaded_on;



update projects set project_start=julianday(project_start);

update uris set downloaded_on=julianday(downloaded_on);



Recreate indexes.



Modify your code to insert julianday('now','localtime') instead of taking the default current_timestamp.



I wasn't really aware before the the datetime functions stored as text all the time...that's bad for searches...





Michael D. Black

Senior Scientist

NG Information Systems

Advanced Analytics Directorate



________________________________
From: [hidden email] [[hidden email]] on behalf of Puneet Kishor [[hidden email]]
Sent: Wednesday, September 28, 2011 9:48 AM
To: General Discussion of SQLite Database
Subject: Re: [sqlite] EXT : speeding up FTS4


On Sep 28, 2011, at 9:44 AM, Simon Slavin wrote:

>
> On 28 Sep 2011, at 3:41pm, Puneet Kishor wrote:
>
>>   WHERE Datetime(u.downloaded_on) >= Datetime(p.project_start)
>
> Why are you doing 'Datetime' here ?  Not only does the conversion take time, but it means you can't usefully index either of those two columns.
>
> Can you instead store your stamps in a format which is readily sortable ?  Either in text form or as julian days.


Could I? Sure, if I had known better. Should I? I would be happy to create a new column, convert the values to julian days, and try that, but on a 27 GB db, that would take a bit of a while.

But, if I understood [http://www.sqlite.org/datatype3.html] correctly, there really is no such thing as DATETIME value. Internally, it is stored as TEXT anyway.


>
> Simon.
> _______________________________________________
> sqlite-users mailing list
> [hidden email]
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

_______________________________________________
sqlite-users mailing list
[hidden email]
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
_______________________________________________
sqlite-users mailing list
[hidden email]
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: speeding up FTS4

Puneet Kishor-2

On Sep 28, 2011, at 11:00 AM, Black, Michael (IS) wrote:

> Your change to numeric date/time may not take a long as you think.
>
>
>


Took an hour and a half.

Step 1: Alter all tables with datetime columns, converting those columns to integer;

Step 2: Update all tables setting new datetime columns to unixtime

UPDATE table SET new_column = strftime('%s', old_column);

Step 3: CREATE INDEX uris_downloaded_on ON uris (u_downloaded_on);

Step 4: Run the following query

        SELECT u.uri_id uri_id, u.uri uri, u.u_downloaded_on
        FROM fts_uri f
                JOIN uris u ON f.uri_id = u.uri_id
                JOIN feed_history fh ON u.feed_history_id = fh.feed_history_id
                JOIN feeds f ON fh.feed_id = f.feed_id
                JOIN projects p ON f.project_id = p.project_id
        WHERE p.project_id = 3
                AND u.u_downloaded_on >= p.u_project_start
                AND fts_uri MATCH 'education school'
        ORDER BY u.uri_id, u_downloaded_on DESC;

Terrible time.
>> CPU Time: user 27.584849 sys 115.219293

Step 5: EXPLAIN QUERY PLAN (above SELECT query)

        0|0|0|SCAN TABLE fts_uri AS f VIRTUAL TABLE INDEX 4: (~0 rows)
        0|1|1|SEARCH TABLE uris AS u USING INTEGER PRIMARY KEY (rowid=?) (~1 rows)
        0|2|2|SEARCH TABLE feed_history AS fh USING INTEGER PRIMARY KEY (rowid=?) (~1 rows)
        0|3|3|SEARCH TABLE feeds AS f USING INTEGER PRIMARY KEY (rowid=?) (~1 rows)
        0|4|4|SEARCH TABLE projects AS p USING INTEGER PRIMARY KEY (rowid=?) (~1 rows)
        0|0|0|USE TEMP B-TREE FOR ORDER BY
        CPU Time: user 0.000099 sys 0.000008

Step 6: ANALYZE;

Step 7: Run the above SELECT query again

Terrible time.
>> CPU Time: user 27.703538 sys 116.684390



This train is going nowhere. The times are actually worse than they were when I was using a non-text column for date time.

By the way, have tried this on two machines -- the previous one was a dual-Xeon Xserve with 12 GB RAM. The current machine of choice is the top of the line iMac (quad core 3.4 GHz Intel Core i7) with 12 GB RAM and a 7200 RPM SATA drive.


> drop any indexes on project_start and downloaded_on;
>
>
>
> update projects set project_start=julianday(project_start);
>
> update uris set downloaded_on=julianday(downloaded_on);
>
>
>
> Recreate indexes.
>
>
>
> Modify your code to insert julianday('now','localtime') instead of taking the default current_timestamp.
>
>
>
> I wasn't really aware before the the datetime functions stored as text all the time...that's bad for searches...
>
>
>
>
>
> Michael D. Black
>
> Senior Scientist
>
> NG Information Systems
>
> Advanced Analytics Directorate
>
>
>
> ________________________________
> From: [hidden email] [[hidden email]] on behalf of Puneet Kishor [[hidden email]]
> Sent: Wednesday, September 28, 2011 9:48 AM
> To: General Discussion of SQLite Database
> Subject: Re: [sqlite] EXT : speeding up FTS4
>
>
> On Sep 28, 2011, at 9:44 AM, Simon Slavin wrote:
>
>>
>> On 28 Sep 2011, at 3:41pm, Puneet Kishor wrote:
>>
>>>  WHERE Datetime(u.downloaded_on) >= Datetime(p.project_start)
>>
>> Why are you doing 'Datetime' here ?  Not only does the conversion take time, but it means you can't usefully index either of those two columns.
>>
>> Can you instead store your stamps in a format which is readily sortable ?  Either in text form or as julian days.
>
>
> Could I? Sure, if I had known better. Should I? I would be happy to create a new column, convert the values to julian days, and try that, but on a 27 GB db, that would take a bit of a while.
>
> But, if I understood [http://www.sqlite.org/datatype3.html] correctly, there really is no such thing as DATETIME value. Internally, it is stored as TEXT anyway.
>
>

_______________________________________________
sqlite-users mailing list
[hidden email]
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: speeding up FTS4

Simon Slavin-3

On 28 Sep 2011, at 6:44pm, Puneet Kishor wrote:

> Step 4: Run the following query
>
> SELECT u.uri_id uri_id, u.uri uri, u.u_downloaded_on
> FROM fts_uri f
> JOIN uris u ON f.uri_id = u.uri_id
> JOIN feed_history fh ON u.feed_history_id = fh.feed_history_id
> JOIN feeds f ON fh.feed_id = f.feed_id
> JOIN projects p ON f.project_id = p.project_id
> WHERE p.project_id = 3
> AND u.u_downloaded_on >= p.u_project_start
> AND fts_uri MATCH 'education school'
> ORDER BY u.uri_id, u_downloaded_on DESC;
>
> Terrible time.
>>> CPU Time: user 27.584849 sys 115.219293
>
> Step 5: EXPLAIN QUERY PLAN (above SELECT query)
>
> 0|0|0|SCAN TABLE fts_uri AS f VIRTUAL TABLE INDEX 4: (~0 rows)
> 0|1|1|SEARCH TABLE uris AS u USING INTEGER PRIMARY KEY (rowid=?) (~1 rows)
> 0|2|2|SEARCH TABLE feed_history AS fh USING INTEGER PRIMARY KEY (rowid=?) (~1 rows)
> 0|3|3|SEARCH TABLE feeds AS f USING INTEGER PRIMARY KEY (rowid=?) (~1 rows)
> 0|4|4|SEARCH TABLE projects AS p USING INTEGER PRIMARY KEY (rowid=?) (~1 rows)
> 0|0|0|USE TEMP B-TREE FOR ORDER BY
> CPU Time: user 0.000099 sys 0.000008

I'm not familiar with the output of EXPLAIN QUERY PLAN.  'SEARCH' means it's using an index ?  And SCAN means it isn't ?  Can anyone suggest a good INDEX or two ?

Simon.
_______________________________________________
sqlite-users mailing list
[hidden email]
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: speeding up FTS4

Black, Michael (IS)
In reply to this post by Puneet Kishor-2
strftime returns a text representation.  So you didn't really change anything.

You need to use juliandays() as I said.



And you want a REAL number...not integer...though SQLite doesn't really care what you call it.  It's more for your own reference.



You just added a bunch more strings increasing the size of your database...ergo it ran slower.



Michael D. Black

Senior Scientist

NG Information Systems

Advanced Analytics Directorate



________________________________
From: [hidden email] [[hidden email]] on behalf of Puneet Kishor [[hidden email]]
Sent: Wednesday, September 28, 2011 12:44 PM
To: General Discussion of SQLite Database
Subject: EXT :Re: [sqlite] speeding up FTS4


On Sep 28, 2011, at 11:00 AM, Black, Michael (IS) wrote:

> Your change to numeric date/time may not take a long as you think.
>
>
>


Took an hour and a half.

Step 1: Alter all tables with datetime columns, converting those columns to integer;

Step 2: Update all tables setting new datetime columns to unixtime

UPDATE table SET new_column = strftime('%s', old_column);

Step 3: CREATE INDEX uris_downloaded_on ON uris (u_downloaded_on);

Step 4: Run the following query

        SELECT u.uri_id uri_id, u.uri uri, u.u_downloaded_on
        FROM fts_uri f
                JOIN uris u ON f.uri_id = u.uri_id
                JOIN feed_history fh ON u.feed_history_id = fh.feed_history_id
                JOIN feeds f ON fh.feed_id = f.feed_id
                JOIN projects p ON f.project_id = p.project_id
        WHERE p.project_id = 3
                AND u.u_downloaded_on >= p.u_project_start
                AND fts_uri MATCH 'education school'
        ORDER BY u.uri_id, u_downloaded_on DESC;

Terrible time.
>> CPU Time: user 27.584849 sys 115.219293

Step 5: EXPLAIN QUERY PLAN (above SELECT query)

        0|0|0|SCAN TABLE fts_uri AS f VIRTUAL TABLE INDEX 4: (~0 rows)
        0|1|1|SEARCH TABLE uris AS u USING INTEGER PRIMARY KEY (rowid=?) (~1 rows)
        0|2|2|SEARCH TABLE feed_history AS fh USING INTEGER PRIMARY KEY (rowid=?) (~1 rows)
        0|3|3|SEARCH TABLE feeds AS f USING INTEGER PRIMARY KEY (rowid=?) (~1 rows)
        0|4|4|SEARCH TABLE projects AS p USING INTEGER PRIMARY KEY (rowid=?) (~1 rows)
        0|0|0|USE TEMP B-TREE FOR ORDER BY
        CPU Time: user 0.000099 sys 0.000008

Step 6: ANALYZE;

Step 7: Run the above SELECT query again

Terrible time.
>> CPU Time: user 27.703538 sys 116.684390



This train is going nowhere. The times are actually worse than they were when I was using a non-text column for date time.

By the way, have tried this on two machines -- the previous one was a dual-Xeon Xserve with 12 GB RAM. The current machine of choice is the top of the line iMac (quad core 3.4 GHz Intel Core i7) with 12 GB RAM and a 7200 RPM SATA drive.


> drop any indexes on project_start and downloaded_on;
>
>
>
> update projects set project_start=julianday(project_start);
>
> update uris set downloaded_on=julianday(downloaded_on);
>
>
>
> Recreate indexes.
>
>
>
> Modify your code to insert julianday('now','localtime') instead of taking the default current_timestamp.
>
>
>
> I wasn't really aware before the the datetime functions stored as text all the time...that's bad for searches...
>
>
>
>
>
> Michael D. Black
>
> Senior Scientist
>
> NG Information Systems
>
> Advanced Analytics Directorate
>
>
>
> ________________________________
> From: [hidden email] [[hidden email]] on behalf of Puneet Kishor [[hidden email]]
> Sent: Wednesday, September 28, 2011 9:48 AM
> To: General Discussion of SQLite Database
> Subject: Re: [sqlite] EXT : speeding up FTS4
>
>
> On Sep 28, 2011, at 9:44 AM, Simon Slavin wrote:
>
>>
>> On 28 Sep 2011, at 3:41pm, Puneet Kishor wrote:
>>
>>>  WHERE Datetime(u.downloaded_on) >= Datetime(p.project_start)
>>
>> Why are you doing 'Datetime' here ?  Not only does the conversion take time, but it means you can't usefully index either of those two columns.
>>
>> Can you instead store your stamps in a format which is readily sortable ?  Either in text form or as julian days.
>
>
> Could I? Sure, if I had known better. Should I? I would be happy to create a new column, convert the values to julian days, and try that, but on a 27 GB db, that would take a bit of a while.
>
> But, if I understood [http://www.sqlite.org/datatype3.html] correctly, there really is no such thing as DATETIME value. Internally, it is stored as TEXT anyway.
>
>

_______________________________________________
sqlite-users mailing list
[hidden email]
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
_______________________________________________
sqlite-users mailing list
[hidden email]
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: speeding up FTS4

Roger Andersson-2
  On 09/28/11 20:14, Black, Michael (IS) wrote:
> strftime returns a text representation.  So you didn't really change anything.
>
> You need to use juliandays() as I said.
>
>
> And you want a REAL number...not integer...though SQLite doesn't really care what you call it.  It's more for your own reference.
>
Assuming that second resolution is sufficient.

Would
  UPDATE table SET new_column = cast(strftime('%s', old_column) as integer);
make any difference?

/Roger


_______________________________________________
sqlite-users mailing list
[hidden email]
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: speeding up FTS4

Puneet Kishor-2
In reply to this post by Black, Michael (IS)

On Sep 28, 2011, at 1:14 PM, Black, Michael (IS) wrote:

> strftime returns a text representation.  So you didn't really change anything.
>


That's not true at all. I added u_downloaded_on (u_ for unixtime)

CREATE TABLE uris (
    uri_id INTEGER PRIMARY KEY,
    uri TEXT,
    uri_html TEXT,
    uri_content TEXT,
    downloaded_on DATETIME DEFAULT CURRENT_TIMESTAMP,
    feed_history_id INTEGER,
    u_downloaded_on INTEGER
);

sqlite> SELECT u_downloaded_on FROM uris LIMIT 5;
1306450769
1306450769
1306450770
1306450774
1306450776


> You need to use juliandays() as I said.
>
>
>
> And you want a REAL number...not integer...though SQLite doesn't really care what you call it.  It's more for your own reference.
>
>
>
> You just added a bunch more strings increasing the size of your database...ergo it ran slower.
>
>


If I understand correctly, the *size* of the database should not matter. Or, at least not matter as much. Imagine a database with one table with only one row but with so much content in that row that it is 27 GB vs. a database with a few million rows, each with small amount of content but together totaling 27 GB.

What should matter is using the indexes correctly. In this case, my query plan shows that I am hitting all the indexes. But, I think I am getting closer to understanding this.

I started backward with

sqlite> SELECT p.project_id
   ...> FROM projects p
   ...> WHERE p.project_id = 3;
3
CPU Time: user 0.000080 sys 0.000089

sqlite> SELECT f.feed_id, p.project_id
   ...> FROM projects p JOIN feeds f ON f.project_id = p.project_id
   ...> WHERE p.project_id = 3;
..
CPU Time: user 0.000239 sys 0.000170

SELECT fh.feed_history_id, f.feed_id, p.project_id
   ...> FROM projects p
   ...>   JOIN feeds f ON f.project_id = p.project_id
   ...>   JOIN feed_history fh ON f.feed_id = fh.feed_id
   ...> WHERE p.project_id = 3;
..
CPU Time: user 0.008491 sys 0.008054

SELECT u.uri_id, fh.feed_history_id, f.feed_id, p.project_id
   ...> FROM projects p
   ...>   JOIN feeds f ON f.project_id = p.project_id
   ...>   JOIN feed_history fh ON f.feed_id = fh.feed_id
   ...>   JOIN uris u ON fh.feed_history_id = u.feed_history_id
   ...> WHERE p.project_id = 3;


BOOM! Adding that last table makes my query way too slow.

Taking out the uris table and querying only the fts table

sqlite> SELECT uri_id
   ...> FROM fts_uri
   ...> WHERE fts_uri MATCH 'education school';


starts producing the results immediately, but gives me back way too many matches. I am now getting all the matches, but I want only the matches for project id = 3 or matches throttled by u_downloaded_on (both of which are achievable only via a JOIN with uris table).


>
> Michael D. Black
>
> Senior Scientist
>
> NG Information Systems
>
> Advanced Analytics Directorate
>
>
>
> ________________________________
> From: [hidden email] [[hidden email]] on behalf of Puneet Kishor [[hidden email]]
> Sent: Wednesday, September 28, 2011 12:44 PM
> To: General Discussion of SQLite Database
> Subject: EXT :Re: [sqlite] speeding up FTS4
>
>
> On Sep 28, 2011, at 11:00 AM, Black, Michael (IS) wrote:
>
>> Your change to numeric date/time may not take a long as you think.
>>
>>
>>
>
>
> Took an hour and a half.
>
> Step 1: Alter all tables with datetime columns, converting those columns to integer;
>
> Step 2: Update all tables setting new datetime columns to unixtime
>
> UPDATE table SET new_column = strftime('%s', old_column);
>
> Step 3: CREATE INDEX uris_downloaded_on ON uris (u_downloaded_on);
>
> Step 4: Run the following query
>
>        SELECT u.uri_id uri_id, u.uri uri, u.u_downloaded_on
>        FROM fts_uri f
>                JOIN uris u ON f.uri_id = u.uri_id
>                JOIN feed_history fh ON u.feed_history_id = fh.feed_history_id
>                JOIN feeds f ON fh.feed_id = f.feed_id
>                JOIN projects p ON f.project_id = p.project_id
>        WHERE p.project_id = 3
>                AND u.u_downloaded_on >= p.u_project_start
>                AND fts_uri MATCH 'education school'
>        ORDER BY u.uri_id, u_downloaded_on DESC;
>
> Terrible time.
>>> CPU Time: user 27.584849 sys 115.219293
>
> Step 5: EXPLAIN QUERY PLAN (above SELECT query)
>
>        0|0|0|SCAN TABLE fts_uri AS f VIRTUAL TABLE INDEX 4: (~0 rows)
>        0|1|1|SEARCH TABLE uris AS u USING INTEGER PRIMARY KEY (rowid=?) (~1 rows)
>        0|2|2|SEARCH TABLE feed_history AS fh USING INTEGER PRIMARY KEY (rowid=?) (~1 rows)
>        0|3|3|SEARCH TABLE feeds AS f USING INTEGER PRIMARY KEY (rowid=?) (~1 rows)
>        0|4|4|SEARCH TABLE projects AS p USING INTEGER PRIMARY KEY (rowid=?) (~1 rows)
>        0|0|0|USE TEMP B-TREE FOR ORDER BY
>        CPU Time: user 0.000099 sys 0.000008
>
> Step 6: ANALYZE;
>
> Step 7: Run the above SELECT query again
>
> Terrible time.
>>> CPU Time: user 27.703538 sys 116.684390
>
>
>
> This train is going nowhere. The times are actually worse than they were when I was using a non-text column for date time.
>
> By the way, have tried this on two machines -- the previous one was a dual-Xeon Xserve with 12 GB RAM. The current machine of choice is the top of the line iMac (quad core 3.4 GHz Intel Core i7) with 12 GB RAM and a 7200 RPM SATA drive.
>
> ..

_______________________________________________
sqlite-users mailing list
[hidden email]
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: speeding up FTS4

Black, Michael (IS)
In reply to this post by Roger Andersson-2
'scuse me....I was wrong (again)...I guess strftime does return an integer....seems to me that belies the name as it's a mismatch to the unix function.



select strftime('%s','now');

1317236583



But I think you may want:

strftime('%s','now','unixepoch','localtime');



That works with the default datetime() then when you select it.

sqlite> select datetime(strftime('%s','now'),'unixepoch','localtime');
2011-09-28 14:05:09



So...given that you actually stuck integers into your table I'm a bit surprised that it took longer.





Michael D. Black

Senior Scientist

NG Information Systems

Advanced Analytics Directorate



________________________________
From: [hidden email] [[hidden email]] on behalf of Roger Andersson [[hidden email]]
Sent: Wednesday, September 28, 2011 1:52 PM
To: [hidden email]
Subject: EXT :Re: [sqlite] speeding up FTS4

  On 09/28/11 20:14, Black, Michael (IS) wrote:
> strftime returns a text representation.  So you didn't really change anything.
>
> You need to use juliandays() as I said.
>
>
> And you want a REAL number...not integer...though SQLite doesn't really care what you call it.  It's more for your own reference.
>
Assuming that second resolution is sufficient.

Would
  UPDATE table SET new_column = cast(strftime('%s', old_column) as integer);
make any difference?

/Roger


_______________________________________________
sqlite-users mailing list
[hidden email]
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
_______________________________________________
sqlite-users mailing list
[hidden email]
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: speeding up FTS4

Roger Andersson-2
  On 09/28/11 21:10, Black, Michael (IS) wrote:
>
> 'scuse me....I was wrong (again)...I guess strftime does return an
> integer....seems to me that belies the name as it's a mismatch to the
> unix function.
>
>
?
SQLite version 3.7.8 2011-09-19 14:49:19
Enter ".help" for instructions
Enter SQL statements terminated with a ";"
sqlite> select typeof(strftime('%s','now','localtime'));
text

/Roger
_______________________________________________
sqlite-users mailing list
[hidden email]
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: speeding up FTS4

Petite Abeille-2
In reply to this post by Puneet Kishor-2

On Sep 28, 2011, at 9:00 PM, Puneet Kishor wrote:

> If I understand correctly, the *size* of the database should not matter. Or, at least not matter as much.

So she said. But contrary to popular believe, size does matter.

_______________________________________________
sqlite-users mailing list
[hidden email]
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
12