Sqlite low level and Speed.

classic Classic list List threaded Threaded
13 messages Options
Reply | Threaded
Open this post in threaded view
|

Sqlite low level and Speed.

Yuriy-6
Hello sqlite-users,

Sqlite low level and Speed.

Sorry for my bad English.

I try use  Sqlite  for  Operation FAST grouping  strings (delete duplicates)

I have input array of strings

String1
String2
String3
String1
String2
???
StringN


Need delete dublicates.

Output database
String1
String2
String3
??..
StringN

1. Example of the decision

CREATE TABLE testtable (val text)
CREATE INDEX index_val ON [testtable]([val]);
PRAGMA synchronous = OFF;

for i:=1 to 10000000 do
begin
select * from testable where  val=StringN
if val NOT Exist  insert into testtable
end

Very Slow.

2. Example of the decision


CREATE TABLE testtable (val text,hashval integer)
CREATE INDEX index_hashval ON [testtable]([ hashval]);
PRAGMA synchronous = OFF;

for i:=1 to 10000000 do
begin
select * from testable where  hashval=hash(StringN)and(val= StringN)
if val NOT Exist  insert into testtable
end

Very Slow.


3. Example of the decision

I find good example for SQLite Low level functions SQlite VS BDB.

http://rganesan.blogspot.com/

But this example use such functions as

sqlite3BtreeOpen
sqlite3BtreeInsert
sqlite3BtreeCursor

I Use Windows and Delphi. In the SQlite3.dll no import this functions.

Please Help me. May be need recompile sql I Have Visual C++ and C++Builder
If need recompile as make it is on Windows Platform?

May be present other decision this problem? Need more speed for grouping big arrays of strings

Thanks you.
 

--
Best regards,
 Yuriy                          mailto:[hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Sqlite low level and Speed.

BertV
use transactions, speeds up a lot

Op maandag 20 juni 2005 11:35, schreef Yuriy:

> Hello sqlite-users,
>
> Sqlite low level and Speed.
>
> Sorry for my bad English.
>
> I try use  Sqlite  for  Operation FAST grouping  strings (delete
> duplicates)
>
> I have input array of strings
>
> String1
> String2
> String3
> String1
> String2
> ………
> StringN
>
>
> Need delete dublicates.
>
> Output database
> String1
> String2
> String3
> ……..
> StringN
>
> 1. Example of the decision
>
> CREATE TABLE testtable (val text)
> CREATE INDEX index_val ON [testtable]([val]);
> PRAGMA synchronous = OFF;
>
> for i:=1 to 10000000 do
> begin
> select * from testable where  val=StringN
> if val NOT Exist  insert into testtable
> end
>
> Very Slow.
>
> 2. Example of the decision
>
>
> CREATE TABLE testtable (val text,hashval integer)
> CREATE INDEX index_hashval ON [testtable]([ hashval]);
> PRAGMA synchronous = OFF;
>
> for i:=1 to 10000000 do
> begin
> select * from testable where  hashval=hash(StringN)and(val= StringN)
> if val NOT Exist  insert into testtable
> end
>
> Very Slow.
>
>
> 3. Example of the decision
>
> I find good example for SQLite Low level functions SQlite VS BDB.
>
> http://rganesan.blogspot.com/
>
> But this example use such functions as
>
> sqlite3BtreeOpen
> sqlite3BtreeInsert
> sqlite3BtreeCursor
>
> I Use Windows and Delphi. In the SQlite3.dll no import this functions.
>
> Please Help me. May be need recompile sql I Have Visual C++ and C++Builder
> If need recompile as make it is on Windows Platform?
>
> May be present other decision this problem? Need more speed for grouping
> big arrays of strings
>
> Thanks you.

--
Met vriendelijke groet
Bert Verhees
ROSA Software
Reply | Threaded
Open this post in threaded view
|

Re[2]: Sqlite low level and Speed.

Yuriy-6
BV> use transactions, speeds up a lot
Sorry transaction i also use speed low :(

begin
c:=0;
for i:=1 to 10000000 do
begin
select * from testable where  val=StringN
if val NOT Exist  insert into testtable

 inc(c);
  if c=10000 then
      begin
      Commit;
      Begin;
      c:=0;
      end;

end

commit

--
Best regards,
 Yuriy                            mailto:[hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Sqlite low level and Speed.

Christian Smith
In reply to this post by Yuriy-6
On Mon, 20 Jun 2005, Yuriy wrote:

>Hello sqlite-users,
>
>Sqlite low level and Speed.
>
>Sorry for my bad English.
>
>I try use  Sqlite  for  Operation FAST grouping  strings (delete duplicates)
>
>I have input array of strings
>
>String1
>String2
>String3
>String1
>String2
>???
>StringN
>
>
>Need delete dublicates.
>
>Output database
>String1
>String2
>String3
>??..
>StringN
>
>1. Example of the decision
>
>CREATE TABLE testtable (val text)
>CREATE INDEX index_val ON [testtable]([val]);

When creating testtable, specify val as unique, and specify what to do
with conflicts:
CREATE TABLE testtable(val TEXT UNIQUE ON CONFLICT REPLACE);

The conflict clauses are documented here:
http://www.sqlite.org/lang_conflict.html

Note, the following does not work as expected (as I expected, at least!)
CREATE TABLE testtable (val text);
CREATE INDEX index_val ON testtable(val) ON CONFLICT REPLACE;

The conflict clause appears to be ignored.


>PRAGMA synchronous = OFF;
>
>for i:=1 to 10000000 do
>begin
>select * from testable where  val=StringN
>if val NOT Exist  insert into testtable
>end
>
>Very Slow.
>
>2. Example of the decision
>
>
>CREATE TABLE testtable (val text,hashval integer)
>CREATE INDEX index_hashval ON [testtable]([ hashval]);
>PRAGMA synchronous = OFF;
>
>for i:=1 to 10000000 do
>begin
>select * from testable where  hashval=hash(StringN)and(val= StringN)
>if val NOT Exist  insert into testtable
>end
>
>Very Slow.
>
>
>3. Example of the decision
>
>I find good example for SQLite Low level functions SQlite VS BDB.
>
>http://rganesan.blogspot.com/
>
>But this example use such functions as
>
>sqlite3BtreeOpen
>sqlite3BtreeInsert
>sqlite3BtreeCursor
>
>I Use Windows and Delphi. In the SQlite3.dll no import this functions.
>
>Please Help me. May be need recompile sql I Have Visual C++ and C++Builder
>If need recompile as make it is on Windows Platform?


The Btree functions are internal to SQLite and subject to change without
notice. Hence they are not exported.

If you want access to such functions, you must compile your own library
exporting the functions on Windows, and fix problems in the future if this
API changes at all.


>
>May be present other decision this problem? Need more speed for grouping
>big arrays of strings


Group them by what criteria? How about just using an aggregate select or
group by?


>
>Thanks you.
>

Christian


--
    /"\
    \ /    ASCII RIBBON CAMPAIGN - AGAINST HTML MAIL
     X                           - AGAINST MS ATTACHMENTS
    / \
Reply | Threaded
Open this post in threaded view
|

Re[2]: Sqlite low level and Speed.

Yuriy-6
CS> When creating testtable, specify val as unique, and specify what to do
CS> with conflicts:
CS> CREATE TABLE testtable(val TEXT UNIQUE ON CONFLICT REPLACE);
CS> The conflict clauses are documented here:
CS> http://www.sqlite.org/lang_conflict.html
Try it for ~1,000,000 UNIQUE records very slow :(
and me need allso count this UNIQUE records. now i try keep count in
the memory.

slDBPath := ExtractFilepath(application.exename)+ 'test.db';
sldb := TSQLiteDatabase.Create(slDBPath);
sldb.execsql('PRAGMA synchronous = OFF;');

  sSQL:='CREATE TABLE testtable(val TEXT UNIQUE ON CONFLICT REPLACE);';
  sldb.execsql(sSQL);
//  sSQL:='CREATE TABLE testtable (val text);';
//  sldb.execsql(sSQL);
//  sSQL:='CREATE INDEX index_val ON testtable(val) ON CONFLICT IGNORE;';
//  sldb.execsql(sSQL);
  GlobalTimer:=Now();

  sldb.BeginTransaction;
  Randomize;
  c:=0;

  sSQL := 'INSERT INTO testtable(val) VALUES (?);';
  if Sqlite3_Prepare(sldb.fDB, PChar(sSQL), -1, Stmt, NextSQLStatement) <> SQLITE_OK then  begin ShowMessage('Error executing SQL') end;

  for i:=1 to 1000000 do
  begin
  ip:=IntToStr(random(255))+'.'+IntToStr(random(255))+'.'+'.'+IntToStr(random(255))+'.'+IntToStr(random(255));
  sqlite3_bind_text(Stmt,1, PChar(ip),Length(ip),nil);
  Sqlite3_step(Stmt);
  SQLite3_Reset(Stmt);

  inc(c);
  if c=10000 then
      begin
      sldb.Commit;
      sldb.BeginTransaction;
      c:=0;
      end;
  end;

  sldb.Commit;
  sldb.Free;


CS> If you want access to such functions, you must compile your own library
CS> exporting the functions on Windows, and fix problems in the future if this
CS> API changes at all.
Can do this on Visual C++
May be have examples? Or steps?

CS> Group them by what criteria? How about just using an aggregate select or
CS> group by?
No if table have more 1,000,000 very slow grouping

But i my case heave ONE big table and one small grouping table mini OLAP :)



--
Best regards,
 Yuriy                            mailto:[hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Sqlite low level and Speed.

Jay Sprenkle
In reply to this post by Yuriy-6
On 6/20/05, Yuriy <[hidden email]> wrote:

> 1. Example of the decision
>
> CREATE TABLE testtable (val text)
> CREATE INDEX index_val ON [testtable]([val]);
> PRAGMA synchronous = OFF;
>
> for i:=1 to 10000000 do
> begin
> select * from testable where  val=StringN
> if val NOT Exist  insert into testtable
> end
>
> Very Slow.

for i:=1 to 10000000 do
begin tran
insert or on conflict fail into testtable(val) values(StringN);
commit
end for
Reply | Threaded
Open this post in threaded view
|

Re: Sqlite low level and Speed.

Will Leshner

On Jun 20, 2005, at 7:42 AM, Jay Sprenkle wrote:

> for i:=1 to 10000000 do
> begin tran
> insert or on conflict fail into testtable(val) values(StringN);
> commit
> end for
>


Shouldn't the transaction wrap the loop?
Reply | Threaded
Open this post in threaded view
|

Re: Sqlite low level and Speed.

Jay Sprenkle
Yes. Sorry, haven't had my morning caffeine yet. ;)

On 6/20/05, Will Leshner <[hidden email]> wrote:

>
> On Jun 20, 2005, at 7:42 AM, Jay Sprenkle wrote:
>
> > for i:=1 to 10000000 do
> > begin tran
> > insert or on conflict fail into testtable(val) values(StringN);
> > commit
> > end for
> >
>
>
> Shouldn't the transaction wrap the loop?
>


--
---
You a Gamer? If you're near Kansas City:
Conquest 36
https://events.reddawn.net

The Castles of Dereth Calendar: a tour of the art and architecture of
Asheron's Call
http://www.lulu.com/content/77264
Reply | Threaded
Open this post in threaded view
|

Re[2]: Sqlite low level and Speed.

Christian Smith
In reply to this post by Yuriy-6
On Mon, 20 Jun 2005, Yuriy wrote:

>CS> When creating testtable, specify val as unique, and specify what to do
>CS> with conflicts:
>CS> CREATE TABLE testtable(val TEXT UNIQUE ON CONFLICT REPLACE);
>CS> The conflict clauses are documented here:
>CS> http://www.sqlite.org/lang_conflict.html
>Try it for ~1,000,000 UNIQUE records very slow :(
>and me need allso count this UNIQUE records. now i try keep count in
>the memory.


What are you actually trying to do? And can you quantify "very slow" and
tell us what you actually expect or what would be acceptable?


> [snip sample]


Is this representitive of what you are trying to do? Are you storing IP
addresses, and you want to discard duplicates? Using the "on conflict"
resolution is probably your fastest course of action.


>
>
>CS> If you want access to such functions, you must compile your own library
>CS> exporting the functions on Windows, and fix problems in the future if this
>CS> API changes at all.
>
>Can do this on Visual C++
>May be have examples? Or steps?


I don't work with VC, but I presume you can just create a new DLL project
(or workspace?) and add the SQLite source files, add your required build
flags, and create a DLL def file to export the various Btree functions you
require.


Christian


--
    /"\
    \ /    ASCII RIBBON CAMPAIGN - AGAINST HTML MAIL
     X                           - AGAINST MS ATTACHMENTS
    / \
Reply | Threaded
Open this post in threaded view
|

Re: Re[2]: Sqlite low level and Speed.

Paul G-3

----- Original Message -----
From: "Christian Smith" <[hidden email]>
To: "Christian Smith" <[hidden email]>; "Yuriy" <[hidden email]>
Sent: Monday, June 20, 2005 12:01 PM
Subject: Re[2]: [sqlite] Sqlite low level and Speed.


> On Mon, 20 Jun 2005, Yuriy wrote:
>
> >CS> When creating testtable, specify val as unique, and specify what to
do

> >CS> with conflicts:
> >CS> CREATE TABLE testtable(val TEXT UNIQUE ON CONFLICT REPLACE);
> >CS> The conflict clauses are documented here:
> >CS> http://www.sqlite.org/lang_conflict.html
> >Try it for ~1,000,000 UNIQUE records very slow :(
> >and me need allso count this UNIQUE records. now i try keep count in
> >the memory.
>
>
> What are you actually trying to do? And can you quantify "very slow" and
> tell us what you actually expect or what would be acceptable?
>
>
> > [snip sample]
>
>
> Is this representitive of what you are trying to do? Are you storing IP
> addresses, and you want to discard duplicates? Using the "on conflict"
> resolution is probably your fastest course of action.

if all he's doing is discarding duplicate strings, with no requirement for
persistent storage, it is easily done with a primitive hash table
implementation. could probably be done efficiently in less than a hundred
lines of c, most of which could be adapted from some example code.

-p

Reply | Threaded
Open this post in threaded view
|

Re[4]: Sqlite low level and Speed.

Yuriy-6
CS> What are you actually trying to do? And can you quantify "very slow" and
CS> tell us what you actually expect or what would be acceptable?
100,000 oll ok 7 seconds
1,000,000 software halt :(

CS> Is this representitive of what you are trying to do? Are you storing IP
CS> addresses, and you want to discard duplicates? Using the "on conflict"
CS> resolution is probably your fastest course of action.
I write log analyzer and want to use sqlite as database.

All Operations in my software grouping big list strings.
and need the fastest speed.
if use "group by" it is slow :(

>if all he's doing is discarding duplicate strings, with no requirement for
>persistent storage, it is easily done with a primitive hash table
>implementation. could probably be done efficiently in less than a hundred
>lines of c, most of which could be adapted from some example code.
Yes need disk-based hash or btree. But SQLite in the low level it
Disk-Based Btree.


--
Best regards,
 Yuriy                            mailto:[hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Re[4]: Sqlite low level and Speed.

Puneet Kishor

On Jun 20, 2005, at 11:24 AM, Yuriy wrote:

> CS> What are you actually trying to do? And can you quantify "very
> slow" and
> CS> tell us what you actually expect or what would be acceptable?
> 100,000 oll ok 7 seconds
> 1,000,000 software halt :(
>
> CS> Is this representitive of what you are trying to do? Are you
> storing IP
> CS> addresses, and you want to discard duplicates? Using the "on
> conflict"
> CS> resolution is probably your fastest course of action.
> I write log analyzer and want to use sqlite as database.
>
> All Operations in my software grouping big list strings.
> and need the fastest speed.
> if use "group by" it is slow :(
>
>> if all he's doing is discarding duplicate strings, with no
>> requirement for
>> persistent storage, it is easily done with a primitive hash table
>> implementation. could probably be done efficiently in less than a
>> hundred
>> lines of c, most of which could be adapted from some example code.

or, a couple of three lines of Perl.

> Yes need disk-based hash or btree. But SQLite in the low level it
> Disk-Based Btree.
>

Pre-process the log file, creating a hash with the unique field as the
key. Then, loop over the hash and insert it in your db.

If memory is a constraint, don't bother even creating a hash. Loop over
the log file, create an array, sort the array, remove the duplicates,
then insert it in the db, making sure that you have AutoCommit off and
commits every 10k or 100k records.

Should be done in a few seconds. To give you an idea, I once de-duped a
file with 320 million rows of duplicate email addresses in about 120
seconds on an ancient, creaking iBook. A million records should be a
piece of cake.


--
Puneet Kishor

Reply | Threaded
Open this post in threaded view
|

Re[6]: Sqlite low level and Speed.

Yuriy-6
Hello Puneet,

PK> Pre-process the log file, creating a hash with the unique field as the
PK> key. Then, loop over the hash and insert it in your db.

PK> If memory is a constraint, don't bother even creating a hash. Loop over
PK> the log file, create an array, sort the array, remove the duplicates,
PK> then insert it in the db, making sure that you have AutoCommit off and
PK> commits every 10k or 100k records.


PK> Should be done in a few seconds. To give you an idea, I once de-duped a
PK> file with 320 million rows of duplicate email addresses in about 120
PK> seconds on an ancient, creaking iBook. A million records should be a
PK> piece of cake.

If array very big more then 1GB? As you it is sort and remove?
External Sorting?
May be simple example or article?

Or you sorting in the memory?

Thank you.

--
Best regards,
 Yuriy                            mailto:[hidden email]