generic advice for insert performance in-memory database optimization

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

generic advice for insert performance in-memory database optimization

Nelson, Erik - 2
I've got an in-memory database with a single table that I need to fill with ~500 million rows.  There are no indexes and the table definitions is

create table data(id int, path int, month int, val1 double, val2 double, val3 double... val20 double)

I'm running on linux with the OS page size configured to 4096 and ~380 GB of ram (much more than required for the table so I think I'm not swapping) and haven't altered the sqlite page size.  I am using sqlite version 3.13.0 and these pragmas immediately after database creation.

pragma temp_store = MEMORY
pragma journal_mode = off

With these settings I'm seeing nonlinear (in a bad way) times for the insert. Is that expected?  I've fiddled about with various performance-related settings like described on

https://blog.devart.com/increasing-sqlite-performance.html

with varying results but haven't managed to arrive at fairly-linear insert behavior.  It's a single-threaded insert on a prepared query with bound arguments in a tight loop.

Is linear-ish insert time a reasonable goal for an in-memory database?

----------------------------------------------------------------------
This message, and any attachments, is for the intended recipient(s) only, may contain information that is privileged, confidential and/or proprietary and subject to important terms and conditions available at http://www.bankofamerica.com/emaildisclaimer.   If you are not the intended recipient, please delete this message.
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: generic advice for insert performance in-memory database optimization

Simon Slavin-3


On 21 Dec 2017, at 7:06pm, Nelson, Erik - 2 <[hidden email]> wrote:

> I'm running on linux with the OS page size configured to 4096 and ~380 GB of ram (much more than required for the table so I think I'm not swapping) and haven't altered the sqlite page size.  I am using sqlite version 3.13.0 and these pragmas immediately after database creation.
>
> pragma temp_store = MEMORY
> pragma journal_mode = off
>
> With these settings I'm seeing nonlinear (in a bad way) times for the insert. Is that expected?

Thank you for your detailed description of your setup, which saved a lot of back-and-forth.  Your blog page shows a good understanding of SQLite and I can find no obvious errors in your code or your reasoning.  Your Blog post includes a very nice report of comparative testing.

The inconsistency you’re seeing is the result of caching.  Once the amount of data busts the amount of cache memory available, the operations have to wait for storage access, which takes a lot longer than RAM access.  I’m not talking about the cache under control of SQLite, but the amount of memory allocated to the program by the OS.  The differences you show for the various PRAGMA changes are not unexpected and don’t suggest any fault with your setup.

I’m not familiar with Windows as a dev platform, but from what I see there, Windows is busting a cache (OS process memory ? storage ? both ?) near the upper end of your testing range.

As you say in your comments, your macOS and Windows platforms are not comparable because the hardware isn’t identical.  I thought it worth repeating that here before someone gets hooked up on the faster figures for the Mac.

Simon.
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: generic advice for insert performance in-memory database optimization

Keith Medcalf
In reply to this post by Nelson, Erik - 2

Seems constant to me.  I'd show you, but Windows has no protection for the OS when system memory is exhausted so the whole computer go kaboom.

However, memory size growth was constant, and insert time was pretty constant (which includes the overhead of generating random values etc).  Go boom when all RAM is full at 31 GB (about 140,000,000 records).  Running it again and getting it to stop at 120,000,000 inserts showed as follows:

>testinsert.py
sys.version_info(major=3, minor=6, micro=3, releaselevel='final', serial=0)
Row(journal_mode='off')
Row(journal_mode='off')
Row(synchronous=0)
insert into data values (?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?);
   1000000     9.751389    1000000     9.751389
   2000000    19.518461    1000000     9.767072
   3000000    29.595865    1000000    10.077404
   4000000    40.991316    1000000    11.395450
   5000000    52.543077    1000000    11.551762
   6000000    64.178037    1000000    11.634959
   7000000    75.347824    1000000    11.169787
   8000000    86.437358    1000000    11.089534
   9000000    97.772551    1000000    11.335193
  10000000   109.154860    1000000    11.382308
  11000000   120.132103    1000000    10.977243
  12000000   131.221607    1000000    11.089504
  13000000   142.555831    1000000    11.334224
  14000000   153.595233    1000000    11.039402
  15000000   164.749915    1000000    11.154682
  16000000   175.282969    1000000    10.533054
  17000000   186.272268    1000000    10.989299
  18000000   197.274567    1000000    11.002299
  19000000   208.608760    1000000    11.334193
  20000000   219.676239    1000000    11.067479
  21000000   230.995386    1000000    11.319148
  22000000   242.520074    1000000    11.524688
  23000000   254.029814    1000000    11.509740
  24000000   265.071223    1000000    11.041409
  25000000   276.094553    1000000    11.023329
  26000000   286.886328    1000000    10.791775
  27000000   298.175561    1000000    11.289233
  28000000   308.891103    1000000    10.715541
  29000000   319.565535    1000000    10.674432
  30000000   330.200866    1000000    10.635331
  31000000   340.765001    1000000    10.564136
  32000000   351.553740    1000000    10.788738
  33000000   362.221151    1000000    10.667412
  34000000   373.168309    1000000    10.947158
  35000000   383.733089    1000000    10.564780
  36000000   394.125019    1000000    10.391931
  37000000   403.954467    1000000     9.829448
  38000000   414.158771    1000000    10.204304
  39000000   424.097607    1000000     9.938836
  40000000   434.161623    1000000    10.064016
  41000000   444.116177    1000000     9.954554
  42000000   454.414414    1000000    10.298238
  43000000   464.478468    1000000    10.064053
  44000000   474.776956    1000000    10.298488
  45000000   485.001610    1000000    10.224653
  46000000   495.143748    1000000    10.142138
  47000000   505.129763    1000000     9.986015
  48000000   515.271824    1000000    10.142061
  49000000   525.554411    1000000    10.282587
  50000000   536.149616    1000000    10.595204
  51000000   546.135350    1000000     9.985735
  52000000   556.402481    1000000    10.267131
  53000000   566.919720    1000000    10.517239
  54000000   578.046368    1000000    11.126648
  55000000   588.990274    1000000    10.943906
  56000000   599.987536    1000000    10.997262
  57000000   610.828424    1000000    10.840888
  58000000   621.324319    1000000    10.495896
  59000000   632.009688    1000000    10.685369
  60000000   642.563769    1000000    10.554081
  61000000   653.049639    1000000    10.485870
  62000000   664.206841    1000000    11.157203
  63000000   674.797024    1000000    10.590183
  64000000   685.523797    1000000    10.726773
  65000000   696.855562    1000000    11.331765
  66000000   708.156918    1000000    11.301356
  67000000   718.994999    1000000    10.838081
  68000000   729.723788    1000000    10.728789
  69000000   740.643086    1000000    10.919299
  70000000   751.140253    1000000    10.497167
  71000000   761.664491    1000000    10.524239
  72000000   772.482858    1000000    10.818367
  73000000   783.014121    1000000    10.531263
  74000000   794.438486    1000000    11.424365
  75000000   805.444951    1000000    11.006465
  76000000   816.032293    1000000    10.587342
  77000000   826.511347    1000000    10.479054
  78000000   837.076599    1000000    10.565252
  79000000   847.528613    1000000    10.452014
  80000000   857.831194    1000000    10.302581
  81000000   868.286178    1000000    10.454984
  82000000   878.765237    1000000    10.479059
  83000000   889.114724    1000000    10.349487
  84000000   900.053886    1000000    10.939162
  85000000   909.930211    1000000     9.876325
  86000000   919.790808    1000000     9.860597
  87000000   929.651311    1000000     9.860503
  88000000   939.527506    1000000     9.876195
  89000000   949.356789    1000000     9.829282
  90000000   959.186170    1000000     9.829381
  91000000   969.031142    1000000     9.844972
  92000000   978.891677    1000000     9.860535
  93000000   988.752464    1000000     9.860787
  94000000   998.597371    1000000     9.844907
  95000000  1008.457877    1000000     9.860506
  96000000  1018.318437    1000000     9.860560
  97000000  1028.163434    1000000     9.844996
  98000000  1038.024322    1000000     9.860888
  99000000  1047.854005    1000000     9.829683
 100000000  1057.761420    1000000     9.907415
 101000000  1067.637771    1000000     9.876351
 102000000  1078.089400    1000000    10.451629
 103000000  1088.756892    1000000    10.667492
 104000000  1099.582771    1000000    10.825879
 105000000  1110.237257    1000000    10.654486
 106000000  1121.181479    1000000    10.944222
 107000000  1132.092627    1000000    10.911147
 108000000  1142.934579    1000000    10.841952
 109000000  1153.866774    1000000    10.932196
 110000000  1164.808998    1000000    10.942224
 111000000  1175.667999    1000000    10.859001
 112000000  1186.503937    1000000    10.835938
 113000000  1197.440143    1000000    10.936207
 114000000  1208.192860    1000000    10.752717
 115000000  1218.891432    1000000    10.698572
 116000000  1229.737401    1000000    10.845968
 117000000  1240.846065    1000000    11.108665
 118000000  1251.648885    1000000    10.802819
 119000000  1262.131912    1000000    10.483027
 120000000  1273.303061    1000000    11.171149

Source code:

#! python3

from __future__ import print_function

import apsw
import apswrow
import random
import sys
import time

sql_create = """
create table data
(
    id int,
    path int,
    month int,
    val1 double,
    val2 double,
    val3 double,
    val4 double,
    val5 double,
    val6 double,
    val7 double,
    val8 double,
    val9 double,
    val10 double,
    val11 double,
    val12 double,
    val13 double,
    val14 double,
    val15 double,
    val16 double,
    val17 double,
    val18 double,
    val19 double,
    val20 double
);
"""
print(sys.version_info)
db = apsw.Connection(':memory:')
db.cursor().execute(sql_create)

for row in db.cursor().execute('pragma journal_mode=off; pragma synchronous=off;'):
    print(row)

for row in db.cursor().execute('pragma journal_mode; pragma synchronous;'):
    print(row)

stmt = 'insert into data values (' + ','.join('?'*23) + ');'
print(stmt)

st = time.time()
lt = time.time()
rows = 1000000

# db.cursor().execute('begin immediate;')
id = 0
while id < 120000000:
    id += 1
    data = [id, id, id]
    for i in range(20):
        data.append(random.random())
    db.cursor().execute(stmt, data)
    if id % rows == 0:
        # db.cursor().execute('commit;')
        ct = time.time()
        print('%10d %12.6f %10d %12.6f' % (id, ct-st, rows, ct-lt))
        lt = ct
        # db.cursor().execute('begin immediate;')

# db.cursor().execute('commit;')
db.close()

---
The fact that there's a Highway to Hell but only a Stairway to Heaven says a lot about anticipated traffic volume.

>-----Original Message-----
>From: sqlite-users [mailto:sqlite-users-
>[hidden email]] On Behalf Of Nelson, Erik - 2
>Sent: Thursday, 21 December, 2017 12:06
>To: SQLite mailing list
>Subject: [sqlite] generic advice for insert performance in-memory
>database optimization
>
>I've got an in-memory database with a single table that I need to
>fill with ~500 million rows.  There are no indexes and the table
>definitions is
>
>create table data(id int, path int, month int, val1 double, val2
>double, val3 double... val20 double)
>
>I'm running on linux with the OS page size configured to 4096 and
>~380 GB of ram (much more than required for the table so I think I'm
>not swapping) and haven't altered the sqlite page size.  I am using
>sqlite version 3.13.0 and these pragmas immediately after database
>creation.
>
>pragma temp_store = MEMORY
>pragma journal_mode = off
>
>With these settings I'm seeing nonlinear (in a bad way) times for the
>insert. Is that expected?  I've fiddled about with various
>performance-related settings like described on
>
>https://blog.devart.com/increasing-sqlite-performance.html
>
>with varying results but haven't managed to arrive at fairly-linear
>insert behavior.  It's a single-threaded insert on a prepared query
>with bound arguments in a tight loop.
>
>Is linear-ish insert time a reasonable goal for an in-memory
>database?
>
>---------------------------------------------------------------------
>-
>This message, and any attachments, is for the intended recipient(s)
>only, may contain information that is privileged, confidential and/or
>proprietary and subject to important terms and conditions available
>at http://www.bankofamerica.com/emaildisclaimer.   If you are not the
>intended recipient, please delete this message.
>_______________________________________________
>sqlite-users mailing list
>[hidden email]
>http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users



_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users