Thread: Casting?

    #31
  1. No Profile Picture
    Contributing User
    Devshed Frequenter (2500 - 2999 posts)

    Join Date
    Oct 2003
    Location
    Germany
    Posts
    2,777
    Rep Power
    348
    Originally Posted by bjquinniii
    where do I find/subscribe to those?
    http://www.postgresql.org/list/
    I will not read nor answer questions where the SQL code is messy and not formatted properly using [code] tags.
    http://forums.devshed.com/misc.php?do=bbcode#code

    Tips on how to ask better questions:
    http://tkyte.blogspot.de/2005/06/how-to-ask-questions.html
    http://wiki.postgresql.org/wiki/SlowQueryQuestions
    http://catb.org/esr/faqs/smart-questions.html
  2. #32
  3. No Profile Picture
    Contributing User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Oct 2013
    Posts
    158
    Rep Power
    10
    http://www.postgresql.org/list/
  4. #33
  5. No Profile Picture
    Contributing User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Nov 2013
    Posts
    30
    Rep Power
    1

    hstore


    Originally Posted by shammat
    Adding a new (nullable) column to an existing table is basically a no-op and is extremely fast:
    Code:
    psql (9.3.0, server 9.2.2)
    Type "help" for help.
    
    postgres=# select count(*)
    postgres-# from foo;
    
      count
    ----------
     29999999
    (1 row)
    
    
    Time: 3466,883 ms
    postgres=# ALTER TABLE foo ADD COLUMN bar varchar(20);
    ALTER TABLE
    Time: 1,458 ms
    postgres=#
    Another alternative to the "dynamic attributes" problem is the hstore datatype which is essentially a key/value store inside a single row:
    Code:
    create table ip_attributes
    (
      address inet not null primary key,
      attributes hstore
    );
    
    insert into ip_attributes
    values
    ('10.1.1.1'),
    ('10.1.1.2'),
    ('10.1.1.9');
    
    select address, attributes -> 'ping' as ping, attributes -> 'nslookup' as nslookup
    from ip_attributes
    order by 1;
    
    address  | ping | nslookup
    ---------+------+---------
    10.1.1.1 |      |         
    10.1.1.2 |      |         
    10.1.1.9 |      |         
    
    update ip_attributes
      set attributes = hstore('ping', 'OK')
    where address = '10.1.1.1'::inet;
    
    address  | ping | nslookup
    ---------+------+---------
    10.1.1.1 | OK   |         
    10.1.1.2 |      |         
    10.1.1.9 |      |         
    
    update ip_attributes
      set attributes = hstore('nslookup', 'Zaphod')
    where address = '10.1.1.2'::inet;
    
    address  | ping | nslookup
    ---------+------+---------
    10.1.1.1 | OK   |         
    10.1.1.2 |      | Zaphod  
    10.1.1.9 |      |         
    
    update ip_attributes
      set attributes = hstore('nslookup', 'Arthur')
    where address = '10.1.1.1'::inet;
    
    select address, attributes -> 'ping' as ping, attributes -> 'nslookup' as nslookup
    from ip_attributes
    where attributes ? 'nslookup'
    order by 1;
    
    address  | ping | nslookup
    ---------+------+---------
    10.1.1.1 |      | Arthur  
    10.1.1.2 |      | Zaphod
    You can add any attribute without the need to alter the table. The column can be indexed and a queries testing the keys or the values are extremely fast (e.g.: attributes ? 'nslookup' or where attributes @> hstore('ping','OK'))

    More details in the manual: http://www.postgresql.org/docs/current/static/hstore.html
    How do I add the hstore module? I'm using 9.3 (64 bit Windows)...
  6. #34
  7. No Profile Picture
    Contributing User
    Devshed Frequenter (2500 - 2999 posts)

    Join Date
    Oct 2003
    Location
    Germany
    Posts
    2,777
    Rep Power
    348
    Originally Posted by bjquinniii
    How do I add the hstore module? I'm using 9.3 (64 bit Windows)...
    Code:
    create extension hstore;
    http://www.postgresql.org/docs/current/static/sql-createextension.html
    I will not read nor answer questions where the SQL code is messy and not formatted properly using [code] tags.
    http://forums.devshed.com/misc.php?do=bbcode#code

    Tips on how to ask better questions:
    http://tkyte.blogspot.de/2005/06/how-to-ask-questions.html
    http://wiki.postgresql.org/wiki/SlowQueryQuestions
    http://catb.org/esr/faqs/smart-questions.html
  8. #35
  9. No Profile Picture
    Contributing User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Nov 2013
    Posts
    30
    Rep Power
    1
    Originally Posted by shammat
    Code:
    create extension hstore;
    http://www.postgresql.org/docs/current/static/sql-createextension.html
    wow, simple, thank-you... would have been awhile before I stumbled on that (was searching for things like "install module").
  10. #36
  11. No Profile Picture
    Contributing User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Nov 2013
    Posts
    30
    Rep Power
    1

    progress, I think...


    Originally Posted by shammat
    That is unnecessary, because that number never changes.

    Sorry I don't get it. Can you add some sample queries where you would need this? Are you aware that Postgres' inet data type supports an "ip range" as?

    Postgres also has pretty powerful range datatypes - I don't really understand your real problem but maybe they might be of some help as well:
    http://www.postgresql.org/docs/current/static/rangetypes.html

    You still didn't describe the real problem you are trying to solve with this table. If you want to store attributes per IP address, I would probably only store those IP addresses where I know the additional attributes. The final CSV file could then be generated by "exploding" that table to all IP addresses using the generate_series() function. Something like this:

    Code:
    create table ip_attributes
    (
      address inet,
      ping_status varchar(10),
      whois       varchar(100)  
    );
    
    insert into ip_attributes
    values
    ('10.1.1.1', 'NOT OK', null),
    ('10.1.1.2', 'OK', 'Arthur Dent'),
    ('10.1.1.9', 'NOT OK', 'Zaphod Beeblebrox');
    
    with all_addresses (address) as 
    (
      select '10.1.1.0'::inet + i
      from generate_series(1,10) i
    ) 
    select aa.address, attr.ping_status, attr.whois
    from all_addresses aa
      left join ip_attributes attr on attr.address = aa.address
    ;
    will output the following:

    Code:
    address   | ping_status | whois            
    ----------+-------------+------------------
    10.1.1.1  | NOT OK      |                  
    10.1.1.2  | OK          | Arthur Dent      
    10.1.1.3  |             |                  
    10.1.1.4  |             |                  
    10.1.1.5  |             |                  
    10.1.1.6  |             |                  
    10.1.1.7  |             |                  
    10.1.1.8  |             |                  
    10.1.1.9  | NOT OK      | Zaphod Beeblebrox
    10.1.1.10 |             |
    By changing the parameter for the generate_series() call you can generate any number of ip addresses you like without actually storing them.
    So, I'm pretty excited. Started playing/testing with the inet and hstore types and the generate_series function...

    Code:
    create extension hstore;
    
    create table ipv4_test
    (
       ipv4 inet
      ,designation varchar(64)
      ,whois varchar(32)
      ,status varchar(16)
      ,country varchar(8)
      ,attributes hstore
     );
    
    insert into ipv4_test (ipv4,designation,status)
    select '0.0.0.0'::inet + i,'IANA - Local Identification','RESERVED' from generate_series(0,256*256*256-1) i;
    
    select count(*) from ipv4_test;
    The insert ran in about 6 1/2 minutes with a rowcount of 16,777,216

    Code:
    select * from ipv4_test limit 10;
    "0.0.0.0";"IANA - Local Identification";"";"RESERVED";"";""
    "0.0.0.1";"IANA - Local Identification";"";"RESERVED";"";""
    "0.0.0.2";"IANA - Local Identification";"";"RESERVED";"";""
    "0.0.0.3";"IANA - Local Identification";"";"RESERVED";"";""
    "0.0.0.4";"IANA - Local Identification";"";"RESERVED";"";""
    "0.0.0.5";"IANA - Local Identification";"";"RESERVED";"";""
    "0.0.0.6";"IANA - Local Identification";"";"RESERVED";"";""
    "0.0.0.7";"IANA - Local Identification";"";"RESERVED";"";""
    "0.0.0.8";"IANA - Local Identification";"";"RESERVED";"";""
    "0.0.0.9";"IANA - Local Identification";"";"RESERVED";"";""

    And now I have the skeleton for the first block. I also created indexes on the ipv4, country and attributes columns (trying gin).

    Another 255 insert statements and I'll have the whole thing... well, apart from actually gathering up all of the attributes I want, oh, and partitioning enough space, and...
  12. #37
  13. No Profile Picture
    Contributing User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Oct 2013
    Posts
    158
    Rep Power
    10
    Another 255 insert statements and I'll have the whole thing... well, apart from actually gathering up all of the attributes I want, oh, and partitioning enough space, and..
    You should probably also tune PgSQL for transactionlogs while you are inserting; otherwise your database may be very busy recycling logs that have absolutly no use during this fase of the operation.

    I would also partition before completing the inserts, at least well before you start indexing stuff.
  14. #38
  15. No Profile Picture
    Contributing User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Nov 2013
    Posts
    30
    Rep Power
    1

    PgSQL tuning?


    Originally Posted by Vinny42
    You should probably also tune PgSQL for transactionlogs while you are inserting; otherwise your database may be very busy recycling logs that have absolutly no use during this fase of the operation.

    I would also partition before completing the inserts, at least well before you start indexing stuff.
    Too late, for this go round anyways. As of this morning, the first 101 insert blocks had finished (000/8 - 100/8) and I launched the next pile (101/8 - 200/8) to run while I'm at work.

    Right now I'm mainly monitoring size and query by IP address performance (so far, so good). I'm retaining all of the build scripts so that I can redo the who mess with better configuration settings (and possibly hardware) later.

    What do you mean by "tune PgSQL for transactionlogs"? (Postgres newbie here)

    I created the indexes before doing any of the inserts, so those should be getting built as I go, right?
    One exception was that I did not understand the creation process for primary keys at the beginning, so that didn't get created until 000/8 - 050/8 were already in (it was created using the already built index on the ipv4 address field).

    When I checked things this morning, I was showing just under 1.7 billion rows. The script running right now should get me about 4/5 of the way to the complete 3.4 billion rows...
  16. #39
  17. No Profile Picture
    Contributing User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Oct 2013
    Posts
    158
    Rep Power
    10
    In short: PostgreSQL writes every change it's about to make to your tables in a log before it actually makes the change. Changes don't get into your table untill PostgreSQL has verified that the change is in the log, and once the change has been made an verified in the actuial table, the log is updated to mark the change as "done".

    This way, if there is a serious problem like a powercut or some exploding software, then at startup the database can compare the logs to the actual tables and decide which changes have not been completed in the tables and it can either roll them forward or roll them back, to bring the table into a stable state again.

    Writing these logs takes time and diskspace and by default PostgreSQL writes chunks of 16MB and it only reserves three chunks or so. Given that you are inserting like a maniac right now, chances are that you don't have enough log space and PostgreSQL will try to recycle the logs while you are inserting, which takes time.

    It's usually better to increase the number of chunks so that you have enough 16MB blocks toe hold all the changes you are making during one transaction (PostgreSQL runs every query inside it's own implicite transaction) or between checkpoints. A checkpoint is a moment in time at which the database is 100% certain that all the changes in the log have been processed so the logs and the tables are in sync.

    Read these docs. It's a lot, but it can save you lots of time and effort.

    More about the Write-Ahead-Logs:
    http://www.postgresql.org/docs/9.3/static/wal-configuration.html

    What a Checkpoint is:
    http://www.postgresql.org/docs/9.3/static/sql-checkpoint.html

    General tuning, explains the parameters and how to set them (roughly)
    http://wiki.postgresql.org/wiki/Tuning_Your_PostgreSQL_Server
  18. #40
  19. No Profile Picture
    Contributing User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Nov 2013
    Posts
    30
    Rep Power
    1
    Originally Posted by Vinny42
    In short: PostgreSQL writes every change it's about to make to your tables in a log before it actually makes the change. Changes don't get into your table untill PostgreSQL has verified that the change is in the log, and once the change has been made an verified in the actuial table, the log is updated to mark the change as "done".

    This way, if there is a serious problem like a powercut or some exploding software, then at startup the database can compare the logs to the actual tables and decide which changes have not been completed in the tables and it can either roll them forward or roll them back, to bring the table into a stable state again.

    Writing these logs takes time and diskspace and by default PostgreSQL writes chunks of 16MB and it only reserves three chunks or so. Given that you are inserting like a maniac right now, chances are that you don't have enough log space and PostgreSQL will try to recycle the logs while you are inserting, which takes time.

    It's usually better to increase the number of chunks so that you have enough 16MB blocks toe hold all the changes you are making during one transaction (PostgreSQL runs every query inside it's own implicite transaction) or between checkpoints. A checkpoint is a moment in time at which the database is 100% certain that all the changes in the log have been processed so the logs and the tables are in sync.

    Read these docs. It's a lot, but it can save you lots of time and effort.

    More about the Write-Ahead-Logs:
    http://www.postgresql.org/docs/9.3/static/wal-configuration.html

    What a Checkpoint is:
    http://www.postgresql.org/docs/9.3/static/sql-checkpoint.html

    General tuning, explains the parameters and how to set them (roughly)
    http://wiki.postgresql.org/wiki/Tuning_Your_PostgreSQL_Server
    Interesting, thank-you.
  20. #41
  21. No Profile Picture
    Contributing User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Nov 2013
    Posts
    30
    Rep Power
    1

    current status - 3.38 billion rows


    I'm fairly impressed (new to PostgreSQL) -

    3,383,420,000 rows

    Code:
    CREATE TABLE ipv4
    (
      ipv4 inet NOT NULL,
      designation character varying(64),
      whois character varying(32),
      status character varying(16),
      country character varying(8),
      attributes hstore,
      CONSTRAINT pk_ipv4 PRIMARY KEY (ipv4)
    )
    WITH (
      OIDS=FALSE
    );
    ALTER TABLE ipv4
      OWNER TO postgres;
    Right now, only ipv4, designation, whois and status populated. But with pretty much "newbie, out of the box" configuration:

    Code:
    select ipv4 from ipv4 order by ipv4 desc limit 10;/* 154 ms */
    select ipv4 from ipv4 order by ipv4 limit 10;/* 41 ms */
    select * from ipv4 where ipv4='192.168.5.101';/* 15 ms */
    Liking it so far
  22. #42
  23. No Profile Picture
    Contributing User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Oct 2013
    Posts
    158
    Rep Power
    10
    Well, PostgreSQL does kick bottom :-)
  24. #43
  25. No Profile Picture
    Contributing User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Nov 2013
    Posts
    30
    Rep Power
    1

    Count and Update


    Originally Posted by Vinny42
    Well, PostgreSQL does kick bottom :-)
    Insert scripts finished. Running a count now (right click - count from gui) to see if I have the correct number of rows (total should be 4,294,967,296 if my math is right, estimated count only shows 4,151,620,000).

    My next step (once I have verification of correct number of rows) will be to update the country column. What I have is a cidr format list of country allocations. What will be my most efficient way to update those rows? Should I do some kind of join update based on exploding each range using a generate_sequence? Or just a simple update using a "where between" on the ipv4 field (which is type inet and is the primary key)?
  26. #44
  27. No Profile Picture
    Contributing User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Oct 2013
    Posts
    158
    Rep Power
    10
    Well, I would definately not store the country for each separate IP, because you have 4bln IP's and only 200 or-so countries, so you'd be wasting billions of integers linking IP's to the same countries over and over.

    I'd keep the countries table as a range plus country. That also saves you several lots of maintenance when a range changes (a few records in the country table vs millions of updates in the IP table)
  28. #45
  29. No Profile Picture
    Contributing User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Nov 2013
    Posts
    30
    Rep Power
    1

    4.3 billion rows (just shy)


    Originally Posted by Vinny42
    Well, I would definately not store the country for each separate IP, because you have 4bln IP's and only 200 or-so countries, so you'd be wasting billions of integers linking IP's to the same countries over and over.

    I'd keep the countries table as a range plus country. That also saves you several lots of maintenance when a range changes (a few records in the country table vs millions of updates in the IP table)
    Normally I would agree, but in this case my design goal is to keep query response time at a minimum. I think I would rather pay the storage cost of the two extra characters (country code) on each row as opposed to the processing cost of some sort of range comparison join on lookup.

    Even if I'm wrong, and I end up going with a separate table in the final production configuration, it is still an update I want to learn how to do...

    Speaking of which (query response time that is), the count finished and I do indeed have the expected number of rows (4,294,967,296). The same queries I ran before are now taking 10-11 ms to run.

    Code:
    select ipv4 from ipv4 order by ipv4 desc limit 10;
    select ipv4 from ipv4 order by ipv4 limit 10;
    select * from ipv4 where ipv4='192.168.5.101';
    Though there was some strange behavior. The first time I ran the first query, it took 23 seconds to complete. After that, all three queries dropped down to the 10 ms area. What would cause that? Was it because it was the first query I had run since adding the last billion or so rows and the index was in a funky state? Or might it have been because of the count operation?

IMN logo majestic logo threadwatch logo seochat tools logo