About Me

My photo
I am currently working as a DB Consultant with Ashnik. We are into Open Source and Cloud based enterprise class solutions. Prior to this I was working as a DBA @ Misys India (Banking Services). Working on IBM DB2, Oracle Microsoft SQL Server. Selected as IBM Champion for Data Management 2011. I like bit of photography, cycling, poetry and I just love gadgets!!

Friday, December 12, 2014

Easing the Pain of Migrating from SQL Server to PostgreSQL


Migrating from any kind of software running on a product from one vendor to a product from another vendor is never easy or risk-free. You always have to go through proper testing cycles and at times there might be changes required in application working on top of product (OS, Database, App server, Web Server etc) being migrated.

Same applies to relational databases as well. Though fundamentally all the relational database are supposed to adhere to same relational theory, still they have some specific implementations of commands and functions. This leads to application changes and with those changes, comes a need of additional efforts for re-testing. The pain is quite much relieved by the fact that relational databases tries to adhere to ANSI-SQL syntax. But they all have few exceptions while they try to comply with ANSI standards.

In my day job as DB Solution Architect, I come across one or two new migration cases almost every month. And at times these are Migration from SQL Server to PostgreSQL. In my experience and opinion SQL Server is one implementation of Relational Theory which has most deviations from ANSI syntax. For example- all RDBMS use double pipe [ || ] as concatenation operator whereas SQL Server (for some good or bad reason) chose it to be +.

Another feature which I have seen being a pain while migrating is 'case-insensitive searches'. All databases support that feature if you would use UPPER( ) function on string before comparing them. But that is the default and only behavior in SQL Server database, if you have created with a collation ending with CI_AS or CI_AI (meaning case insensitive and ascent sensitive/insensitive). I have faced that challenge of migrating these customers.

When you look at the database while assessing it for migration it all looks fine. The SQL Server database has only tables (no programmable objects e.g. procedures or functions) with no 'funny' data-types [again the whole world calls timestamp as data-time and SQL Server uses it as a binary datatype for Optimistic Locking :) ]. You migrate it over (well quite easily), and finally when you connect the application you start getting error left-right-and-center!

Well PostgreSQL is not just a stable alternative for proprietary databases but is also offers great flexibility to make itself a popular alternative. Let me share two examples where-in PostgreSQL can really make it easy for you to migrate over.
In PostgreSQL you can create user defined casts and user defined operators. So to over come the issues I have mentioned above, let me create user defined operators and test-


1. + as Concatenation operator-

Let's first create a function which can concatenate two strings
___________________________________

CREATE FUNCTION 
           concat_with_plus(varchar, varchar) 
          RETURNS varchar
    AS $$ SELECT $1 || $2 $$
    LANGUAGE SQL;
___________________________________

Now let's create a custom operator which has two arguments ( left=string and right=string )

___________________________________


CREATE OPERATOR + (
    LEFTARG = varchar,
    RIGHTARG = varchar ,
    PROCEDURE = concat_with_plus
);
___________________________________

If you notice the operator uses the function that we have defined above.

Here is how it would work-

___________________________________

postgres=# select 'sameer' + 'kumar';
  ?column?
-------------
 sameerkumar
(1 row)
___________________________________

2. Enable Case-insensitive search for all the tables and columns

All we need to do is create an operator '=' overriding the default behavior.
Before that let's first define a function which matches two string irrespective of their case.

______________________________________

CREATE FUNCTION 
varchar_equal_caseinsenstive(varchar, varchar) 
    RETURNS boolean
    AS $$ SELECT upper($1)=upper($2) $$
    LANGUAGE SQL;
______________________________________


Let's now create an operator '=' using this functions

______________________________________

CREATE OPERATOR = (
    LEFTARG = varchar,
    RIGHTARG = varchar,
    PROCEDURE = varchar_equal_caseinsenstive,
    COMMUTATOR = =,
    NEGATOR = !=,
    HASHES, MERGES
);
______________________________________

Here is how it would work-

______________________________________

postgres=# SELECT * FROM test_varchar 
postgres-# WHERE col1='SAMEER';
  col1
--------
 sameer
 Sameer
 SameeR
 SAmeeR
 Sameer
<<more rows in result>>
______________________________________

And if you define an index over UPPER(COL1), 

______________________________________

CREATE INDEX upper_col_test_varchar ON 
      test_varchar(UPPER(col1));
______________________________________


it would be used for index scans/bitmap scans-
____________________________________________________________________________

postgres=# EXPLAIN SELECT * FROM test_varchar WHERE col1='SAMEER';
                                       QUERY PLAN                         
----------------------------------------------------------------------------------------
 Bitmap Heap Scan on test_varchar  (cost=11.77..394.81 rows=432 width=10)
   Recheck Cond: (upper((col1)::text) = 'SAMEER'::text)
   ->  Bitmap Index Scan on upper_col_test_varchar  (cost=0.00..11.66 rows=432 width=0)
         Index Cond: (upper((col1)::text) = 'SAMEER'::text)
 Planning time: 2.685 ms
(5 rows)

____________________________________________________________________________


These are one of the most frequently deployed tricks when I migrate my customer from SQL Server to PostgreSQL. These wrappers are written inside your database. You can create them on template0 database and hence all the databases created there after will have these casts. This is not only easy but also does away with the requirement of changing application code. 

Stay tuned and we will be sharing more such tips and tricks which helps you break away from the lock-in of proprietary database vendors and move to a much more cost effective solution.

In case you are looking for professional services for migration you may contact my employers Ashnik (www.ashnik.com | success@ashnik.com). We offer wide range of PostgreSQL services - Migration,  Heterogeneous Replication, Hardening, Tuning, High Availability, DR Setup and anything that has to do with PostgreSQL.

Tuesday, November 18, 2014

Emulate redo Multiplexing in PostgreSQL


When it comes to reliability and recover-ability, PostgreSQL is as good a database as any other RDBMS implementation we see in the market. Postgres very strongly complies with ACID philosophy of  Relational Databases. It has Multi-version concept where the older snapshot of rows/data is preserved until any transaction which may need it lives (something similar to UNDO tablespaces or rollback segment of Oracle). The database recover-ability is ensured by WAL segments which you can relate to redo log files of Oracle or transaction logs of SQL Server. These WAL files are useful for single phase recovery (after an OS or hardware crash). You can enable Archiving of these WAL files and copy them to a separate disk or server and use it for complex recovery. I will talk about this at length in one of my next blogs.

One thing which I have seen Oracle DBAs boast about is multiplexing of redo log files by creating multiple redo log groups. Well that is something which is not there is PostgreSQL out of box (and mainly because it's not really needed given that we have RAID devices and multiplexing/mirroring at OS/storage level). Still if you have a need to multiplex your WAL segments you can do that quite easily v9.2 onwards with pg_receivexlog.

Do pg_receivexlog is a utility which is based on streaming replication and can stream the WAL files in real time. Unlike archive_command it does not require WAL files to be completely filled (16MB) before they can sent over. The files replicated/streamed can be used for archiving or replication.

You can setup a local stream for which you need to-

1. set max_wal_senders to a value 1 or greater. If you have setup replication and using pg_basebackup please count them too.

                     max_wal_senders=2

2. Set the wal_level to archive or hot_standby
                     wal_level=hot_standby

3. Set pg_hba.conf to allow replication connections from local (UNIX Socket)

                      local  replication postgres        trust

4. Restart your DB server after these changes
                     pg_ctl restart -m fast

5. Start pg_receivexlog-
                    pg_receivexlog -D /usr/local/pgsql/wal_2 -U postgres

Make sure that the PGHOST is not set in the session from where you are firing this command.

This command will copy (stream) the WALs from a server locally (on default port pointed by PGPORT) and store them in secondary WAL location - /usr/local/pgsql/wal_2


Now if your server crashes and you lose your WAL directory, all you need to do is look for a file with ".partial" as extension in your secondary WAL location (/usr/local/pgsql/wal_2), rename the file to remove the extension. Then copy the content of secondary WAL location to primary location and restart the server. The last chuck of changes were present in this ".partial" file which otherwise would not have got archived.


Note: There is one caveat though, the files in secondary WAL directory will not get rotated or removed automatically, you may do that on periodic basis with pg_archivecleanup command.


Wednesday, June 11, 2014

Postgres Hangout #2

Watch Ashnik’s Google Hangout event on PostgreSQL right here on 28th May 2014, 4.30 pm SGT! Talking onTips & Tricks on Tuning Slow running SQLs
Are slow running SQLs slackening your application? Have a bad performing query and hand and don’t know what to do? Catch us live and keep learning!



Tips & Tricks on Tuning Slow Running SQLs in PostgreSQL



Watch the prequel over here
Follow @Ashnikbiz on Twitter to get latest updates of next Hangout.

Postgres Hangout #1

Watch Ashnik’s Video Blog showcasing ‘Busting that slow running SQL’

Watch the recording of first ever Postgres Hangout and see how I took viewers through finding the slow running queries. The event is hosted by Ashnik.

Follow @Ashnikbiz on Twitter to get latest updates of next Hangout.



Tuesday, June 3, 2014

25 Features your Developers are missing when not using Postgres


I recently met a few prospects who were keen to know what Postgres can offer to their developers which they weren’t already getting while using the contemporary databases. So I thought of putting together a list.
Though the list refers to features of Postgres Plus Advanced Server, lot of these features are applicable for PostgreSQL too. I have found great resources and documentation for “how to use” references for most of these features and hence I am not going to repeat that here. Let me share the list and some quick reference/user guides for each of them.
1.       JSON Datatype for storing non-relational data in JSON document- More details http://www.postgresql.org/docs/9.3/static/datatype-json.html

2.       JSON Functions to convert relational data into JSON format- More details http://www.postgresql.org/docs/9.3/static/functions-json.html

3.       HSTORE datatype for storing non-relational data as key value pair- More details http://www.postgresql.org/docs/9.1/static/hstore.html

4.       Range Datatype- DATE Range and Integer Range for storing RANGE [minValue-maxValue] together instead of having to store minValue and maxValue in separate columns- For More details- http://www.postgresql.org/docs/9.2/static/rangetypes.html

5.       Boolean Datatype: Support for Boolean as datatype for columns- For more details- http://www.postgresql.org/docs/9.3/static/datatype-boolean.html

6.       Support for TIME datatype- As far as I remember TIME data-type was not present in DB2 (supported by Oracle)

7.       Infinite Cache for Application Performance- Exclusive to EnterpriseDB Postgres Plus

8.       Handling and Playing with Time Series Data is much more simpler and easier in PostgreSQL: http://no0p.github.io/postgresql/2014/05/08/timeseries-tips-pg.html

9.       Rich Foriegn Data Wrapper Support for querying form various different sources as foriegn table- More details here: http://wiki.postgresql.org/wiki/Foreign_data_wrappers

10.   UNLOGGED Tables: You can create unlogged tables which does not have WAL (redo-log equivalent) overhead for write operations.

11.   Serializable Snapshot Isolation: SSI provides features of Serializable transaction but without the performance impacts More details: http://wiki.postgresql.org/wiki/SSI and http://wiki.postgresql.org/wiki/Serializable

12.   NOT VALID Constraints: Add a check constraint/FK for new incoming data but ignore any inconsistency in existing data. This will make adding constraints faster. Later, you can clean-up the data and validate the constraint

13.   Filter Based Indexes: You can index a particular column based on some filter. This can be useful for indexing columns based on criteria e.g. index on employee_table(dept) where hasLeft=false

14.   Create Recursive Views using shorthand syntax - More details here: https://wiki.postgresql.org/wiki/What's_new_in_PostgreSQL_9.3#Recursive_View_Syntax

15.   Define sort memory for each query- Gives you flexibility to assign more resources for one query which takes huge resources. You can define work_mem at user, transaction, query/statement or at DB level.

16.   LATERAL JOIN: A LATERAL JOIN enables a subquery in the FROM part of a clause to reference columns from preceding items in the FROM list. More details: https://wiki.postgresql.org/wiki/What's_new_in_PostgreSQL_9.3#LATERAL_JOIN

17.   Define User defined Operators- Flexibility to define user defined operators makes it easy to define your own operators. e.g while migrating from SQL Server one can define a new Operator '+' with two string operands which would work similar to concatenation operator of SQL Server (+).

18.   Rich Language Support: Create user defined objects in Pl/pgSQL, PL/SQL, pl/perl, pl/python, pl/java etc. Your developers can choose a language of their choice for implementing the Stored Procedure

19.   IS NOT NULL queries can make use of indexes- I have seen RDBMS which boasts about its optimizer failing to do so

20.   GRANT/REVOKE access on all tables of a particular schema- More general thatn grant access on each table but at the same time more granular than "SELECT ANY TABLE" and similar privileges

21.   LIMIT and OFFSET features- Limit the number of rows in your output using LIMIT keyword. You can further set an OFFSET from where to start fetching using OFFSET

22.   Granular security and logging: Set User privileges and logging parameter per user/role

23.   Full Text Search Features- More details here- http://www.postgresql.org/docs/9.3/static/textsearch.html

24.   Use indexes for getting statistics on the fly- This feature can help you get a better plan when you are reading data from a table which has been loaded with bulk data recently

25.   COPY JDBC API: Use COPY API to do bulk load into table via JDBC. Check my earlier post on this- http://pgpen.blogspot.com/2013/05/using-copy-in-your-jdbc-code.html


Remember, I have not even added the awesome new features which we will see in PostgreSQL v9.4 [https://wiki.postgresql.org/wiki/What's_new_in_PostgreSQL_9.4].

If you are developer who is not using PostgreSQL then now you know what you have missing all along. If you are a CIO/IT Manager or Product Manager then you know what great features your developers have been deprived off. What are you waiting for? 


If you are a happy postgreSQL user and you think I missed any features, please help me update this list. Thanks in advance!