ActiveRecord: Better native types mappings for PostgreSQL

After watching, @pvh’s talk about PostgreSQL data types, I started looking what default types ActiveRecord uses for PostgreSQL when running a database migration.

By default, the ActiveRecord PostgreSQL adapter uses serial for primary keys, timestamp without time zone for datetime/timestamp, and varchar for strings (ref).

Based on the suggestions of the talk, You probably want to use bigserial instead of serial, because of the larger amount of serial values (uuids are another option), and always timestamp with time zones.

I was able to change the default through this initializer file in my application:

# config/initializers/active_record_better_native_database_types.rb
require "active_record/connection_adapters/postgresql_adapter"
module ActiveRecord
 module ConnectionAdapters
   class PostgreSQLAdapter
     NATIVE_DATABASE_TYPES.merge!(
       primary_key: "bigserial primary key",
       datetime:  { name: "timestamptz" },
       timestamp: { name: "timestamptz" }
       string:    { name: "text" }
     )
   end
 end
end

NOTE: varchar and text are the same. This change is just personal preference (because is short!). Varchar limit was removed in Rails 4.2.

Unfortunately, this patch is not backward compatible, so it’s better for new applications. Still, you can use these data types in new migrations:

class CreateUsers < ActiveRecord::Migration
 def change
   create_table :persons, id: :bigserial do |t|
     t.text :first_name
     t.text :last_name
     t.column :created_at, :timestamptz, null: false
     t.column :updated_at, :timestamptz, null: false
   end
 end
end

Update: There is an open pull request in the Rails’ GitHub repository to use bigserial as default: https://github.com/rails/rails/pull/24962. Looks like there are plans for timestamptz by default too.

Update 2: Rails 5 has a feature to change the default primary key type when generating database migrations: http://blog.mccartie.com/2015/10/20/default-uuid's-in-rails.html

B-Tree Index: Suffix Lookup

I’m watching PGCon 2016 talks and I really recommend you to see Peter van Hardenberg’s talk: Data Types in PostgreSQL. He gives a great overview over the data types shipped in Postgres and how to create your own. One little trick caught my eye, he showed how to support suffix matching in indexed text columns.

In Postgres, search terms with a prefix can use an index, like:

explain (costs false)
select
  *
from
  users
where
  name like 'Franc%'
QUERY PLAN
--------------------------------------------------------------------
Index Scan using index_users_on_name on users
   Index Cond: ((email ~>=~ 'Franc'::text)
                AND (email ~<~ 'Frand'::text))

In fact, the more selective the prefix before the wildcard is, the faster the index lookup will be.

However, if the search term starts with a wildcard the database has to do a full table scan.

explain (costs false)
select
  *
from
  users
where
  email like '%gmail.com'
QUERY PLAN
--------------------------------------------------------------------
Seq Scan on users
   Filter: (email ~~ '%gmail.com'::text)

As mentioned in the talk, you can use a function based index:

create index index_users_on_email_suffix on users (reverse(email));

Then, suffix matches will be supported by this trick:

explain (costs false)
select
  *
from
  users
where
  reverse(email) like reverse('%gmail.com')
QUERY PLAN
--------------------------------------------------------------------
Bitmap Heap Scan on users
   Filter: (reverse(email) ~~ 'moc.liamg%'::text)
   ->  Bitmap Index Scan on index_users_on_email_suffix
         Index Cond: ((reverse(email) ~>=~ 'moc.liamg'::text)
           AND (reverse(email) ~<~ 'moc.liamh'::text))