Sunday, August 25, 2013

Unit testing SQLAlchemy apps

I help out a lot on the #sqlalchemy channel on Freenode IRC. When people ask questions there, one of the issues that comes up often is unit testing applications that use SQLAlchemy. Almost all developers instinctively use SQLite for testing due to its simplicity and the possibility to use an in-memory database that leaves no garbage behind to be cleaned up. It seems like a clean and easy solution, yes? Well, my answer is that if you don't plan on deploying with SQLite support, don't test on SQLite! Always use the production RDBMS for testing. Going to deploy with PostgreSQL? Then test with PostgreSQL! Why?

There are at least two good reasons why testing and deploying with the same RDBMS is a good idea. The first and foremost is that SQLite is vastly different from other RDBMS's. For one, it does not really enforce column types, so code that erroneously inputs data of the wrong type won't cause errors when it should. There are also many semantic differences on how embedded and your typical client-server RDBMS's work, so you may run into bugs that only occur in production while all the tests pass just fine. The second reason is that SQLite's rather modest design, which lets it fit into small memory spaces, is also a big hindrance since it can't support some more advanced database features like window functions or recursive queries. This shortcoming prevents you from taking full advantage of the features of your chosen RDBMS.

If I managed to convince you, then you'll probably be asking how testing should be done on RDBMS's other than SQLite. The answer boils down to whether your RDBMS supports two crucial features: nested transactions and transactional DDL. Nested transactions are savepoints within a transaction, to which you can roll back without losing any changes done before the savepoint. Transactional DDL means that in addition to normal data modification (INSERT, UPDATE, DELETE), schema changes are also transactional. That means they can be rolled back, which is a very nice thing to have when unit testing. According to the article linked to above, the following RDBMS's support transactional DDL: PostgreSQL, SyBase, Microsoft SQL Server, DB2, Informix and Firebird. Most notably, MySQL and Oracle do not support it.

If your RDBMS does support the aforementioned two features, then you can conveniently test your SQLAlchemy apps in the following manner:
  1. Make sure you have an empty database for unit testing
  2. Create the engine, create one connection and start a transaction in it
  3. Create the tables
  4. Optionally, insert test fixtures
  5. For every test, repeat:
    1. Create a savepoint
    2. Run the test
    3. Roll back to the savepoint
  6. Roll back the transaction
This way all the tests are isolated from each other. Every tests gets the same "blank slate" as far as the database state is concerned. After the tests are done, the database will look just as empty as before the test suite was run, even if the tests were interrupted.

So how to actually accomplish this in practice? I'll give you an example using PostgreSQL and nose. This example should be adaptable for other equally capable test runners. The following code should be placed in __init__.py of the root package of your tests.

from sqlalchemy.engine import create_engine
from sqlalchemy.orm.session import Session

from your.package import Base  # This is your declarative base class


def setup_module():
    global transaction, connection, engine

    # Connect to the database and create the schema within a transaction
    engine = create_engine('postgresql:///yourdb')
    connection = engine.connect()
    transaction = connection.begin()
    Base.metadata.create_all(connection)

    # If you want to insert fixtures to the DB, do it here


def teardown_module():
    # Roll back the top level transaction and disconnect from the database
    transaction.rollback()
    connection.close()
    engine.dispose()
 

class DatabaseTest(object):
    def setup(self):
        self.__transaction = connection.begin_nested()
        self.session = Session(connection)

    def teardown(self):
        self.session.close()
        self.__transaction.rollback()

To take advantage of this setup, your test class should inherit from DatabaseTest. If you override either setup() or teardown(), make sure you remember to call the superclass implementations too.

If you want to use this testing scheme with your web framework or other application framework, you should be aware of the requirement that the framework's SQLAlchemy extension/middleware/whatever must have a way to receive the connectable (Engine or Connection) as a Python object and not just as a connection URL.

9 comments:

  1. 100% not a unit test if it's hitting a real DB. Could be a very useful test, but not a unit test. Same goes for using sqlite, of course.

    ReplyDelete
    Replies
    1. Absolutely. While these tests are useful, and indeed best performed on the same RDBMS type and version as production, this is definitely not a unit test.

      A unit test tests your code, not your infrastructure.

      Delete
    2. My bad, should've said "integration testing" instead.

      Delete
  2. Great article! thanks for sharing.

    I wonder how do you go about seeding the test database, especifically application defaults like roles, permissions and alike?

    Do you create custom methods for the latest db migration? If so, how do you keep track of these method as the migrations move forward?

    ReplyDelete
    Replies
    1. Seeding the database is easy: create another fixture that adds the base data within the scope of the top level transaction.

      I don't understand your question about db migration methods. What do you mean by "method"?

      Delete
  3. Great article! Would you recommend setting the isolation_level to 'autocommit' when connecting to a postgres database for testing (see https://www.oddbird.net/2014/06/14/sqlalchemy-postgres-autocommit/ ) and if yes do you have recommendations for handling transactions when using SQLAlchemy Core (not ORM)

    ReplyDelete
  4. Also what is the difference between setting transaction, connection, engine as global instead of setting them as class parameters through cls.transation, cls.connection etc...

    ReplyDelete
  5. Im quite new to SQLalchemy , wouldn't you rather create a sql session and perform rollbacks on it?

    ReplyDelete
  6. Nope, it's not that simple. Rollbacks can happen in the app and you don't want that to mess with test isolation.

    ReplyDelete