Effective testing of database schema – the missing link

There is a certain contradiction which appears in modern projects concerning the unit testing strategy. On one hand, there is a powerful assertion stating that business logic testing should be completely disconnected from the database. This makes perfect sense in a certain way : the tests should check the business logic, not the database and/or the persistence layer. Then, generally the persistence layer is a fully-fledged product (such as the excellent Hibernate) or other JDO-esque solution which has its own testing suite – no need to check that it really works. Usually, the link between business objects and persistence is “faked” using mock objects. Basically, this means that testing the code doesn't need a running database (well, code testing doesn't need a database at all).

The database schema should also be tested – the only tool I am aware of is the excellent DbUnit. Although more targeted towards data testing, it copes quite well with schema testing. Nicely integrated with Ant, DbUnit is the right solution for your database testing needs. And yes you do need to test your database since it is supposed to evolve along with the code (there's a great article about Evolutionary database design on Martin Fowler's site).

Somehow, we instinctively feel that something is missing from this picture. We are testing the code, disconnected from the database – and also the database, in a independent manner. But how can we be sure that the persistence layer between the application model and the database is ok ? And I'm not talking about the persistence mechanics, but the data model itself. Basically, this goes down to mapping testing. I am aware of the fact that some special O/R bindings do not need mappings and there is a direct object-table correspondence, but I feel that this is generally a BadIdea^TM^ since it hampers the flexibility of both the application model structure and database schema.

In the small-to-medium-sized projects I've been working lately we didn't feel the need of mapping testing. This has a very simple reason : the person which is performing the change on the database schema is usually the same person which needs a certain modification in the application model. After performing the modification, quite often this same person starts the application and makes a functional test which implicitly checks the mapping. Most of the time this works just fine.

However, some nasty problems might appear when the project starts to grow :

- changing the mapping is more difficult, some kind of testing might give indications about the nature of the problem. - there is a certain “schema decay” when some foreign keys cannot be created at a certain point, then their creation is forgotten when the data is finally consistent. Further with schema evolution, more and more objectual data model relations will not be backed up by integrity constraints. - you may sometimes end up with unmapped and unused tables/views/columns.

A really useful testing tool should be able to check one or multiple mapping files against a database schema (via DbUnit, why not). The tool should :

- a) recognize different mapping formats (Hibernate, Castor, etc.) and different database types - b) match the mapping declarations with the tables from the database, check their existence also the type of primitive columns - c) warn if some constraints are wrong or missing (based on simple aggregation, cardinality or other hints from the mapping structure). - d) warn for unmapped tables/views/columns.

Here's the good news : a tool which is able to perform a) and b) does exist ! And the bad news (purists will jump with disgust) : just for a moment, you should forgot about testing your code without the database. The solution is quite simple, build a unit test which fires up the persistence layer and retrieves at least one of each type of mapped object from a test database. If no exceptions are encountered, the test is ok. This is a basic but effective approach and :

- be prepared to have a testing database different from the development database but with schema automatically synchronized. - harden your test case by inserting the most “exotic” test data you can find. If the data goes in via SQL (dbunit) but you have problems retrieving it via persistence layer, then look for missing schema constraints and sometimes some subtle mapping problems.

You could go one step further by performing update and deletion operations and check them via dbunit, but we have found that if the retrieval works, the persistence layer is perfectly able to perform updates and deletions. Now if your data layer is more complex, then just use some mock objects to test it – because it's a code issue and not a mapping issue.

If you are interested in the topic, just let me know by mail (still waiting for comment integration with FreeRoller). And yes, I'm still looking for a tool able to do a), b), c) and d).

Note : There is a simple technique that we are using currently. The idea is that, when the application starts, a simple retrieval is performed via the persistence layer for some objects that we know for sure must exist in all test and production datbases. It this succeeds you may be sure of two things : that the database connection really works and that the mapping is probably fine. This way, you don't have to wait the first persistence operation in order to see an error. Coupled with a nightly build and rerun, this little trick proved quite effective at keeping the mapping clean.