Posted on September 24, 2012 by Paolo Predonzani

Today we give for granted that a Content Management System must have an underlying database, especially in the Java world. In this article we'll see what motivations lead to the decision of having a database but also the disadvantages that derive from it. Also we'll see an alternative approach based on the file system. 

Ok let's store everything on the database

What goes on the database and what on the file system is a design decision. Excluding configuration files, most CMS' store everything on the database: pages, articles, comments, other contents and users. This importance of the database is justified by the advantages of relational DBs, such as concurrency management and the ability to run queries easily.

But let's consider concurrency. Is it really necessary? Sometimes it is, sometimes it isn't. The general structure of a website's pages changes very slowly. We are considering not content, but rather the hierarchical organization of the pages. This is true not only for smaller sites but also for portals of several hundred pages. Restructuring a website is a well thought activity and little subject to concurrency.

Even in sites that manage thousands of articles - e.g. on-line newspapers - the page hierarchy is pretty much fixed. The homepage, the sub-sites and the section are relatively stable and even the pages that display the multitude of articles have a fixed scheme.

On the other hand it is true that comments change very frequently and that this kind of data is handled very well by a database.

We notice different frequencies of change, namely:

  • The page hierarchy changes very rarely
  • Articles change somewhat frequently
  • Comments change very frequently

We can correlate these types of data with the people that deal with them. The structure of pages is managed by the site developers (which can include designers and high-level editors). The other two (articles and comments) are managed by users. While for users it may be handy to have a database, having the part that pertains to developers on a database requires further analysis.

Versioning nightmares with the database

Often we have a new version of a site in a development environment and we want to apply it in production. In development we have some changes in the page hierarchy. In production we have some new articles and many many new comments. We must selectively apply only the page hierarchy changes. Since everything is on the database, we need to find the relevant records in development and copy them into production.

Let's consider another case. Two or more developers work on different sections of a site. Each developer has a local DB. When we move to production, we'll have to merge several databases, resolving possible conflicts manually each time.

As a variant of the previous case, there can be page hierarchy changes that are made directly in production. These must be propagated back to the development environments.

Finally let's consider the case of a version that has already been published in production but the site is not working correctly and needs to be rolled back to a previous version. We could restore the whole DB from a dump but we'd lose all the comments published in the meanwhile, or we could remove/restore only the records that cause problems. In any case it's a manual operation and subject to human error.

It's true that some CMS' are aware of these problems and offers tools for staging, propagation of changes and selective imports/exports of pages. But these are CMS-specific tools. They lack generality and sometimes don't cover all the cases that we've mentioned.

Versioning joys with the file system

All the problems we've mentioned would be immediately solved if the page hierarchy was stored on the file system. We have Version Control Systems that are well know and tested: cvs, svn, mercurial, git, bazaar are just a few open source examples.

Let's see what it means to apply changes from development to production. We must launch a VCS command and all the relevant files will be updated to the desired version. Easy!

Let's consider two or more developers working in parallel. If the CMS is well modularized, different sections of a site will correspond to different files and directories. The developers can work without interfering with each other.

Let's consider the case where an already published version doesn't work in production. Reverting to a previous version is a standard feature in VCS. Admittedly sometimes there may be conflicts. E.g., the developers may touch a common xml file. Still this is a problem we know how to solve and for which there are adequate CMS-independent tools.

Finally, VCS' offer functions like tagging and branching which allow us to better organize our work.

Conclusions

In this article we've starting by observing that databases offer an advantage as far as concurrency management in concerned. However we've noticed that for data that change unfrequently and pertain to developers, a database is more an obstacle than an advantage.

Conversely a CMS that stores the page hierarchy on the file system has an advantage in managing concurrent access and versioning between developers.

The latter approach is the one we use in Portofino, where directories, xml files and other resources on the file system define the structure of a site/application. File systems bring more advantages, not only improved versioning, which we'll see in future articles. 

 

comments powered by Disqus