Choosing an object-relational mapping tool by Fabrice Marguerie (http://weblogs.asp.net/fmarguerie)
When developing an application, it is almost unavoidable to write data access and data manipulation code. Many object-relational mapping tools are available to help. Choosing the right one is important.

Version fran�aise de cet article

Whether you are developing a small or a big application, you always have to deal with data. It's even a critical part of an application. Problem is this is tedious, repetitive work, which consumes a lot of the time we would prefer to spend on other parts of the application. Without forgetting that the less interesting the work is, the higher the risks of errors.
To solve these problems, multiple solutions exist. Their goal is to simplify the creation of data access layers, automate data access, or generate data access code.
We will focus on one of these solutions: object-relational mapping tools (O/RM).

The principle of object-relational mapping is to delegate to tools the management of persistence, and to work at code-level with objects representing a domain model, and not with data structures in the same format as the relational database (.NET's DataSet class). Object-relational mapping tools establish a bidirectional link with data in a relational database and objects in code, based on a configuration and by executing SQL queries (dynamic most of the time) on the database.
We won't get here into the details of how mapping tools work and their principles. You can refer to the links at the end of this article to learn more about that.
We should not forget that other solutions exist, such as those based on code generation. They all have their pros and cons, just as it's the case for mapping tools themselves of course. The perfect tool for all situations does not exist, or someone has to tell me about it :-)
We won't talk about these other approaches here; it's not the subject for today.

In terms of tools, the offer is huge. This is true for .NET as well as for Java, even if the offer for Java is more advanced for historical reasons. There is anyway an impressive quantity of tools for both sides! (see the list of tools for .NET at the end of this article)
Is quality always there? Are the tools suitable to your needs? These are the main questions.
The most important when searching for the right tool is to define precisely which criteria are essential for you. This is why I will present in this article the criteria for selecting an object-relational mapping tool.
The .NET and Java/J2EE platforms being so close, the needs being the same, these criteria can be considered universal.

Criteria specific to object-relational mapping

Basic features

  • Be able to use inheritance, create hierarchies between entities, and use polymorphism (we are using objects!). The tools can support a variety of combinations for tables and classes to allow these mechanisms.
  • Handle any type of relations (1-1, 1-n, n-n)
  • Support for transactions
  • Aggregates (equivalent to SQL's SUM, AVG, MIN, MAX, COUNT)
  • Support for grouping (SQL's GROUP BY)

Useful extended features

  • Supported databases. A big advantage of mapping tools is that they provide an abstraction of the underlying database engine. Most of them allow switching easily between RDBMSs (Relational Database Management Systems).
  • Query language (OQL - Object Query Language, OPath). We very frequently have to execute dynamic queries. It's the case at least with searches based on filters. It is important to be able to use a powerful query language. 
  • Support for DataBinding (to be able to bind data objects to visual components). Note some specificity exists with Windows Forms.

Flexibility

  • Customization of queries. We often need to go beyond what is possible with the provided query language. In these cases, we need to be able to provide custom SQL queries. HQL, which is a strong point of Hibernate/NHibernate, allows for this. We could also wish a dynamic mapping to be possible from developer provided SQL queries.
  • Support any type of SQL joins (inner join, outer join)
  • Concurrency management (support for optimistic and pessimistic approaches)
  • Support for the data types specific to the database management system (identity columns, sequences, GUIDs, autoincrements)
  • Be able to map a single object to data coming from multiple tables (joins, views). Most of the tools handle a direct mapping of a class to one table. We often need more.
  • Be able to dispatch the data from a single table to multiple objects.

Assistance, ease of use

  • GUI to set up the mapping. Such a graphical tool presents the relational data model and lets you specify the objects to be created or at least the links between the objects and the tables.
  • Generation of the classes. This can speed up the development, even if in a lot of cases we prefer to map the database to hand-coded classes or to classes generated from UML for example. Check which scenarios are supported by the tools.
  • Generation of the database schema. Some tools work only with a database they generated. This can be a big constraint, especially if you have to work with a legacy database of course! Otherwise, it all depends on whether you are an expert in database modeling, or if you prefer not to have to deal with the database schema. If you have a DBA who takes care of your databases, or if you prefer to design them by yourself, be sure to select a mapping tool that doesn't require its own data model.

Optimizations, performance, design

  • Global performance (good implementation of the object-relational mapping concept)
  • Lazy loading (the loading of some data is deferred until it's needed)
    • for the data through relations
    • for some columns. When we want to display just a list of names, we don't need all the columns of a table to be loaded. We may need the blob fields only at certain point, under certain conditions, and so it's better to load them only at that time.
  • Cache dynamically generated queries, so that they don't get rebuilt at each call.
  • Cache some data to avoid too many calls to the data source.
  • Optimized queries (update only the modified columns; detect situations where the number of executed queries can be reduced; ...)
  • Handle circular references without duplication of objects ("account == account.Client.Account")
  • Handle cascade updates. Deleting a master record should delete the linked details if wished so.
  • Bulk updates or deletions. When we want to update or delete thousands of records at a time, it's not possible to load all the objects in memory, while this can be easily and quickly done with a SQL query (DELETE FROM Customer WHERE Balance < 0). Support from the tool is welcome to handle such massive operations without having to deal with SQL. Hibernate is not very good on this point for example.

Evolution, compatibility

  • Maintainability (what happens if the database schema changes? If I need to add a new collection?)
  • Possibility to move to a new mapping tool (what would it imply? At what cost?)
  • Serialization. Serialization can be used to persist data outside of the database. Serialization can be done into a binary format, or more important, in XML (see the section on SOA below)
  • Distributed objects (remoting; web services; requires support for serialization)

Welcome additional features

  • Freedom in the design of the classes (no base class for the entities; no mandatory interface; no specific class for collections). Think POJO (Plain Old Java Object).
  • Less constraints as possible on the database schema (eg. support for composite data keys)
  • State information on data. It can be useful to know by looking at an object if the entity has been added, modified, deleted.
  • External mapping file or not? Attributes (annotations) in code or not?
    • Advantages of external files: mapping entirely externalized; no intrusion in the classes; can be generated
    • Disadvantages of external files: one or multiple additional files to deal with; a syntax to learn if no GUI is provided; understanding the links between the code and the database requires some effort; can become out of sync with the code
    • Advantages of attributes/annotations: everything at hand at code-level; the mapping is obvious since directly present on classes and class members; can be used to generate external mapping files using reflection if needed; if the mapping tool isn't used anymore, they are ignored
    • Disadvantages of attributes/annotations: the code is "polluted"; the code depends on a specific mapping framework
  • Advanced compatibility with the development platform. For example, compatibility with .NET's DataSets can be useful. It can be important to be able to convert object graphs into DataSets to interact with components that require them (reporting tools for example).
  • Support for disconnected mode (fill objects from a database, close the connection, the session, create/update/delete some objects, and apply this modifications back to the database later)
  • Interceptors and delegation mechanisms to be able to react when the handling of the persistence happens (eg. to be able to log what happens)
  • Support for stored procedures. The advantages of stored procedures compared to dynamic SQL queries make for a hot debate, but it is better to have the choice.
  • One advantage of some tools for .NET could be their popularity in the Java world. This is the case for iBatis.NET or NHibernate. NHibernate is a port of Hibernate, a tool widely used in Java, and so can benefit from a large community of developers and existing documentation.
  • Ability to specify constraints between objects and on properties (OCL - Object Constraint Language). This can avoid having to wait for the data to reach the database before being validated.
  • Filtering objects in memory (without having new queries executed on the database)
  • Be able to defer the updates on the database, and apply them at a given time using a specific method call, instead of having them systematically applied immediately.

Non O/RM specific criteria

A lot of criteria are common to the selection of any piece of software, and so are to be included in your checklist:

  • Price
  • Performance
  • Resource consumption (memory)
  • Scalability
  • Complexity (or simplicity...)
  • Ease of use, time to be up and running
  • Flexibility
  • Documentation. Using a mapping tool is not always a snap. Check the quality of the documentation and the provided samples.
  • Maturity. Many tools are relatively new, at least for .NET, and are still not mature. Some are still at the beta level, or even alpha.
  • Frequency of the updates, bug fixes, evolutions
  • Support, forums, community
  • Vendor's reputation and stability. This is a big problem today. There are a lot of tools, but natural selection is constantly at work. Which products will still be there tomorrow? Some have already fallen. Time will probably continue to eliminate some actors in this very competitive market.
  • Source code provided or not (important in case the vendor disappears in a black hole or in the Cayman Islands)
  • Support for multiple platforms (Java and .NET? Windows and Linux? .NET Framework and Mono?)

Specific case of a Service-Oriented Architecture (SOA)

Some criteria should particularly be taken into account when considering service-oriented architectures.

Object-relational mapping tools often base the development of your applications on the fact that the entities you handle are linked to a data source with which they communicate directly to reflect updates you perform on them. With SOA, the handling of persistence is delegated to a dedicated layer; the code for persistence is not in the entities themselves.

When considering a tool, you will ask yourself the following questions:

  • Can my objects be serialized in XML?
  • Can I persist my data objects into different formats?
  • Am I the one who decides when the calls to handle persistence are performed? Are these calls frequent? Are the moments when they happen well-defined?
  • Can I use my data objects from another platform? (eg. client in .NET, server in Java) Without the object-relational mapping tool?
  • Am I sure my business objects won't call directly the database from the presentation layer, by any chance?
  • What happens if I decide to move my presentation layer or my service's clients to a remote location?

The design of your applications will tell you what tool you should use, not the opposite...

Other tools

Mapping tools are not the only solution to handle persistence. The second big category of tools consists in code generators.
To keep it simple, the main differences between these two categories of tools can be as follow:

  • Dynamic queries (filters) are handled only by mapping tools
  • Mapping tools use reflection, which is slower than the compiled code produced by code generators

Code generation tools are numerous for .NET. See the links at the bottom of this article.

Depending on the situation, it can be code generation tools or mapping tools that are best fitted. An interesting point is that some tools combine code generation and object-relational techniques to offer the best of both worlds.

Some final words

As you can see, the criteria are numerous! You should start by defining which are critical for you (MUST HAVE) and which are less important (NICE TO HAVE).

You have to make your own opinion. As with any tool, don't hesitate to test extensively. Download evaluation versions. Do more than simply taking a look at the provided demos,  use the tools on a prototype in your own application domain.
Evaluate, compare, use, criticize and comment so we too know what you think about the different tools...


Who is Fabrice Marguerie?
Fabrice Marguerie is a .NET architect and a Microsoft MVP. Fabrice works as a consultant on design and implementation missions. He writes a weblog in English: http://weblogs.asp.net/fmarguerie and administrates the web sites http://sharptoolbox.com and http://javatoolbox.com

Resources