Whether you are developing a small or a
big application, you always have to deal with
data. It's even a critical part of an
application. Problem is this is tedious,
repetitive work, which consumes a lot of the
time we would prefer to spend on other parts of
the application. Without forgetting that the
less interesting the work is, the higher the
risks of errors.
To solve these problems, multiple solutions
exist. Their goal is to simplify the creation of
data access layers, automate data access, or
generate data access code.
We will focus on one of these solutions:
object-relational mapping tools (O/RM).
The principle of object-relational mapping is
to delegate to tools the management of
persistence, and to work at code-level with
objects representing a domain model, and not
with data structures in the same format as the
relational database (.NET's DataSet
class). Object-relational mapping tools
establish a bidirectional link with data in a
relational database and objects in code, based
on a configuration and by executing SQL queries
(dynamic most of the time) on the database.
We won't get here into the details of how
mapping tools work and their principles. You can
refer to the links at the end of this article to
learn more about that.
We should not forget that other solutions exist,
such as those based on code generation. They all
have their pros and cons, just as it's the case
for mapping tools themselves of course. The
perfect tool for all situations does not exist,
or someone has to tell me about it :-)
We won't talk about these other approaches here;
it's not the subject for today.
In terms of tools, the offer is huge. This is
true for .NET as well as for Java, even if the
offer for Java is more advanced for historical
reasons. There is anyway an impressive quantity
of tools for both sides! (see the list of tools
for .NET at the end of this article)
Is quality always there? Are the tools suitable
to your needs? These are the main questions.
The most important when searching for the right
tool is to define precisely which criteria are
essential for you. This is why I will present in
this article the criteria for selecting an
object-relational mapping tool.
The .NET and Java/J2EE platforms being so close,
the needs being the same, these criteria can be
considered universal.
Criteria specific to
object-relational mapping
Basic features
- Be able to use inheritance, create
hierarchies between entities, and use
polymorphism (we are using objects!). The
tools can support a variety of combinations
for tables and classes to allow these
mechanisms.
- Handle any type of relations (1-1,
1-n, n-n)
- Support for transactions
- Aggregates (equivalent to SQL's SUM,
AVG, MIN, MAX, COUNT)
- Support for grouping (SQL's GROUP
BY)
Useful extended features
- Supported databases. A big
advantage of mapping tools is that they
provide an abstraction of the underlying
database engine. Most of them allow switching
easily between RDBMSs (Relational Database
Management Systems).
- Query language (OQL - Object Query
Language, OPath). We very frequently have to
execute dynamic queries. It's the case at
least with searches based on filters. It is
important to be able to use a powerful query
language.
- Support for DataBinding (to be able
to bind data objects to visual components).
Note some specificity exists with Windows
Forms.
Flexibility
- Customization of queries. We often
need to go beyond what is possible with the
provided query language. In these cases, we
need to be able to provide custom SQL queries.
HQL, which is a strong point of Hibernate/NHibernate,
allows for this. We could also wish a dynamic
mapping to be possible from developer provided
SQL queries.
- Support any type of SQL joins (inner
join, outer join)
- Concurrency management (support for
optimistic and pessimistic approaches)
- Support for the data types specific to
the database management system (identity
columns, sequences, GUIDs, autoincrements)
- Be able to map a single object to data
coming from multiple tables (joins,
views). Most of the tools handle a direct
mapping of a class to one table. We often need
more.
- Be able to dispatch the data from a single
table to multiple objects.
Assistance, ease of use
- GUI to set up the mapping.
Such a graphical tool presents the
relational data model and lets you
specify the objects to be created or at
least the links between the objects and
the tables.
- Generation of the classes.
This can speed up the development, even
if in a lot of cases we prefer to map
the database to hand-coded classes or to
classes generated from UML for example.
Check which scenarios are supported by
the tools.
- Generation of the database
schema. Some tools work only with a
database they generated. This can be a
big constraint, especially if you have
to work with a legacy database of
course! Otherwise, it all depends on
whether you are an expert in database
modeling, or if you prefer not to have
to deal with the database schema. If you
have a DBA who takes care of your
databases, or if you prefer to design
them by yourself, be sure to select a
mapping tool that doesn't require its
own data model.
Optimizations, performance,
design
- Global performance (good
implementation of the object-relational
mapping concept)
- Lazy loading (the loading of some
data is deferred until it's needed)
- for the data through relations
- for some columns. When we want to
display just a list of names, we don't need
all the columns of a table to be loaded. We
may need the blob fields only at certain
point, under certain conditions, and so it's
better to load them only at that time.
- Cache dynamically generated queries,
so that they don't get rebuilt at each call.
- Cache some data to avoid too many
calls to the data source.
- Optimized queries (update only the
modified columns; detect situations where the
number of executed queries can be reduced;
...)
- Handle circular references without
duplication of objects ("account ==
account.Client.Account")
- Handle cascade updates. Deleting a
master record should delete the linked details
if wished so.
- Bulk updates or deletions. When we
want to update or delete thousands of records
at a time, it's not possible to load all the
objects in memory, while this can be easily
and quickly done with a SQL query (DELETE FROM
Customer WHERE Balance < 0). Support from the
tool is welcome to handle such massive
operations without having to deal with SQL.
Hibernate is not very good on this point for
example.
Evolution, compatibility
- Maintainability (what happens if
the database schema changes? If I need to add
a new collection?)
- Possibility to move to a new mapping
tool (what would it imply? At what cost?)
- Serialization. Serialization can be
used to persist data outside of the database.
Serialization can be done into a binary
format, or more important, in XML (see the
section on SOA below)
- Distributed objects (remoting; web
services; requires support for serialization)
Welcome additional features
- Freedom in the design of the classes
(no base class for the entities; no mandatory
interface; no specific class for collections).
Think POJO (Plain Old Java Object).
- Less constraints as possible on the
database schema (eg. support for composite
data keys)
- State information on data. It can
be useful to know by looking at an object if
the entity has been added, modified, deleted.
- External mapping file or not?
Attributes (annotations) in code or not?
- Advantages of external files: mapping
entirely externalized; no intrusion in the
classes; can be generated
- Disadvantages of external files: one or
multiple additional files to deal with; a
syntax to learn if no GUI is provided;
understanding the links between the code and
the database requires some effort; can
become out of sync with the code
- Advantages of attributes/annotations:
everything at hand at code-level; the
mapping is obvious since directly present on
classes and class members; can be used to
generate external mapping files using
reflection if needed; if the mapping tool
isn't used anymore, they are ignored
- Disadvantages of attributes/annotations:
the code is "polluted"; the code depends on
a specific mapping framework
- Advanced compatibility with the
development platform. For example,
compatibility with .NET's DataSets can
be useful. It can be important to be able to
convert object graphs into DataSets to
interact with components that require them (reporting
tools for example).
- Support for disconnected mode (fill
objects from a database, close the connection,
the session, create/update/delete some objects,
and apply this modifications back to the
database later)
- Interceptors and delegation mechanisms
to be able to react when the handling of the
persistence happens (eg. to be able to log
what happens)
- Support for stored procedures. The
advantages of stored procedures compared to
dynamic SQL queries make for a hot debate, but
it is better to have the choice.
- One advantage of some tools for .NET could
be their popularity in the Java world.
This is the case for iBatis.NET or NHibernate.
NHibernate is a port of Hibernate, a tool
widely used in Java, and so can benefit from a
large community of developers and existing
documentation.
- Ability to specify constraints
between objects and on properties (OCL -
Object Constraint Language). This can avoid
having to wait for the data to reach the
database before being validated.
- Filtering objects in memory (without
having new queries executed on the database)
- Be able to defer the updates on the
database, and apply them at a given time using
a specific method call, instead of having them
systematically applied immediately.
Non O/RM specific criteria
A lot of criteria are common to the selection
of any piece of software, and so are to be
included in your checklist:
- Price
- Performance
- Resource consumption (memory)
- Scalability
- Complexity (or simplicity...)
- Ease of use, time to be up and running
- Flexibility
- Documentation. Using a mapping tool
is not always a snap. Check the quality of the
documentation and the provided samples.
- Maturity. Many tools are relatively
new, at least for .NET, and are still not
mature. Some are still at the beta level, or
even alpha.
- Frequency of the updates, bug fixes,
evolutions
- Support, forums, community
- Vendor's reputation and stability.
This is a big problem today. There are a lot
of tools, but natural selection is constantly
at work. Which products will still be there
tomorrow? Some have already fallen. Time will
probably continue to eliminate some actors in
this very competitive market.
- Source code provided or not
(important in case the vendor disappears in a
black hole or in the Cayman Islands)
- Support for multiple platforms
(Java and .NET? Windows and Linux? .NET
Framework and Mono?)
Specific case of a
Service-Oriented Architecture (SOA)
Some criteria should particularly be taken
into account when considering service-oriented
architectures.
Object-relational mapping tools often base
the development of your applications on the fact
that the entities you handle are linked to a
data source with which they communicate directly
to reflect updates you perform on them. With
SOA, the handling of persistence is delegated to
a dedicated layer; the code for persistence is
not in the entities themselves.
When considering a tool, you will ask
yourself the following questions:
- Can my objects be serialized in XML?
- Can I persist my data objects into
different formats?
- Am I the one who decides when the calls to
handle persistence are performed? Are these
calls frequent? Are the moments when they
happen well-defined?
- Can I use my data objects from another
platform? (eg. client in .NET, server in Java)
Without the object-relational mapping tool?
- Am I sure my business objects won't call
directly the database from the presentation
layer, by any chance?
- What happens if I decide to move my
presentation layer or my service's clients to
a remote location?
The design of your applications will tell you
what tool you should use, not the opposite...
Other tools
Mapping tools are not the only solution to
handle persistence. The second big category of
tools consists in code generators.
To keep it simple, the main differences between
these two categories of tools can be as follow:
- Dynamic queries (filters) are handled only
by mapping tools
- Mapping tools use reflection, which is
slower than the compiled code produced by code
generators
Code generation tools are numerous for .NET.
See the links at the bottom of this article.
Depending on the situation, it can be code
generation tools or mapping tools that are best
fitted. An interesting point is that some tools
combine code generation and object-relational
techniques to offer the best of both worlds.
Some final words
As you can see, the criteria are numerous!
You should start by defining which are critical
for you (MUST HAVE) and which are less important
(NICE TO HAVE).
You have to make your own opinion. As with
any tool, don't hesitate to test extensively.
Download evaluation versions. Do more than
simply taking a look at the provided demos,
use the tools on a prototype in your own
application domain.
Evaluate, compare, use, criticize and comment so
we too know what you think about the different
tools...
Who is Fabrice Marguerie?
Fabrice Marguerie is a .NET architect and a
Microsoft MVP. Fabrice works as a consultant on
design and implementation missions. He writes a
weblog in English:
http://weblogs.asp.net/fmarguerie and
administrates the web sites
http://sharptoolbox.com and
http://javatoolbox.com
|