Part 2. The B-ORM Identification.

Oct 23, 2009 at 10:39 AM
Edited Oct 23, 2009 at 10:43 AM


All things here is my opinion and they may be incorrect.



When we design the database, we usually define PRIMARY KEY for each table. Primary key allows to distinct rows in table. Every time you meet the primary key value in your data: in views, resultsets produced by SQL instruction, in cell of another table, you may identify the rows of original table. Primary key value may occurs few times in some set of data, but you know it points to the same row.

But relation between primary key and set of data where such key exists is only in your mind. If you change result set where the specified primary key value is occurred, the data in row with this primary key is not changed. It is the nature of database: it always copies data, and data itself are immutable. The table's data are mutable, but there is only one way to change it: by executing UPDATE command.

If we take a look to the objects in memory, we figure out the different situation. There are no primary keys, but references. The reference is finally the memory address where the object data are placed. In .NET the object references is variable has specified type. The important thing is the object may has any number of references to it, and reference is the only way to access the object. If we change object data via any reference (in .NET it is the only way to access and change the object data), we change the same instance, and data changes now available via any reference points to this object.

The one of ORM task is to know about this difference and correctly process it. Usually, the one primary key value must has one corresponding instance. I mark it with bold, italic and underline to show how important this thing is. In ORM world it names identity mapping. The identity mapping may be implemented in different ways, but in any case it must maintain data structure that store correspondence between instance and primary key value. Usually it performed via simple Dictionary<TPrimaryKey, TInstance>. If we need the instance with specified primary key, we ask identity map to this instance. If identity map contains the instance, we get and use it in any way we need: return to user code, place in collection, put it into data of another object etc.

The identity mapping is very important thing. It allows to avoid conflicts due saving, because if we have few copy of the object corresponded to the same table row, and we have made changes to each, we never know which changes are correct. Identity mapping allows you to work with hierarchies. In one of the further posts we'll discuss about hierarchies more detail. Without identity mapping it is impossible to load instance that has reference to itself, or instance that is part of closed graph.


The identity mapping of course affect performance. It may seems that identity mapping always slows performance, but it is incorrect. It may slows ORM, but it very depends of may factor.

Identity mapping always adds two factors slows performance:

  1. It requires memory for store map between instances and identifiers.
  2. It requires additional checks on each access to instance by it primary key. This checks usually requires constant time and equal to Dictionary<TKey, TValue>.TryGetValue() method call cost.

Object data and object identifier.

The primary key value stored in identity map names object identifier (or instance identifier). It is fully independent from any instance data. Of course, due design classes used for store data from some table, the primary key column is also added as class property. But with OverStore it is not required. The actual instance identifier is not accessible outside the persistence session. If class has property corresponding to table's primary key, the changing of this property does not affect the instance identifier.

All this stuff is correct until we save instance. On saving we have choice: change the row's primary key to value contains in instance data, or rewrite the instance's copy of primary key to original value of instance identifier. It is configures for each repository separately. Please check the OverStore documentation for more information about configuration and identification.

For OverStore there is no matter what type primary key has if it supports equality comparsion. It may be reference type of value type, built-in type or user type, composite or simple etc. It is strongly recommended to primary key to be immutable class or structure, because effect is unpredictable.