Database

Something that only Surrogate Keys can do

There are two kinds (three with smart key, but the third one is considered worse) of primary keys, one is natural key and another is surrogate key. Simply said, surrogate key is an additional column with type of int (or guid), and unique, used as identity of row. While natural key is natural attribute that can distinctive the row. Both have pros and cons and I don’t want to discuss them here, I just want to share that there is one (two) things that can only be done by surrogate key.

First one, code generation

It does not means that natural key cannot have code generation tools. Natural key can consist of multiple columns (composite key) and different data type. The non-standard format makes code generation tools harder to develop. Meanwhile surrogate key will have standard and consistent format, which will make code generation easier.

Reference table and generic data operation

This is a case where I think can only be done by surrogate key (or at least, surrogate column). A reference table I defined here is something like “a table that hold reference to records from several tables”.

For example, say that you have a table called “submission trail”. That table is responsible to hold all transaction activity related to a record in all tables in all database. Let’s say that it has the following content:

database_name table_name reference_key action utc_time
db01 t001 1 Insert 2015-07-08T00:00:00

The reference key is referencing to primary key of table and act as identifier. Using natural key, especially composite key will prevent you from creating such table. The same reference table can also be used to other utilities things, for example: attachment or notification.

Generic data operations also need consistent column and data type keys to operate. What I means by generic data operations here is an operation that can be applied to any entities. Repository’s GetByKey is one of them. You can make a generic operation GetByKey, have specific implementation and returning the specified entity type. All of it cannot be done without consistent column.

Conclusion

In this case, We can see two patterns / techniques that cannot be used by natural / composite keys (or to be precise, require a constant key format). Both can be useful for utility operations and data-logging activities.

Advertisements

Microsoft SQL Server : What you need to know as beginner developer?

I realized that Microsoft SQL server is easy to use and setup. It’s UI (management studio) is easy to use, user friendly and nice. However, used by beginner developer, usually the SQL Server will perform bad. As a beginner at MS SQL Server world, what do you need to know?

In this series, I will provides hints and small description about what topics you need to learn beforehand. If you need detailed explanation, you need to do detailed research yourself.

Database transaction isolation level and lock hints

This is the very first thing that you need to know when developing system with SQL Server as database. Why? Because not knowing this will grant you over 80% possibility of deadlock when used in high-transaction system.

By default, MS SQL Server use Serializable isolation level for read queries (select). It is the heaviest-locking isolation level that you can achieve with SQL Server. The most secure, but also the most problematic. It basically locks the table (or page) every time you do select/insert/update/delete queries. In heavy read and light write applications such as stackoverflow, this usually cause problems.

Microsoft recommend Read Committed Snapshot Isolation (RCSI) level for common system. But still you need to search for yourself the best isolation level that is most suit with your apps.

Auto-commit transaction

By default, MS SQL enables the auto commit feature, means that every insert/update/delete that is not inside a transaction will be wrapped in a transaction and committed. This is bad for performance, because every commit will add record into your database log and it hurts performance. This is especially happen in looped-generated statement (to insert/update/delete many rows from applications).

Basically to avoid this, you need to wrap your statements in a transactions, or set auto commit off.

Backup Recovery Model

By default, MS SQL has backup recovery model set to FULL. In contrast to SIMPLE recovery model, full recovery model enables you to point-in-time recovery per transaction committed. While in simple model it is not supported. In short, set to simple model if you don’t think that point-in-time recovery is required, especially in logging database.

Indexing

Indexing is complex, but in case you need to maintain database performance against big volume data, you need to learn indexing. First, you need to know how to get the query execution plan. Next, you need to know the index seek vs index scan, key lookup, and sargable query.

In short, what I recommend is:

* Always use primary keys in all your tables,

* If you need to do join query, ordered by recommendation, it’s better if you can:
1) join between primary key/foreign key
2) the joining field contains same data type and length
3) avoid computational at where clause or joining fields, eg: isnull, where fieldA < fieldB + 1
4) do not use leading wildcard (percent) in string search

Connection overhead

Opening connection and user authentication is providing some performance impact. It is not much, but exists, especially when the application server is far away from database server. Please note that stating there is connection overhead does not means that keeping the connection open is the solution. What you need to avoid is opening/closing connection inside an application loop. It’s better to re-factor it to become more like set-based operation.

Scalar vs table-valued function

In MS SQL, there are 3 types of functions, that is scalar function, table valued function and multi-statement table-valued function. Execution plan wise, scalar and multi-statement table valued is same, while table-valued function is operated more like accessing view. So let’s consider multi-statement table valued function the same as scalar in this topic.

Scalar function inside select column / join / where clause will be converted in looping operations. Meanwhile table-valued function will be treated the same as view, and then being included in query plan as set operations. So, in short, if the function you defined is accessing any table, avoid define it in scalar function.

Conclusion

SQL Server, after being installed can be easily used. But not knowing the features of SQL Server and then using it to make complex system can cause problems. Before actually using it in real system, I suggest that you at least know the points I described above.

However, even if I know those points, it does not immediately makes me know all the SQL Server and the best configurations for each scenarios. I am not a database administrator after all.

Why I avoid ORM in my enterprise architecture

Yesterday I have a nice, interesting chat with someone I just met. In those discussion about software development and technology, he asks whether I use Entity Framework for my current architecture or not. I was saying it clearly, I’m not using any kind of ORM nor develop one myself. Before I continue, I want to clarify that I’m not hating the ORM or anything. With my current skill and experience, I’m just not ready for the edge cases (or to be precise, corner cases) that can occur when using ORM.

When asked with such question, I had hard time to give reasons about it. One of the reason is that there will be hard to find developer with expert knowledge of particular ORM, rather than expert knowledge in stored procedure-based command execution. The other is, ORM has leaky abstraction. I’m not saying that other than ORM or any tools that I used until now has none leaky abstraction, but I don’t think it’s worth the effort to do the workaround for ORM’s corner cases, compared to matured, traditional stored procedure and query.

Not all Linq expression is supported to be translated into sql

This statement is primarily based on article from Mark Seeman: IQueryable is a leaky abstraction. In that article, he state that there are some unsupported expression exposed from IQueryable. He states that the unsupported expression itself violates LSP. Moreover, quick search at google shows some stackoverflow questions asking for the NotSupportedException. One of them shows inconsistency with linq2sql, where when someone use string.Contains the ORM throws exception meanwhile using join the expression executed. Another post says the same about EF.

So as we can see, the IQueryable interface used by ORM give us false guarantee or false positive, just because the code compiles but produce runtime exception later. The same case can also happen with SqlCommand. However the error is clear, that it’s either: 1) different sql data type provided to SqlParameter, 2) incorrect parameter provided, 3) wrong sql syntax. However it is the opposite with in-memory IEnumerable lambda/linq, where the expression is fully supported.

Now we have additional step to measure sql performance

ORM translate query expression into sql queries. If you do not master the specific ORM, you won’t know what kind of sql query produced. Moreover, different ORM produced different query. Now if we concern about the sql performance, we have 2 steps to do. First step is to determine which exactly sql query resulted from ORM, while the other is real measurement with indexes, plan, etc.

Common case is N+1 selects issue with ORM in which handled differently by different ORM, and impact the performance.

Data annotation breaks POCO and SRP

This is specific to EF with data annotation. If you prefer to use fluent api, great on you.

Data annotation break POCO, nuff said. Based on wikipedia, POCO or Plain Old CLR Object is simple object which does not have any dependency to other framework/plugin/tools. Even Serializable and XmlIgnore data annotation breaks POCO, and I’ve already stated that cluttering the class to make it xml serializable is usually bad.

It also breaks SRP. Now that your POCO class has another responsible coupled with it. The additional responsible is handling the way ORM mapping from table to the object. It isn’t wrong, but it’s not clean. The worst part is, you need to modify your POCO class to meet mapping requirement.

The original underlying structure of database is relational

Original post by Martin Fowler states that the root of main problem is the difference in underlying structure between application (OOP) and database (RDBMS). The way they handle relations is different and the way they are processing data is also different. Even Jeff Atwood also said that the ORM problem will be clear by either you remove the “object” aspect or “relational” aspect, either by following table structure in application or using ODBMS. ODBMS is great, but it also has cons compared to RDBMS.

When and how many times the query being executed?

I don’t know how good you are with IQueryable abstraction. However even with in memory IEnumerable, there are many times i’ve caught with bug where the iterator is executed multiple times, resulting in inconsistent data properties and increasing cost. With IEnumerable / IQueryable ORM, I don’t know when and how many times exactly the sql query being executed, and it can impact performance considerably.

Additional sources where they have documented the issue

These sources are more like general statements or even the detailed technical one, so I don’t follow it one by one. But it shows you the lists of current problem faced by ORM.

http://en.wikipedia.org/wiki/Object-relational_impedance_mismatch

http://programmers.stackexchange.com/questions/120321/is-orm-an-anti-pattern

Conclusion

As a developer / architect that concern with clean code, consistency, coding standard and convention, I don’t think that ORM will suit me. There is simply too many corner cases that can’t be handled by current ORM, and I don’t like (and don’t have time) to document every cases that cannot be handled, and the solution or workaround. It’ll be a pain to teach to the new developer your ORM case handling standard. Moreover it can cause you more time to fix the problem caused by the ORM rather than it’s benefit.

However if you find yourself confident that you can handle the corner cases, or if you exactly know that the application won’t face those corner case with ORM, then it’s a great tool to be used.