Architecture

Software Modification Cost

Dalam dunia IT, sering kita lihat perbedaan harga yang cukup signifikan antara vendor A dan vendor B dalam men-develop sebuah software. Ada beberapa faktor yang membedakan kualitas software yang dihasilkan dari vendor-vendor yang berbeda tersebut, seperti bug tracking, banyaknya bug, garansi support. Namun faktor yang akan dibahas di sini adalah seberapa mudah software yang sudah ada diubah / dimodifikasi.

Faktor kemudahan dalam mengubah software itu kita sebut saja sebagai software modification cost atau beban me-modifikasi software.

Bagaimana cara menghitung modification cost

Saat mendapatkan task untuk modifikasi, saya terbiasa untuk menghitung-hitung tingkat kesulitannya dan berapa waktu yang diperlukan. Pengalaman saya, waktu yang diperlukan untuk QA / testing adalah hampir 40-50 persen dari waktu development, bergantung dari kompleksitas dan bagusnya kode / arsitektur. Namun waktu untuk QA sering diacuhkan dan sering tidak dialokasikan.

Selain dari waktu yang diperlukan, setiap perubahan juga akan meningkatkan faktor resiko kesalahan, baik yang akan ditemukan dalam testing maupun tidak. Faktor resiko tersebut ada yang langsung berpengaruh ke finansial dan ada yang hanya mengesalkan user. Maka dari itu saya menganggap perhitungan modification cost sebagai:

development in hour + QA in hour + peningkatan resiko kesalahan

Bagaimana function / class mempengaruhi modification cost

Pertama, mari kita lihat potongan kode php berikut:

echo $isDone === true ? "yes" : "no";

Kode di atas mencetak tulisan “yes” bila variable $isDone bernilai true, dan no bila sebaliknya. Potongan kode di atas cukup sederhana, dan sering di copy-paste langsung di view. Semua terlihat baik dan indah.

Namun suatu hari, request untuk modifikasi datang. Tulisan yes dan no di atas perlu diubah menjadi ya dan tidak (dalam bahasa Indonesia). Seberapa sulit perubahan tersebut dilakukan? Katakan saja kode tersebut berada di 15 view yang berbeda. Maka modification cost menjadi 15 * n, di mana n adalah menit / detik untuk melakukan perubahan. Cost tersebut, belum dihitung / ditambah dengan kemungkinan adanya kode dalam view yang luput untuk diubah, atau perubahan salah sehingga menghasilkan syntax error.

Wrap in function

Sekarang untuk mengurangi modification cost tersebut, sang programmer membuat sebuah function untuk mencetak nilainya. Function tersebut di-define seperti berikut:

function boolToYesNo($value){
  return $isDone === true ? "yes" : "no";
}
echo boolToYesNo($isDone);

Bila ada request modifikasi seperti di atas, maka modification cost pada function tersebut bisa 15 kali lebih rendah dibandingkan dengan copy-paste kode di atas. Waktu yang diperlukan untuk QA dan tingkat kesalahan juga lebih rendah dari kode sebelumnya.

Lalu suatu hari, ada request modifikasi untuk mendukung multi-language. Lalu bila function di atas diubah untuk menerima kode bahasa yang dimaksud, maka kira-kira seperti berikut:

function boolToYesNo($value, $languagecode = "en"){
  if($languagecode == "en"){
    return $isDone === true ? "yes" : "no";
  }
  else if($languagecode == "id"){
    return $isDone === true ? "ya" : "tidak";
  }
}
echo boolToYesNo($isDone, $languagecode);

Modification cost-nya? Kembali sama seperti di awal, yaitu 15* menit / detik yang diperlukan + peningkatan resiko. Lalu bagaimana bila tulisan yang mau dicetak diawali dengan huruf besar (perfect case)? Modification cost nya tidak dapat ditekan. Maka dari itu sang programmer membuat class seperti berikut:

class BoolToYesNo{
  public function __construct($context = NULL){
    $this->context = (object)array_merge(["languagecode" => "en"], (array)$context);
    $this->languages = [
      "en" => [true => "yes", false => "no"],
      "id" => [true => "ya", false => "tidak"]
    ];
  }
  private $context;
  private $languages;
  public function print($value, $options = NULL){
    $options = (object)array_merge((array)$options, ["case" => "lower"]);
    if($options->case == "upper"){ return strtoupper($this->languages[$this->context->languagecode][$value]); }
    else if($options->case == "lower"){ return $this->languages[$this->context->languagecode][$value]; }
  }
}

Wah untuk mencetak yes dan no saja kodenya rumit begitu. Bila menghitung development cost di awal, sangat tinggi. Namun modification cost-nya sangat rendah.

Conclusion

Mengurangi software modification cost tidak mudah, dan umumnya ada trade-off (pertukaran) antara initial development time dengan modification cost. Pengalaman dalam pemograman dan requirement yang semakin jelas di awal dapat membantu mengurangi software modification cost.

Sedikit banyak, topik ini juga menyangkut dalam technical debt.

Advertisements
seminar at bizzy.co.id 22 feb 2016

Mengapa menggunakan framework?

Topik ini adalah salah satu bahasan yang cukup sering ditanyakan oleh programmer-programmer muda (saya belum tua tapi ya) yang baru akan memulai atau baru selesai memperlajari cara menggunakan framework. Sekedar flashback, penggunaan framework pada sekitar tahun 2005-an masih belum populer di Indonesia. Dan pada saat itu, konsep bahwa menggunakan framework menambahkan kode / size yang tidak perlu sangat mendominasi keputusan untuk mengarah ke scratch (tanpa framework).

Karena keterbatasan processing power dan kapasitas harddisk, sayapun pada saat itu juga mengarah untuk tidak menggunakan framework. Namun apa yang saya rasakah setelahnya adalah:

kalau kamu cukup mahir dan tidak menggunakan framework yang tersedia, kamu pasti akan membangun framework-mu sendiri

Mengapa demikian?

Susunan folder dan penempatan file

Untuk project berskala besar dengan tingkat kesulitan kompleks, susunan folder dan penempatan / pengelompokkan file menjadi suatu hal yang penting. Susunan file yang berantakan akan menyulitkan programmer-programmer lain untuk memahami kode, serta mengurangi kualitas dan meningkatkan resiko terjadinya bug.

Framework, pada umumnya sudah memiliki panduan untuk penempatan / pengelompokkan file-file dalam folder-folder tertentu. Standarisasi tersebut akan mempercepat programmer-programmer lain dalam memahami kode, atau mencari posisi kode yang menjalankan proses-proses tertentu. Contohnya folder config, constants, language dan sebagainya.

Bila tidak mengikuti / menggunakan framework yang tersedia secara umum, saya cukup yakin anda nanti akan membangun standar dan aturan pengelompokkan sendiri.

Arsitektur

Umumnya sebuah framework mengikuti susunan arsitektur tertentu. Misalnya pada php, umumnya framework-framework digunakan arsitektur MVC atau HMVC, karena paling cocok dalam server-client architecture. Bila menggunakan contoh lain, misalnya C# dan desktop, prism adalah framework MVVM untuk WPF.

Bagi programmer-programmer yang memahami manfaat arsitektur dalam pemograman, saya cukup yakin bila mereka tidak menggunakan framework yang tersedia, maka mereka akan membangun arsitektur yang mirip (misal dengan MVC).

Class Library

Project yang kompleks memerlukan banyak sekali operasi yang beragam. Namun tidak sedikit dari operasi tersebut yang mirip, berulang dan dapat dikelompokkan menjadi library / class library. Contohnya: logging, serialization, export to excel/csv, image manipulation dan masih banyak lagi.

Framework umumnya sudah memiliki cara sendiri / standar untuk meng-organize library-library tersebut. Contohnya pada laravel yang sudah menggunakan composer sebagai package manager, dan PSR4 untuk autoload class.

Common cases / operation

Mirip dengan class library, ada beberapa kasus / operasi yang umum dan sering ditemui dalam pengembangan applikasi. Contohnya bila dalam php adalah routing, login authentication, response type dan view templating. Umumnya framework sudah mendukung operasi-operasi tersebut.

Standar pengembangan, update dan komunitas

Keuntungan dari menggunakan framework yang tersedia, adalah framework tersebut sudah memiliki standar pengembangan (penempatan kode). Framework juga umumnya diupdate mengikuti teknologi terbaru, dan memiliki komunitas pengguna yang secara tidak langsung meningkatkan kualitas framework, dan mengurangi kemungkinan bug yang ada.

Kesimpulan

Banyak keuntungan yang didapat dengan menggunakan framework. Dan meskipun seseorang yang mahir tidak menggunakan framework yang sudah tersedia, cepat atau lambat applikasi yang dikembangkan akan mengarah dan pada akhirnya menghasilkan framework buatan sendiri.

Password Policy : Keep it simple

Your password need to be between 8-16 characters length, number, lower and uppercase and a special character, no repeating characters.

In my opinion, the policy above is a good policy which can prevent new users away. If your site content is not very valuable, I can guarantee it will drive them off. Why? Isn’t the policy will protect the user from being hacked?

Complicated password is not always more secure

It’s false security if you store password un-encrypted

Don’t store the user password un-encrypted (plain text). It will expose risks not only to every account in your site, but the same user account in other sites will be exposed to risks as well.

The most common user authentication in sites is username/password or email/password. People usually use a same password to several accounts. So by getting the username, email and password of a user, someone can try to use the combination to other accounts as well, such as websites like web mails, paypal, forums, etc. Rather than having complicated policy, store your password encrypted!

Prevent bruteforce and dictionary attack!

No matter how simple a user password is, you still need to guess it to match it. Or in hacking, the common method used to guess password is bruteforce, or the more organized dictionary attack. If your site can prevent bruteforce, you already improve the account security. If your site can be attacked with brute force, your site are the one to blame.

There are many ways to prevent brute force attack. Some examples are captcha and cookie locking.

Simpler, longer password can be more secured

XKCD explained this amazingly. Now that we know bruteforce and dictionary attack are common method at hacking. Additional length in the password will give much more better protection against bruteforce rather than 8 characters consist of different letter case, number and special chars. That is because of the number of combination is increased dramatically with every character added.

Even more amazing, the XKCD method above already assume that the hacker knows the password generation algorithm, and still having the same strength as 7-character password with a completely random mix of letters, numbers, and digits. If the assumption is taken away, the strength and amount needed to crack that password will be astronomically much higher.

Please note that I don’t say the additional of lower/upper case, number and special characters are useless in password generation. The point of my writing here is that those complicated algorithm may be replaced by something simpler without reducing the strength.

The importance of account is determined by user, unless…

The importance of an account for someone is determined by the user itself. Exception apply here in case if your site is involved in financial activity such as paypal, or that someone can easily imitate other such as facebook and twitter. Other than that, the importance of user account is determined by the user itself and complicated password policy is not needed there.

So what policy is good policy?

Personally, I think there are several policy rules that is useful and not bothering the user. Minimum of 6 characters length limit is common in every account authentication that it’ll bring no harm nowadays. That policy alone is not sufficient, since users can still use common password that is easy to guess like: 123456, abcdef, qwerty, password, yyyymmdd. The next useful policy is to restrict user from using such easy-guessing password. It won’t bothering the user and at the same time improve the security.

Conclusion

Don’t use over complicated password policy. Bruteforce countermeasure can prevent simple passwords from being cracked. Complicated passwords are useless if the site stored them in plain text and get hacked. Simpler, longer password can provide same to better security. Not every user will matter if their account in some site is getting hacked, and some non-complicated policy can much improve the security instead of complicated one.

Enterprise system vs specific-purpose application

I’m always wondering why I tend to be more excited to develop a kind of application but not the other kind. Now that I’m thinking about it, I can differentiate the applications into enterprise system and specific-purpose application. Let’s see what description that can I provide between the two different application.

Please note that even though I try to distinguish one between each other, it does not ruled out the possibility of an application can serve as specific-purpose application and enterprise system simultaneously. And the characteristic can be swapped each other, means that enterprise system can have a characteristic of specific-purpose application.

Specific-purpose application

What I mean by specific-purpose application here is an application that it’s function and features only serve one purpose. Calculator, Microsoft Excel, Word, Adobe Photoshop, Notepad++ goes into this kind of specific-purpose application. They only serve one purpose. The features are revolved around the purpose itself. Example: Ms.Word’s features will focus to document creation and design, while Ms.Excel’s features will focus to tabular operations and creation.

Enterprise system

Wikipedia says that:

Enterprise systems (ES) are large-scale application software packages that support business processes, information flows, reporting, and data analytics in complex organizations. While ES are generally packaged enterprise application software (PEAS) systems they can also be bespoke, custom developed systems created to support a specific organization’s needs.

If I want to describe my point of view: it is a collection of integrated applications that primarily do data operations. The main characteristic of enterprise system is that they are integrated, and they have a single storage, or a fixed set of storage. The storage is fixed so that other integrated applications can access the same data. The features in enterprise system focus to data processing.

The difference in architecture

In specific-purpose application, the operations are defined in smaller scope. Most of the time the operations aren’t contradict or related between each other. And usually it is not customizable, meaning that all of the user can expect the same thing to perform in other installation of application. The operation may have customization into some point, however the customization is limited and predictable.

For example, save operation in Ms.Word should be the same with other installation of Ms.Word. Even with other similar-applications such as Ms.Excel and Adobe Photoshop. And the save operation is not directly related with other operations such as copy/paste, delete, change font. But let’s say that in the developer IDE such as VIM or EMACS, they have different configuration, such as tab size and word wrap. Customizable, but predicted and limited.

The operation in enterprise system is more dynamic and customizable. The same operation in same system installed in other server can behave differently. Moreover, the operations in one part may affect other data because the integration. Example: Updating an purchase order in company A will produce a material request, while updating in company B isn’t.

Conclusion

Specific-purpose application and enterprise system is different. Both have different focus and purpose.

Framework development is hard

Note: I intended to make this article only as draft or brainstorming article, however my idea maybe useful for some of the reader. This article will lack supporting source and theory, and maybe having un-arranged topic. I’ll try to publish another article after some time after this to re-arrange the writing.

Have you ever tried to develop framework, or even as simple as code library? Is it easy? Is it hard? Well yeah, the answer is it depends. It depends based on the complexity and size of framework/library you develop. But I think that it is fairly easy to develop at start. It will get harder to develop when the framework is already large enough, especially modifying existing features.

I can’t imagine how hard the developer at .Net / C# is when developing new version.

Backward Compatibilty

It’s hard. Jon Skeet also say that backward compatibility is hard. I have developed a medium-sized framework for Asp.Net MVC server side for around 1 year. In that elapsed time span, I have made around three breaking changes. Without good architect / planner, the library will be likely have bad structure and need to be refactored much.

I don’t think that if asp.net still support backward compatibility with asp classic, it will still have the utilities and flexibility as it have now. What will happen if the new C# version does not consider backward compatibility? What powerful feature that can be introduced by the breaking change? Here I will say that “without backward compatibility, the new version of a framework should be much more powerful”.

Defining name, namespace categorization

Defining name is not easy. Often I don’t know how to name a specific class / operation. In a worse case, the class name will be very long to express the operation.

Categorizing the operation is even harder. Will it be put under Excel namespace? App namespace? Web namespace? Javascript? etc. Most of the time I refactor is because the incorrect namespace category / project placement.

Flexible / Parameters definition

Developing framework / library is not easy. Developing flexible library is even harder. Most of the time, the breaking change will happen at different parameter definition. Mostly can be handled using default parameter, while the other need to be added or modified, introducing breaking change.

Another problem during developing library is the amount of parameter required for an operation (function / method). Often we find that the operation need many parameters. That kind of structure is not recommended, because it will make the operation harder to use. Encapsulating the parameters into one class may be useful, but again it introduce another model. Moreover, many parameters usage is a sign of “god object“, which is hard to maintain.

Parameter initialization / life cycle

Parameter initialization is another pain. Many times I find the case where when I want to use object A in class X, it haven’t been fully initialized yet. Or worse, you can only use object A in class X after you do operation Z.

The setup + parameter structure

Despite the limitation, I like the idea in some of .Net’s library structure. XmlSerializer is one of them. The process that is needed to use XmlSerializer is: instantiate object, define setup, do operation (serialize). Inside the class, there exists some “setup” object, where you can modify to change the operation behavior, instead of passing all the setup objects yourself. Moreover, it has “default” setup parameter, in which you will need less effort rather than constructing the setup parameter down from zero.

It has limitation though, the “mutability”. The mutability at service object is bad. You cannot use the same object over and over again. One object is used one operation, unless the same setup is required at the operation. Don’t use the setup + parameter structure when you are using long-lasting object (such as static object).

Separate context class with service class

I am the follower or Anemic Domain Model pattern despite being classified as anti pattern. I think ADM with POCO class (DTO-style) really supporting SRP, instead using RDM, in which we can easily fall into the god object anti pattern. It’s not that I say RDM tend to violates SRP, however I find that in most case, it is easier for RDM to violates SRP rather than ADM’s antipattern bring cause.

That’s why I usually like to separate context class (as data model) with service class (the operation). The context is an object that hold all the data required during one “operation life cycle”. Example: AppContext hold all data required during application running. Setup class is a context, it holds all the data required during the class’s operation. In Asp.Net, RequestContext holds all data required during the web request.

Meanwhile service class hold almost no data and only responsible to process the request passed from parameter (in exception to the setup class). StringBuilder is one of them. I also like to separate entity (domain model) class with it’s operations. That way, I can easily develop more class to handle different business rules and cases.

The separation is intended to scope and encapsulate the operation / context. That scoping will be useful to make the class’s modification easier. Any modification happen at one service class won’t break other operation in other service class.

Don’t refer to static context directly

I admit, I also like to do something like this:

public void Do(){
    string userId = App.Context.Current.User.UserId;
    if(userId == "....") { /*the operation*/ }
}

It is a bad design in general, because we cannot use the service without the context. However, passing the context into parameter is out of question. It will increase operation complexity. However, you can separate the context into class’s property and do lazy loading for default context.

private App.Context context;
public App.Context Context{
    set { this.context = value; }
    get { return (this.context ? (this.context = App.Context.Current)); }
}

public void Do(){
    string userId = this.Context.User.UserId;
    if(userId == "....") { /*the operation*/ }
}

Now you can mock the context easily and provide it to the class before processing. And it’s testable now.

Conclusion

Framework development is hard.

Why I avoid ORM in my enterprise architecture

Yesterday I have a nice, interesting chat with someone I just met. In those discussion about software development and technology, he asks whether I use Entity Framework for my current architecture or not. I was saying it clearly, I’m not using any kind of ORM nor develop one myself. Before I continue, I want to clarify that I’m not hating the ORM or anything. With my current skill and experience, I’m just not ready for the edge cases (or to be precise, corner cases) that can occur when using ORM.

When asked with such question, I had hard time to give reasons about it. One of the reason is that there will be hard to find developer with expert knowledge of particular ORM, rather than expert knowledge in stored procedure-based command execution. The other is, ORM has leaky abstraction. I’m not saying that other than ORM or any tools that I used until now has none leaky abstraction, but I don’t think it’s worth the effort to do the workaround for ORM’s corner cases, compared to matured, traditional stored procedure and query.

Not all Linq expression is supported to be translated into sql

This statement is primarily based on article from Mark Seeman: IQueryable is a leaky abstraction. In that article, he state that there are some unsupported expression exposed from IQueryable. He states that the unsupported expression itself violates LSP. Moreover, quick search at google shows some stackoverflow questions asking for the NotSupportedException. One of them shows inconsistency with linq2sql, where when someone use string.Contains the ORM throws exception meanwhile using join the expression executed. Another post says the same about EF.

So as we can see, the IQueryable interface used by ORM give us false guarantee or false positive, just because the code compiles but produce runtime exception later. The same case can also happen with SqlCommand. However the error is clear, that it’s either: 1) different sql data type provided to SqlParameter, 2) incorrect parameter provided, 3) wrong sql syntax. However it is the opposite with in-memory IEnumerable lambda/linq, where the expression is fully supported.

Now we have additional step to measure sql performance

ORM translate query expression into sql queries. If you do not master the specific ORM, you won’t know what kind of sql query produced. Moreover, different ORM produced different query. Now if we concern about the sql performance, we have 2 steps to do. First step is to determine which exactly sql query resulted from ORM, while the other is real measurement with indexes, plan, etc.

Common case is N+1 selects issue with ORM in which handled differently by different ORM, and impact the performance.

Data annotation breaks POCO and SRP

This is specific to EF with data annotation. If you prefer to use fluent api, great on you.

Data annotation break POCO, nuff said. Based on wikipedia, POCO or Plain Old CLR Object is simple object which does not have any dependency to other framework/plugin/tools. Even Serializable and XmlIgnore data annotation breaks POCO, and I’ve already stated that cluttering the class to make it xml serializable is usually bad.

It also breaks SRP. Now that your POCO class has another responsible coupled with it. The additional responsible is handling the way ORM mapping from table to the object. It isn’t wrong, but it’s not clean. The worst part is, you need to modify your POCO class to meet mapping requirement.

The original underlying structure of database is relational

Original post by Martin Fowler states that the root of main problem is the difference in underlying structure between application (OOP) and database (RDBMS). The way they handle relations is different and the way they are processing data is also different. Even Jeff Atwood also said that the ORM problem will be clear by either you remove the “object” aspect or “relational” aspect, either by following table structure in application or using ODBMS. ODBMS is great, but it also has cons compared to RDBMS.

When and how many times the query being executed?

I don’t know how good you are with IQueryable abstraction. However even with in memory IEnumerable, there are many times i’ve caught with bug where the iterator is executed multiple times, resulting in inconsistent data properties and increasing cost. With IEnumerable / IQueryable ORM, I don’t know when and how many times exactly the sql query being executed, and it can impact performance considerably.

Additional sources where they have documented the issue

These sources are more like general statements or even the detailed technical one, so I don’t follow it one by one. But it shows you the lists of current problem faced by ORM.

http://en.wikipedia.org/wiki/Object-relational_impedance_mismatch

http://programmers.stackexchange.com/questions/120321/is-orm-an-anti-pattern

Conclusion

As a developer / architect that concern with clean code, consistency, coding standard and convention, I don’t think that ORM will suit me. There is simply too many corner cases that can’t be handled by current ORM, and I don’t like (and don’t have time) to document every cases that cannot be handled, and the solution or workaround. It’ll be a pain to teach to the new developer your ORM case handling standard. Moreover it can cause you more time to fix the problem caused by the ORM rather than it’s benefit.

However if you find yourself confident that you can handle the corner cases, or if you exactly know that the application won’t face those corner case with ORM, then it’s a great tool to be used.