React hook, can it be better?

I see react hook proposal today and while it is a solution to some problems, I am concerned with how they use universal functions (useState) rather than dependency inject them, which is clearer. In this article, I am exercising my ability at designing library by trying to find another approach to those problems.

First, we like to see their point about Complex components become hard to understand at introduction.

It pools subscribed events and execute it afterwards

From the example at the following gist, I feel that the implementation idea of useEffect is just subscribing events, pool them and execute by order when render has been done. I know the implementation above is not following the best practice, but the fact that it can fully works concern me.

However, personally I don’t know how that kind of design workaround can be exploited, either in good way or bad way. But one thing for sure, the same won’t happen when declared using class declaration, or the functionality being added using HOC, meaning no exploit may be present.

useState in custom hook blur the scope

useState is imported from React, which is kinda global-level module. I’m using the following gist for this example.

I’d like to know, at which scope the useState here resides?  At first, based on the design, I think it’s singleton, and globally managed. But seeing the implementation, it is scoped per useState call.

If you run the code, it’s logged 20 times, which are called per every useState call, multiplied per each Elm rendered, doubled because one on didMount and another on didUpdate (not quite correct, but it’s similar enough).

I know the scope is documented. But as I have stated before, the fact that class declaration does not have this scope issue makes this design somehow confusing.

Can it be better?

Better is usually relative to something. I prefer it to be somehow have more explicit declaration and better at clearer flow. But it’s verbose and increasing code size, which is worse in that aspect.

Reusable Logic

First, the reusable logic for component that envisioned by react team is more or less consist of those aspects:

  • can maintain it’s own state
  • can have parameters, which is independent with component’s props
  • can have state changed and component attached will able to watch it
  • state can be changed asynchronously
  • can have some logic at side effect phase, ex: at didMount

Functional Component

Then the functional component somehow has those aspects:

  • can maintain it’s own state
  • can use / attach many Reuseable Logic and watch it’s state
  • can have some logic at side effect phase, ex: at didMount

The design

This gist is the example of how react hooks declared that I envisioned. For simplicity, I use the obsolete createClassmethod to indicate component composition.

At useFriendStatus:

  • it accept a parameter of self-managing state
  • then it return function that accept parameters specified by developers
  • then the body is where the logic reside
  • finally, it’ll return an object, which contains state that can be listened, and side effect (if any)
  • the effect can return another function, in which will be executed on willUnmount

Then at FriendStatus or component, it has 2 parameters. The 2nd one being the specification of component:

  • it will specify initialState to be passed to component
  • it also specify which hooks to attach, by injecting props for initial state, and createState that can be used to create self-managing state for reusable logics
  • the hooks specification will return array of reusable logic, which already been injected with each of their own self-managing state

Then the 1st being the render part:

  • as usual, the normal props injected at 1st param
  • 2nd param being the self-managing state with initialState value from specification
  • 3rd param being the list of hook states that can be watched. It’s positionally same with hooks definition position from specification

What is the difference?

For me, my version explicitly show where is the state comes from. It’s living scope is also clear because it’s defined in hooks specification. And it’s also not using any universal-level functions except for builder-pattern-like createClass.

The hooks specification being executed once and defined in specification makes it more explicit declaration. The reusable logic being pure function also makes it easier to test, mock or reusable with other libraries. Of course some side effect and conditional hacks can also be applied, but it’s another topic that can be discussed later.

However it’s very clear that it’s worse in terms of code size, since it’s add more details to code. And it’s also reduce the number of manageable state to one per each reusable logic.

Conclusion

Tweaking a RFC concept and trying to find the flaws and better approach is a fun experience. It can also serve as exercise for me at designing API for library.

Advertisements

Mutability: Array

Mutability in programming is the ability of an object to have it’s state / value changed. Immutable object, on the other hand, cannot have it’s state changed. In php, javascript, C# and Java, most of variables / objects that is commonly used are mutable. Array is one of them.

Array mutability

Let’s see the following snippet of code:

let exec1 = function() {
    console.log("START: EXEC 1");
    let a = [3, 5, 7];
    let wrongOperation = function(arr){
        arr.push(8);
        arr[1] = 4;

        return arr;
    };

    let b = wrongOperation(a);
    console.log("b:");
    b[2] = 6; // mutation
    console.log(b); // results [ 3, 4, 6, 8]

    console.log("a:");
    console.log(a); // results [ 3, 4, 6, 8]
    console.log("DONE: EXEC 1");
    console.log();
};

We can see that any modification to b is changing the value of a, which sometimes is expected, and the other times are becoming a bug. If the result b are returned and modified by another caller. Sometimes in the future, you may wonder why the content of a changed. It will be hard to track where the changes happen to a.

A better design

A better design is indeed, to prevent any changes made to b to be reflected back to it’s original owner, a. This can be achieved by replicating the array argument using concat in javascript, or array_merge in ​php to an empty array. See the example in following snippet:


let exec2 = function() {
    console.log("START: EXEC 2");
    let a = [3, 5, 7];
    let correctOperation = function(arr){
        let result = [].concat(arr);
        result.push(8);
        result[1] = 4;

        return result;
    };

    let b = correctOperation(a);
    console.log("b:");
    console.log(b); // results [ 3, 4, 7, 8]

    console.log("a:");
    console.log(a); // results [ 3, 5, 7]
    console.log("DONE: EXEC 2");
    console.log();
};

Above example shows how the operation copy the argument array first before doing any operation using concat with another empty array. It cause any further modification after that function to not reflect back to original variable a.

Another example can be like this:


let exec3 = function() {
    console.log("START: EXEC 3");
    let a = [3, 5, 7];
    let anotherCorrectOperation = function(arr){
        let newData = [];
        for(let i = 2; i < 4; i++){
            newData.push(i);
        }
        return arr.concat(newData);
    };

    let b = anotherCorrectOperation(a);
    console.log("b:");
    b[1] = 4; // test mutability
    console.log(b); // results [ 3, 4, 7, 2, 3]

    console.log("a:");
    console.log(a); // results [ 3, 5, 7]
    console.log("DONE: EXEC 3");
    console.log();
};

The above example do operations first, then returning the operation result together with the existing array. This is the preferred approach to the other for-push​ directly to argument array.

It’s still just the array that be copied

However, both preferred example above only copied and un-ref the array and array only. The content is still the same, and can be modified. For example, if the array contains javascript objects, any modification to the array member will be reflected back to original variable a:

let exec4 = function() {
    console.log("START: EXEC 4");
    let a = [{v: 1}, {v: 3}];
    let anotherCorrectOperation = function(arr){
        let newData = [];
        for(let i = 2; i < 4; i++){
            newData.push({v: i});
        }
        return arr.concat(newData);
    };

    let b = anotherCorrectOperation(a);
    console.log("b:");
    b[1].v = 4; // test mutability
    console.log(b); // results [ { v: 1 }, { v: 4 }, { v: 2 }, { v: 3 } ]

    console.log("a:");
    console.log(a); // results [ { v: 1 }, { v: 4 } ]
    console.log("DONE: EXEC 4");
    console.log();
};

So you still need to be careful at doing variable modification inside functions. However at least you need not to worry anymore when doing modification to the array.

Conclusion

Design your operation to be immutable and returning copy by default, unless the other behavior is somewhat desired. This can help to make code easier to track, modular and prevent unnecessary bug in the future. All code that is shown in this article can be retrieved at my github repository.

Why should you give buffer to project estimation

“Why is this enhancement needs 1 month? Can’t it be done with just 3 weeks?”

This is often being said by project managers to developers. They like to squeeze estimations as short as possible, then in the end wonder why the project is going late. Overtime happens, code quality reduced, bugs everywhere during development until launching. This is one factor that I, as a developer, see as an “incapable” project manager. Furthermore, client often has the same vision: to reduce the estimation as short as possible. Little did they know that having buffer to estimation can bring many benefits rather than shorter one.

Less overtime

Overtime is unproductive. It has been researched many times, and you can easily looking for those in internet or forums. It is obvious, having more buffer to estimation will lead to less overtime needed, in which overall will prevent productivity dropped in long time.

More often than not, I find that unjustified overtime won’t bring better result in a week / month time span, compared to working normally. That’s because usually the task done in overtime is either wrong, bad quality, or incorrect by requirement, in which usually they will be fixed in next week / month.

On the other hand, justified overtime is a necessity, like when there is a critical bug in production, or when a bug is found at last time before launching.

No spare time for changes

Client is very bad at giving requirement. They don’t know what solution they want. They maybe even don’t know what problem they are facing. In my experience, it is 100% chance that at least one small requirement change and at least 50% chance of one medium / big requirement change will happen during project timeline.

Up to this day, prototyping is very useful, especially in programming world. Usually prototype are made to give client a clear picture of how the program will work and how it will help solve their problem. Requirement change are highly possible to happen at this point.

Setting up tight estimation will be a disaster during this time. With tight estimation, any change can’t be integrated in timeline, since everything has already calculated. This can only lead to bad things: late deadline, non-applicable requirement or bad quality.

No room for mistake

Mistake happen, there is no single software in this world that is bug-free. Even the business flow, a requirement that is given by client, may be incorrect. Setting up tight estimation without calculating for mistake and fixing is planning to fail. More buffer to your estimation means the application can be tested more, and more problems can be fixed during the meantime.

You will never deliver “earlier”

There are two situations possible when delivering, that is to deliver “early” or deliver “late”. Deliver “on time” is almost impossible. Usually the project are already done and ready to be delivered at the specific time, so it counts as “early”.

Now which one is better: “longer estimation but deliver ‘early'” or “shorter estimation but deliver ‘late'”. It’s up for personal preference, but usually I find that deliver “early” bring better impression than “late”. “Early” deliver usually being perceived as faster than the opposite, a physiological  things.

Now with setting up tight estimation you are setting yourself to deliver “late” and never “early”.

It is indeed hard at “client” side

Client is hard to deal with. More often than not they aren’t satisfied with estimation given, no matter how tight that estimation is. Though they want tighter estimation, they cannot give clear requirement. “Just make it so that we input this, that and that and it’s saved! How hard is it?” is usually their saying, without knowing what must be done if it needs to be edited if mistakenly inputted, what if there are changed, which system are affected by the change, etc.

That’s why good client will bring good software as well. If you want to teach a client why a tight estimation is bad, give them a tight estimation. Then everytime there are changes and everytime discussion with client about requirement happens, count for all of that. Then after delivery, make an evaluation session with client and present them how all of those things delay the estimation and make the project late.

The delivery time will be the same after all

Many times I find that tight estimation will be late, and the delivery time is usually the same with when I giving buffer to estimation originally. Inexperienced developer and manager usually prefer to give tighter estimation and underestimating how big time will changes and fixing take. The problem is, how much buffer do they need?

In previous job, I like to add 30-50% buffer to my estimation. Then my PM will try to bargain by cutting 20-30%, then give back some buffer to QA and fixing phase. In the end I assume it’s around 25-40% buffer. With that, I usually deliver 10-20% early, so it means 20-40% is a sweet spot, based on how complex and big the project is. It’s just my preference and personal experience, do not take it as guidance, since everyone is estimating differently.

Now if it’s the same after all, why not try to give longer estimation and more flexibility in requirements and development? It will provide better foundation and code quality in software after all.

Summary

Give buffer to your estimation. It’s benefit is far outweight the false perception that “shorter estimation is faster”. You won’t be flexible or will having good quality if the estimation is tight.

PHP Nested Ternary Operator Order

Now let’s say we have the following pseudo-code that we want to implement in several programming language:

bool condition = true;
string text = condition == true ? "A" : condition == false ? "B" : "C";
print text;

Here is the implementation in Javascript:

let st = true;
let text = st === true ? "A": st === false ? "B" : "C";
document.write(text);

Here is the implementation in PHP:

$st = true; 
$text = $st === true ? "A": $st === false ? "B" : "C"; 
echo $text;

As a bonus, I also try the same pseudocode in C#:

bool st = true;
string text = (st == true) ? "A" : (st == false) ? "B" : "C";
Console.WriteLine(text);

Now let’s guess what is the result of variable text? The expected value should be A, which is already correct in other languages, but PHP​ produces B. Well, this is not a great discovery but many PHP developers may be missing this after all, so I think it’s worth archiving. Parentheses may fix them but it’s making things ugly, looks like it’s better to stick with if-else statement then.

Security is hard

The recent issue about meltdown and spectre attack shows how hard a security implementation is. For a short explanation, those two attacks takes advantage of CPU’s error handling to gain access and read other non-authorized memory address. A patch has been published by each respective vendor and OS right after. However the real issue is the applied patch can bring down the performance up to 30%! And this is what I want to raise in this article.

Trade-off

Ignoring programmers efforts or development cost, a security implementation may or may not has a trade-off, but it’s more likely to has a trade-off rather than not.

Let’s take for example a security token for online banking. It’s a security implementation that reduce UX (user experience) by adding one step of verification. Though in this case the trade-off is worth it, that it helps the user to verify the input and prevent wrong transaction that otherwise can be too easy.

Asking user for username password everytime to login is also a UX trade-off, in which lately there is other option by “login with facebook”, “login with twitter” and so on. And in majority of trade-off, such as in latest meltdown case, is performance drop due to another step of verification.

Trade-off vs Risks

Security flaw after all, are just risks. It’s only when an attack being executed that the security flaw is a loss for one. Usually security flaw only bring negligible trade-off (performance drop) that it’s better to implement than not. Some example, preventing sql injection, xss, one-way hash salted password, using HTTPS is a common practice. They should be enforced because otherwise it’ll be too easy for the attacker to exploit the flaw and getting advantage of it.

However in case of up to 30% performance drop in latest case, how complex and how much precondition there is for a successful meltdown attack, the performance drop to risk rate can be considered high. In this case, there is an “advantage” to not fix the security flaw, and simply hoping for the attacker to either not targeting you, do not attempt with specific attack method, or simply doesn’t interested enough that they don’t want to waste with their time.

However, the risks will always be there and the attacker may be have better and better tools to exploit the flaw, while at the same time we can hope for better and better fix with lower trade-off to exists. After all, it’ll be top level management and developers that may decide whether it’ll be better to patch it right away or leave it as is.

After all, security is hard.

 

Debugging / learning is a scientific cycle

This is a little shower-thought idea I’ve got a while ago, that by debugging, a less or more you are actually doing a scientific cycle. Though it’s simpler than actual scientific cycle. A full simple scientific cycle can consist of: do observation, make theory / hypotheses, perform experiment, perform observation based on that experiment, repeat. I know, actual scientific process cycle and debugging is more complicated than that, but the general idea for their full flow is that cycle.

Do observation

When you encounter a bug or abnormality in the process, the very first thing that you need to do is observe the abnormality. You will need to observe some (or all) of those things:

  • what happened during the specific process
  • what was the process doing
  • what is affected
  • what is the current result
  • what is the desired result
  • where is the difference
  • what is written in the log
  • is there any more information that you can gather?

The observation process will lead to next step, to make hypotheses.

Making hypotheses

You will craft some hypotheses from all of the information gathered from observation process before. Some of the hypotheses can be:

  • it only occurs at requests with specific value at specific field
  • only when user do those specific steps
  • when the machine is having high load
  • when there is issue in internet connection
  • when the machine’s recommended specification is not met
  • when some of the apps is outdated
  • and so on…

If there are insufficient information acquired from previous process, the worst hypotheses available can be: that bug will happen if you perform the same step, with the same data, at the system with same configuration, maybe needed to be done at specific time. No matter what your hypotheses are, the best experiment to perform next is to reproduce the issue.

Perform experiment reproduce the issue

This is one of the hardest steps of debugging, creating an environment that can reproduce the issue consistently. Many issue can be hardware specific, concurrent / race condition specific, network issue specific, hardware failure specific, and many other complex situation can produce the issue. But this hard effort can provide you with big rewards, such as that you will understand the process more, it will be easier to decide the cause and you will be ensured that the fix is really solving the issue.

After you can reproduce the issue consistently, you can do the next step by placing more logging features, setup debugger tools and then continue with observation.

Do observation, make hypotheses and experiment again

With more information and the ability to reproduce the issue, you can repeatedly perform the cycle. Observation produce information, it will be used to make hypotheses, you make fix based on the hypotheses, observe whether the fix is really solving the problem, make another hypotheses if the problem is still there, perform another fix, repeat.

At the last iterations, you may observe that the change to application has fixed the problem. Then you will start to make theories (a hypotheses that is supported by facts from tests), then do more experiment to prove the theories. For example, you can change the application back to re-produce the same error with different condition, or that you can do same steps with different data to ensure that the fix is correct. If you theories is proven by more tests, then the debugging process is completed.

Unit test

Now we can see that the debugging process above is very complex and time-consuming. Especially when you need to re-create the environment to reproduce the error every time. Unit test is an amazing tools in this case.

With unit test, you can do experiment with more isolated environment, can easily tinker the data with mock object, replicate the situation or configuration (set the time to specific value maybe) and many more to reproduce the issue. Once the issue has been reproduced, the test will result in fail / error.

Then the fix that you made will be tested again until it produce the correct expectation, and other existing unit tests can help to ensure that it won’t make error in other place. Amazing, right?

Conclusion

Debugging is more or less similar with how you will perform scientific experiment / research. It’s a repetitive cycles of observation, hypotheses and experiment. Unit testing can help the process greatly since it can create an enclosed environment in which you can perform experiments with artificial conditions.