There are many important rules that good developer should follow, one of them is the DRY principle. Is it one of the many so called ‘clean code’ principles that developer should know, understand, and apply in his code. This rule was introduced by Andy Hunt and Dave Thomas in book The Pragmatic Programmer.
Basically what it says, is that as a developer we should always try not to have code that is duplicated – it is as easy as that. To be even more precise, we should also think about a duplication not only from the ‘code/line’ perspective, but also (mainly) from the bussines perspective.
Our goal is to have one single feature/knowledge in just a one single code representation. The easier to understand that is by using an example. Lets imagine we have a web application, some online bookstore. Application allows clients to give their email information in various places, e.g. in:
- user information/data
- newsletter
- if some product is not available, user can leave his email to be notify later on when product is available
- invite a friend and get 5% discount
- etc…
Before email is save in the application we must validate it. This can be done by so called ‘copy-paste’ design pattern, or by providing a separate single code representation, it may be a class, or a one single method that handles this validation on the client side.
What we gain by not duplicating this code is several places:
- we reduce the cost of developing – if there will be a new requirement to the validation process, we provide our change just in one single place.
- lets imagine someone has reported a bug in the validation process for adding email in the newsletter. We do fix this, but only in that place – that was the ticket created for, right? Now, two weeks later we are getting this same bug reported – what is going on?! We have already fixed that… but not everywhere. Providing single representation of code, allows us to fix the problem just in one place, and it will be applied for all the features where it is used.
- we test it only one time – we test our logic – the validator method/class.
- there is much less probability of making a mistake. If we not ‘copy-paste’ the code, but decide to write our code manually in 10 different places, we often can make a simple mistake, like e.g. typo in regex expression.
- we refactor the code only once – maybe we have decided to include some nice feature of latest frameworks in the validation process, we can do that is just one single place
Why sometimes we are having problems with following this principle. One reason is the long methods. We all have seen a 500 lines metod, a 2000 lines classes – we all have been there. When you write a new feature, that is similar to something already written, you tell to yourself: ‘Ooo!, those 15 lines of code, that stars in line 834 – I need it as well’. And what do we do – we copy-paste this code and add it to our new feature, and we make even bigger mess. Not only we do not follow the DRY principle, but for sure we do not follow the ‘boy-scout rule’, that is: ‘Always leave the code better than you found it.’
It is very tempting to do as shown above – simple copy-paste – isn’t it. What we should have done in the example above? Maybe use the ‘extract method pattern’. Just create a new method, add the code that is duplicated in this new method, and simply call it from our new feature, and from the old place.
But should we always follow the DRY principle. Should we always refactor, and use e.g. ‘Extract method pattern’. IMHO, no. Remember be very caution when using words ‘always’ and ‘never’, there are very dangerous.
Lets imagine this situation. We have some report generating system, very important to out client. Its requirements were defined few years ago, according to some government specifications and regulations. There are plenty of already printed out papers that are stored in the archive, also many pdf generated stored in some backup hard drives.
Now your goal is to provide some new way of reporting, that follows some new government regulations. You noticed, that code you wrote, is very similar to the one that old reporting engine is using. You are thinking about doing some refactor, it can be done just by using ‘extract method pattern’, and e.g. provide new method with two arguments. You want to leave the code better than you fond it, right ? Remember, there are thousands of already printed out documents, reports, if you make a mistake in the old reporting system, there could be some serous consequences.
In that example I would vote for duplication. The risk is simply to much, especially if old system does not have any tests written.
Some duplications are good, some are even a must have in the application. Lets go back a bit to the system I have mentioned before, that is the online bookstore. We were working with the validation of the user email. Is validating this email only on the client side enough? In this case, you should provide a validation both for the client site (the form), and also for the backed side. We duplicate the logic, but it is necessary, as simple as that.
These days modern IDE can actually show you the duplication of code, so refactoring is so much easier. InteliiJ has this feature, and it works quite nice I must say.
There is one thing that I often see in code, that can be very problematic in the future. Again, with the example. Imagine we have a web application that uses a ‘create user form’ feature, in three places. All those places looks very similar, those are almost identical forms, some very tiny differences. So what we do, to not duplicate our code – we provide a parent class for our three form classes, so we add everything that is common in the parent class, and stuff that is specific, to child classes.
Now what is happening in next year, when application grows. We got some new requirements, and now our three very similar classes, are starting to have a bit different behaviour. So what happens now… we are adding the ‘instanceof’ operators in the parent class to say: do this for this child, and that for that child. Please remember: We use inheritance not to add common part in front of the parenthesis (if speaking in the mathematical terminology) but to support polymorphism, and polymorphism is different behaviour for different instances. I will try to cover the dangerous of inheritance in the future posts, but please do not do that.
Remember that the DRY principle can be applied not just to the code. Think about all the processes that you are dealing in the development/deploying process of the application.
- instead of manual testing , that is repeated every time new feature is implemented, you should think of providing unit test, maybe selenium test, integration test.
- instead of manually uploading a war to the server you should think about the CI , jenkins for example
- etc…
I hope after reading this article you will think twice, before you copy-paste some ‘if statements’ from already existing code.