Philosophies In Software Engineering

Unix: Do One Thing and Do It Well

Do One Thing and Do It Well

This is the Unix philosophy: Write programs that do one thing and do it well. Write programs to work together. Write programs to handle text streams, because that is a universal interface.

... the power of a system comes more from the relationships among programs than from the programs themselves. Many UNIX programs do quite trivial things in isolation, but, combined with other programs, become general and useful tools.

The UNIX Philosophy:

  1. Small is beautiful.
  2. Make each program do one thing well.
  3. Build a prototype as soon as possible.
  4. Choose portability over efficiency.
  5. Store data in flat text files.
  6. Use software leverage to your advantage.
  7. Use shell scripts to increase leverage and portability.
  8. Avoid captive user interfaces.
  9. Make every program a filter.

worse is better: simplicity of both the interface and the implementation are more important than any other attributes of the system—including correctness, consistency, and completeness.


>>> import this
The Zen of Python, by Tim Peters

Beautiful is better than ugly.
Explicit is better than implicit.
Simple is better than complex.
Complex is better than complicated.
Flat is better than nested.
Sparse is better than dense.
Readability counts.
Special cases aren't special enough to break the rules.
Although practicality beats purity.
Errors should never pass silently.
Unless explicitly silenced.
In the face of ambiguity, refuse the temptation to guess.
There should be one-- and preferably only one --obvious way to do it.
Although that way may not be obvious at first unless you're Dutch.
Now is better than never.
Although never is often better than *right* now.
If the implementation is hard to explain, it's a bad idea.
If the implementation is easy to explain, it may be a good idea.
Namespaces are one honking great idea -- let's do more of those!

Maven: Convention over configuration

Convention over configuration: to decrease the number of decisions that a developer using the framework is required to make without necessarily losing flexibility.

Maven vs sbt

I once loved Maven so much. Not only because the philosophy - "convention over configuration” - was so striking to me at that time, but also because it made me realize that there’s(or should be) a philosophy behind everything. However in Maven, it only solves a tiny part of the problem: directory layout. It still takes dozens of lines of xml to perform even the most basic tasks. Then comes sbt. No more XML. The whole building process is described by scala code. At the beginning it felt so bizarre, things came out from nowhere but works like magic. Until one day I pressed cmd+B and realized libraryDependenciesis just some predefined variable(actually val) by sbt, and the weird operators := %% are just methods.

Grunt vs Gulp

I learned about grunt when i was trying to find a maven equivalent in javascript world. I could not tell if there was anything wrong with grunt, maybe because I was using it so sparsely.Until recently I started to look into Polymer as my new toy, I noticed the use of gulp in the starter kit. The slideshow did a really good job explaining “why gulp”, versus grunt: code over configuration. The solution is so simple and flexible, and feels so natural to build up a stream of tasks.

Code vs Configuration in Data Science Tools

I actually vaguely felt this trend from another task I’ve been doing. Now I’m convinced.

That task was to build a data mining pipeline, for the data scientist. We encapsulated some of the common functionalities and provided a single configuration as the user interface. Originally it was a JSON(I was never a fun of XML), but the quoting and trailing comma caused headaches for the users; then we tried java properties, which was not very flexible to defined nested configs, and escaping special characters also became a chore; YAML was another option, though we did not have the confident to spend time on it; eventually we created some home grown format, then educating the users turned out to be the biggest overhead.

Things got worse when more users jumped on board. Because of the nature of the Data Science: exploratory and experimental, it was so hard to provide all the tweaks and tricks our scientist used. Even if we can actually encapsulate all of those requirements, the configuration would become a hundreds of pages manual.

The big question now: is “configuration” the correct solution, or level of abstraction, for data science? Maybe we should take one step back and let user interact with code directly, like R, like SAS, like scikit-learn for python, oh, yes, like Spark.

Worse Is Better


Developer progression

  1. Simple and wrong
  2. Complicated and wrong
  3. Complicated and right
  4. Simple and right

Choose Boring Technology