2014-08-11

To Innovate or to Stabilize - That is the Question!

There has been a lively thread on the openstack-dev mailing list these past few days, largely to do with GBP. I don't want to go into the intricacies of what exactly sparked the discussion but rather to discuss one of the by products that came out of it.

OpenStack is now 4, and on its ninth cycle of development. There has been a huge amount of innovation that has gone into the product and I think that the community is now coming to a stage where these growing pains are starting to show.

I think it is part of human nature to like the next shiny thing. We are always looking for the next best thing.

Let me give you a classic example. What was wrong with the iPhone 4? Of course there will be those that say the new iPhone 6 that will be released in the not too distant future, is better, faster, more powerful, looks nicer etc. etc.

But if you look at it from another perspective, is is just a nice and shiny new toy, that basically does the same as your old phone.

Did you really need to get a new one, probably not. Is it nice to have a new one, I would definitely say so.

But have those annoying things from your old phone ever been fixed? Does your battery last longer or shorter with you new phone? Are there annoying bugs that have been around forever and have never been fixed?

Here is another example. There has been a cosmetic, but annoying bug in the vSphere client since forever. I am sure you all have come a cross it. When provisioning a new VM the first window focus will always jump to the location field instead of staying in the first field which is the VM name. Annoying as hell! But since this bug has been around, VMware have developed the dvSwitch, SIOC, NIOC, VSAN, vCloud, VCHS and more and more. But that bug has never been fixed.

Balance

I see this again as a question of innovation (and the next bright and shiny thing) or fix up those annoying bugs. Of course innovation will always trump the mundane work of maintenance. I have done it myself, more than once, focused on new projects instead of taking care of the stuff that needs to be fixed.

I do think that it is very hard to keep a proper balance between the two. Again human nature.

This is no different if you are a developer. Do I stabilize my code, I prove my code so that it works in a more efficient manner or do I let it chug away, in the same old manner and add a new feature that improves the product as a whole. It is a serious dilemma.

OpenStack is currently at a stage where there are some fundamental issues with the current state of the software components. There are a number of issues that are preventing the full adoption in Enterprise market – and yet a large portion of the development process is dedicated to the new and shiny stuff, and not to the stability of the product. Quite a while ago I wrote a post about release cycles – and why we are chasing our tails – and with OpenStack which has a release cycle once every six months this is amplified ten times over.

I would like to share with you something that Thierry Carrez (Chair of the Technical Committee and Release Manager for OpenStack) wrote as part of this discussion.

Hi everyone, With the incredible growth of OpenStack, our development community is facing complex challenges. How we handle those might determine the ultimate success or failure of OpenStack.

With this cycle we hit new limits in our processes, tools and cultural setup. This resulted in new limiting factors on our overall velocity, which is frustrating for developers. This resulted in the burnout of key firefighting resources. This resulted in tension between people who try to get specific work done and people who try to keep a handle on the big picture.

It all boils down to an imbalance between strategic and tactical contributions. At the beginning of this project, we had a strong inner group of people dedicated to fixing all loose ends. Then a lot of companies got interested in OpenStack and there was a surge in tactical, short-term contributions. We put on a call for more resources to be dedicated to strategic contributions like critical bugfixing, vulnerability management, QA, infrastructure... and that call was answered by a lot of companies that are now key members of the OpenStack Foundation, and all was fine again. But OpenStack contributors kept on growing, and we grew the narrowly-focused population way faster than the cross-project population.

At the same time, we kept on adding new projects to incubation and to the integrated release, which is great... but the new developers you get on board with this are much more likely to be tactical than strategic contributors. This also contributed to the imbalance. The penalty for that imbalance is twofold: we don't have enough resources available to solve old, known OpenStack-wide issues; but we also don't have enough resources to identify and fix new issues.

We have several efforts under way, like calling for new strategic contributors, driving towards in-project functional testing, making solving rare issues a more attractive endeavor, or hiring resources directly at the Foundation level to help address those. But there is a topic we haven't raised yet: should we concentrate on fixing what is currently in the integrated release rather than adding new projects ?

We seem to be unable to address some key issues in the software we produce, and part of it is due to strategic contributors (and core reviewers) being overwhelmed just trying to stay afloat of what's happening. For such projects, is it time for a pause ? Is it time to define key cycle goals and defer everything else ?

On the integrated release side, "more projects" means stretching our limited strategic resources more. Is it time for the Technical Committee to more aggressively define what is "in" and what is "out" ? If we go through such a redefinition, shall we push currently-integrated projects that fail to match that definition out of the "integrated release" innercircle ?

The TC discussion on what the integrated release should or should not include has always been informally going on. Some people would like to strictly limit to end-user-facing projects. Some others suggest that "OpenStack" should just be about integrating/exposing/scaling smart functionality that lives in specialized external projects, rather than trying to outsmart those by writing our own implementation. Some others are advocates of carefully moving up the stack, and to resist from further addressing IaaS+ services until we "complete" the pure IaaS space in a satisfactory manner. Some others would like to build a roadmap based on AWS services. Some others would just add anything that fits the incubation/integration requirements.

On one side this is a long-term discussion, but on the other we also need to make quick decisions. With 4 incubated projects, and 2 new ones currently being proposed, there are a lot of people knocking at the door.

Thanks for reading this braindump this far. I hope this will trigger the open discussions we need to have, as an open source project, to reach the next level.

Cheers,-- Thierry Carrez (ttx)

So I go back to the question – the title of this post.

Innovate or Stabilize?

I do not have a clear and definitive answer to this dilemma. On the one hand if you do not innovate – then you will get left behind, your competition will beat you – because they have the next brightest and shiny thing – and you are the dinosaur.

But if you only innovate and do not fix things that are broken – your will not be seen as a trustworthy company – because you always let the broken things “stay broken”.

It is – like most things in life – a delicate balance that you need to find. You cannot only do one or the other. It will be a mixture of both, sometimes one will take precedence over the other and there will be time when that is reversed. In order to stay relevant – this mixture should be evaluated on a regular basis and focus changed when need be.

OpenStack specific - I personally think that the time has come to perhaps to go to a different mindset – one option that just comes to mind – is to dedicate one in four Openstack releases to only fixing stuff that is broken, no new features will be added – unless everything else that was supposed to be fixed, has been dealt with. It could be that one in four is too often – or maybe not often enough, but running at such a pace, is not good for the community, not good for the operators and in the end will not be good for those who will use OpenStack.

(Even Redhat – “the mother of all opensource” only releases a major version release once every 3-4 years)

Just by the way – I assumed that the OpenStack projects were run in a more Agile oriented mindset – evidently – they (as do we all) have a great deal still to learn.

I would be very interested in hearing your thoughts and suggestions on this subject, please feel free to leave them in the comments below.