Archive for DevOps

Managing for DevOps

This was originally published on the Amalgam Insights site in July of 2018

I am constantly asked the question “What does one have to do to implement DevOps”, or some variant. Most people who ask this question say how they have spent time searching for an answer. The pat answers they encounter typically is either technology based (“buy these products and achieve DevOps magic”) or a management one such as “create a DevOps culture.” Both are vague, flippant, and decidedly unhelpful.

My response is twofold. First, technology and tools follow management and culture. Tools do not make culture and a technology solution without management change is a waste. So, change the culture and management first. Unfortunately, that’s the hard part. When companies talk about changing culture for DevOps they often mean implementing multifunction teams or something less than that. Throwing disparate disciplines into an unregulated melting pot doesn’t help. These teams can end up as dysfunctional as with any other management or project structure. Team members will bicker over implementation and try to protect their hard-won territory.

As the old adage goes, “everything old is new again” and so-called DevOps culture is no different. Multi-functional teams are just a flavor of matrix management which has been tried over and over for years. They suffer from the same problems. Team members have to serve two masters and managers act like a group of dogs with one tree among them. Trying to please both the project leader and their functional management creates inherent conflicts.

Another view of creating DevOps culture is, what I think of as, the “CEO Buy-in Approach”. Whenever there is new thinking in IT there always seems to be advocacy for a top-down approach that starts with the CEO or CIO “buying in” to the concept. After that magic happens and everyone holds hands and sings together. Except that they don’t. This approach is heavy handed and an unrealistic view of how companies, especially large companies, operate. If simply ordering people to work well together was all it took, there would be no dysfunctional companies or departments.

A variation on this theme advocates picking a leader (or two if you have two-in-the-box leadership) to make everyone work together happily. Setting aside the fact that finding people with broad enough experience to lead multi-disciplinary teams, this leads to what I have always called “The Product Manager Problem.” The problem that all new product managers face is the realization that they have all the responsibility and none of the power to accomplish their mission. That’s because responsibility for the product concentrates in one person, the product manager, and all other managers can diffuse their responsibility across many products or functions.

Having a single leader responsible for making multi-functional teams work creates a lack of individual accountability. The leader, not the team, is held accountable for the project while the individual team members are still accountable to their own managers. This may work when the managers and project team leaders all have great working relationships. In that case, you don’t need a special DevOps structure. Instead, a model that creates a separate project team leader or leaders enables team dysfunction and the ability to maintain silos through lack of direct accountability. You see this when you have a Scrum Master, Product Owner, or Release Manager who has all the responsibility for a project.

The typical response to this criticism of multi-functional teams (and the no-power Product Manager) is that leaders should be able to influence and cajole the team, despite having no real authority. This is ridiculous and refuses to accept that individual managers and the people that work for them are motivated to maintain their own power. Making the boss look good works well when the boss is signing your evaluation and deciding on your raise. Sure, project and team leaders can be made part of the evaluation process but, really who has the real power here? The functional manager in control of many people and resources or the leader of one small team?

One potential to the DevOps cultural conundrum is collective responsibility. In this scheme, all team members benefit or are hurt by the success of the project. Think of this as the combined arms combat team model. In the Army, an multi-functional combined arms teams are put together for specific missions. The team is held responsible for the overall mission. They are responsible collectively and individually. While the upper echelons hold the combined arms combat team responsible for the mission, the team leader has the ability to hold individuals accountable. Can anyone imagine an Army or Marine leader being let off the hook for mission failure because one of their people didn’t perform? Of course not, but they also have mechanisms for holding individual soldiers accountable for their performance.

In this model, DevOps teams collectively would be held responsible for on-time completion of the entire project as would the entire management chain. Individual team members would have much of their evaluation based on this and the team leader would have the power to remediate nonperformance including remove a team member who is not doing their job (i.e. fire them). They would have to have the ability to train up and fill the role of one type of function with another if a person performing a role wasn’t up to snuff or had to be removed. It would still be up to the “chain of command” to provide a reasonable mission with appropriate resources.

Ultimately, any one in the team could rise up and lead this or another team no matter their specialty. There would be nothing holding back an operations specialist from becoming the Scrum Master. If they could learn the job, they could get it. The very idea of a specialist would lose power, allowing team members to develop talents no matter their job title.

I worked in this model years ago and it was successful and rewarding. Everyone helped everyone else and had a stake in the outcome. People learned each other’s jobs, so they could help out when necessary, learning new skills in the process. It wasn’t called DevOps but it’s how it operated. It’s not a radical idea but there is a hitch – silo managers would either lose power or even cease to exist. There would be no Development Manager or Security Manager. Team members would win, the company would win, but not everyone would feel like this model works for them.

This doesn’t mean that all silos would go away. There will still be operations and security functions that maintain and monitor systems. The security and ops people who work on development projects just wouldn’t report into them. They would only be responsible to the development team but with full power (and resources) to make changes in production systems.

Without collective responsibility, free of influence from functional managers, DevOps teams will never be more that a fresh coat of paint on rotting wood. It will look pretty but underneath, it’s crumbling.

Monitoring Containers: Do you know what happening inside your cluster?

container with a spherical object in it

This was originally published on May 18th on the Amalgam Insights. For reasons I can’t fathom, I forgot to push the publish button.

 

It’s not news that there is a lot of buzz around containers. As companies begin to widely deploy microservices architectures, containers are the obvious choice with which to implement them. As companies deploy container clusters into production, however, an issue has to be dealt with immediately:
container architectures have a lot of moving parts. The whole point of microservices is to break apart monolithic components into smaller services. This means that what was once a big process running on a resource rich server is now multiple processes spread across one or many servers. On top of the architecture change, a container cluster usually encompasses a variety of containers that are not application code. These include security, load balancing, network management, web servers, etc. Entire frameworks, such as NGINX Unit 1.0, may be deployed as infrastructure for the cluster. Services that used to be centralized in a network are now incorporated into the application itself as part of the container network.

Because an “application” is now really a collection of smaller services running in a virtual network, there’s a lot more that can go wrong. The more containers, the more opportunities for misbehaving components. For example:

  • Network issues. No matter how the network is actually implemented, there are opportunities for typical network problems to emerge including deadlocked communication and slow connections. Instead of these being part of monolithic network appliances, they are distributed throughout a number of local container clusters.
  • Apps that are slow and make everything else slower. Poor performance of a critical component in the cluster can drag down overall performance. With microservices, the entire app can be waiting on a service that is not responding quickly.
  • Containers that are dying and respawning. A container can crash which may cause an orchestrator such as Kubernetes to respawn the container. A badly behaving container may do this multiple times.

These are just a few examples of the types of problems that a container cluster can have that negatively affect a production system. None of these are new to applications in general. Applications and service can fail, lock up, or slow down in other architectures. There are just a lot more parts in a container cluster creating more opportunities for problems to occur. In addition, typical application monitoring tools aren’t necessarily designed for container clusters. There are events that traditional application monitoring will miss especially issues with containers and Kubernetes themselves.

To combat these issues, a generation of products and open source projects are emerging that are retrofit or purpose built for container clusters. In come cases, app monitoring has been extended to include containers (New Relic comes to mind). New companies, such as LightStep, have also entered the market for application monitoring but with containers in mind from the onset. Just as exciting are the open source projects that are gaining steam. Prometheus (for application monitoring), OpenTracing (network tracing), and Jaeger (transaction tracing), are some of the open source projects that are help gather data about the functioning of a cluster.

What makes these projects and products interesting is that they place monitoring components in the clusters, close to the applications components, and take advantage of container and Kubernetes APIs. This helps sysops to have a more complete view of all the parts and interactions of the container cluster. Information that is unique to containers and Kubernetes are available alongside traditional application and network monitoring data.

As IT departments start to roll scalable container clusters into production, knowing what is happening within is essential. Thankfully, the ecosystem for monitoring is evolving quickly, driven equally but companies and open source communities.