Archive for security

Monitoring Containers: Do you know what happening inside your cluster?

container with a spherical object in it

This was originally published on May 18th on the Amalgam Insights. For reasons I can’t fathom, I forgot to push the publish button.


It’s not news that there is a lot of buzz around containers. As companies begin to widely deploy microservices architectures, containers are the obvious choice with which to implement them. As companies deploy container clusters into production, however, an issue has to be dealt with immediately:
container architectures have a lot of moving parts. The whole point of microservices is to break apart monolithic components into smaller services. This means that what was once a big process running on a resource rich server is now multiple processes spread across one or many servers. On top of the architecture change, a container cluster usually encompasses a variety of containers that are not application code. These include security, load balancing, network management, web servers, etc. Entire frameworks, such as NGINX Unit 1.0, may be deployed as infrastructure for the cluster. Services that used to be centralized in a network are now incorporated into the application itself as part of the container network.

Because an “application” is now really a collection of smaller services running in a virtual network, there’s a lot more that can go wrong. The more containers, the more opportunities for misbehaving components. For example:

  • Network issues. No matter how the network is actually implemented, there are opportunities for typical network problems to emerge including deadlocked communication and slow connections. Instead of these being part of monolithic network appliances, they are distributed throughout a number of local container clusters.
  • Apps that are slow and make everything else slower. Poor performance of a critical component in the cluster can drag down overall performance. With microservices, the entire app can be waiting on a service that is not responding quickly.
  • Containers that are dying and respawning. A container can crash which may cause an orchestrator such as Kubernetes to respawn the container. A badly behaving container may do this multiple times.

These are just a few examples of the types of problems that a container cluster can have that negatively affect a production system. None of these are new to applications in general. Applications and service can fail, lock up, or slow down in other architectures. There are just a lot more parts in a container cluster creating more opportunities for problems to occur. In addition, typical application monitoring tools aren’t necessarily designed for container clusters. There are events that traditional application monitoring will miss especially issues with containers and Kubernetes themselves.

To combat these issues, a generation of products and open source projects are emerging that are retrofit or purpose built for container clusters. In come cases, app monitoring has been extended to include containers (New Relic comes to mind). New companies, such as LightStep, have also entered the market for application monitoring but with containers in mind from the onset. Just as exciting are the open source projects that are gaining steam. Prometheus (for application monitoring), OpenTracing (network tracing), and Jaeger (transaction tracing), are some of the open source projects that are help gather data about the functioning of a cluster.

What makes these projects and products interesting is that they place monitoring components in the clusters, close to the applications components, and take advantage of container and Kubernetes APIs. This helps sysops to have a more complete view of all the parts and interactions of the container cluster. Information that is unique to containers and Kubernetes are available alongside traditional application and network monitoring data.

As IT departments start to roll scalable container clusters into production, knowing what is happening within is essential. Thankfully, the ecosystem for monitoring is evolving quickly, driven equally but companies and open source communities.

The Specter of Spectre (and Meltdown)?

CPU Usage

About a month ago, my desktop computer started having problems. Weird problems. For example, loading an uncached website was exceeding slow. I thought that maybe that was because of a network issues or something was up with Firefox. After a bit of digging around I realized that was unlikely. The same problems happened with both Microsoft Edge and Google Chrome. Speed tests from my browser and router were within normal parameters.

About the same time I began to have issues with Outlook. Specifically, Outlook would be very slow to synch IMAP accounts. I have several email accounts that I access via IMAP and updating them all would often so slow that Outlook would hang or crash.

There were other applications affected, especially when running a number of Office applications or streaming music from Spotify while working. Could it be a virus? Unlikely. Unless it’s a very new and stealthy one, my anti-virus would have picked it up. Hardware problem? Everything checked out. Something in Windows corrupt? Not seeing the usual signs of that and the Ubuntu BASH shell is also running pretty slow. It would have to be a pretty deep DLL issue to have that type of effect.

What I finally realized is that my problems began after a recent Windows update. This would be the update with the patch for Meltdown and Spectre. Could the patch have dramatically affected the performance of my PC this much? It’s certainly possible. A patch that changes how the chip handles parallel and concurrent operations would make sense as the culprit. Programs that download multiple components from the internet or are trying to synch several folders from multiple IMAP accounts at once would also be impacted more from kernel level patches.

I can’t tell. Does anyone have solid evidence of Windows 10 patches slowing down basic applications like Browsers or Outlook? Who else is seeing these same issues? Let me know.