We’re Not Building a Piano: Design Patterns for Resilient Application Infrastructure – Part 1

When I was doing work as a general contractor and building a house, one of the teams I worked with was a Father-Son pair who were helping to build the house with me, and the owner of the house. The son of the team was nicknamed “Lumpy” and he was learning the trade. When Lumpy was hammering in a nail that went crooked and folded over, he began pulling the nail out. This happened a couple of times and was noted by his father by the phrase that I will never forget (apologies for the salty language):

For Christ’s sake, Lumpy, we’re not building a f%&*ing piano

The choice of wording aside, the issue is that too much time was being spent trying to make each part of the building process ideal, which was likened to a finely-tuned piano. Lumpy was spending 3-4 times as long trying to remove the bent nail as he would have spent just hammering it in crooked and then putting another nail beside it.

This series takes the same concept and puts it into practice for infrastructure design. Only once we understand the design patterns can we apply them to where it really matters, which is in service of application availability.

Infrastructure Resiliency – Stop Building Pianos

Crooked nails will be all over your virtual and physical infrastructure. They should be. What we need focus on as infrastructure operations teams is moving further up the proverbial stack with our understanding of resiliency. We should be designing infrastructure using the same patterns as the “at-scale” folks use where possible. This is the concept behind what Alex Polvi (founder of CoreOS) calls GIFEE (Google Infrastructure for Everyone Else).

You don’t need to be Google to think like them. The term SRE (Site Reliability Engineer) may seem like a buzzword title, but the concepts are sound and the fundamentals can be adopted more easily than we realize. It takes a little rethinking of what the goal of resiliency in infrastructure really is.

Bottom Up – Understanding N+1 and N+2

We use the phrases N+n to illustrate systems component resiliency. When we talk about things like server availability for virtualization clustering, N+1 and N+2 are often confused in their meaning. N+n is a measure of how many single components in a system can fail before critically affecting that system. If you have 12 hosts in a cluster, N+1 would indicate you have enough resources to survive a single node (virtualization host) failure and to continue service the remaining workloads. N+2 becomes a 2-node loss, and so on.

For host clustering and N+n illustration, we measure as a percentage of resources which is left over for the surviving nodes. A single-node system obviously cannot sustain any lost at all. A 2-node cluster can sustain a single node failure and survive but will have 50% of the total compute resources.

This is just a sample showing N+1, N+2, and N+3 node loss effects in clusters up to 7-nodes. The calculations can be made quite easily using a simple formula:

Remaining Resource Percentage = ( Nodes Lost / Nodes Available ) * 100

The interesting thing with cluster sizing is that we spend a surprising amount of time designing the clusters and then forget to keep track of the dynamic workloads. That’s another blog all unto itself. Our goal in this series is to uncover the upper layers in which we can understand resiliency. Even if you do not directly affect these layers yourself as an operations admin or IT architect, it’s my believe that we have a responsibility to know more to truly design and build resilient application infrastructure.

Understanding the True Full-Stack Infrastructure Resilience Approach

When somebody is described as a full-stack application designer, it usually means they are competent in both front-end (visual)) and back-end (application logic and data) design. For full-stack infrastructure architects, there are a lot more layers. An IT architect needs to understand the physical layer (servers, storage, network), the data layers (relational databases, NoSQL databases), the application layers (application logic, code logic and code deployment), and the access layers (front-end load balancing and caching). All of these need to also be understood on traditional virtualization and on private or public cloud infrastructure. Yikes!

Have no fear, we are going to take these topics on in some simple and meaningful examples, and you will have a crash course in resilient application infrastructure. Using these fundamentals will give us the foundation to then apply these patterns to specific infrastructure deployments like AWS, Microsoft Azure and private cloud products.

Strap in and enjoy the ride, and I hope that you find this series to be helpful!

NEXT POST: Full-Stack Infrastructure Understanding

Quick to the point – VMware Tools Installation throwing vix error code 21012 – yeah, now what?!?

Ah, the subtleties of upgrading a vSphere environment – tackle the vCenter, tackle the hosts, and then take care of the VMware Tools and Compatability of the VMs themselves.  It’s not rocket science but for some reason, I always seem to run into an issue somewhere along the line.  This time, it was during the VMware […]

The post Quick to the point – VMware Tools Installation throwing vix error code 21012 – yeah, now what?!? appeared first on mwpreston.net.

Quick to the point – Unlocking the default vmware account in vRealize Orchestrator

I use vRO pretty religiously within my day job – however, I find that when in the lab I don’t really spawn up that wonderful Java client that much (<- there is sarcasm if you can’t detect it).  Anyways, it seems every time I go to use vRO in the lab I can never remember […]

The post Quick to the point – Unlocking the default vmware account in vRealize Orchestrator appeared first on mwpreston.net.

A newbies guide to ELK – Part 3 – Logstash Structure & Conditionals

Now that we have looked at how to get data into our logstash instance it’s time to start exploring how we can interact with all of the information being thrown at us using conditionals.  But, before we get too far into what conditionals are we are best to first have a look at the overall structure […]

The post A newbies guide to ELK – Part 3 – Logstash Structure & Conditionals appeared first on mwpreston.net.

Visualizing your Solutions: Mind Maps and Wireframe Diagrams

Let me start this post out with a huge thanks to Rene (aka @vcdx133) and Melissa (aka @vmiss33) who has been very helpful with me getting from idea to diagram/document using these tips. Having a simple template to start things off with becomes the best way to get Visualization helps your ideas become more clear because it forces you to see the relationships between things, and to do the physical process of drawing them out on paper and/or using a digital platform. Before you think you need to be an AutoCAD, or even a Visio export, you have to learn to quickly get ideas drafted out.

There have been many days where I stared at a blank diagram software screen and fought with how to get it to work in a nice way using the product when what I should have done is to start with just sketching it out in rough format first. This goes to the classic phrase “don’t let perfection get in the way of good enough” When you need to take the thought process from ideation to visualization, there are many tools and techniques that can help you. The most popular ones I use nearly every day are:

  • Paper sketches
  • Mind Maps
  • Diagram Tools: Visio, OmniGraffle, PowerPoint

Each has a distinct purpose in the process.

Paper Sketches

This is one that Melissa (aka @vmiss33) has taught me to leverage more and more. When you want to get started on an idea, just break out a pencil or pen, and some paper. Scratch diagrams and sketches take your idea and put them into a visual form. This helps you think about how to visualize it before you go diving into OmniGraffle or Visio and find yourself searching shape catalogs for hours and getting frustrated. Scratch pads and notebooks are excellent for both words and diagrams. As you write out and sketch out things, your mind is forced to connect the physical motor act with the thought process. This helps to enhance learning and to get closer to a result for you with your ideas. I’ve also gotten some really nice notebooks which I enjoy using. Rhodia is one type that have very nice paper and lots of different styles. My favourite to use is engineering paper or graph paper style.

Mind Maps

Whether it’s a site map you want to work out, some ideas and related content/thoughts, or just general brainstorming, mind maps are also a great tool for taking verbal and thought processes and putting them to paper easily. Start with your core idea/thought and then branch out from there using simple mind map diagrams. There are lots of resources online to help you as you learn to use this technique to expand on your ideas. MindNode is a product I use for the Mac, but there are many different products which you can find online. The goal is really just to adopt the practice first and then you can use this for both self-ideation as well as for collaboration. A project manager who I worked with for years taught me the value of quickly scribing down discussion ideas for project planning using a mind map which has served me well over the years.

Diagram Tools

Before you think you need to be creating perfect diagrams with visually-stunning graphics, start with the basics. Wireframe diagrams can be easily drafted out as a digital version of your earlier sketches. You can choose the level you want your graphic quality to be, but the best diagrams I’ve used and created are ones that I modelled after a template that I got from Rene Van Den Bedem (aka @vcdx133). Using a seemingly simple diagram format means you concentrate on the content. Once the content is completed and your idea is committed to a diagram, you can then tune the graphic style all you want. The first step is moving from concept in your head to the concept in a diagram. Products I’ve used include OmniGraffle, Microsoft Visio, and even Microsoft PowerPoint can be quite handy for doing such diagrams. Hopefully these are helpful tips for you as much as they were for me.

Using Touch ID on Macbook Pro for sudo Authentication

Full credit goes to Cabel Sasser (@cabel) on this one for sharing the original tip. I’m simply sharing it here and showing the process to prove the awesomeness of this capability.

If you run a MacBook Pro with the Touch ID option, you have already discovered the speed at which you can authenticate for a number of GUI-driven products. Running sudo in the command line does not give you that luxury, usually.

By making a small change

First, you have to edit the /etc/pam.d/sudo file with your editor of choice. It’s a read only file and you need admin privileges to do so. Oh the irony!

I’m going to use sudo vim /etc/pam.d/sudo to open up the file. This prompts me for credentials in the terminal session, as it should:

Add the following to the first line in the file after the comment:

auth sufficient pam_tid.so

You can space it out for consistency with the other lines:

Save the file. It’s read-only, so I have to use w! to save, and then exit back to the shell and close your terminal.

Launch a new terminal session so that you have no cached sudo session credentials and try a new sudo command such as sudo vim /etc/hosts and watch the magic happen:

This should be a nice time saver for you, especially when you use complex passwords…like you should 🙂

Resetting vSphere 6.x ESXi Account Lockouts via SSH

VMware vSphere has had a good security feature added since vSphere ESXi 6.0 to add a root account lockout for safety. After a number of failed login attempts, the server will trigger a lockout. This is a good safety measure for when you have public facing servers and is even important for internally exposed servers on your corporate network. We can’t always assume that it’s external bad actors who are the only ones attempting to breach your devices.

Using the vSphere web client shows us the settings which are used to define the lockout count and duration. The parameters under the Advanced settings are as follows:


Resetting your Failed Login Attempts with pam_tally2

There is a rather simple but effective tool to help you do this. It’s called pam_tally2 and is baked in with your ESXi installation. The command line to clear the lockout status and reset the count to zero for an account is shown here with the root account as an example:

pam_tally2 --user root --reset

In order to gain access to do this, you will need to have SSH access or console access to your server. Console access could be at a physical or virtual console. For SSH access, you need to use SSH keys to make sure that you won’t fall victim to the lockouts for administrative users. In fact, this should be a standard practice. Setting up the SSH keys is relatively simple and is nicely documented in the Knowledge Base article Allowing SSH access to ESXi/ESX hosts with public/private key authentication (1002866)


Uploading a key can be done with the vifs command as shown here:


The real question will come as to why you have the interface exposed publicly. This is a deeper question that we have to make sure to ask ourselves at all times. It’s generally not recommended as you can imagine. Ensuring you always use complex passwords and 2-factor authentication is another layer which we will explore. Hopefully this quick tip to safely reset your accounts for login is a good first step.

Installing PowerCLI 6.5.x on Windows Server 2012 R2 after Find-Module Error

Now that PowerCLI is part of the PowerShell Gallery, you can install it using the native module installer…but there’s a catch. Windows Server 2012 R2 requires a couple of minor updates to get this process underway. You’ll know really quickly if you open up your PowerShell terminal or PowerShell ISE (as Administrator) and try the following command:

Find-Module -name VMware.PowerCLI

The issue is easily solved be deploying a more recent installer for the PackageManagement PowerShell Modules. Download the installer using this link and run the install:


Select the Download within the page once you’re there:

Choose the x64 version (assuming you’re running a 64-bit OS):

Run through the installation and accept the defaults. Nothing significant to worry about with this file as it’s a necessary update for what we need to do.

If you run the Find-Module command again, you’ll see a much better result. You’ll be prompted to update your NuGet components which are used to pull resources from the PowerShell Gallery. Accept the update and then we can keep going:

Time to get back to the issues. Just relaunch your PowerShell terminal or ISE as an Administrator. We are running as Administrator so that we install the module for all users of the server. If you only want to run for your user then run your PowerShell session as your regular user and add -scope CurrentUser to the Install-Module command. Run the following to install for all users:

Install-Module -name VMware.PowerCLI

Now we have to import the module into our session using the Import-Module -name VMware.PowerCLI command:

Just like that, you’re up to date and running the latest and greatest PowerCLI goodness. Happy scripting!