Infrastructure Family Pet or Farm Animal

MeetAlice (1)For the last few months I have been thinking a lot about this issue.   Let me start out by introducing our family pet Alice.   She was a rescue from the local shelter that we added to our family about two years ago.  As you can see at time of rescue she needed a haircut badly… and a bath.   I remember when we brought her home the whole family washed her and babied here to death.   She has turned out to be a great family pet providing a number of benefits including exercising our four year old with hours of chase the toy and tug of war.   She provides a valuable service to my family.  She also hops up on my lap at exactly 5:30 PM every night and lick/annoys me until she gets her dinner.   At 7:30 PM she is restless and expects to be let outside for the sole purpose of receiving her daily chew stick.   She requires two walks a day or she has an “accident” in my back yard.   She is a lot of work.   Not less valuable but a lot of work.   The real problem with a family pet is they don’t scale… from time to time my children ask about getting a “insert some type of house pet here”  to which my response is the age tested trick of well if you walked the dog… maybe…   I am becoming such an old man.    I have no intention of getting any more family pets because it’s just one more thing I have to manage and I don’t have time for it.   There is no way I can manage seven family pets let alone 200.

What happened to Virtual Me and why are you talking about pets?

This is a valid question has someone stolen my password?   Nope.   I think we can learn a lot from Alice.  I think she has me trained.   Everyone has infrastructure like Alice.  They provide valued services but require lots of care and feeding.  Early in my professional life I remember that after a older admin retired we kept having issues with a server that forced reboots and outages.   When I asked my retired friend about the problem his response was “that’s because you don’t reboot it every morning”.   I came to learn he had been rebooting the server every morning including weekends for five years.   I think the server had him trained pretty well..  Again the problem is it does not scale.   Imagine having four dogs all that require bones at different times… you could not possibly manage the nightmare.

OK So how did all my servers become pets?

One word: Customization.   It’s the bane of human existence.   Image taking a boat and turning it into a car… it may operate and provide the basic function of providing ground transportation but maintenance will be a huge pain.   Every systems administrator is a tweaker… they love to tinker and make things better.   All of these little changes provide 2% benefit but cannot be supported by a larger group.   Think my dog again.. if you watch my dog and at 7:30 PM she goes to the back door… you would let her out.   When she does not get her bone she will again ask to go out and you will let her… rise and repeat this process about 40 more times and you start to wonder if my dog is crazy.   What has happened is she has a undocumented tweak to the process.   Going outside has nothing to do with getting a bone except for her they have always followed in succession (in fact they just did again).   These tweaks are almost always not documented and even if they are you will have to read for an hour to understand it… it’s easy for me to write 7:30 PM let dog out give her bone… but you might ask why?  Customization cannot be your friend if you expect to scale out.

I am ready to give up all customization but my customer might need some

Eleanor_AbernathyYou are correct… customization is what we do…. but I just told you to give it up.    Here is where farm cattle come into play.   Dairy cows on a daily farm live out their lives for a single purpose from a business perspective to provide milk.   Everything is handled together… when it’s washing time they all get hosed off… when it’s milking time they all go to the barn.   When a cow stops producing product then they are removed and replaced with a younger cow.    We need to get to servers like cows.   They provide a business function or service and if that function stops working they are replaced not diagnosed or tweaked or rebooted once a day.   They are removed.   Ten years ago you would have called me crazy but virtualization has enabled the rip and replace model.   But in IT we are still caring for family pets… we love them and they love us… guess what we are starting to look like the crazy cat lady.   We have more pets than we can handle and we scare the business units.   To make matters worse our cats cost a lot of money are slow to deploy and come in some crazy colors.   Business wants a dairy farm while we run cat ladies house.




What can the auto industry teach us?

The auto industry has been facing this problem for a long time.   They started with hand made cars and moved through automation.  They learned you cannot customize everything but you can offer options.  We need to offer three models of servers and different types… some are mini-vans others sports cars.   We can offer customization but we need to automate and use modules.   Let me say that again : We need to automate and use modules.   We have to move build times from perhaps weeks or days to hours or seconds.   We need to automate the life cycle of our services.   If we don’t we will become the cat lady not the future oriented technology experts.


What do you think?

Did I just force you to loose a portion of your soul?   Where do you think IT is heading?  Share let me know.   I definitely don’t have all the answers… either way thanks for reading and letting me rant.. I will return you to your normally scheduled content.

Brocade Zoning via Scripting for FOS 7

About four years ago I wrote about how to do fiber channel zoning on Brocade switches using scripts.   The CLI on brocade is really feature rich but not super documented… most people use the GUI.     Well the times have changed and so have the commands so here is the super duper updated command set for FOS 7.   You can read the old post here.

Assume that we are making a single zone with a HBA and storage system:

Storage_SPA  50:01:43:81:02:45:DE:47
Server_HBA  50:01:23:45:FE:34:52:12

Steps at a glance:

  1. Use alicreate to create aliases
  2. Use zonecreate to create zones
  3. Use zoneadd to add an additional alias
  4. Use cfgadd to add new zone to active set
  5. Use cfgsave to save active set
  6. Use cfgenable to enable set


Step 1: alicreate “NAME”, “WWN”

alicreate “Storage_SPA”, “50:01:43:81:02:45:DE:47″

alicreate “Server_HBA”  “50:01:23:45:FE:34:52:12″


Step 2: zonecreate “NAME”, “First_Alias”

zonecreate “Server_To_Storage_SPA”, “Storage_SPA”


Step 3: zoneadd “NAME”, “Second_Alias”

zoneadd “Server_To_Storage_SPA”, “Server_HBA”

(use again to add multiples)


Step 4:cfgadd “Your_Config_Name”,”Zone_Name”

cfgadd “production_cfg”, “Server_To_Storage_SPA”


Step 5: cfgsave



Step 6: cfgenable Your_Config_Name

cfgenable production_cfg


You can also check your work with

zoneshow “Your_Zone_Name”


Thanks for reading

 Update: @MartinMune2341 Provided the link to the latest CLI reference guide here.  Thank you sir

Testing MTU in vSphere

Well I have been playing around with vxlan more than I care to admit.   It’s a painful process.  One key component to vxlan is an increased MTU to 1600 in order to support the encapsulation.  You verify that you don’t have a MTU issue the following way:

Login to your ESXi host (I like ssh but it’s up to you).

Identify the vmknic with your MTU settings:

esxcfg-vmknic -l

You should see a list of vmknic’s and MTU settings.   Then check to make sure your local switch also has the MTU setting => the nic setting

esxcfg-vswitch -l

Check for MTU of switch.   If everything looks ok you can use the vmkping to send a packet.  Test basic connectivity first:

vmkping IP_Address_of_local_interface
vmkping IP_address_of_remote_interface

This should return with pings unless you are using 5.5 (see below for more 5.5 stuff).   If this fails you have basic connectivity issues like firewall,subnet or some other layer 2 problem.  Now test for a 1600 byte packet (VMware has a 28 byte overhead that command does not take into account)

5.0 (-d is do not fragment -s is size)

vmkping -d -s 1572 IP_Address_of_local_interface
vmkping -d -s 1572 IP_address_of_remote_interface

5.1 (-I allows you to identify the vmknic to use)
vmkping -I vmknic# -d -s 1572 IP_Address_of_local_interface
vmkping -I vmknic# -d -s 1572 IP_address_of_remote_interface

5.5 (this one is different this actually shoots out a vxlan packet not a MTU 1572 packet - true test of vxlan)

vmkping ++netstack=vxlan vmknic_IP -d -s 1572


esxcli network diag ping --netstack=vxlan --host vmknic_IP --df --size=1572

Enjoy your testing and remember the 1572 rule.

2014 Top Virtualization Blogs


So it’s that time of year again time to vote for your favorite vmware blogs.  This year I selfishly added myself.. but there was a name Snafu so it ended up as voting for my name :)   You can read the results on the official site  I just wanted to thank the 9 people who voted for me and for the one person who voted for me as number 1 (it was not me I voted for yellow-bricks as always).   Thanks again and I promise to add more content this year.  I have a bit of a secret project and when it’s done in about a month I will be back with lots and lots of posts around design.

Enjoy my favorite cat.

Radically simple networking design with VMware


VMware has lots of great options and features.  Filtering through all the best practices combined with legacy knowledge can be a real challenge.  I envy people starting with VMware now they don’t have knowledge of all the things that were broken on 3.5, 4.0, 4.1 etc…  It’s been a great journey but you have  to be careful not to let the legacy knowledge influence the design of today.   In this design I will provide a radically simple solution to networking with VMware.


Design overview:

You have been given a VMware cluster running on HP blades.  Each blade has a total of 20GB’s of potential bandwidth that can be divided anyway you want.   You should make management of this solution easy and provide as much bandwidth as possible to each traffic type.  You have the following traffic types:

  • Management
  • vMotion
  • Fault Tolerance
  • Backup
  • Virtual machine

Your storage is fiber channel and not in scope for the network design.   You chassis is connected to a two upstream switches that are stacked.  You cannot configure the switches beyond assigning vlans.


This design takes into account the following assumptions:

  • Etherchannel and LAG are not desired or available
  • You have enterprise plus licensing and vcenter

Physical NIC/switch Design:

We want a simple solution with maximum available bandwidth.  This means we should use two 10Gb nic’s on our blades.   The connections to the switch for each nic should be identical (exact same vlans) and include the vlans for management, FT, vMotion, backup and all virtual machines.   Each with their own vlan ID for security purposes.  This solution provides the following benefits:

  • Maximum bandwidth available to all traffic types
  • Easy configuration on the switch and nics (identical configuration)

The one major draw back to this solution is some environments require physical separation of traffic and require traffic to be segregated by nics.

Virtual Switch Design:

On the virtual switch side we will use a dVS.  In the past there has been major concerns with using a dVS for management and virtual center.  There are a number of chicken and the egg scenarios that come into play.   If you still have concerns then make the port group for vCenter ephemeral so it does not need vcenter to allocate ports.   Otherwise vDS brings a lot to the table over standard switches including:

  • Centralized consistent configuration
  • Traffic Shaping with NIOC
  • Load based teaming
  • Netflow
  • dVS automatic health check


Traffic Shaping:

The first thing to understand about traffic shaping in VMware is it can only have effect ingress traffic and is unique to each host.   We use a numeric value known as a share to enforce traffic shaping.  These share values are only used during time of contention by default.  This unique ability allows you to ensure nothing uses 100% of a link while other neighbors want access to the link.   This is a unique and awesome feature that automates traffic policing in VMware solutions.  You can read about the default NIOC pools here.   I suggest you leave the default pools in place with their default values and then add a custom pool for backup.   Traffic shares are applied a value from 1 to 100.  Another design factor is that traffic that is not in use is not applied to the share algorithm.   For example assume the following:


You would assume that the total shares would be 10+25+25+50 = 110  but if you are not using any FT traffic then it’s 10+25+50=95  either way this number can be divided by total bandwidth so worst case scenario with 100% contention with all traffic types would get the following:

  • Management (20/110=.18*10) 1.8 GB
  • FT (20/110=.18*25) 4.5 GB
  • vMotion (20/110=.18*25) 4.5GB
  • Virtual machine (20/110=.18*50) 9 GB

And remember this is per host.   You will want to adjust the default settings to fit your requirements and traffic patterns.

This design has some real advantages:

  • The vmotion nic is seen as 10GB which means you can do 8 concurrent vmotions at the same time
  • No more wasted bandwidth
  • Easy to setup and forget about

Load balancing:

Load balancing algorithms in vSphere each have their own personality and physical requirements.   But we want simple above everything else.  So we choose to use Load Balanced teaming (LBT) known as physical nic load in vDS.  This is a great choice for enterprise plus customers.  It allocates usage of any one link to 80%, Once 80% is reached then some of the traffic is moved over to the next link.  This configuration will work with any number of uplinks without any configuration on the physical switch.  We avoid loops because unique traffic does not share uplinks.  For example virtual machine 1 will use uplink1 exclusively while virtual machine 2 uses uplink2.   With this load balancing method we don’t have to assign different uplink priorities to port groups in order to balance traffic just let LBT handle it.    It is 100% fire and forget.  If you find you need more bandwidth just add more uplinks to the switch and you will be using it.

Radically simple networking

It’s simple and it works.  Here is a simple diagram of the solution:



Once setup it scales and provides for all your needs.   It’s consistent clean and designed around possible failures.  It allows all traffic types to use as much network as needed unless contention is present.   Just think of it as DRS for networking.   I just wish I could handle my physical switches this way… maybe some day NSX.

VMware predefined NIOC settings what do they mean?

Recently I was setting up a new 5.5 cluster with NIOC and I noticed all the new NIOC pre-build categories:

Untitled (1)


Some are obvious but others are a little more questionable.  After a great discussion with VMware support I found out the following:

  • NFS traffic – This is traffic using the NFS bindings in ESXi (not guest NFS traffic) only ESXi NFS traffic
  • Management Traffic – ESXi management traffic only – connections between vcenter and ESXi
  • vMotion Traffic – vMotion and heartbeats
  • vSphere storage Area network traffic – I had a lot of questions on this one but it turned out to be simple vSAN only traffic
  • vSphere replication traffic – Traffic coming from the vsphere replication appliance only no other replication traffic
  • iSCSI traffic – As expected it’s traffic to ESXi that is iSCSI using hardware or software initiator
  • Virtual Machine traffic – Traffic out of guest virtual machines
  • Fault Tolerance Traffic – Traffic specific to vmware FT

There is all the predefined ones… what if I create a user defined category and assign it to my NFS port group… which assigns NIOC.   Simple the one with the larger share.

How does storage multipathing work?

Every week I spend some time answering questions on the vmware forums.  It also provides me great idea’s for blog posts just like this one.   It started with a simple question how does multipathing work?   Along with a lot of well thought out specific questions.   I tried to answer the questions but figured it would be best with some diagrams and a blog post.    I will focus this post on fiber channel multipathing.  First it’s important to understand that Fiber channel is nothing more than L2 communication using frames to push scsi commands.   Fiber channel switches are tuned to pass scsi packets as past as possible.

Types of Arrays

There are really three types connectivity with fiber channel (FC) arrays

  • Active/Active – I/O can be sent to a LUN via any of the arrays storage processors (SP) and port.  Normally this is implemented in larger arrays with lots of cache.  Writes are sent to the cache then destaged to disk.   Since everything is delivered to cache SP and port does not matter.
  • Active/Passive – I/O is sent down to a single SP and port that owns the LUN.  If I/O is send down any other path it is denied by array.
  • Pseudo Active/Active – I/O can be sent down any SP and port but there is a SP and port combination that owns the LUN.  Traffic send to the owner of the LUN is much faster than traffic sent to non-owners.

The most common implementation of pseudo active/active is asymmetric logical unit access (AULA) defined in the SCSI-3 protocol.  In AULA the SP identifies the owner of a LUN with SCSI sense codes.

Access States

AULA has a few possible access states for any SP port combination:

  • Active/Optimized (AO) – this is the SP and port that owns the lun best possible path to use for performance
  • Active/Non-Optimized (ANO) – this is a SP and port that can be used to access a lun but it’s slower than the AO
  • Transitioning – this lun is changing from one state to another and not available for IO – Not used in most AULA now
  • Standby – Not active but available – Not used in most AULA now
  • Unavailable – SP and port not available

In a active/active array the following states exist:

  • Active – All SP and ports should be this state.
  • Unavailable – SP and port not available

In a active/passive array the following states exist:

  • Active – SP and port to access the lun (single owner)
  • Standby – SP and port available is active is gone
  • Transitioning – Switch to Active or Standby

In AULA arrays you also have Target port groups (TPG) which are SP and ports that have a similar state.  For example all the ports on a single SP may be a TPG since the LUN is owned by the SP.

How does your host know what the state is?

Great question.  Using SCSI commands a host and array communicate state.   There are lots of commands in the standard.  I will show three management commands from AULA array’s since they are the most interesting:

  • Inquiry – Ask a scsi question
  • Report Target port – Reports what TPG has the optimized path
  • Set Target port group – ask the array to switch the target port group ownership


This brings up some fun scenario’s who can initiate these commands and when…  All of these will use a AULA array


So we have a server with two HBA’s connected to san switches.  In turn the SP’s are connected to the san switches.  SPa owns LUN1 via AO and SPb owns LUN2 via AO.



Consider the following failures:

  • HBA1 fails – assuming the pathing software on the OS is set correctly (more on this later) The operating system access LUN1 via ANO path to SPb to continue to access storage.  Then it initiates a set target group command to SPb asking it to take over LUN1.  Which is fulfilled and the array sends out a report target port groups to all known systems that they should use SPb for access to LUN1 for AO.
  • SPa fails – assuming the pathing in OS is good.  Access to LUN1 fails via SPa and the OS fails over the SPb and initiates the LUN fail over.

This is designed just to show the interaction in a real environment you would want san switch a and b both connected to SPa and SPb if possible for redundancy.

How does ESXi deal with paths?

ESXi has three possible path states:

  • Active
  • Standby
  • Dead – cable unplug, bad connection / switch

It will always try to access to the lun via any path available.

Why does path selection policy matter?

The path selection policy can make a huge difference.  For example if you have a AULA array you would not use the round robin path selection policy.  Doing this would cause at least half your I/O’s to go down the ANO path which would be slow.   ESXi supports three policies out of the box:

  • Fixed – Honors the AO path until available most commonly used with AULA arrays
  • Most recently used (MRU) – Ignores the prefered path and uses the most recently used path until it’s dead (used in active/passive arrays)
  • Round Robin (RR) – sends a fixed number or I/O’s / bytes down a path then switches to next path.  Ignores AO.  Used normally with active/active arrays

The number of I/O’s or bytes sent before switching in RR can be configured but defaults to 1000 io’s and 10485760 bytes.

Which path should you use?  That depends on your storage array and you should work with your vendor to understand their best practices.  In addition a number of vendors have their own multipath systems that you should use (for example EMC’s powerpath).