VMkernel types updated with design guidance for multi-site

Holy crap what do all these VMware VMkernel type mean?  I started this article and realized I had already written one here.  Sad when google leads you to something you wrote… looks like I don’t remember too well… Perhaps I should just go yell for the kids to get off my lawn now.   I wanted to take a minute to revise my post with some new things I have learned and some guidance.

capture

From my previous post:

  • vMotion traffic – Required for vMotion – Moves the state of virtual machines (active datadisk svMotion, active memory and execution state) during a vMotion
  • Provisioning traffic – Not required will use management network if not setup – cold migration, cloning and snapshot creation (powered off virtual machines = cold)
  • Fault tolerance traffic (FT)  – Required for FT – Enables fault tolerance traffic on the host – only a single adapter may be used for FT per host
  • Management traffic – Required – Management of host and vCenter server
  • vSphere replication traffic – Only needed if using vSphere replication– outgoing replication data from ESXi host to vSphere replication server
  • vSphere replication NFC traffic – Only needed if using vSphere replication – handles incoming replication data on the target replication site
  • Virtual SAN – Required for VSAN – virtual san traffic on the host
  • VXLAN – used for NSX not controlled from the add vmkernel interface.

I wanted to provide a little better explanation around design elements with some interfaces.  Specifically I want to focus on vMotion and Provisioning traffic.    Let’s create a few scenario’s and see what interface is used assuming I have all the VMkernel interfaces listed above:

  1. VM1 is running and we want to migrate from host1 to host2 at datacenter1 – vMotion
  2. VM1 is running with a snapshot and we want to migrate from host1 to host2 at datacenter1 – Provisioning traffic (if it does not exist management network is used)
  3. VM1 is running with a snapshot and we want to storage migrate from host1 DC1 to host4 DC3 – storage vMotion – Provisioning traffic (if it does not exist management network is used)
  4. VM1 is not running and we want to migrate from host1 to host2 at datacenter1 – Provisioning traffic (very low bandwidth used)
  5. VM1 is not running has a snapshot and we want to migrate from host1 to host2 at datacenter1 – Provisioning traffic (very low bandwidth used)
  6. VM2 is being created at datacenter1 – Provisioning traffic

 

Your ads will be inserted here by

Easy Plugin for AdSense.

Please go to the plugin admin page to
Paste your ad code OR
Suppress this ad slot.

So design guidance in a multi-site implementation you should have the following interfaces if you wish to separate the TCP-IP stack  or use network IO control to avoid bad neighbor situations.   (Or you could just assign it all to management vmk and go nuts on that interface = bad idea)

  • Management
  • vMotion
  • Provisioning

Use of other vmkernel interfaces depends on if you are using replication, vSAN or NSX.

Should you have multi-nic vMotion? 

Multi-nic vMotion enables faster vMotion of multiple entries off a host (as long as they don’t have snapshots).   It still is a good idea if you have large vm’s or lots of vm’s on a host.

Should you have multi-nic Provisioning?

No idea if it’s even supported or a good idea.  Provisioning network is used for long distance vMotion so the idea might be good… I would not use it today.

Should IT build a castle or a mobile home?

So I have many hobbies to keep my mind busy during idle times… like when driving a car.   One of my favorite hobbies is to identify the best candidate locations to live in if the Zombie apocalypse was to happen.   As I drive in my car between locations I see many different buildings and I attempt to rate large buildings by their Zombie proof nature.   There are many things to consider in the perfect Zombie defense location for example:

  • Avoiding buildings with large amounts of windows or first floor windows
  • Building made of materials that cannot be bludgeoned open for example stone
  • More than one exit but not too many exits
  • A location that can be defended on all sides and allows visible approach

There are many other considerations like proximity to water and food etc..  but basically I am looking for the modern equivalent of a castle:pexels-photo

OK what does this have to do with IT

Traditional infrastructure is architected like a castle its primary goal is to secure at the perimeter and be very imposing to keep people out.   During a zombie attack this model is great until they get in then it becomes a grave yard.   IT architects myself include spend a lot of time considering all the factors that are required to build the perfect castle.   There are considerations like:

  • Availability
  • Recoverability
  •  Manageability
  • Performance
  • Security

That all have to be considered and as you add another wing to your castle every one of these elements of design must be considered for the whole castle.  We cannot add a new wing that bridges the moat without extending the moat etc..   Our design to build the perfect castle has created a monolithic drag.   While development teams move from annual releases to quarters or weeks or days we continue to attempt to control the world from a perimeter design perspective.   If we could identify all possible additions to the castle at the beginning we could potentially account for them.   This was true in the castle days:  there were only so many ways to get into the castle and so many methods to break in.    Even worse the castle provided lots of nooks and locations for zombies to hide and attack me when not expecting it..  This is the challenge with the Zombie attack they don’t follow the rules they just might create a ladder out of zombie bodies and get into your castle (World War Z style).   If we compare zombies to the challenges being thrown at IT today the story becomes valid.    How do we deal with constant change and unknown?   How do we become agile to change?   Is it from building a better castle?

Introducing the mobile home

pexels-photo-106401

Today I realized that the perfect solution to my Zombie question was the mobile home.   We can all assume that I need a place to sleep.   Something that I can secure with reasonable assurance.   I can re-enforce the walls and windows on a mobile home and I gain something I don’t have with a castle: mobility.  I can move my secured location and goods to new locations.  My mobile home is large enough to provide for my needs without providing too many places for zombies to hide.  IT needs this type of mobility.   Cloud has provided faster time to market for many enterprises but in reality you are only renting space in someone else’s castle.    There are all types of methods to secure your valuables from mine but in reality we are at the mercy of the castle owner.   What if my service could become a secured mobile home… that would provide the agility I need in the long run.   The roach motel is very alive and well in cloud providers today.   Many providers have no cross provider capabilities while others provide tools to transform the data between formats.   My mobile home needs to be secure and not reconfigured each time I move between locations while looking for resources or avoiding attack.  We need to reconsider IT as a secured mobile home and start to build this model.   Some functions to consider in my mobile home:

  • Small enough to provide the required functions (bathroom, kitchen and sleeping space or in IT terms business value) and not an inch larger than required
  • Self contained security the encircles the service
  • Mobility without interruption of services

Thanks for reading my rant.  Please feel free to provide your favorite zombie hiding location or your thoughts on the future of IT.

 

Breaking out a SSO/PSC to enable enhanced linked mode

The single sign on used to be a fairly painless portion of vCenter (once we got to 5.5, in 5.0 it was a major pain).    It was essentially a lightweight directory (vsphere.local) and gateway to active directory.  The platform services controller (PSC) of vCenter 6 is a completely different animal.  It performs a lot of new functions that are not easy to transfer between instances.  For example the PSC does the following:

  • Handles and stores SSL certificates
  • Handles and stores license keys
  • Handles and stores permissions via global permissions layer
  • Handles and stores replication of Tags and Catagories
  • Built in automation replication between different sites

Why does it do all these and why do I care?

Well VMware has come to understand that virtual machines cannot be bound to a specific location more and more customer want Hybrid and multi-site capabilities while keeping the same management.   A lot of the management functions are based around Tags and permissions have a over arching layer to provide that functionality is huge.   I assume that we are going to see more features passed up to the PSC layer in order to make cross site/ vCenter features available.

Architectural change

In 6.0 VMware changed the architecture to have external PSC’s as a preferred mode of operation.   In fact they support up to 8 replicated PSC’s and they have two constructs that matter:

  • Domain (traditionally this has been vsphere.local)
  • Sites (Physical locations)

Site designation changes how the PSC’s and their multi-masters replicate (choosing to replicate to a single instance at each site then have that instance replicate to local nodes)

The change to external PSC’s is a challenge for many users.  First let me be clear about a challenge you can only have one domain: merging domains is not supported. Once you get to 6 you cannot leave a domain and join a different domain I have not seen instructions to do it and it does not seem to be supported.  In 5 you can leave a SSO domain and join a different domain so if you are still on 5 and wish to join multiple machines to the same domain do it while on 5 using SSO.  If you wish to move from an embeded PSC to an external PSC the process is pretty simple:

  1. Install a new PSC (can be windows or Linux) joined to the embedded PSC
  2. Repoint the vCenter to the new PSC (instructions here)
  3. Remove the old PSC

The key takeaway for all of you who might have slotted off during this article is this: Make any topology changes to vCenter domains before upgrading to 6.

 

Long Distance Cross vCenter vMotion requirements

The ability to move virtual machines long distances between two datacenters while running seems like the key example of the power of abstraction.   VMware has enabled this feature but it has a number of requirements that make the cost of ownership a little high.    All of these requirements are listed in VMware KB articles but you have to mine them for the details to ensure you are compatible.   Having recently been stung by these requirements I thought I would collect them into a single location.

Assumptions:

The following assumptions are made:

  • You are running two vCenters one at each site
  • You are running virtual distributed switches at each site

KB Articles mined for the data

Requirements

  • The source and destination vCenter server instances and ESXi hosts must be running version 6.0 or later.
  • Requires Enterprise Plus licensing
  • When initiating the moves in the web client both source and destination vCenter instances must be in Enhanced Linked mode and in the same vCenter Single Sign-On domain (When using API this is not a requirement)
  • Both vCenter Servers must be time synced for SSO to work
  • For migration of compute resources only, both ESXi hosts must be connected to the shared virtual machine storage.
  • When using the vSphere APIs/SDK, both vCenter Server instances may exist in separate vSphere Single Sign-On domains. Additional parameters are required when performing a non-federated cross vCenter Server vMotion.
  • MAC address must no conflict (different vCenter ID’s will ensure this)
  • vMotion cannot take place from distributed switch to standard switch
  • vMotion cannot take place between distributed switches of different versions (source and destination vDS must be the same version)
  • RTT (round-trip time) latency of 150 milliseconds or less, between hosts
  • You must create a routeable network for the Traffic for Cold migrations (Provisioning network from VMkernel types)

 

These requirements can really bite you if you are not careful.   Notice there are no constraints on vMotioning from a standard switch to a distributed switch which helps you get around version differences.   The truth is that vMotion is a miracle of engineering and then cross vCenter vMotion is an even better miracle but it comes at a cost.   Essentially best case scenario you have to have two vCenters in enhanced linked mode on the same version of ESXi, with the same hardware type or in EVC with the same version of distributed switches.   It’s a lot of asks to enable the features and something to consider if your planning on using long distance cross vCenter vMotion.

 

Configuring a NSX load balancer from API

A customer asked me this week if there was any examples of customers configuring the NSX load balancer via vRealize Automation.   I was surprised when google didn’t turn up any examples.  The NSX API guide (which is one of the best guides around) provides the details for how to call each element.  You can download it here. Once you have the PDF you can navigate to page 200 which is the start of the load balancer section.

Too many Edge devices

NSX load balancers are Edge service gateways.   A normal NSX environment may have a few while others may have hundreds but not all are load balancers.   A quick API lookup of all Edges provides this information: (my NSX manager is 192.168.10.28 hence the usage in all examples)

 

This is for a single Edge gateway in my case I have 57 Edges deployed over the life of my NSX environment and 15 active right now.   But only Edge-57 is a load balancer.   This report does not provide anything that can be used to identify it as a load balancer from a Edge as a firewall.   In order to identify if it’s a load balancer I have to query it’s load balancer configuration using:

Notice the addition of the edge-57 name to the query.   It returns:

Notice that this edge has load balancer enabled as true with some default monitors.   To compare here is a edge without the feature enabled:

Enabled is false with the same default monitors.   So now we know how to identify which edges are load balancers:

  • Get list of all Edges via API and pull out id element
  • Query each id element for load balancer config and match on true

 

 

Adding virtual servers

You can add virtual servers assuming the application profile and pools are already in place with a POST command with a XML body payload like this (the virtual server IP must already be assigned to the Edge as an interface):

capture

You can see it’s been created.  A quick query:

 

Shows it’s been created.  To delete just use the virtualServerId and pass to DELETE

 

Pool Members

For pools you have to update the full information to add a backend member or for that matter remove a member.  So you first query it:

Then you form your PUT with the data elements you need (taken from API guide).

In the client we see a member added:

capture

Tie it all together

Each of these actions have a update delete and query function that can be done.  The real challenge is taking the API inputs and creating user friendly data into vRealize Input to make it user friendly.    NSX continues to amaze me as a great product that has a very powerful and documented API.    I have run into very little issues trying to figure out how to do anything in NSX with the API.  In a future post I may provide some vRealize Orchestrator actions to speed up configuration of load balancers.

 

 

 

 

 

 

 

 

 

vSphere 6.5 features that are exciting to me

Well yesterday VMware announced vSphere 6.5 and VSAN 6.5  both are huge leaps forward in technology.   They address some major challenges my customers face and I wanted to share a few features that I think are awesome:

vSphere 6.5

  • High Availability in vCenter Appliance – if you wanted a reason to switch to the appliance this has to be it… for years I have asked for high availability for vCenter.   Now we have it.   I look forward to testing and blogging about failure scenarios with this new version.  This has to be my #1 ask for the platform for the last three years!  – We are not talking about VMware HA we are talking about active / standby appliances.
  • VM EncryptionNotice this is a feature of vSphere not VSAN – this is huge the hypervisor can encrypt virtual machines at rest and while being vMotioned.   This is a huge enabler for public cloud allowing you to ensure your data is secure with your encryption keys.   This is going to make a lot of compliance folks happen and enable some serious hybrid cloud.
  • Integrated Containers – Docker compatible interface for containers in vSphere allowing you to spawn stateless containers while enforcing security, compliance and monitoring using vSphere tools (NSX etc..) – this allows you to run traditional and next generation applications side by side.

VSAN 6.5

  • iSCSI support – VSAN will be able to be a iSCSI target for physical workloads – a.k.a SQL failover clustering and Oracle RAC.   This is huge VSAN can now be a iSCSI server that has easy policy based management and scaleable performance.

There are a lot more annoucements but these features are just awesome.    You can read more about vSphere 6.5 here and VSAN 6.5 here.

vRO scriptable task to return top level folder of a VM

Every so often you have nested folders in a vCenter and want to return only the top level folder.  Here is a function to return the top level folder only: