How to add undue stress to your L3 switch without warning

Same L3 switch, new problem. Some vital stats for reference:

  • cisco Catalyst 3750E
  • 250+ VLAN interfaces
  • 100+ Gig/10GE interfaces
  • OSPF
  • PBR is used to specify a different next-hop
  • The same route-map is applied to each interface so that the route-map only needs one corresponding access-list for the match clause.

We got alerted by our monitoring to some consistently high CPU load. We followed the check list for troubleshooting 3750 CPU load here and didn't see anything that particularly caught our eye. Checking revision control (you have configuration revision control, don't you?) a change commit was found that correlated perfectly with the increase in CPU utilisation. It turns out that cisco's check list was relevant and this particular section explained it:

When configuring match criteria in a route map, follow these guidelines:
  • Do not match ACLs with deny ACEs. Packets that match a deny ACE are sent to the CPU, which could cause high CPU utilization.

The ACL in use for the route-map had four deny lines at the start with a combined match count in the hundred of millions.

Lesson learned: when you think you've read all of the best practice guides, there'll be another one you haven't had to read until something goes wrong.

VMware ESX vCPU: Cores vs Sockets

Simply put, by default each vCPU you configure a guest with is presented to the Guest OS as a socket. Various benchmarks suggest there's no performance difference between (for instance) a four single-core socket VM vs a single four-core socket VM, which is fine and dandy.

Unless you use fancy software that does CPU licensing.

This post has a great explanation, with anecdotal evidence that you can fix this, albeit expreimentally, in ESX 3.5u2 and beyond.

Also, something I found today that is the (now second-) most useful thing I'll find all day: