Quantcast
Channel: sFlow
Viewing all 347 articles
Browse latest View live

OVS Orbit podcast with Ben Pfaff

$
0
0
OVS Orbit Episode 6 is a wide ranging discussion between Ben Pfaff and Peter Phaal of the industry standard sFlow measurement protocol, implementation of sFlow in Open vSwitch, network analytics use cases and application areas supported by sFlow, including: OpenStack, Open Network Virtualization (OVN), DDoS mitigation, ECMP load balancing, Elephant and Mice flows, Docker containers, Network Function Virtualization (NFV), and microservices.

Follow the link to see listen to the podcast, read the extensive show notes, follow related links, and to subscribe to the podcast.

Internet of Things (IoT) telemetry

$
0
0
The internet of things (IoT) is the network of physical objects—devices, vehicles, buildings and other items—embedded with electronics, software, sensors, and network connectivity that enables these objects to collect and exchange data. - ITU

The recently released Raspberry Pi Zero (costing $5) is an example of the type of embedded low power computer enabling IoT. These small devices are typically wired to one or more sensors (measuring temperature, humidity, location, acceleration, etc.) and embedded in or attached to physical devices.

Collecting real-time telemetry from large numbers of small devices that may be located within many widely dispersed administrative domains poses a number of challenges, for example:
  • Discovery - How are newly connected devices discovered?
  • Configuration - How can the numerous individual devices be efficiently configured?
  • Transport - How efficiently are measurements transported and delivered?
  • Latency - How long does it take before measurements are remotely accessible? 
This article will use the Raspberry Pi as an example to explore how the architecture of the industry standard sFlow protocol and its implementation in the open source Host sFlow agent provide a method of addressing the challenges of embedded device monitoring.

The following steps describe how to install the Host sFlow agent on Raspbian Jesse (the Debian Linux based Raspberry Pi operating system).
sudo apt-get update
sudo apt-get install libpcap-dev
git clone https://github.com/sflow/host-sflow
sudo make install
The resulting Host sFlow binary is extremely small (only 163,300 bytes in this case):
pi@raspberrypi:~ $ ls -l /usr/sbin/hsflowd 
-rwx------ 1 root root 163300 Jun 1 17:18 /usr/sbin/hsflowd
Next, specify /etc/hsflowd.conf file for the device:
sflow {
agent = eth0
agent.cidr=::/0
DNSSD = on
DNSSD_domain = .sf.inmon.com
jsonPort = 36343
pcap { dev = eth0 }
}
There are a number of important points to note about this configuration:
  • The configuration is not device specific - this same configuration can be pre-loaded in every device.
  • Prefer IPv6 addresses as a way of identifying the agent since they are more likely to be globally unique.
  • DNS Service Discovery (DNS-SD) is used to retrieve dynamic configuration on startup and to periodically refresh the configuration. Hosting a single copy of the configuration (in the form of SRV and TXT records on the DNS server responsible for the sf.inmon.com domain) minimize the complexity of managing large numbers of devices.
  • Network visibility provides a way to monitor the interactions between the devices on the network. The pcap entry enables a Berkeley Packet Filter to efficiently sample network traffic using instrumentation built into the Linux kernel.
  • Custom Metrics can be be sent along with the extensive set of standard sFlow metrics by including the jsonPort entry.
Now start the daemon:
sudo /etc/init.d/hsflowd start
Now add an entry to the sf.inmon.com.zone file on the DNS server:
_sflow._udp   30  SRV     0 0 6343  collector.sf.inmon.com.
In this case, the SRV record specifies that sFlow records should be sent via UDP to collector.sf.inmon.com on port 6343. The TTL is set to 30 seconds so that agents will pick up any changes within 30 seconds. A larger TTL should be used to improve scaleability if there are large numbers of devices.

The following example shows how Custom Metrics can be used to export sensor data. The temp.py script exports the CPU temperature:
#!/usr/bin/env python

import json
import socket

tempC = int(open('/sys/class/thermal/thermal_zone0/temp').read()) / 1e3
msg = {
"rtmetric": {
"datasource": "sensors",
"tempC": { "type": "gaugeFloat", "value": tempC }
}
}
sock = socket.socket(socket.AF_INET,socket.SOCK_DGRAM)
sock.sendto(json.dumps(msg),("127.0.0.1",36343))
The following crontab entry runs the script every minute:
* * * * * /home/pi/temp.py
The Host sFlow agent will automatically pick up the configuration via a DNS request, start making measurements, which are immediately send in standard sFlow UDP datagrams to the designated sFlow collector collector.sf.inmon.com. sFlow's immediate transmission of measurements minimizes the memory requirements on the agent (since data doesn't have to be stored for later retrieval) and minimizes the latency before measurements are accessible on the collector (and can be acted on).

It should also be noted that all communication is initiated by the device (DNS requests and transmission of telemetry via sFlow). This means that the radio on the device can be powered down between transmissions to save power (and extend battery life if the device is battery powered).
Raspberry Pi real-time network analytics describes how to build a low cost sFlow analyzer using a Raspberry Pi model 3 b and sFlow-RT real-time analytics software. The following command queries the sFlow-RT REST API to show the set of standard metrics being exported by the agent (2001:470:67:27d:d811:aa7e:9e54:30e9):
pi@raspberrypi:~ $ curl http://localhost:8008/metric/2001:470:67:27d:d811:aa7e:9e54:30e9/json
{
"2.1.bytes_in": 3211.5520193372945,
"2.1.bytes_out": 462.2822036458858,
"2.1.bytes_read": 0,
"2.1.bytes_written": 4537.818511431161,
"2.1.contexts": 7006.546480008057,
"2.1.cpu_guest": 0,
"2.1.cpu_guest_nice": 0,
"2.1.cpu_idle": 99.17638114546376,
"2.1.cpu_intr": 0,
"2.1.cpu_nice": 0,
"2.1.cpu_num": 4,
"2.1.cpu_sintr": 0.025342118601115054,
"2.1.cpu_speed": 0,
"2.1.cpu_steal": 0,
"2.1.cpu_system": 0.456158134820071,
"2.1.cpu_user": 0.3294475418144957,
"2.1.cpu_utilization": 0.8236188545362393,
"2.1.cpu_wio": 0.012671059300557527,
"2.1.disk_free": 24435570688,
"2.1.disk_total": 29627484160,
"2.1.disk_utilization": 17.523977160453796,
"2.1.drops_in": 0,
"2.1.drops_out": 0,
"2.1.errs_in": 0,
"2.1.errs_out": 0,
"2.1.host_name": "raspberrypi",
"2.1.icmp_inaddrmaskreps": 0,
"2.1.icmp_inaddrmasks": 0,
"2.1.icmp_indestunreachs": 0,
"2.1.icmp_inechoreps": 0,
"2.1.icmp_inechos": 0,
"2.1.icmp_inerrors": 0,
"2.1.icmp_inmsgs": 0,
"2.1.icmp_inparamprobs": 0,
"2.1.icmp_inredirects": 0,
"2.1.icmp_insrcquenchs": 0,
"2.1.icmp_intimeexcds": 0,
"2.1.icmp_intimestamps": 0,
"2.1.icmp_outaddrmaskreps": 0,
"2.1.icmp_outaddrmasks": 0,
"2.1.icmp_outdestunreachs": 0,
"2.1.icmp_outechoreps": 0,
"2.1.icmp_outechos": 0,
"2.1.icmp_outerrors": 0,
"2.1.icmp_outmsgs": 0,
"2.1.icmp_outparamprobs": 0,
"2.1.icmp_outredirects": 0,
"2.1.icmp_outsrcquenchs": 0,
"2.1.icmp_outtimeexcds": 0,
"2.1.icmp_outtimestampreps": 0,
"2.1.icmp_outtimestamps": 0,
"2.1.interrupts": 4438.56380300131,
"2.1.ip_defaultttl": 64,
"2.1.ip_forwarding": 2,
"2.1.ip_forwdatagrams": 0,
"2.1.ip_fragcreates": 0,
"2.1.ip_fragfails": 0,
"2.1.ip_fragoks": 0,
"2.1.ip_inaddrerrors": 0,
"2.1.ip_indelivers": 9.165072011280088,
"2.1.ip_indiscards": 0,
"2.1.ip_inhdrerrors": 0,
"2.1.ip_inreceives": 9.215429549803606,
"2.1.ip_inunknownprotos": 0,
"2.1.ip_outdiscards": 0,
"2.1.ip_outnoroutes": 0,
"2.1.ip_outrequests": 2.1653741565112297,
"2.1.ip_reasmfails": 0,
"2.1.ip_reasmoks": 0,
"2.1.ip_reasmreqds": 0,
"2.1.ip_reasmtimeout": 0,
"2.1.load_fifteen": 0.05,
"2.1.load_fifteen_per_cpu": 0.0125,
"2.1.load_five": 0.02,
"2.1.load_five_per_cpu": 0.005,
"2.1.load_one": 0,
"2.1.load_one_per_cpu": 0,
"2.1.machine_type": "arm",
"2.1.mem_buffers": 52133888,
"2.1.mem_cached": 383287296,
"2.1.mem_free": 238026752,
"2.1.mem_shared": 0,
"2.1.mem_total": 970506240,
"2.1.mem_used": 297058304,
"2.1.mem_utilization": 30.608591437339783,
"2.1.os_name": "linux",
"2.1.os_release": "4.4.9-v7+",
"2.1.page_in": 0,
"2.1.page_out": 2.2157316950347465,
"2.1.part_max_used": 31.74,
"2.1.pkts_in": 11.582233860408904,
"2.1.pkts_out": 2.266089233558264,
"2.1.proc_run": 1,
"2.1.proc_total": 237,
"2.1.read_time": 0,
"2.1.reads": 0,
"2.1.swap_free": 104853504,
"2.1.swap_in": 0,
"2.1.swap_out": 0,
"2.1.swap_total": 104853504,
"2.1.tcp_activeopens": 0,
"2.1.tcp_attemptfails": 0,
"2.1.tcp_currestab": 6,
"2.1.tcp_estabresets": 0,
"2.1.tcp_incsumerrs": 0,
"2.1.tcp_inerrs": 0,
"2.1.tcp_insegs": 2.568234464699365,
"2.1.tcp_maxconn": 4294967295,
"2.1.tcp_outrsts": 0,
"2.1.tcp_outsegs": 1.913586463893645,
"2.1.tcp_passiveopens": 0,
"2.1.tcp_retranssegs": 0,
"2.1.tcp_rtoalgorithm": 1,
"2.1.tcp_rtomax": 120000,
"2.1.tcp_rtomin": 200,
"2.1.udp_incsumerrors": 0,
"2.1.udp_indatagrams": 6.596837546580723,
"2.1.udp_inerrors": 0,
"2.1.udp_noports": 0,
"2.1.udp_outdatagrams": 0.2517876926175849,
"2.1.udp_rcvbuferrors": 0,
"2.1.udp_sndbuferrors": 0,
"2.1.uptime": 46603,
"2.1.uuid": "22c4ce8c-067e-4517-8c00-8d822efc4897",
"2.1.write_time": 8.333333333333334,
"2.1.writes": 0.6042904622822036,
"2.ifadminstatus": "up",
"2.ifdirection": "full-duplex",
"2.ifindex": "2",
"2.ifindiscards": 0,
"2.ifinerrors": 0,
"2.ifinoctets": 3241.1101037574294,
"2.ifinpkts": 11.483831973405863,
"2.ifinucastpkts": 11.483831973405863,
"2.ifinutilization": 0.025928880830059432,
"2.ifname": "eth0",
"2.ifoperstatus": "up",
"2.ifoutdiscards": 0,
"2.ifouterrors": 0,
"2.ifoutoctets": 406.6686813740304,
"2.ifoutpkts": 1.8636043114737586,
"2.ifoutucastpkts": 1.8636043114737586,
"2.ifoututilization": 0.0032533494509922435,
"2.ifspeed": 100000000,
"2.iftype": "ethernetCsmacd",
"sensors.tempC": 49.388
}
Note the custom temperature metric at the end of the list.

In addition, enabling traffic monitoring in the Host sFlow agent provides detailed flow information along with the metrics to provide visibility into interactions between the devices on the network. In this case the wired Ethernet interface (eth0) is being monitored, but monitoring the wireless interface (wlan0) would be a way to gain visibility into messages exchanged over an ad-hoc wireless mesh network connecting devices. RESTflow describes how to perform flow analytics using sFlow-RT.

In conclusion, sFlow provides a standard way to export metrics and traffic information. Most network equipment vendors already provide sFlow support and the technology has a number of architectural features that are well suited to addressing the challenges of extending visibility to and gathering telemetry from large scale IoT deployments.

Streaming telemetry

$
0
0
The OpenConfig project has been getting a lot of attention lately.  A number of large network operators, lead by Google, are developing "a consistent set of vendor-neutral data models (written in YANG) based on actual operational needs from use cases and requirements from multiple network operators."

The OpenConfig project extends beyond configuration, "Streaming telemetry is a new paradigm for network monitoring in which data is streamed from devices continuously with efficient, incremental updates. Operators can subscribe to the specific data items they need, using OpenConfig data models as the common interface."

Anees Shaikh's Network Field Day talk provides an overview of OpenConfig and includes an example that demonstrates how configuration and state are combined in a single YANG data model. In the example, read/write config attributes used to configure a network interface (name, description, MTU, operational state) are combined with the state attributes needed to verify the configuration (MTU, name, description, oper-status, last-change) and collect metrics (in-octets, in-ucast-pkts, in-broadcast-pkts, ...).

Anees positions OpenConfig streaming telemetry mechanism as an attractive alternative polling for metrics using Simple Network Management Protocol (SNMP) - see Push vs Pull for a detailed comparison between pushing (streaming) and pulling (polling) metrics.

Streaming telemetry is not unique to OpenConfig. Industry standard sFlow is a streaming telemetry alternative to SNMP that has seen rapid vendor adoption over the last decade. Drivers for growth discusses how the rise of merchant silicon and virtualization have accelerated adoption of sFlow, particularly in data centers.

sFlowOpenConfig TelemetrySNMP
OrganizationsFlow.orgOpenConfig.netIETF
UsersGeneral PurposeLarge Service ProvidersGeneral Purpose
ScopeData Plane, Control PlaneManagement Plan, Control PlaneControl Plane
Vendor Support40+ (see sFlow.org)1 (Cisco IOS XR )Near universal
Modelsstructure definitionsYANG modelsManagement Information Base
EncodingXDR (RFC 4506)protobufs, JSON, NetConfASN.1
TransportUDPUDP, HTTPUDP
ModePushPushPull

The table compares sFlow and OpenConfig Telemetry. There are a number of similarities: sFlow and OpenConfig are both driven by participation based organizations that publish standards to ensure multi-vendor interoperability, and both push standard sets of metrics using standard encodings over widely supported transport protocols.

However, important differences result from OpenConfig's exclusive focus on large service provider configuration and monitoring requirements. Telemetry is tied to the hierarchical YANG configuration models, making it easy to correlate operational and configured state, but limiting the scope of monitoring to the management and control planes of devices that are configured using OpenConfig.

In contrast, sFlow is a management and control plane agnostic monitoring technology (i.e. a device may be configured using CLI, NetConf, JSON RPC, OpenConfig, etc. to use any control plane OpenFlow, BGP, OSPF, TRILL, spanning tree, etc.). In addition, sFlow is primarily concerned with the data and control planes, i.e. capturing information about packets and forwarding actions.
The article, Management, control and data planes in network devices and systems, by Ivan Pepelnjak, provides background on data, control, and management plane terminology.
Gathering data plane telemetry requires hardware support and merchant silicon vendors (Broadcom, Cavium/XPliant, Intel/Fulcrum, Marvell etc.) include sFlow instrumentation in their switching/routing ASICs. Embedded hardware support allows sFlow to efficiently stream standardized data plane telemetry from all devices in a large multi-vendor network.

To conclude, sFlow and OpenConfig shouldn't be viewed as competing technologies. Instead, the complementary capabilities of sFlow and OpenConfig and shared architectural model make it easy to combine sFlow and OpenConfig into an integrated management solution that unifies monitoring and control of management, control, and data planes.

Docker networking with IPVLAN and Cumulus Linux

$
0
0
Macvlan and Ipvlan Network Drivers are being added as Docker networking options. The IPVlan L3 Mode shown in the diagram is particularly interesting since it dramatically simplifies the network by extending routing to the hosts and eliminating switching entirely.

Eliminating the complexity associated with switching broadcast domains, VLANs, spanning tree, etc. allows a purely routed network to be easily scaled to very large sizes. However, there are some challenges to overcome:
IPVlan will require routes to be distributed to each endpoint. The driver only builds the Ipvlan L3 mode port and attaches the container to the interface. Route distribution throughout a cluster is beyond the initial implementation of this single host scoped driver. In L3 mode, the Docker host is very similar to a router starting new networks in the container. They are on networks that the upstream network will not know about without route distribution.
Cumulus Networks has been working to simplify routing in the ECMP leaf and spine networks and the white paper Routing on the Host: An Introduction shows how the routing configuration used on Cumulus Linux can be extended to the hosts.
This article explores the combination of Cumulus Linux networking with Docker IPVLAN using a simple test bed built using free software: VirtualBox, CumulusVX switches, and Ubuntu 16.04 servers.  This setup should result in a simple, easy to manage, easy to monitor, networking solution for Docker since all the switches and server will be running Linux allowing the same routing, monitoring, and orchestration software to be used throughout.
Using Cumulus VX with VirtualBox and Creating a Two-Spine, Two-Leaf Topology provide detailed instructions on building and configuring the leaf and spine network shown in the diagram. However, BGP was configured as the routing protocol instead of OSPF (see BGP configuration made simple with Cumulus Linux). For example, the following commands configure BGP on leaf1:
interface swp1
ipv6 nd ra-interval 10
link-detect
!
interface swp2
ipv6 nd ra-interval 10
link-detect
!
interface swp3
ipv6 nd ra-interval 10
link-detect
!
router bgp 65130
bgp router-id 192.168.0.130
bgp bestpath as-path multipath-relax
neighbor EBGP peer-group
neighbor EBGP remote-as external
neighbor swp1 interface peer-group EBGP
neighbor swp1 capability extended-nexthop
neighbor swp2 interface peer-group EBGP
neighbor swp2 capability extended-nexthop
neighbor swp3 interface peer-group EBGP
neighbor swp3 capability extended-nexthop
Auto-configured IPv6 link local addresses dramatically simplify the configuration of equal cost multi-path (ECMP) routing, eliminating the need to assign IP addresses and subnets to the routed switch ports. The simplified configuration is easy to template, all the switches have a very similar configuration, and an orchestration tool like Puppet, Chef, Ansible, Salt, etc. can be used to automate the process of configuring the switches.
Two Ubuntu 16.04 hosts were created, attached to leaf1 and leaf2 respectively. Each server has two network adapters: enp0s3 connected to the out of band network used to manage the switches and enp0s8 connected to the respective leaf switch.
Why the strange Ethernet interface names (e.g. enp0s3 instead of eth0)? They are the result of the predictable network interface names mechanism that is the default in Ubuntu 16.04. I can't say I am a fan, predictable interface names are difficult to read, push the problem of device naming up the stack, and make it difficult to write portable orchestration scripts.
Quagga is used for BGP routing on the Cumulus Linux switches and on the Ubuntu hosts. The host BGP configurations are virtually identical to the switch configurations, but with an added redistribute stanza to automatically advertise locally attached addresses and subnets:
redistribute connected route-map IPVLAN
The IPVLAN route-map is used to control the routes that are advertised, limiting them to the range of addresses that have been allocated to the IPVLAN orchestration system. Route filtering is an example of the flexibility that BGP brings as a routing protocol: top of rack switches can filter routes advertised by the hosts to protect the fabric from misconfigured hosts, and hosts can be configured to selectively advertise routes.

At this point, configuring routes between the hosts is easy. Configure a network on host1:
user@host1:~$ sudo ip address add 172.16.134.1/24 dev enp0s8
And a route immediately appears on host2:
user@host2:~$ ip route
default via 10.0.0.254 dev enp0s3 onlink
10.0.0.0/24 dev enp0s3 proto kernel scope link src 10.0.0.135
172.16.134.0/24 via 169.254.0.1 dev enp0s8 proto zebra metric 20 onlink
Add a network to host2:
user@host2:~$ sudo ip address add 172.16.135.1/24 dev enp0s8
And it appears on host1:
cumulus@host1:~$ ip route
default via 10.0.0.254 dev enp0s3 onlink
10.0.0.0/24 dev enp0s3 proto kernel scope link src 10.0.0.134
172.16.134.0/24 dev enp0s8 proto kernel scope link src 172.16.134.1
172.16.135.0/24 via 169.254.0.1 dev enp0s8 proto zebra metric 20 onlink
Connectivity across the leaf and spine fabric can be verified with a ping test:
user@host2:~$ ping 172.16.134.1
PING 172.16.134.1 (172.16.134.1) 56(84) bytes of data.
64 bytes from 172.16.134.1: icmp_seq=1 ttl=61 time=2.60 ms
64 bytes from 172.16.134.1: icmp_seq=2 ttl=61 time=2.59 ms
Ubuntu 16.04 was selected as the host operating system since it has built-in IPVLAN support. Docker Experimental Features distribution, which includes the IPVLAN Docker networking plugin, is installed on the two hosts.

The following command creates a Docker IPVLAN network on host1:
user@host1:~$ docker network create -d ipvlan --subnet=172.16.134.0/24 \
-o parent=enp0s8 -o ipvlan_mode=l3 ipv3
Note: Route(s) to the IPVLAN network would be automatically distributed if the above command had an option to attach the subnets to the parent interface.

The following commands start a container attached to the network and show the network settings as seen by the container:
user@host1:~$ docker run --net=ipv3 -it --rm alpine /bin/sh
/ # ifconfig
eth0 Link encap:Ethernet HWaddr 08:00:27:70:FA:B5
inet addr:172.16.134.2 Bcast:0.0.0.0 Mask:255.255.255.0
inet6 addr: fe80::a00:27ff:fe70:fab5%32582/64 Scope:Link
UP BROADCAST RUNNING NOARP MULTICAST MTU:1500 Metric:1
RX packets:0 errors:0 dropped:0 overruns:0 frame:0
TX packets:0 errors:0 dropped:2 overruns:0 carrier:0
collisions:0 txqueuelen:0
RX bytes:0 (0.0 B) TX bytes:0 (0.0 B)

lo Link encap:Local Loopback
inet addr:127.0.0.1 Mask:255.0.0.0
inet6 addr: ::1%32582/128 Scope:Host
UP LOOPBACK RUNNING MTU:65536 Metric:1
RX packets:0 errors:0 dropped:0 overruns:0 frame:0
TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1
RX bytes:0 (0.0 B) TX bytes:0 (0.0 B)

/ # ip route
default dev eth0
172.16.134.0/24 dev eth0 src 172.16.134.2
Connectivity between containers attached to the same IPVLAN can be verified by starting a second container and performing a ping test:
user@host1:~$ docker run --net=ipv3 -it --rm alpine /bin/sh
/ # ping 172.16.134.2
PING 172.16.134.2 (172.16.134.2): 56 data bytes
64 bytes from 172.16.134.2: seq=0 ttl=64 time=0.085 ms
64 bytes from 172.16.134.2: seq=1 ttl=64 time=0.077 ms
Unfortunately, connecting containers to the IPVLAN network interferes with the auto-configured IPv6 link local BGP connection between host1 and leaf1, which results in host1 being disconnected from the leaf and spine fabric, and losing connectivity to host2. Assigning static IPv4 addresses to leaf1 and host1 complicates the configuration but solves the problem. For example, here is the Quagga BGP configuration on host1:
router bgp 65134
bgp router-id 192.168.0.134
redistribute connected route-map NON-MGMT
neighbor 192.168.1.130 remote-as 65130
Now it is possible to connect to the containers remotely from host2:
user@host2:~$ ping 172.16.134.2
PING 172.16.134.2 (172.16.134.2) 56(84) bytes of data.
64 bytes from 172.16.134.2: icmp_seq=1 ttl=61 time=2.72 ms
64 bytes from 172.16.134.2: icmp_seq=2 ttl=61 time=2.79 ms
Now that we have end to end connectivity, how can we monitor traffic?
The following instructions show how to install the open source Host sFlow agent on the switches and Docker hosts:
The sFlow-RT real-time analytics software can be installed on a host, or spun up in a container using a simple Dockerfile:
FROM   centos:centos6
RUN yum install -y wget
RUN yum install -y java-1.7.0-openjdk
RUN wget http://www.inmon.com/products/sFlow-RT/sflow-rt.tar.gz
RUN tar -xzf sflow-rt.tar.gz
EXPOSE 8008 6343/udp
CMD ./sflow-rt/start.sh
Writing Applications provides an overview of sFlow-RT's REST API. The following Python script demonstrates end to end visibililty by reporting the specific path that a large "Elephant" flow takes across the leaf and spine fabric:
#!/usr/bin/env python
import requests
import json

rt = 'http://127.0.0.1:8008'

flow = {'keys':'node:inputifindex,ipsource,ipdestination,ipttl','value':'bytes'}
requests.put(rt+'/flow/elephant/json',data=json.dumps(flow))

threshold = {'metric':'elephant','value':100000000/8,'byFlow':True,'timeout':600}
requests.put(rt+'/threshold/elephant/json',data=json.dumps(threshold))

eventurl = rt+'/events/json?thresholdID=elephant&maxEvents=10&timeout=60'
eventID = -1
while 1 == 1:
r = requests.get(eventurl + "&eventID=" + str(eventID))
if r.status_code != 200: break
events = r.json()
if len(events) == 0: continue

eventID = events[0]["eventID"]
events.reverse()
for e in events:
print e['flowKey']
Running the script and generating a large flow from host1 (172.16.135.1) to an IPVLAN connected container on host2 (172.16.134.2) gives the following output:
$ ./elephant.py 
spine2,172.16.135.1,172.16.134.2,62
leaf1,172.16.135.1,172.16.134.2,61
host2,172.16.135.1,172.16.134.2,64
leaf2,172.16.135.1,172.16.134.2,63
host1,172.16.135.1,172.16.134.2,61
The last field is the IP TTL (time to live). Ordering the events by TTL shows the path across the network (since the TTL is decremented by each switch along the path):
host2 -> leaf2 -> spine2 -> leaf1 -> host1

Monitoring resources on the physical switches is critical to large scale IPVLAN deployments since the number of routes must be kept within the table sizes supported by the hardware. Broadcom ASIC table utilization metrics, DevOps, and SDN describes how sFlow exports hardware routing table metrics.

The use of Cumulus Linux greatly simplified the configuration of the ECMP fabric and allowed a common set of routing, monitoring, and orchestration software to be used for both hosts and switches. A common set of tools dramatically simplifies the task of configuring, testing, and managing end to end Docker networking.

Finally, end to end IP routing using the new Docker IPVLAN networking plugin is very promising. Eliminating overlays and bridging improves scaleability, reduces operational complexity, and facilitates automation using a mature routing control plane (BGP).

Merchant silicon based routing, flow analytics, and telemetry

$
0
0
Drivers for growth describes how switches built on merchant silicon from Broadcom ASICs dominate the current generation of data center switches, reduce hardware costs, and support an open ecosystem of switch operating systems (Cumulus Linux, OpenSwitch, Dell OS10, Broadcom FASTPATH, Pica8 PicOS, Open Network Linux, etc.).

The router market is poised to be similarly disrupted with the introduction of devices based on Broadcom's Jericho ASIC, which has the capacity to handle over 1 million routes in hardware (the full Internet routing table is currently around 600,000 routes).
An edge router is a very pricey box indeed, often costing anywhere from $100,000 to $200,000 per 100 Gb/sec port, depending on features in the router and not including optical cables that are also terribly expensive. Moreover, these routers might only be able to cram 80 ports into a half rack or full rack of space. The 7500R universal spine and 7280R universal leaf switches cost on the order of $3,000 per 100 Gb/sec port, and they are considerably denser and less expensive. - Leaving Fixed Function Switches Behind For Universal Leafs
Broadcom Jericho ASICs are currently available in Arista 7500R/7280R routers and in Cisco NCS 5000 series routers. Expect further disruption to the router market when white box versions of the 1U router hardware enter the market.
There was general enthusiasm for Broadcom Jericho based routers in a recent discussion on the North American Network Operators' Group (NANOG) mailing list, Arista Routing Solutions, so merchant silicon based routers should be expected to sell well.
The Broadcom Jericho ASICs also include hardware instrumentation to support industry standard sFlow traffic monitoring and streaming telemetry. For example, the following commands enable sFlow on all ports on an Arista router:
sflow source 170.1.1.1
sflow destination 170.1.1.11
sflow polling-interval 30
sflow sample 65535
sflow run
See EOS System Configuration Guide for details.

While Cisco supports standard sFlow on merchant silicon based switch platforms, see Cisco adds sFlow support, Cisco adds sFlow support to Nexus 9K series, and Cisco SF250, SG250, SF350, SG350, SG350XG, and SG550XG series switches. Unfortunately, IOS XR on Cisco's Jericho based routers doesn't yet support sFlow. Instead, a complex set of commands is required to configure Cisco's proprietary NetFlow and streaming telemetry protocols:
RP/0/RP0/CPU0:router#config
RP/0/RP0/CPU0:router(config)#flow exporter-map exp1
RP/0/RP0/CPU0:router(config-fem)#version v9
RP/0/RP0/CPU0:router(config-fem-ver)#options interface-table timeout 300
RP/0/RP0/CPU0:router(config-fem-ver)#options sampler-table timeout 300
RP/0/RP0/CPU0:router(config-fem-ver)#template data timeout 300
RP/0/RP0/CPU0:router(config-fem-ver)#template options timeout 300
RP/0/RP0/CPU0:router(config-fem-ver)#exit
RP/0/RP0/CPU0:router(config-fem)#transport udp 12515
RP/0/RP0/CPU0:router(config-fem)#source Loopback0
RP/0/RP0/CPU0:router(config-fem)#destination 170.1.1.11
RP/0/RP0/CPU0:router(config-fmm)#exit
RP/0/RP0/CPU0:router(config)#flow monitor-map MPLS-IPv6-fmm
RP/0/RP0/CPU0:router(config-fmm)#record mpls ipv6-fields labels 3
RP/0/RP0/CPU0:router(config-fmm)#exporter exp1
RP/0/RP0/CPU0:router(config-fmm)#cache entries 10000
RP/0/RP0/CPU0:router(config-fmm)#cache permanent
RP/0/RP0/CPU0:router(config-fmm)#exit
RP/0/RP0/CPU0:router(config)#sampler-map FSM
RP/0/RP0/CPU0:router(config-sm)#random 1 out-of 65535
RP/0/RP0/CPU0:router(config-sm)# exit
And further commands are needed to enable monitoring on each interface (and there can be a large number of interfaces given the high port density of these routers):
RP/0/RP0/CPU0:router(config)#interface HundredGigE 0/3/0/0
RP/0/RP0/CPU0:router(config-if)#flow mpls monitor MPLS-IPv6-fmm sampler FSM ingress
See Netflow Configuration Guide for Cisco NCS 5500 Series Routers, IOS XR Release 6.0.x for configuration details and limitations.

We are still not done, further steps are required to enable the equivalent to sFlow's streaming telemetry.

Create policy file defining the counters to export:
{
"Name": "Test",
"Metadata": {
"Version": 25,
"Description": "This is a sample policy",
"Comment": "This is the first draft",
"Identifier": "data that may be sent by the encoder to the mgmt stn"
},
"CollectionGroups": {
"FirstGroup": {
"Period": 30,
"Paths": [
"RootOper.InfraStatistics.Interface(*).Latest.GenericCounters"
]
}
}
}
Copy the policy file to router:
$ scp Test.policy cisco@170.1.1.1:/telemetry/policies
Finally, configure the JSON encoder:
Router# configure
Router(config)#telemetry encoder json
Router(config-telemetry-json)#policy group FirstGroup
Router(config-policy-group)#policy Test
Router(config-policy-group)#destination ipv4 170.1.1.11 port 5555
Router(config-policy-group)#commit
See Cisco IOS XR Telemetry Configuration Guide for details.
Software defined analytics describes how the sFlow architecture disaggregates the flow analytics pipeline and integrates telemetry export to reduce complexity and increase flexibility. The reduced configuration complexity is clearly illustrated by the two configuration examples above.

Unlike the complex and disparate monitoring mechanisms in IOS XR, sFlow offers a simple, flexible and unified monitoring solution that exposes the full monitoring capabilities of the Broadcom Jericho ASIC. Expect a future release of IOS XR to add the sFlow support since sFlow a natural fit for the hardware capabilities of Jericho based router platforms and the addition of sFlow support will provide feature parity with Cisco's merchant silicon based switches.

Finally, the real-time visibility provided by sFlow supports a number of important use cases for high performance routers, including:
  • DDoS mitigation
  • Load balancing ECMP paths
  • BGP route analytics
  • Traffic engineering
  • Usage based accounting
  • Enforcing usage quotas

Programmable hardware: Barefoot Networks, PISA, and P4

$
0
0
Barefoot Networks recently came out of stealth to reveal their  Tofino 6.5Tbit/second (65 X 100GE or 260 X 25GE) fully user-programmable switch. The diagram above, from the talk Programming The Network Data Plane by Changhoon Kim of Barefoot Networks, shows the Protocol Independent Switch Architecture (PISA) of the programmable switch silicon.
A logical switch data-plane described in the P4 language is compiled to program the general purpose PISA hardware. For example, the following P4 code snippet is part of a P4 sFlow implementation:
table sflow_ing_take_sample {
/* take_sample > MAX_VAL_31 and valid sflow_session_id => take the sample */
reads {
ingress_metadata.sflow_take_sample : ternary;
sflow_metadata.sflow_session_id : exact;
}
actions {
nop;
sflow_ing_pkt_to_cpu;
}
}
Network visibility is one of the major use cases for P4 based switches. Improving Network Monitoring and Management with Programmable Data Planes describes how P4 can be used to collect information about latency and queueing in the switch forwarding pipeline.
The document also describes an architecture for In-band Network Telemetry (INT) in which the ingress switch is programmed to insert a header containing measurements to packets entering the network. Each switch in the path is programmed to append additional measurements to the packet header. The egress switch is programmed to remove the header so that the packet can be delivered to its destination. The egress switch is responsible for processing the measurements or sending them on to analytics software.

In-band telemetry is an interesting example of the flexibility provided by P4 programmable hardware and the detailed information that can be gathered about latency and queueing from the hardware forwarding pipeline. However, there are practical issues that should be considered with this approach:
  1. Transporting measurement headers is complex and different encapsulations are each transport protocol:  Geneve, VxLAN, etc.
  2. Addition of headers increases the size of packets and risks causing traffic to be dropped downstream due to maximum transmission unit (MTU) restrictions.
  3. The number of measurements that can be added by each switch and the number of switches adding measurements in the path needs to be limited.
  4. In-band telemetry cannot be incrementally deployed. Ideally, all devices need to participate, or at a minimum, the ingress and egress devices need to be in-band telemetry aware.
  5. In-band telemetry transports data from the data plane to the control/management planes, providing a potential attack surface that could be exploited by crafting malicious packets with fake measurement headers.
The sFlow architecture provides an out of band alternative for transporting the per packet forwarding plane measurements defined by INT. Instead of adding the measurements to the egress packet, measurements can be attached as metadata to the sampled packets that are handed to the switch CPU. The sFlow agent immediately forwards the additional packet metadata as part of the standard sFlow telemetry stream to a centralized collector. Using sFlow as the telemetry transport has a number of benefits:
  1. Simple to deploy since there is no modification of packets (no issues with encapsulations, MTU, number of measurements, path length, incremental deployment, etc.)
  2. Extensibility of sFlow protocol allows additional forwarding plane measurements to augment standard sFlow measurements, fully integrating the new measurements with sFlow data exported from other switches in the network (sFlow is supported by over 40 switch vendors and is a standard feature of switch ASICs).
  3. sFlow's is a unidirectional telemetry transport protocol originates from the device management plane, can be sent out of band, limiting possible attack surfaces.
The great thing about programmable hardware is that behavior can be modified by changing the software. Implementing out of band telemetry is a matter of combining measurements from the P4 INT code with the P4 sFlow agent code. Compiling and installing out of band sFlow telemetry code reprograms the hardware to implement the new scheme.

The advent of P4 and programmable hardware opens up exciting possibilities for defining additional packet sample metadata, counters, and gauges to augment the sFlow telemetry stream and gain additional insight into the performance of production traffic in large scale, high capacity, networks.

The real-time sFlow streaming telemetry can be used to drive automated controls, for example, to block DDoS attacks, or load balance large "Elephant" flows across multiple paths. Here again P4 combined with programmable hardware makes it possible to create additional control capabilities, for example, to be able to block the large numbers of source addresses involved in a DDoS attack, where sFlow analytics would be used to identify the attackers and their points of ingress and program each of the switches with filters based on their location in the network. The ability to customize the hardware to address specific tasks makes more efficient use of hardware resources than is possible with fixed function device. In this case, defining a specialized DDoS drop table would allow for a much larger number of filters than would be possible with a general purpose ACL table.

Cisco Tetration analytics

$
0
0
Cisco Tetration Analytics: the most Comprehensive Data Center Visibility and Analysis in Real Time, at Scale, June 15, 2016, announced the new Cisco Tetration Analytics platform. The platform collects telemetry from proprietary agents on servers and embedded in hardware on certain Nexus 9k switches, analyzed the data, and presents results via Web GUI, REST API, and as events.

Cisco Tetration Analytics Data Sheet describes the hardware requirements:
Platform Hardware
Quantity
Cisco Tetration Analytics computing nodes (servers)
16
Cisco Tetration Analytics base nodes (servers)
12
Cisco Tetration Analytics serving nodes (servers)
8
Cisco Nexus 9372PX Switches
3

And the power requirements:
Property
Cisco Tetration Analytics Platform
Peak power for Cisco Tetration Analytics Platform (39-RU single-rack option)
22.5 kW
Peak power for Cisco Tetration Analytics Platform (39-RU dual-rack option)
11.25 kW per rack (22.5 KW Total)

No pricing is given, but based on the hardware, data center space, power and cooling requirements, this brute force approach to analytics will be reassuringly expensive to purchase and operate.
A much less expensive alternative is to use industry standard sFlow agents embedded in Cisco Nexus 9k/3k switches and in switches from over 40 other vendors. The open source Host sFlow agent extends visibility to servers and applications by streaming telemetry from Linux, Windows, FreeBSD, Solaris, and AIX operating system, hypervisors, Docker containers, web servers (Apache, NGINX, Tomcat, HAproxy) and Java application servers.

The diagram shows how the sFlow-RT real-time analytics engine receives a continuous telemetry stream from sFlow instrumentation build into network, server and application infrastructure and delivers analytics through APIs and can easily be integrated with a wide variety of on-site and cloud, orchestration, DevOps and Software Defined Networking (SDN) tools.

Minimizing cost of visibility describes why lightweight monitoring is critical to realizing the value that telemetry can bring to improving operational efficiency. In the case of the sFlow based solution, the critical data path instrumentation is built into the switch ASICs and in the Linux kernel, ensuring that there is negligible impact on operational performance.

The sFlow-RT analytics software shown in the diagram provides real-time (sub second) visibility for 5,000 unique end points (Virtual Machines or Bare metal server), the upper limit of scaleability in the Tetration data sheet, using a single virtual machine or Docker container with 4 GBytes of RAM and 4 CPU cores. With additional memory and CPU the solution easily scales to 100,000 unique end points.
How can sFlow provide real-time visibility at scale and consume so few resources? Shrink ray describes how advanced statistical techniques are used to select and analyze measurements that capture the essential features of network and system performance. A statistical approach yields fast, accurate answers, while minimizing the resources required to measure, transport and analyze the data.
The sFlow-RT analytics platform was selected as an example because of the overlap in capabilities with the Cisco Tetration analytics platform. However, sFlow is non-proprietary and there are many other open source and commercial sFlow analytics solutions listed on sFlow.org.

The Cisco press release states, "Available in July 2016, the first Tetration platform will be a full rack appliance that is deployed on-premise at the customer’s data center." On the other hand, the sFlow based solution described here is available today and can be installed and running in minutes on a virtual machine or in a Docker container.

Configuring OpenSwitch

$
0
0
The following configuration enables sFlow monitoring of all interfaces on a white box switch running the OpenSwitch operating system, sampling packets at 1-in-4096, polling counters every 20 seconds and sending the sFlow to an analyzer (10.0.0.50) on UDP port 6343 (the default sFlow port):
switch(config)# sflow collector 10.0.0.50
switch(config)# sflow sampling 4096
switch(config)# sflow polling 20
switch(config)# sflow enable
A previous posting discussed the selection of sampling rates.  Additional information can be found in the OpenSwitch sFlow User Guide.

See Trying out sFlow for suggestions on getting started with sFlow monitoring and reporting.

Real-time BGP route analytics

$
0
0
The diagram shows how sFlow-RT real-time analytics software can combine BGP route information and sFlow telemetry to generate route analytics. Merging sFlow traffic with BGP route data significantly enhances both data streams:
  1. sFlow real-time traffic data identifies active BGP routes
  2. BGP path attributes are available in flow definitions
The following example demonstrates how to configure sFlow / BGP route analytics. In this example, the switch IP address is 10.0.0.253, the router IP address is 10.0.0.254, and the sFlow-RT address is 10.0.0.162.

Setup

First download sFlow-RT. Next create a configuration file, bgp.js, in the sFlow-RT home directory with the following contents:
var reflectorIP  = '10.0.0.254';
var myAS = '65162';
var myID = '10.0.0.162';
var sFlowAgentIP = '10.0.0.253';

// allow BGP connection from reflectorIP
bgpAddNeighbor(reflectorIP,myAS,myID);

// direct sFlow from sFlowAgentIP to reflectorIP routing table
// calculate a 60 second moving average byte rate for each route
bgpAddSource(sFlowAgentIP,reflectorIP,60,'bytes');
The following sFlow-RT System Properties load the configuration file and enable BGP:
  • script.file=bgp.js
  • bgp.start=yes
Start sFlow-RT and the following log lines will confirm that BGP has been enabled and configured:
$ ./start.sh 
2016-06-28T13:14:34-0700 INFO: Listening, BGP port 1179
2016-06-28T13:14:35-0700 INFO: Listening, sFlow port 6343
2016-06-28T13:14:35-0700 INFO: Starting the Jetty [HTTP/1.1] server on port 8008
2016-06-28T13:14:35-0700 INFO: Starting com.sflow.rt.rest.SFlowApplication application
2016-06-28T13:14:35-0700 INFO: Listening, http://localhost:8008
2016-06-28T13:14:36-0700 INFO: bgp.js started
2016-06-28T13:14:36-0700 INFO: bgp.js stopped
Configure the switch (10.0.0.253) to send sFlow to the sFlow-RT instance(10.0.0.162), see Switch configurations for vendor specific configurations. Check the sFlow-RT /agents/html page to verify that sFlow telemetry is being received from the agent.

Next, configure the router (10.0.0.254) to reflect BGP routes to the sFlow-RT instance (10.0.0.162):
router bgp 65254
bgp router-id 10.0.0.254
neighbor 10.0.0.162 remote-as 65162
neighbor 10.0.0.162 port 1179
neighbor 10.0.0.162 timers connect 30
neighbor 10.0.0.162 route-reflector-client
neighbor 10.0.0.162 activate
The following sFlow-RT log entry confirms that a BGP session has been established:
2016-06-28T13:20:17-0700 INFO: BGP open 10.0.0.254 53975

Query active routes

The following cURL command uses the REST API to identify the top 5 IPv4 prefixes ranked by traffic (measured in bytes/second):
curl "http://10.0.0.162:8008/bgp/topprefixes/10.0.0.254/json?maxPrefixes=5
{
"as": 65254,
"direction": "destination",
"id": "10.0.0.254",
"learnedPrefixesAdded": 691838,
"learnedPrefixesRemoved": 0,
"nPrefixes": 691838,
"pushedPrefixesAdded": 0,
"pushedPrefixesRemoved": 0,
"startTime": 1467322582093,
"state": "established",
"topPrefixes": [
{
"aspath": "NNNN-NNNN-NNNNN-NNNNN",
"localpref": 100,
"med": 1,
"nexthop": "NNN.NNN.NNN.N",
"origin": "IGP",
"prefix": "NN.NNN.NN.0/24",
"value": 9.735462342126082E7
},
{
"aspath": "NNN-NNNN",
"localpref": 100,
"med": 1,
"nexthop": "NNN.NNN.NNN.N",
"origin": "IGP",
"prefix": "NN.NNN.NNN.0/24",
"value": 7.347515546153101E7
},
{
"aspath": "NNNN-NNNNNN-NNNNN",
"localpref": 100,
"med": 1,
"nexthop": "NNN.NNN.NNN.N",
"origin": "IGP",
"prefix": "NN.NNN.NN.N/24",
"value": 4.26137765317916E7
},
{
"aspath": "NNNN-NNNN-NNNN",
"localpref": 100,
"med": 1,
"nexthop": "NNN.NNN.NNN.N",
"origin": "IGP",
"prefix": "NNN.NN.NNN.0/24",
"value": 2.6633190792947102E7
},
{
"aspath": "NNNN-NNN-NNNNN",
"localpref": 100,
"med": 10001,
"nexthop": "NNN.NNN.NNN.NN",
"origin": "IGP",
"prefix": "NN.NNN.NNN.0/24",
"value": 1.5500941476103483E7
}
],
"valuePercentCoverage": 71.38452058755995,
"valueTopPrefixes": 2.55577687683634E8,
"valueTotal": 3.5802956380458355E8

}
In addition to returning the top prefixes, the query returns information about the amount of traffic covered by these prefixes. In this case, the valuePercentageCoverage of 71.38 indicates that 71.38% of the traffic is covered by the top 5 prefixes.
Note: Identifying numeric digits have been substituted with the letter N to protect privacy.
Additional arguments can be used to refine the top prefixes query:
  • maxPrefixes, maximum number of prefixes in the result 
  • minValue, only include entries with a value greater than the threshold
  • direction, specify "ingress" for traffic arriving from remote networks and "egress" for traffic destined for remote networks
  • minPrefix, exclude shorter prefixes, e.g. minPrefix=1 would exclude 0.0.0.0/0.
  • includeCovered, set to "true" to also include prefixes that are covered by the top prefix, but wouldn't otherwise make the list. For example, if 10.1.0.0/16 was included, then 10.1.3.0/24 would also be included if it were in the set of prefixes advertised by the router.
  • pruneCovered, set to "true" to eliminate covered prefixes that share the same next hop.
IPv6 prefixes an be queried using /bgp/topprefixes6/{router}/json, which takes the same arguments as the topprefixes query shown above.

Writing Applications, describes how to build analytics driven controller applications using sFlow-RT's REST and embedded JavaScript APIs. For example, SDN router using merchant silicon top of rack switchWhite box Internet router PoC, and Active Route Manager demonstrate how real-time identification of active routes can be used to efficiently manage limited hardware resources in commodity white box switches in order to handle a full Internet routing table of over 600,000 routes.

Defining Flows

The following flow attributes learned from the BGP session are merged with sFlow data received from switch 10.0.0.253:
  • ipsourcemaskbits
  • ipdestinationmaskbits
  • bgpnexthop
  • bgpnexthop6
  • bgpas
  • bgpsourceas
  • bgpsourcepeeras
  • bgpdestinationas
  • bgpdestinationpeeras
  • bgpdestinationaspath
  • bgpcommunities
  • bgplocalpref
The sFlow-RT /flowkeys/html page can be queried to verify that the attributes have been merged and to see the full set of attributes that are available from the sFlow feed.

Writing Applications describes how to program sFlow-RT flow caches, using the flow keys to select and identify traffic flows. For example, the following Python script uses the REST API to identify the source networks associated with a UDP amplification DDoS attack:
#!/usr/bin/env python
import requests
import json

// DNS port
reflector_port = '53'
max_pps = 100000

rest = 'http://localhost:8008'

# define flow
flow = {'keys':'mask:ipsource,bgpsourceas',
'filter':'udpsourceport='+reflector_port,
'value':'frames'}
requests.put(rest+'/flow/ddos/json',data=json.dumps(flow))

# set threshold
threshold = {'metric':'ddos', 'value': max_pps, 'byFlow':True}
requests.put(rest+'/threshold/ddos/json',data=json.dumps(threshold))

# tail even log
eventurl = rest+'/events/json?thresholdID=ddos&maxEvents=10&timeout=60'
eventID = -1
while 1 == 1:
r = requests.get(eventurl + "&eventID=" + str(eventID))
if r.status_code != 200: break
events = r.json()
if len(events) == 0: continue

eventID = events[0]["eventID"]
events.reverse()
for e in events:
print e['flowKey']
Running the script generates a log of the source network and AS number that exceed 100,000 packets per second of DNS response traffic (again, identifying numeric digits have been substituted with the letter N to protect privacy):
$ ./ddos.py 
NNN.NNN.0.0/13,NNNN
NNN.NNN.NNN.NNN/27,NNNN
NNN.NN.NNN.NNN/28,NNNNN
NNN.NNN.NN.0/24,NNNNN
A variation on the script can be used to identify large "Elephant" flows and their destination AS paths (showing the list of networks that packets traverse en route to their destination):
#!/usr/bin/env python
import requests
import json

max_Bps = 1000000000/8

rest = 'http://localhost:8009'

# define flow
flow = {
'keys':'ipsource,ipdestination,tcpsourceport,tcpdestinationport,bgpdestinationaspath',
'value':'bytes'}
requests.put(rest+'/flow/elephant/json',data=json.dumps(flow))

# set threshold
threshold = {'metric':'elephant', 'value': max_Bps, 'byFlow':True}
requests.put(rest+'/threshold/elephant/json',data=json.dumps(threshold))

# tail even log
eventurl = rest+'/events/json?thresholdID=elephant&maxEvents=10&timeout=60'
eventID = -1
while 1 == 1:
r = requests.get(eventurl + "&eventID=" + str(eventID))
if r.status_code != 200: break
events = r.json()
if len(events) == 0: continue

eventID = events[0]["eventID"]
events.reverse()
for e in events:
print e['flowKey']
Running the script generates real-time notification of the Elephant flows (flows exceeding 1Gbit/s) along with their destination AS paths:
$ ./elephant.py 
NNN.NN.NN.NNN,NNN.NNN.NN.NN,60789,25,NNNNN
NNN.NN.NNN.NN,NNN.NN.NN.NNN,443,38016,NNNNN-NNNNN-NNNNN-NNNNN
NN.NNN.NNN.NNN,NNN.NNN.NN.NN,37030,10059,NNNN-NNN-NNNN
NNN.NN.NN.NNN,NN.NN.NNN.NNN,34611,25,NNNN
SDN and large flows describes how a small number of Elephant flows typically consume most of the bandwidth, even though they are greatly outnumbered by small (Mice) flows. Dynamic policy based routing can targeted at Elephant flows to significantly improve performance and manage network resources: Leaf and spine traffic engineering using segment routing and SDN and WAN optimization using real-time traffic analytics are two examples.
Finally, the real-time BGP analytics don't exist in isolation. The diagram shows how the sFlow-RT real-time analytics engine receives a continuous telemetry stream from sFlow instrumentation build into network, server and application infrastructure and delivers analytics through APIs and can easily be integrated with a wide variety of on-site and cloud, orchestration, DevOps and Software Defined Networking (SDN) tools.

Network, host, and application monitoring for Amazon EC2

$
0
0
Microservices describes how visibility into network traffic is the key to monitoring, managing and securing applications that are composed of large numbers of communicating services running in virtual machines or containers.

Amazon Virtual Private Cloud (VPC) Flow Logs can be used to monitor network traffic:
However, there are limitations on the types of traffic that are logged, a 10-15 minute delay in accessing flow records, and costs associated with using VPC and storing the logs in CloudWatch (currently $0.50 per GB ingested, $0.03 per GB archived per month, and possible addition Data Transfer OUT charges).

In addition, collecting basic host metrics at 1 minute granularity using CloudWatch is an additional $3.50 per instance per month.

The open source Host sFlow agent offers an alternative:
  1. Lightweight, requiring minimal CPU and memory on EC2 instances.
  2. Real-time, up to the second network visibility
  3. Efficient, export of extensive set of host metrics every 10-60 seconds (configurable).
This article will demonstrate how to install Host sFlow on an Amazon Linux instance:
$ cat /etc/issue
Amazon Linux AMI release 2016.03
The following commands build the latest version of the Host sFlow agent from sources:
yum install libcap-devel libpcap-devel
git clone https://github.com/sflow/host-sflow
cd host-sflow
make
sudo make install
You can also make an RPM package (make rpm) so that the Host sFlow agent can be installed on additional EC2 instances without compiling.

Edit the Host sFlow configuration file, /etc/hsflowd.conf, to specify an sFlow collector, sampling rate, polling interval, and interface(s) to monitor:
sflow {
agent=eth0
DNSSD=off
polling=20
sampling=400
collector { ip = 10.117.46.49 }
pcap { dev=eth0 }
}
Note: The same configuration file can be used for all EC2 instances.

Finally, start the Host sFlow daemon:
sudo service hsflowd start
The above steps are easily automated using Puppet, Chef, Ansible, etc. to deploy Host sFlow agents on all your EC2 instances.

There are a variety of open source and commercial software packages listed on sFlow.org that can be used to analyze and the telemetry stream. The sFlow-RT analyzer has APIs that provide similar functionality to the Amazon VPC and CloudWatch APIs, but with sub-second response times.
The diagram shows how the sFlow-RT real-time analytics engine receives a continuous telemetry stream from sFlow instrumentation build into network, server and application infrastructure and delivers analytics through APIs and can easily be integrated with a wide variety of on-site and cloud, orchestration, DevOps and Software Defined Networking (SDN) tools.

Download and install sFlow-RT in an EC2 instance. The following articles provide examples of integrations:
Industry standard sFlow is easily deployed, highly scaleable, and provides a low cost, low latency, alternative to Amazon VPC flow logging for gaining visibility into EC2 microservice deployments. Using sFlow for visibility allows a common monitoring technology to be used in public, private and hybrid cloud deployments, and to extend visibility into physical and virtual networks.

Internet router using merchant silicon

$
0
0
SDN router using merchant silicon top of rack switch and Dell OS10 SDN router demo discuss how an inexpensive white box switch running Linux can be used to replace a much costlier Internet router. The key to this solution is the observation that, while the full Internet routing table of over 600,000 routes is too large to fit in white box switch hardware, only a small fraction of the routes carry most of the traffic. Traffic analytics allows the active routes to be identified and installed in the hardware.

This article describes a simple self contained solution that uses standard APIs and should be able to run on a variety of Linux based network operating systems, including: Cumulus Linux, Dell OS10, Arista EOS, and Cisco NX-OS. The distinguishing feature of this solution is its real-time response, where previous solutions respond to changes in traffic within minutes or hours, this solution updates hardware routes within seconds.

The diagram shows the elements of the solution. Standard sFlow instrumentation embedded in the merchant silicon ASIC data plane in the white box switch provides real-time information on traffic flowing through the switch. The sFlow agent is configured to send the sFlow to and instance of sFlow-RT running on the switch. The Bird routing daemon is used to handle the BGP peering sessions and to install routes in the Linux kernel using the standard netlink interface. The network operating system in turn programs the switch ASIC with the kernel routes so that packets are forwarded by the switch hardware and not by the kernel software.

The key to this solution is Bird's multi-table capabilities. The full Internet routing table learned from BGP peers is installed in a user space table that is not reflected into the kernel. A BGP route reflector session between sFlow-RT and Bird allows sFlow-RT to see the full routing table and combine it with the sFlow telemetry to perform real-time BGP route analytics to identify the currently active routes. A second BGP session allows sFlow-RT to push routes to Bird which are in turn pushes the active routes to the kernel, programming the ASIC.

In this example, the following Bird configuration, /etc/bird/bird.conf, was installed on the switch:
# Please refer to the documentation in the bird-doc package or BIRD User's
# Guide on http://bird.network.cz/ for more information on configuring BIRD and
# adding routing protocols.

# Change this into your BIRD router ID. It's a world-wide unique identification
# of your router, usually one of router's IPv4 addresses.
router id 10.0.0.136;

# The Kernel protocol is not a real routing protocol. Instead of communicating
# with other routers in the network, it performs synchronization of BIRD's
# routing tables with the OS kernel.
protocol kernel {
learn;
scan time 2;
import all;
export all; # Actually insert routes into the kernel routing table
}

# The Device protocol is not a real routing protocol. It doesn't generate any
# routes and it only serves as a module for getting information about network
# interfaces from the kernel.
protocol device {
scan time 60;
}

# Create a new table (disconnected from kernel/master) for peering routes
table peers;

# Create BGP sessions with peers
protocol bgp peer_65134 {
table peers;
igp table master;
local as 65136;
neighbor 10.0.0.134 as 65134;
import all;
export all;
}

protocol bgp peer_65135 {
table peers;
igp table master;
local as 65136;
neighbor 10.0.0.135 as 65135;
import all;
export all;
}

# Copy default route from peers table to master table
protocol pipe {
table peers;
peer table master;
import none;
export filter {
if net ~ [ 0.0.0.0/0 ] then accept;
reject;
};
}

# Reflect peers table to sFlow-RT
protocol bgp to_sflow_rt {
table peers;
igp table master;
local as 65136;
neighbor 127.0.0.1 port 1179 as 65136;
rr client;
import all;
export all;
}

# Receive active prefixes from sFlow-RT
protocol bgp from_sflow_rt {
local as 65136;
neighbor 10.0.0.136 port 1179 as 65136;
import all;
export none;
}
The open source Active Route Manager (ARM) application has been installed in sFlow-RT and the following sFlow-RT configuration, /usr/local/sflow-rt/conf.d/sflow-rt.conf, enables the BGP route reflector and control sessions with Bird:
bgp.start=yes
arm.reflector.ip=127.0.0.1
arm.reflector.as=65136
arm.reflector.id=0.0.0.1
arm.sflow.ip=10.0.0.136
arm.target.ip = 10.0.0.136
arm.target.as=65136
arm.target.id=0.0.0.2
arm.target.prefixes=10000
Once configured, operation is entirely automatic. As soon as traffic starts flowing to a new route, the route is identified and installed in the ASIC. If the route later becomes inactive, it is automatically removed from the ASIC to be replaced with a different active route. In this case, the maximum number of routes allowed in the ASIC has been specified as 10,000. This number can be changed to reflect the capacity of the hardware.
The Active Route Manager application has a web interface that provides up to the second visibility into the number of routes, routes installed in hardware, amount of traffic, hardware and software resource utilization etc. In addition, the sFlow-RT REST API can be used to make additional queries.

World map

Internet router using Cumulus Linux

$
0
0
Internet router using merchant silicon describes how an inexpensive white box switch running Linux can be used to replace a much costlier Internet router. This article will describe the steps needed to install the software on an x86 based white box switch running Cumulus Linux 3.0.

First, add the Debian Jessie repository:
sudo sh -c 'echo "deb http://ftp.us.debian.org/debian jessie main contrib"> \
/etc/apt/sources.list.d/deb.list'
Next, install Host sFlow, Java, and Bird:
sudo apt-get update
sudo apt-get install hsflowd
sudo apt-get install unzip
sudo apt-get install default-jre-headless
sudo apt-get install bird
Install sFlow-RT (the latest version is available at sFlow-RT.com):
wget http://www.inmon.com/products/sFlow-RT/sflow-rt_2.0-1116.deb
sudo dpkg -i sflow-rt_2.0-1116.deb
Increase the default virtual memory limit for sflowrt (needs to be greater than 1/3 amount of RAM on system to start Java virtual machine, see Giant Bug: Cannot run java with a virtual mem limit (ulimit -v)):
sudo sh -c 'echo "sflowrt soft as 2000000"> \
/etc/security/limits.d/99-sflowrt.conf'
Note: Maximum Java heap memory has a default of 1G and is controlled by settings in /usr/local/sflow-rt/conf.d/sflow-rt.jvm file.

Install the Active Route Manager application:
sudo sh -c "cd /usr/local/sflow-rt; ./get-app.sh sflow-rt active-routes"
Cumulus Networks, sFlow and data center automation describes how to configure the sFlow agent (hsflowd). The sFlow collector address should be set to 127.0.0.1.

Finally, configure Bird and sFlow-RT as described in Internet router using merchant silicon.

The instructions were tested on a Cumulus VX virtual machine, but should work on physical switches. Cumulus VX is free and provides a convenient way to try out Cumulus Linux and create virtual networks to test configurations.

If you are going to experiment with the solution on CumulusVX then the following command is needed to enable sFlow traffic monitoring:
sudo iptables -I FORWARD -j NFLOG --nflog-group 1 --nflog-prefix SFLOW
On physical switches the sFlow agent automatically configures packet sampling in the ASIC and is able to monitor all packets (not just the routed packets captured by the iptables command above).

Network and system analytics as a Docker service

$
0
0
The diagram shows how new and existing cloud based or locally hosted orchestration, operations, and security tools can leverage the sFlow-RT analytics service to gain real-time visibility. Network visibility with Docker describes how to install open source sFlow agents to monitor network activity in a Docker environment in order to gain visibility into Docker Microservices.

The sFlow-RT analytics software is now on Docker Hub, making it easy to deploy real-time sFlow analytics as a Docker service:
docker run -p 8008:8008 -p 6343:6343/udp -d sflow/sflow-rt
Configure standard sFlow Agents to stream telemetry to the analyzer and retrieve analytics using the REST API on port 8008.

Increase memory from default 1G to 2G:
docker run -e "RTMEM=2G" -p 8008:8008 -p 6343:6343/udp -d sflow/sflow-rt
Set System Property to enable country lookups when Defining Flows:
docker run -e "RTPROP=-Dgeo.country=resources/config/GeoIP.dat" -p 8008:8008 -p 6343:6343/udp -d sflow/sflow-rt
Run sFlow-RT Application. Drop the -d option while developing an application to see output of logging commands and use control-c to stop the container.
docker run -v /Users/pp/my-app:/sflow-rt/app/my-app -p 8008:8008 -p 6343:6343/udp -d sflow/sflow-rt
A simple Dockerfile can be used to generate a new image that includes the application:
FROM sflow/sflow-rt:latest
COPY /Users/pp/my-app /sflow-rt/app
Similarly, a Dockerfile can be used to generate a new image from published applications. Any required System Properties can also be set in the Dockerfile.
FROM sflow/sflow-rt:latest
ENV RTPROP="-Dgeo.country=resources/config/GeoIP.dat"
RUN /sflow-rt/get-app.sh sflow-rt top-flows
This solution is extremely scaleable, a single sFlow-RT instance can monitor thousands of servers and the network devices connecting them.

Real-time web analytics

$
0
0
The diagram shows a typical scale out web service with a load balancer distributing requests among a pool of web servers. The sFlow HTTP Structures standard is supported by commercial load balancers, including F5 and A10, and open source load balancers and web servers, including HAProxy, NGINX, Apache, and Tomcat.
The simplest way to try out the examples in this article is to download sFlow-RT and install the Host sFlow agent and Apache mod-sflow instrumentation on a Linux web server.

The following sFlow-RT metrics report request rates based on the standard sFlow HTTP counters:
  • http_method_option
  • http_method_get
  • http_method_head
  • http_method_post
  • http_method_put
  • http_method_delete
  • http_method_trace
  • http_method_connect
  • http_method_other
  • http_status_1xx
  • http_status_2xx
  • http_status_3xx
  • http_status_4xx
  • http_status_5xx
  • http_status_other
  • http_requests
In addition, mod-sflow exports the following standard thread pool metrics:
  • workers_active
  • workers_idle
  • workers_max
  • workers_utilization
  • req_delayed
  • req_dropped
Cluster performance metrics describes how sFlow-RT's REST API is used to compute summary statistics for a pool of servers. For example, the following query calculates the cluster wide total request rates:
http://localhost:8008/metric/ALL/sum:http_method_get,sum:http_method_post/json
More interesting is that the sFlow telemetry stream also includes randomly sampled HTTP request records with the following attributes:
  • protocol
  • serveraddress
  • serveraddress6
  • serverport
  • clientaddress
  • clientaddress6
  • clientport
  • proxyprotocol
  • proxyserveraddress
  • proxyserveraddress6
  • proxyserverport
  • proxyclientaddress
  • proxyclientaddress6
  • proxyclientport
  • httpmethod
  • httpprotocol
  • httphost
  • httpuseragent
  • httpxff
  • httpauthuser
  • httpmimetype
  • httpurl
  • httpreferer
  • httpstatus
  • bytes
  • req_bytes
  • resp_bytes
  • duration
  • requests
The sFlow-RT analytics pipeline is programmable. Defining Flows describes how to compute additional metrics based on the sampled requests. For example, the following flow definition creates a new metric called image_bytes that tracks the volume of image data in HTTP responses as a bytes/second value calculated over a 10 second window:
setFlow('image_bytes', {value:'resp_bytes',t:10,filter:'httpmimetype~image/.*'});
The new metric can be queries in exactly the same way as the counter based metrics above, e.g.:
http://localhost:8008/metric/ALL/sum:image_bytes/json
The uri: function is used to extract parts of the httpurl or httpreferer URL fields. The following attributes can be extracted:
  • normalized
  • scheme
  • user
  • authority
  • host
  • port
  • path
  • file
  • extension
  • query
  • fragment
  • isabsolute
  • isopaque
For example, the following flow definition creates a metric called game_reqs that tracks the requests/second hitting the URL path with prefix /games:
setFlow('games_reqs', {value:'requests',t:10,filter:'uri:httpurl:path~/games/.*'});
Define flow keys to identify slowest requests, most popular URLs, etc. For example, the following definition tracks the top 5 longest duration requests:
setFlow('slow_reqs', {keys:'httpurl',value:'duration',t:10,n:5});
The following query retrieves the result:
$ curl "http://localhost:8008/activeflows/ALL/slow_reqs/json?maxFlows=5"
[
{
"dataSource": "3.80",
"flowN": 1,
"value": 117009.24305622398,
"agent": "10.0.0.150",
"key": "/login.php"
},
{
"dataSource": "3.80",
"flowN": 1,
"value": 7413.476263017302,
"agent": "10.0.0.150",
"key": "/games/animals.php"
},
{
"dataSource": "3.80",
"flowN": 1,
"value": 4486.286259806839,
"agent": "10.0.0.150",
"key": "/games/puzzles.php"
},
{
"dataSource": "3.80",
"flowN": 1,
"value": 2326.33482623333,
"agent": "10.0.0.150",
"key": "/sales/buy.php"
},
{
"dataSource": "3.80",
"flowN": 1,
"value": 276.3486100676183,
"agent": "10.0.0.150",
"key": "/index.php"
}
]
Sampled records are a useful complement to counter based metrics, making it possible to disaggregate counts and identify root causes. For example, suppose a spike in errors is identified through the http_status_4xx or http_status_5xx metrics. The following flow definition breaks out the most frequent failed requests by specific URL and error code:
setFlow('err_reqs', {keys:'httpurl,httpstatus',value:'requests',t:10,n:5,
filter:'range:httpstatus:400=true'});
Finally, the real-time HTTP analytics don't exist in isolation. The diagram shows how the sFlow-RT real-time analytics engine receives a continuous telemetry stream from sFlow instrumentation build into network, server and application infrastructure and delivers analytics through APIs and can easily be integrated with a wide variety of on-site and cloud, orchestration, DevOps and Software Defined Networking (SDN) tools.

Triggered remote packet capture using filtered ERSPAN

$
0
0
Packet brokers are typically deployed as a dedicated network connecting network taps and SPAN/mirror ports to packet analysis applications such as Wireshark, Snort, etc.

Traditional hierarchical network designs were relatively straightforward to monitor using a packet broker since traffic flowed through a small number of core switches and so a small number of taps provided network wide visibility. The move to leaf and spine fabric architectures eliminates the performance bottleneck of core switches to deliver low latency and high bandwidth connectivity to data center applications. However, traditional packet brokers are less attractive since spreading traffic across many links with equal cost multi-path (ECMP) routing means that many more links need to be monitored.

This article will explore how the remote Selective Spanning capability in Cumulus Linux 3.0 combined with industry standard sFlow telemetry embedded in commodity switch hardware provides a cost effective alternative to traditional packet brokers.

Cumulus Linux uses iptables rules to specify packet capture sessions. For example, the following rule forwards packets with source IP 20.0.1.0 and destination IP 20.0.1.2 to a packet analyzer on host 20.0.2.2:
-A FORWARD --in-interface swp+ -s 20.0.0.2 -d 20.0.1.2 -j ERSPAN --src-ip 90.0.0.1 --dst-ip 20.0.2.2
REST API for Cumulus Linux ACLs describes a simple Python wrapper that exposes IP tables through a RESTful API. For example, the following command remotely installs the capture rule on switch 10.0.0.233:
curl -H "Content-Type:application/json" -X PUT --data \
'["[iptables]","-A FORWARD --in-interface swp+ -s 20.0.0.2 -d 20.0.1.2 -j ERSPAN --src-ip 90.0.0.1 --dst-ip 20.0.2.2"]' \
http://10.0.0.233:8080/acl/capture1
The following command deletes the rule:
curl -X DELETE http://10.0.0.233:8080/acl/capture1
Selective Spanning makes it possible to turn every switch and port in the network into a capture device. However, it is import to carefully select which traffic to capture since the aggregate bandwidth of an ECMP fabric is measured in Terabits per second - far more traffic than can be handled by typical packet analyzers.
SDN packet broker describes an analogy for the role that sFlow plays in steering the capture network to that of a finderscope, the small wide-angle telescope used to provide an overview of the sky and guide a telescope to its target. The article goes on to describes some of the benefits of combining sFlow analytics with selective packet capture:
  1. Offload The capture network is a limited resource, both in terms of bandwidth and in the number of flows that can be simultaneously captured.  Offloading as many tasks as possible to the sFlow analyzer frees up resources in the capture network, allowing the resources to be applied where they add most value. A good sFlow analyzer delivers data center wide visibility that can address many traffic accounting, capacity planning and traffic engineering use cases. In addition, many of the packet analysis tools (such as Wireshark) can accept sFlow data directly, further reducing the cases where a full capture is required.
  2. Context Data center wide monitoring using sFlow provides context for triggering packet capture. For example, sFlow monitoring might show an unusual packet size distribution for traffic to a particular service. Queries to the sFlow analyzer can identify the set of switches and ports involved in providing the service and identify a set of attributes that can be used to selectively capture the traffic.
  3. DDoS Certain classes of event such as DDoS flood attacks may be too large for the capture network to handle. DDoS mitigation with Cumulus Linux frees the capture network to focus on identifying more serious application layer attacks.
The diagram at the top of this article shows an example of using sFlow to target selective capture of traffic to blacklisted addresses. In this example sFlow-RT is used to perform real-time sFlow analytics. The following emerging.js script instructs sFlow-RT to download the Emerging Threats blacklist and identify any local hosts that are communicating with addresses in the blacklist. A full packet capture is triggered when a potentially compromised host is detected:
var wireshark = '10.0.0.70';
var idx=0;
function capture(localIP,remoteIP,agent) {
var acl = [
'[iptables]',
'# emerging threat capture',
'-A FORWARD --in-interface swp+ -s '+localIP+' -d '+remoteIP
+' -j ERSPAN --src-ip '+agent+' --dst-ip '+wireshark,
'-A FORWARD --in-interface swp+ -s '+remoteIP+' -d '+localIP
+' -j ERSPAN --src-ip '+agent+' --dst-ip '+wireshark
];
var id = 'emrg'+idx++;
logWarning('capturing '+localIP+' rule '+id+' on '+agent);
http('http://'+agent+':8080/acl/'+id,
'PUT','application/json',JSON.stringify(acl));
}

var groups = {};
function loadGroup(name,url) {
try {
var res, cidrs = [], str = http(url);
var reg = /^(\d{1,3}\.){3}\d{1,3}(\/\d{1,2})?$/mg;
while((res = reg.exec(str)) != null) cidrs.push(res[0]);
if(cidrs.length > 0) groups[name]=cidrs;
} catch(e) {
logWarning("failed to load " + url + ", " + e);
}
}

loadGroup('compromised',
'https://rules.emergingthreats.net/blockrules/compromised-ips.txt');
loadGroup('block',
'https://rules.emergingthreats.net/fwrules/emerging-Block-IPs.txt');
setGroups('emerging',groups);

setFlow('emerging',
{keys:'ipsource,ipdestination,group:ipdestination:emerging',value:'frames',
log:true,flowStart:true});

setFlowHandler(function(rec) {
var [localIP,remoteIP,group] = rec.flowKeys.split(',');
try { capture(localIP,remoteIP,rec.agent); }
catch(e) { logWarning("failed to capture " + e); }
});
Some comments about the script:
  1. The script uses sFlow telemetry to identify the potentially compromised host and the location (agent) observing the traffic.
  2. The location information is required so that the capture rule can be installed on a switch that is in the traffic path.
  3. The application has been simplified for clarity. In production, the blacklist information would be periodically updated and the capture sessions would be tracked so that they can be deleted when they they are no longer required.
  4. Writing Applications provides an introduction to sFlow-RT's API.
Configure sFlow on the Cumulus switches to stream telemetry to a host running Docker. Next, log into the host and run the following command in a directory containing the emerging.js script:
docker run -v "$PWD/emerging.js":/sflow-rt/emerging.js \
-e "RTPROP=-Dscript.file=emerging.js" -p 6343:6343/udp sflow/sflow-rt
Note: Deploying analytics as a Docker service is a convenient method of packaging and running sFlow-RT. However, you can also download and install sFlow-RT as a package.

Once the software is running, you should see output similar to the following:
2016-09-17T22:19:16+0000 INFO: Listening, sFlow port 6343
2016-09-17T22:19:16+0000 INFO: Listening, HTTP port 8008
2016-09-17T22:19:16+0000 INFO: emerging.js started
2016-09-17T22:19:44+0000 WARNING: capturing 10.0.0.162 rule emrg0 on 10.0.0.253
The last line shows that traffic from host 10.0.0.162 to a blacklisted address has been detected and that selective spanning session has been configured on switch 10.0.0.253 to capture packets and send them to the host running Wireshark (10.0.0.70) for further analysis.

Asynchronous Docker metrics

$
0
0
Docker allows large numbers of lightweight containers can be started and stopped within seconds, creating an agile infrastructure that can rapidly adapt to changing requirements. However, the rapidly changing populating of containers poses a challenge to traditional methods of monitoring which struggle to keep pace with the changes. For example, periodic polling methods take time to detect new containers and can miss short lived containers entirely.

This article describes how the latest version of the Host sFlow agent is able to track the performance of a rapidly changing population of Docker containers and export a real-time stream of standard sFlow metrics.
The diagram above shows the life cycle status events associated with a container. The Docker Remote API provides a set of methods that allow the Host sFlow agent to communicate with the Docker to list containers and receive asynchronous container status events. The Host sFlow agent uses the events to keep track of running containers and periodically exports cpu, memory, network and disk performance counters for each container.

The diagram at the beginning of this article shows the sequence of messages, going from top to bottom, required to track a container. The Host sFlow agent first registers for container lifecycle events before asking for all the currently running containers. Later, when a new container is started, Docker immediately sends an event to the Host sFlow agent, which requests additional information (such as the container process identifier - PID) that it can use to retrieve performance counters from the operating system. Initial counter values are retrieved and exported along with container identity information as an sFlow counters message and a polling task for the new container is initiated. Container counters are periodically retrieved and exported while the container continues to run (2 polling intervals are shown in the diagram). When the Host sFlow agent receives an event from Docker indicating that the container is being stopped, it retrieves the final values of the performance counters, exports a final sFlow message, and removes the polling task for the container.

This method of asynchronously triggered periodic counter export allows an sFlow collector to accurately track rapidly changing container populations in large scale deployments. The diagram only shows the sequence of events relating to monitoring a single container. Docker network visibility demonstration shows the full range of network traffic and system performance information being exported.

Detailed real-time visibility is essential for fully realizing the benefits of agile container infrastructure, providing the feedback needed to track and automatically optimize the performance of large scale microservice deployments.

Docker 1.12 swarm mode elastic load balancing

$
0
0

Docker Built-In Orchestration Ready For Production: Docker 1.12 Goes GA describes the native swarm mode feature that integrates cluster management, virtual networking, and policy based deployment of services.

This article will demonstrate how real-time streaming telemetry can be used to construct an elastic load balancing solution that dynamically adjusts service capacity to match changing demand.

Getting started with swarm mode describes the steps to configure a swarm cluster. For example, following command issued on any of the Manager nodes deploys a web service on the cluster:
docker service create --replicas 2 -p 80:80 --name apache httpd:2.4
And the following command raises the number of containers in the service pool from 2 to 4:
docker service scale apache=4
Asynchronous Docker metrics describes how sFlow telemetry provides the real-time visibility required for elastic load balancing. The diagram shows how streaming telemetry allows the sFlow-RT controller to determine the load on the service pool so that it can use the Docker service API to automatically increase or decrease the size of the pool as demand changes. Elastic load balancing of the service pools ensures consistent service levels by adding additional resources if demand increases. In addition, efficiency is improved by releasing resources when demand drops so that they can be used by other services. Finally, global visibility into all resources and services makes it possible to load balance between services, reducing service pools for non-critical services to release resources during peak demand.

The first step is to install and configure Host sFlow agents on each of the nodes in the Docker swarm cluster. The following /etc/hsflowd.conf file configures Host sFlow to monitor Docker and send sFlow telemetry to a designated collector (in this case 10.0.0.162):
sflow {
sampling = 400
polling = 10
collector { ip = 10.0.0.162 }
docker { }
pcap { dev = docker0 }
pcap { dev = docker_gwbridge }
}
Note: The configuration file is identical for all nodes in the cluster making it easy to automate the installation and configuration of sFlow monitoring using  Puppet, Chef, Ansible, etc.

Verify that the sFlow measurements are arriving at the collector node (10.0.0.162) using sflowtool:
docker -p 6343:6343/udp sflow/sflowtool
The following elb.js script implements elastic load balancer functionality using the sFlow-RT real-time analytics engine:
var api = "https://10.0.0.134:2376";
var certs = '/tls/';
var service = 'apache';

var replicas_min = 1;
var replicas_max = 10;
var util_min = 0.5;
var util_max = 1;
var bytes_min = 50000;
var bytes_max = 100000;
var enabled = false;

function getInfo(name) {
var info = null;
var url = api+'/services/'+name;
try { info = JSON.parse(http2({url:url, certs:certs}).body); }
catch(e) { logWarning("cannot get " + url + " error=" + e); }
return info;
}

function setReplicas(name,count,info) {
var version = info["Version"]["Index"];
var spec = info["Spec"];
spec["Mode"]["Replicated"]["Replicas"]=count;
var url = api+'/v1.24/services/'+info["ID"]+'/update?version='+version;
try {
http2({
url:url, certs:certs, method:'POST',
headers:{'Content-Type':'application/json'},
body:JSON.stringify(spec)
});
}
catch(e) { logWarning("cannot post to " + url + " error=" + e); }
logInfo(service+" set replicas="+count);
}

var hostpat = service+'\\.*';
setIntervalHandler(function() {
var info = getInfo(service);
if(!info) return;

var replicas = info["Spec"]["Mode"]["Replicated"]["Replicas"];
if(!replicas) {
logWarning("no active members for service=" + service);
return;
}

var res = metric(
'ALL', 'avg:vir_cpu_utilization,avg:vir_bytes_in,avg:vir_bytes_out',
{'vir_host_name':[hostpat],'vir_cpu_state':['running']}
);

var n = res[0].metricN;

// we aren't seeing all the containers (yet)
if(replicas !== n) return;

var util = res[0].metricValue;
var bytes = res[1].metricValue + res[2].metricValue;

if(!enabled) return;

// load balance
if(replicas < replicas_max && (util > util_max || bytes > bytes_max)) {
setReplicas(service,replicas+1,info);
}
else if(replicas > replicas_min && util < util_min && bytes < bytes_min) {
setReplicas(service,replicas-1,info);
}
},2);

setHttpHandler(function(req) {
enabled = req.query && req.query.state && req.query.state[0] === 'enabled';
return enabled ? "enabled" : "disabled";
});
Some notes on the script:
  1. The setReplicas(name,count,info) function uses the Docker Remote API to implement functionality equivalent to the docker service scale name=count command shown earlier. The REST API is accessible at https://10.0.0.134:2376 in this example.
  2. The setIntervalHandler() function runs every 2 seconds, retrieving metrics for the service pool and scaling the number of replicas in the service up or down based on thresholds.
  3. The setHttpHandler() function exposes a simple REST API for enabling / disabling the load balancer functionality. The API can easily be extended to all thresholds to be set, to report statistics, etc.
  4. Certificates, key.pem, cert.pem, and ca.pem, required to authenticate API requests must be present in the /tls/ directory.
  5. The thresholds are set to unrealistically low values for the purpose of this demonstration.
  6. The script can easily be extended to load balance multiple services simultaneously.
  7. Writing Applications provides additional information on sFlow-RT scripting.
Run the controller:
docker run -v `pwd`/tls:/tls -v `pwd`/elb.js:/sflow-rt/elb.js \
-e "RTPROP=-Dscript.file=elb.js" -p 8008:8008 -p 6343:6343/udp -d sflow/sflow-rt
The autoscaling functionality can be enabled:
curl "http://localhost:8008/script/elb.js/json?state=enabled"
and disabled:
curl "http://localhost:8008/script/elb.js/json?state=disabled"
using the REST API exposed by the script.
The chart above shows the results of a simple test to demonstrate the elastic load balancer function. First, ab - Apache HTTP server benchmarking tool was used to generate load on the apache service running under Docker swarm:
ab -rt 60 -n 300000 -c 4 http://10.0.0.134/
Next, the test was repeated with the elastic load balancer enabled. The chart clearly shows that the load balancer is keeping the average network load on each container under control.
2016-09-24T00:57:10+0000 INFO: Listening, sFlow port 6343
2016-09-24T00:57:10+0000 INFO: Listening, HTTP port 8008
2016-09-24T00:57:10+0000 INFO: elb.js started
2016-09-24T01:00:17+0000 INFO: apache set replicas=2
2016-09-24T01:00:23+0000 INFO: apache set replicas=3
2016-09-24T01:00:27+0000 INFO: apache set replicas=4
2016-09-24T01:00:33+0000 INFO: apache set replicas=5
2016-09-24T01:00:41+0000 INFO: apache set replicas=6
2016-09-24T01:00:47+0000 INFO: apache set replicas=7
2016-09-24T01:00:59+0000 INFO: apache set replicas=8
2016-09-24T01:01:29+0000 INFO: apache set replicas=7
2016-09-24T01:01:33+0000 INFO: apache set replicas=6
2016-09-24T01:01:35+0000 INFO: apache set replicas=5
2016-09-24T01:01:39+0000 INFO: apache set replicas=4
2016-09-24T01:01:43+0000 INFO: apache set replicas=3
2016-09-24T01:01:45+0000 INFO: apache set replicas=2
2016-09-24T01:01:47+0000 INFO: apache set replicas=1
The sFlow-RT log shows that containers are added to the apache service to handle the increased load and removed once demand decreases.

This example relied on a small subset of the information available from the sFlow telemetry stream. In addition to container resource utilization, the Host sFlow agent exports an extensive set of metrics from the nodes in the Docker swarm cluster. If the nodes are virtual machines running in a public or private cloud, the metrics can be used to perform elastic load balancing of the virtual machine pool making up the cluster, increasing the cluster size if demand increases and reducing cluster size when demand decreases. In addition, poorly performing instances can be detected and removed from the cluster (see Stop thief! for an example).
The sFlow agents also efficiently report on traffic flowing within and between microservices running on the swarm cluster. For example, the following command:
docker run -p 6343:6343/udp -p 8008:8008 -d sflow/top-flows
launches the top-flows application to show an up to the second view of active flows in the network.

Comprehensive real-time analytics is critical to effectively managing agile container-bases infrastructure. Open source Host sFlow agents provide a lightweight method of instrumenting the infrastructure that unifies network and system monitoring to deliver a full set of standard metrics to performance management applications.

Collecting Docker Swarm service metrics

$
0
0
This article demonstrates how to address the challenge of monitoring dynamic Docker Swarm deployments and track service performance metrics using existing on-premises and cloud monitoring tools like Ganglia, Graphite, InfluxDB, Grafana, SignalFX, Librato, etc.

In this example, Docker Swarm is used to deploy a simple web service on a four node cluster:
docker service create --replicas 2 -p 80:80 --name apache httpd:2.4
Next, the following script tests the agility of monitoring systems by constantly changing the number of replicas in the service:
#!/bin/bash
while true
do
docker service scale apache=$(( ( RANDOM % 20 ) + 1 ))
sleep 30
done
The above test is easy to set up and is a quick way to stress test monitoring systems and reveal accuracy and performance problems when they are confronted with container workloads.

Many approaches to gathering and recording metrics were developed for static environments and have a great deal of difficulty tracking rapidly changing container-based service pools without missing information, leaking resources, and slowing down. For example, each new container in Docker Swarm has unique name, e.g. apache.16.17w67u9157wlri7trd854x6q0. Monitoring solutions that record container names, or even worse, index data by container name, will suffer from bloated databases and resulting slow queries.

The solution is to insert a stream processing analytics stage in the metrics pipeline that delivers a consistent set of service level metrics to existing tools.
The asynchronous metrics export method implemented in the open source Host sFlow agent is part of the solution, sending a real-time telemetry stream to a centralized sFlow collector which is then able to deliver a comprehensive view of all services deployed on the Docker Swarm cluster.

The sFlow-RT real-time analytics engine completes the solution by converting the detailed per instance metrics into service level statistics which are in turn streamed to a time series database where they drive operational dashboards.

For example, the following swarmmetrics.js script computes cluster and service level metrics and exports them to InfluxDB:
var docker = "https://10.0.0.134:2376/services";
var certs = '/tls/';

var influxdb = "http://10.0.0.50:8086/write?db=docker"

var clustermetrics = [
'avg:load_one',
'max:cpu_steal',
'sum:node_domains'
];

var servicemetrics = [
'avg:vir_cpu_utilization',
'avg:vir_bytes_in',
'avg:vir_bytes_out'
];

function sendToInfluxDB(msg) {
if(!msg || !msg.length) return;

var req = {
url:influxdb,
operation:'POST',
headers:{"Content-Type":"text/plain"},
body:msg.join('\n')
};
req.error = function(e) {
logWarning('InfluxDB POST failed, error=' + e);
}
try { httpAsync(req); }
catch(e) {
logWarning('bad request ' + req.url + '' + e);
}
}

function clusterMetrics(nservices) {
var vals = metric(
'ALL', clustermetrics,
{'node_domains':['*'],'host_name':['vx*host*']}
);
var msg = [];
msg.push('swarm.services value='+nservices);
msg.push('nodes value='+(vals[0].metricN || 0));
for(var i = 0; i < vals.length; i++) {
let val = vals[i];
msg.push(val.metricName+' value='+ (val.metricValue || 0));
}
sendToInfluxDB(msg);
}

function serviceMetrics(name, replicas) {
var vals = metric(
'ALL', servicemetrics,
{'vir_host_name':[name+'\\.*'],'vir_cpu_state':['running']}
);
var msg = [];
msg.push('replicas_configured,service='+name+' value='+replicas);
msg.push('replicas_measured,service='+name+' value='+(vals[0].metricN || 0));
for(var i = 0; i < vals.length; i++) {
let val = vals[i];
msg.push(val.metricName+',service='+name+' value='+(val.metricValue || 0));
}
sendToInfluxDB(msg);
}

setIntervalHandler(function() {
var i, services, service, spec, name, replicas, res;
try { services = JSON.parse(http2({url:docker, certs:certs}).body); }
catch(e) { logWarning("cannot get " + url + " error=" + e); }
if(!services || !services.length) return;

clusterMetrics(services.length);

for(i = 0; i < services.length; i++) {
service = services[i];
if(!service) continue;
spec = service["Spec"];
if(!spec) continue;
name = spec["Name"];
if(!name) continue;

replicas = spec["Mode"]["Replicated"]["Replicas"];
serviceMetrics(name, replicas);
}
},10);
Some notes on the script:
  1. Only a few representative metrics are being monitored, many more are available, see Metrics.
  2. The setIntervalHandler function is run every 10 seconds. The function queries Docker REST API for the current list of services and then calculates summary statistics for each service. The summary statistics are then pushed to InfluxDB via a REST API call.
  3. Cluster performance metrics describes the set of summary statistics that can be calculated.
  4. Writing Applications provides additional information on sFlow-RT scripting and REST APIs.
Start gathering metrics:
docker run -v `pwd`/tls:/tls -v `pwd`/swarmmetrics.js:/sflow-rt/swarmmetrics.js \
-e "RTPROP=-Dscript.file=swarmmetrics.js" \
-p 8008:8008 -p 6343:6343/udp sflow/sflow-rt
The results are shown in the Grafana dashboard at the top of this article. The charts show 30 minutes of data. The top Replicas by Service chart compares the number of replicas configured for each service with the number of container instances that the monitoring system is tracking. The chart demonstrates that the monitoring system is accurately tracking the rapidly changing service pool and able to deliver reliable metrics. The middle Network IO by Service chart shows a brief spike in network activity whenever the number of instances in the apache service is increased. Finally, the bottom Cluster Size chart confirms that all four nodes in the Swarm cluster are being monitored.

This solution is extremely scaleable. For example, increasing the size of the cluster from 4 to 1,000 nodes increases the amount of raw data that sFlow-RT needs to process to accurately calculate service metrics, but has have no effect on the amount of data sent to the time series database and so there is no increase in storage requirements or query response time.
Pre-processing the stream of raw data reduces the cost of the monitoring solution, either in terms of the resources required by an on-premises monitoring solutions, or the direct costs of cloud based solutions which charge per data point per minute per month. In this case the raw telemetry stream contains hundreds of thousands of potential data points per minute per host - filtering and summarizing the data reduces monitoring costs by many orders of magnitude.
This example can easily be modified to send data into any on-premises or cloud based backend, examples in this blog include: SignalFX, Librato, Graphite and Ganglia. In addition, Docker 1.12 swarm mode elastic load balancing describes how the same architecture can be used to dynamically resize service pools to meet changing demand.

Real-time domain name lookups

$
0
0
Reverse DNS requests request the domain name associated with an IP address, for example providing the name google-public-dns-a.google.com for IP address 8.8.8.8.  This article demonstrates how the sFlow-RT engine incorporates domain name lookups in real-time flow analytics.

First, use the dns.serversSystem Property is used to specify one or more DNS servers to handle the reverse lookup requests. For example, the following command uses Docker to run sFlow-RT with DNS lookups directed to server 10.0.0.1:
docker run -e "RTPROP=-Ddns.servers=10.0.0.1" \
-p 8008:8008 -p 6343:6343/udp -d sflow/sflow-rt
The following Python script dnspair.py uses the sFlow-RT REST API to define a flow and log the resulting flow records:
#!/usr/bin/env python
import requests
import json

flow = {'keys':'dns:ipsource,dns:ipdestination',
'value':'bytes','activeTimeout':10,'log':True}
requests.put('http://localhost:8008/flow/dnspair/json',data=json.dumps(flow))
flowurl = 'http://localhost:8008/flows/json?name=dnspair&maxFlows=10&timeout=60'
flowID = -1
while 1 == 1:
r = requests.get(flowurl + "&flowID=" + str(flowID))
if r.status_code != 200: break
flows = r.json()
if len(flows) == 0: continue

flowID = flows[0]["flowID"]
flows.reverse()
for f in flows:
print json.dumps(f,indent=1)
Running the script generates the following output:
$ ./dnspair.py
{
"value": 233370.92322668363,
"end": 1476234478177,
"name": "dnspair",
"flowID": 1523,
"agent": "10.0.0.20",
"start": 1476234466195,
"dataSource": "10",
"flowKeys": "xenvm11.sf.inmon.com.,dhcp20.sf.inmon.com."
}
{
"value": 39692.88754760739,
"end": 1476234478177,
"name": "dnspair",
"flowID": 1524,
"agent": "10.0.0.20",
"start": 1476234466195,
"dataSource": "10",
"flowKeys": "xenvm11.sf.inmon.com.,switch.sf.inmon.com."
}
The token dns:ipsource in the flow definition is an example of a Key Function. Functions can be combined to define flow keys or in filters.
or:[dns:ipsource]:ipsource
Returns a dns name if available, otherwise the original IP address is returned
suffix:[dns:ipsource]:.:3
Returns the last 2 parts of the DNS name, e.g. xenvm11.sf.inmon.com. becomes inmon.com.

DNS results are cached by the dns: function in order to provide real-time lookups and reduce the load on the backend name server(s). Cache size and timeout settings are tune-able using System Properties.
Viewing all 347 articles
Browse latest View live