Topology aware flow analytics with NVIDIA NetQ
NVIDIA Cumulus Linux 5.11 for AI / ML describes how NVIDIA 400/800G Spectrum-X switches combined with the latest Cumulus Linux release deliver enhanced real-time telemetry that is particularly relevant...
View ArticleReplay pcap files using sflowtool
It can be very useful to capture sFlow telemetry from production networks so that it can be replayed later to perform off-line analysis, or to develop or evaluate sFlow collection tools. sudo tcpdump...
View ArticleAI Metrics
AI Metrics is available on GitHub. The application provides performance metrics for AI/ML RoCEv2 network traffic, for example, large scale CUDA compute tasks using NVIDIA Collective Communication...
View ArticleCapture to pcap file using sflowtool
Replay pcap files using sflowtool describes how to capture sFlow datagrams using tcpdump and replay them in real time using sflowtool. However, using tcpdump for the capture has the downside of...
View ArticleComparing AI / ML activity from two production networks
AI Metrics describes how to deploy the open source ai-metrics application. The application provides performance metrics for AI/ML RoCEv2 network traffic, for example, large scale CUDA compute tasks...
View ArticleDropped packet notifications with Cisco 8000 Series Routers
The availability of the Cisco IOS XR Release 25.1.1 brings sFlow dropped packet notification support to Cisco 8000 series routers, making it easy to capture and analyze packets dropped at router...
View ArticleAI Metrics with Prometheus and Grafana
The Grafana AI Metrics dashboard shown above tracks performance metrics for AI/ML RoCEv2 network traffic, for example, large scale CUDA compute tasks using NVIDIA Collective Communication Library...
View ArticleMulti-vendor support for dropped packet notifications
The sFlow Dropped Packet Notification Structures extension was published in October 2020. Extending sFlow to provide visibility into dropped packets offers significant benefits for network...
View ArticleAI Metrics with Grafana Cloud
The Grafana AI Metrics dashboard shown above tracks performance metrics for AI/ML RoCEv2 network traffic, for example, large scale CUDA compute tasks using NVIDIA Collective Communication Library...
View ArticleAI network performance monitoring using containerlab
AI Metrics is available on GitHub. The application provides performance metrics for AI/ML RoCEv2 network traffic, for example, large scale CUDA compute tasks using NVIDIA Collective Communication...
View ArticleAI Metrics with InfluxDB Cloud
The InfluxDB AI Metrics dashboard shown above tracks performance metrics for AI/ML RoCEv2 network traffic, for example, large scale CUDA compute tasks using NVIDIA Collective Communication Library...
View ArticleTracing network packets with eBPF and pwru
pwru (packet, where are you?) is an open source tool from Cilium that used eBPF instrumentation in recent Linux kernels to trace network packets through the kernel. In this article we will use...
View ArticleLinux packet sampling using eBPF
Linux 6.11+ kernels provide TCX attachment points for eBPF programs to efficiently examine packets as they ingress and egress the host. The latest version of the open source Host sFlow agent includes...
View Article