Network Diagnostics

From Pulsed Media Wiki
Revision as of 12:27, 29 December 2023 by Nucode (talk | contribs) (Internet's Complex Nature)

Network Diagnostics Comprehensive Guide

Network performance is crucial for the smooth operation of dedicated servers. However, diagnosing network issues can be very challenging. This document aims to guide through effective network testing methods to identify potential issues. Understanding the nature of the internet - a vast network with numerous interconnected routes and nodes - is key to recognizing why network issues are often not within the immediate control of your hosting provider.

Why In-Depth Network Testing is Essential

Internet's Complex Nature

  1. The internet is a complex web of interconnected networks. It's common for routes to experience disruptions due to various factors like maintenance, outages, or heavy traffic (congestion).
  2. Always Broken; The Internet is always broken somewhere, all the time; Sometimes the fixes can take months to even years to be implemented, at extreme costs. Patience.
  3. Different connections follow different paths across the network, leading to varied experiences for different users.
  4. Every connection can potentially have tens of thousands of components involved; Issue with even one will cause disruptions.

Identifying the True Source of the Issue

  1. In most cases (over 90%), reported network issues are not directly related to the server or hosting provider.
  2. Problems often lie in the route your data takes through the internet, which involves third-party networks. Third-party networks are not under control of your hosting provider.
  3. Network issues are notoriously difficult to diagnose, proper testing is a must. Do not waste network engineer time with "no work" or "bad speed" messages.

No Data; No Solution is Possible

Without data there can be no possible solution. The onus is always on the user to do the initial basic testing, every large network operator will ignore all requests without exact, well collated data and synopsis of the issue. Reports along lines "xyz is slow!" goes to /dev/null typically.

Do not waste a professional's time without first doing the basic testing, and if more information is requested never ignore and just complete the testing. Otherwise your report will again go to /dev/null.

Sometimes when issue is finally identified, it is most likely still going to be ignored if report comes from 3rd party. Do not ask, or hold your hosting provider responsible for fixing other operators networks. If your testing identifies the issue elsewhere than your hosting provider, such as your ISP, contact your ISP directly.

Understanding Key Network Metrics: Jitter, Ping, Packet Loss, and Routes

For comprehensive network diagnostics, it’s essential to understand the significance of various network metrics. Each of these metrics offers valuable insights into the quality and reliability of a network connection.

Ping (Latency)

  1. Ping, or latency, measures the time it takes for data to travel from the source to the destination and back.
  2. It's crucial to have low latency for activities requiring real-time response, such as video conferencing and gaming.
  3. Latency directly affects potential network throughput due to TCP window sizes and TCP ACK packet delays. High latency means more packets need to be in-flight.
  4. Uses ICMP Echo packets to measure latency

Jitter

  1. Jitter refers to the variation in delay (latency, "ping") of received packets.
  2. High jitter can cause issues in real-time applications like VoIP or online gaming, where consistent timing is crucial.
  3. Jitter is measured by observing the time difference between successive packets. Consistent packet timing leads to low jitter, which is desirable for a stable connection.
  4. High Jitter on High Latency link may lead to TCP restart events / TCP Window issues, leading to slow total throughput.

Packet Loss

  1. Packet loss occurs when packets of data being transmitted across a network fail to reach their destination completely.
  2. It can be caused by network congestion, hardware failures, or signal degradation.
  3. High packet loss leads to interruptions and degradation in service quality, particularly affecting streaming, downloading, and online gaming.
  4. Packet loss causes TCP Window to reset, causing low throughput speeds. Higher the latency, the higher the effect.
  5. Tested with Ping typically. Internet Routers may have packet loss, since ICMP Echo packets are lowest priority handled by router's CPU.

Routes

  1. The path or route taken by data packets across the network significantly affects overall network performance.
  2. Data can traverse multiple routers and networks, each potentially impacting speed and reliability.
  3. Understanding the routes helps in pinpointing where delays or packet losses are occurring, especially when troubleshooting network issues.
  4. Routes are commonly different to each way, therefore testing both ways is important.
  5. Network provider cannot really affect which way 3rd party sends the packets (route), this affects downstream latency and throughput.
  6. Network provider can have control over the route packets leave towards to a 3rd party, this affects upstream latency and throughput.
  7. Routes are typically dynamic and can change often, these are rarely manually optimized per target.

By monitoring and analyzing these metrics, users can better understand the health and performance of their network connections. This knowledge is essential for diagnosing issues and optimizing network performance.


Testing methodologies

Speed Tests; Inherently Unreliable

Speed tests can be an unreliable measure of network health. Network conditions constantly fluctuate, and third-party testing services have their limitations. Third-party testing servers are also often very busy. Especially speeds over 1Gbps can be difficult to measure. Most 3rd party testing servers are 10Gbps max.

Speed tests, while popular, can be an unreliable measure of network health due to various factors. Here are key reasons why reliance on speed tests alone is not advisable:

  1. Third-Party Networks: Speed tests often involve data traveling through networks outside the control of your hosting provider. These third-party networks can have varying performance due to their own traffic management policies and network health.
  2. Transits and Peerings: The path data takes typically goes through several links and networks, each potentially affecting speed and performance. The complexity of these routes means that a speed test to one location will yield completely different results compared to another, even if both are equidistant.
  3. Inconsistent Results Across Different Tests: Due to the complexities of internet routing, different speed tests can yield varying results. Each test may involve data traveling through distinct paths, encountering unique network conditions along the way.
  4. Network Variability: The internet's network conditions are in constant flux. This variability can result from traffic congestion, maintenance activities, and outages, all of which can temporarily impact speed test results.
  5. Third-Party Server Limitations: Most speed test results are dependent on the performance of third-party servers. These servers can be busy or have limitations in their capacity, especially for high-speed connections. The majority of third-party testing servers have a maximum capacity of 10Gbps, which can be insufficient for accurately measuring speeds over 1Gbps.
  6. Indication of Server Performance: Despite their limitations, if at least one or a few speed tests show good speeds, it's a strong indication that the server itself is functioning properly. Consistently high speeds in multiple tests, especially from different testing platforms, further reinforce this.
  7. Client-Side Factors: The accuracy of speed tests can also be influenced by factors on the user's end, such as local network issues, the performance of the testing device, and the browser or application used for the test. Most typical is using WI-FI. If you are using WI-FI, start diagnosing from there. Experience shows "to home" speed issues are almost always due to utilizing WI-FI.
  8. Limited Scope of Testing: Speed tests primarily measure bandwidth and latency but do not provide comprehensive insights into other critical aspects of network performance, such as packet loss, jitter, and the stability of the connection over time.
  9. User's Server Configuration: The configuration of Your server plays a crucial role in network performance. Non-standard kernel configurations, especially those related to TCP and MTU window sizes, can significantly skew test results. TCP/MTU window sizes are vital as they determine how much data can be sent before requiring an acknowledgment – in scenarios with higher latency, the impact of incorrectly set window sizes becomes more pronounced, potentially leading to reduced throughput and performance issues.

Speed tests can offer some insights into network performance, but they should only be used as part of a broader diagnostic strategy. For a more accurate assessment of network health, combining speed tests with other tools like MTR analysis is recommended. This approach helps in identifying whether network issues are indeed related to the server or if they lie elsewhere in the complex web of internet connectivity.

Thoughput, aka Speed Test Tools

There are a lot of tools for testing, these are the most common.

Yabs.sh

Popular tool for general server performance test, while limited, this is what a lot of people run as default. It gives a hint of relative server performance, since all tests are the same it does give decent indication, for most part. It's well known that yabs.sh limited number of network speed test servers are often congested at this time.

Run yabs.sh:

wget -qO- yabs.sh | bash
network-speed.xyz

This is like yabs.sh but dedicated only for network tests, running larger number of tests, allowing choosing regionality etc.

Run network-speed.xyz test:

wget -qO- network-speed.xyz | bash
Speedtest by Ookla

speedtest-cli is another very common tool to use, or speedtest.net. This tool only tests on single server, closest it can find. Due to geolocation awareness this actually gives one of the more reliable results, a close by server. Sometimes these servers are congested as well, so testing multiple is key if first one gives you bad results; It could be that particular test server is congested.

Do not Use The Python Version -- This is known to have measurement issues.

Iperf3

Best tool for measuring point to point, this is heavily optimized and has many options for various testing methods and parallelism.

Install on Debian / Ubuntu and starting a server is simple;

apt install -y iperf3
iperf3 -s
# Performing a test: iperf3 -c [Server IP Address]

Network Diagnostics; Route, Ping, Jitter

Tools

Tools for basic network diagnostics are the same commonly, while others can exist, these are the ones most typically used.

MTR, WinMTR

This is the most important and essential tool, see Network Troubleshooting with MTR for comprehensive guide. Always do a MTR test both ways, in minimum of 1000+ packets if you suspect an issue.

MTR gives you all typical information; Latency, Packet Loss, Jitter, Route -- all in one.

Ping

Test latency between 2 end points. Simple basic test to quickly check if you get an response.

To test, Linux or Windows;

ping [server ip or hostname, ie. google.com]
Traceroute

Trace the route to the other server, which network hops it has, how the packets are being routed _to_ the target. This only tests from to the target, you need to run this from the target as well to get complete picture, like MTR.

To test, Linux;

traceroute [server ip or hostname, ie. google.com]

To test, Windows;

tracert [server ip or hostname, ie. google.com]