Network Diagnostics

From Pulsed Media Wiki
Revision as of 12:57, 29 December 2023 by Nucode (talk | contribs)

Network Diagnostics Comprehensive Guide

Network performance is crucial for the smooth operation of dedicated servers. However, diagnosing network issues can be very challenging. This document aims to guide through effective network testing methods to identify potential issues. Understanding the nature of the internet - a vast network with numerous interconnected routes and nodes - is key to recognizing why network issues are often not within the immediate control of your hosting provider.

Why In-Depth Network Testing is Essential

Internet's Complex Nature

  1. The internet is a complex web of interconnected networks. It's common for routes to experience disruptions due to various factors like maintenance, outages, or heavy traffic (congestion).
  2. Different connections follow different paths across the network, leading to varied experiences for different users.
  3. Every connection can potentially have tens of thousands of components involved; Issue with even one will cause disruptions.

Identifying the True Source of the Issue

  1. In most cases (over 90%), reported network issues are not directly related to the server or hosting provider.
  2. Problems often lie in the route your data takes through the internet, which involves third-party networks. Third-party networks are not under control of your hosting provider.
  3. Network issues are notoriously difficult to diagnose, proper testing is a must. Do not waste network engineer time with "no work" or "bad speed" messages.

Tools for Network Testing

Speed Tests Are Inherently Unreliable

Speed tests can be an unreliable measure of network health. Network conditions constantly fluctuate, and third-party testing services have their limitations. Third-party testing servers are also often very busy. Especially speeds over 1Gbps can be difficult to measure. Most 3rd party testing servers are 10Gbps max.

Speed tests, while popular, can be an unreliable measure of network health due to various factors. Here are key reasons why reliance on speed tests alone is not advisable:

  1. Third-Party Networks: Speed tests often involve data traveling through networks outside the control of your hosting provider. These third-party networks can have varying performance due to their own traffic management policies and network health.
  2. Transits and Peerings: The path data takes typically goes through several links and networks, each potentially affecting speed and performance. The complexity of these routes means that a speed test to one location will yield completely different results compared to another, even if both are equidistant.
  3. Inconsistent Results Across Different Tests: Due to the complexities of internet routing, different speed tests can yield varying results. Each test may involve data traveling through distinct paths, encountering unique network conditions along the way.
  4. Network Variability: The internet's network conditions are in constant flux. This variability can result from traffic congestion, maintenance activities, and outages, all of which can temporarily impact speed test results.
  5. Third-Party Server Limitations: Most speed test results are dependent on the performance of third-party servers. These servers can be busy or have limitations in their capacity, especially for high-speed connections. The majority of third-party testing servers have a maximum capacity of 10Gbps, which can be insufficient for accurately measuring speeds over 1Gbps.
  6. Indication of Server Performance: Despite their limitations, if at least one or a few speed tests show good speeds, it's a strong indication that the server itself is functioning properly. Consistently high speeds in multiple tests, especially from different testing platforms, further reinforce this.
  7. Client-Side Factors: The accuracy of speed tests can also be influenced by factors on the user's end, such as local network issues, the performance of the testing device, and the browser or application used for the test. Most typical is using WI-FI. If you are using WI-FI, start diagnosing from there. Experience shows "to home" speed issues are almost always due to utilizing WI-FI.
  8. Limited Scope of Testing: Speed tests primarily measure bandwidth and latency but do not provide comprehensive insights into other critical aspects of network performance, such as packet loss, jitter, and the stability of the connection over time.
  9. User's Server Configuration: The configuration of Your server plays a crucial role in network performance. Non-standard kernel configurations, especially those related to TCP and MTU window sizes, can significantly skew test results. TCP/MTU window sizes are vital as they determine how much data can be sent before requiring an acknowledgment – in scenarios with higher latency, the impact of incorrectly set window sizes becomes more pronounced, potentially leading to reduced throughput and performance issues.

Speed tests can offer some insights into network performance, but they should only be used as part of a broader diagnostic strategy. For a more accurate assessment of network health, combining speed tests with other tools like MTR analysis is recommended. This approach helps in identifying whether network issues are indeed related to the server or if they lie elsewhere in the complex web of internet connectivity.

Speed Test Tools

Yabs.sh

Popular tool for general server performance test, while limited, this is what a lot of people run as default. It gives a hint of relative server performance, since all tests are the same it does give decent indication, for most part. It's well known that yabs.sh limited number of network speed test servers are often congested at this time.

network-speed.xyz

This is like yabs.sh but dedicated only for network tests, running larger number of tests, allowing choosing regionality etc.

Speedtest by Ookla

speedtest-cli is another very common tool to use, or speedtest.net. This tool only tests on single server, closest it can find. Due to geolocation awareness this actually gives one of the more reliable results, a close by server. Sometimes these servers are congested as well, so testing multiple is key if first one gives you bad results; It could be that particular test server is congested.

Do not Use The Python Version -- This is known to have measurement issues.

Iperf

Best tool for measuring point to point, this is heavily optimized and has many options for various testing methods and parallelism.