Modern networks have become complex systems with many different network devices and applications, and it is almost impossible to control the situation in such networks manually. Special anomaly detection systems help here.

What is a network anomaly?
Problems in computer networks are detected by the traffic anomalies they cause. An anomaly is something that goes against expectations. For example, a damaged switch may create unexpected traffic in another part of the network, or new error codes may start to appear when a service is disabled. Network troubleshooting is based on network anomalies.

The first method of classifying anomalies is based on how they differ from normal communication. Anomalies can be distinguished by the type of data transmitted , or by the amount of data transmitted , or both . Another way to classify anomalies is by their cause :

Non-human error - equipment failure or interruption of radio communication due to weather conditions;
Human error - misconfiguration of equipment or accidental disconnection of a network cable;
Malicious human activity is an internal attack, when a company employee purposefully damages the system, or an external attack, when an attacker tries to disable the network and cause damage.
What is an anomaly detection system?
Anomaly detection requires constant monitoring and analysis of selected network metrics. The anomaly detection system covers the scenario where something unexpected is detected and the system evaluates it as an anomaly that can be reported to the network administrator.

There are two main categories of network monitoring that allow you to detect anomalies:

Passive network monitoring
A computer network includes sensors that receive data from the network and evaluate it. This data may be intended directly for sensors (for example, events sent via SNMP), or it may be a copy of production traffic that occurs on the network, whether a sensor is connected or not.

Active network monitoring
The network may also contain sensors that generate additional traffic that they send through the network. With this traffic, you can constantly determine the availability or general parameters of the tested services, network lines and devices.

Differences between active and passive network monitoring
It might seem that active monitoring complements passive monitoring, automatically making it a better option. However, the problem with active monitoring is that it generates additional data on the network. With active monitoring, devices become part of the production network (which entails security risks), so such monitoring is not completely secure.

Another potential problem is that the analysis data itself can affect the functionality of the network and cause anomalies (for example, overload the server). Given these shortcomings, this article focuses on passive monitoring of network anomalies.

In general, anomaly detection can be divided into several main components:

Parameterization - traceable data is separated from the input data in a form suitable for further processing;
Training - When this mode is selected, the network model (training state) is updated. This update can be performed either automatically or manually;
Discovery - the created (trained) model is used to compare data from the supervised network. If they meet certain criteria, an anomaly detection report is generated.


What anomalies can be found?
Ransomware - by searching for the signature of an executable file;
DDoS attack - by comparing the volume of current traffic with the expected volume;
Botnet activity - known botnet command and control (C&C) servers are analyzed to detect connections to these servers;
Dictionary attack - by comparing the number of login attempts against thresholds;
Link failure - accompanied by a noticeable increase in the number of connections on the backup channel;
Incorrect application configuration - an increase in the number of error codes in application connections;
Server overload - implies a decrease in the quality of the services or the server;
Suspicious Device Behavior - By creating behavior profiles and checking device behavior outside the profiles created.
Anomaly detection methods
Signatures or knowledge-based methods
The signature describes exactly what type of data the system is looking for. An example of a signature might be looking for a packet that has the same source and destination IP addresses, or looking for specific content in a packet.

Basic (statistical)
The base layer describes the amount of data to be transferred, which has certain general characteristics. For example, this could be the number of TCP connections discovered every 5 minutes. An anomaly occurs when the current value (number of requests in the last 5 minutes) deviates significantly from the learned baseline.

Anomaly detection by changing the number of detected TCP connections

Another example is looking for changes in the distribution of packets according to the ports to which they are directed. The picture shows the case when the anomaly manifests itself in an increase in the number of packets sent to one destination port.

Changing the distribution of packets by destination ports. When transferring large amounts of data to one port, the result of the distribution changes significantly.

The difference between network anomaly detection with signatures and baseline
The table below compares signature and baseline usage.

While signature anomaly detection is fast and accurate, it can only work on traffic anomalies for which the signature is known. And machine learning-based detection is slower and gives more false positives, but is able to detect new anomalies for which there are no signatures. Therefore, a balanced approach is usually recommended.

Using Machine Learning for Anomaly Detection
For signatures to be accurate and detect known anomalies in the network, signatures must be created manually using knowledge about each problem or attack. The main benefit of using machine learning is that the baseline can change over time depending on what data was actually discovered, allowing you to learn from previous results. ML algorithms are used by anomaly-based intrusion detection systems that operate on the principle of finding deviations from the learned norm.

The advantage of using machine learning is that it does not require any knowledge of the monitored network. Artificial intelligence can also learn expected behavior and determine deviations from it. However, if the error manifests itself as a gradual increase in some attributes, no anomalies will be detected - the ML model will adapt to the increase in these attributes, and detection will not occur.

Anomaly Detection Challenges
The reality of anomaly detection is not as simple as it might seem. When monitoring a network, a dangerous problem can occur that will severely limit detection capabilities.

False positive detection
Distinguishing normal activity from anomaly is not always easy. What was an ordinary event yesterday may become an anomaly tomorrow. This is due to the fact that the transmitted data changes regardless of whether there is a problem (anomaly) in the network or not. That's why detection is rather based on a probability estimate. Each detected event is assigned a score, if this score exceeds a predetermined threshold, it is marked as an anomaly.

The threshold determines the sensitivity of the detection. If the sensitivity is too high , problems will be detected quickly, but at the cost of more false positives. If the sensitivity is too low , the number of false positives is reduced, but the number of correctly detected anomalies is also reduced, allowing the cybercriminal to remain undetected.

An example of a false positive event is an unexpected operating system update that transmits a large amount of data or too many customers connect to the online store at the same time.

In other words, it is impossible to guarantee that all anomalies in the network will be detected and at the same time there will be no false positives. The reason false positive events are actually a problem is that during automatic event processing, legitimate traffic or service can be identified as problematic and their activity will be restricted. At the same time, manual processing and analysis of these anomalies requires a huge investment of time and effort.

Monitoring of encrypted traffic as a replacement for the outdated anomaly detection method
For reasons of privacy and security in computer networks, data encryption is being expanded and improved. Encrypted communication also affects anomaly detection because encryption reduces the amount of data that monitoring and analysis can work with. For example, when monitoring encrypted email, email addresses become unavailable.

It is important to know at what level encryption takes place. Most communications are only encrypted at the application level , which means that statistical analysis of IP addresses, destination ports, etc. can be performed. Thus, encryption does not prevent detection, but it greatly limits the types of anomalies that can be detected. Unfortunately, attackers are aware of this fact and hide their activity in an encrypted message to avoid detection.