This short article covers some tips in conducting log analysis (specially on getting the information about top talkers and coming up with a comprehensive report). This is not something formal or a standard procedure that we should strictly follow. This is a guide to help you approach a log analysis problem systematically, which can then lead to a better and easier to understand log analysis report. This guide doesn’t focus on any proprietary log format or vendor-specific network device, thus this can be applied to multiple platforms (such as Cisco, FortiGates, etc.).
Logs Logs And More Logs.
Logs will always give you some idea of what has happened and it is up to you what to interpret with that information. I’d also like to think of it that logs can tell you more than what it actually says because log analysis is not just looking at a particular log entry, but correlating all the information from multiple log entries too. Asking the right questions for different scenarios and knowing what to find in the logs is the key to find your answers.
In conducting log analysis, you always start with a hypothesis or a problem (we call them both as a scenario), which is to be proven or solved by your analysis. Some scenarios could be about finding the cause of the network or device anomaly. It could also be to figure out if there are any signs of malware infection, or botnet activity. Each scenario has their own unique data requirements to arrive at a conclusion. Nevertheless, by the scenario alone, it can already give you a clue as to what elements in logs we should look for. Knowing what they are is the key so that you won’t get lost in conducting log analysis.
A systematical approach is what keeps us from messing up. It complements our knowledge of what elements in the logs to look for, and what they mean to us. The systematic approach greatly improves our efficiency in coming up with our analysis. Now off we go to the main part: below are the steps to help you systematically tackle any log analysis problem.
1. Identify the time frame
2. Identify the hosts that are involved
3. Identify the choke point
4. Produce the lists of essential data from the logs
5. Dig deeper into the elements of each log entry
Below are the discussion of every step enumerated above.
Identify The Time Frame.
For any case, we should make it clear what’s the time frame we have to look into. This minimizes our log size and will speed up the entire log analysis process. The key is to determine when the anomaly or problem was noticed.
Identify The Hosts Involved.
Identifying the hosts involved is very much applicable for cases of malware infection or host/device misconfiguration. Determining what hosts are involved may again narrow down your scope. If you can identify what are the affected hosts’ IPs or host names, you could focus your analysis on those. You just need to extract all the logs of those hosts from the whole log file and conduct your analysis on those logs. Your search key will be the hosts’ IPs or host names.
Identify The Choke Point.
The choke points are the devices sitting between the internal network and the Internet. Identifying them can help us understand what’s going on in the network. It could inform you either through logs or packet captures. These devices are usually the firewalls or inline IDSes.
The choke points can aide us in deducing a network problem (such as bandwidth issue, firewall dropping packets, or no connectivity issue). You can gather logs from that device to see any problems. Do also check the configuration and health status of the device itself (this entails you to log in directly to the device). Some times the problem lies in the configuration (when some one screws up the ACL, etc), and some times the health status of the device tells us that it is overwhelmed with traffic that’s why it can’t function.
Produce The Lists of Data From The Logs.
Producing these lists provides you with an overview of what is going on in the network. Your methods to produce the lists has to be structured in an orderly manner since each list can already lead you to some conclusions. All list could be as simple as a Top Ten list or you can go as much as you want (e.g. Top 30).
1. Produce a list of source IPs and destinations IPs
The source IP list can give you an idea who are the most active in the network or the ones that consumes the most network resources. The destination IP list gives you the idea who are the domains or external hosts your network hosts are talking to.
By getting these two lists and resolving the public IPs, you can tell if there are any suspicious/malicious connections to external hosts.
2. Produce a list of source ports and destination ports
Both source and destination port can give you the idea what the flow of the traffic is and the services are being used. Moreover, it shows you if the network is only using the necessary services needed (like HTTP, SMTP) or any suspicious ports that are opened and listening (like backdoors or Trojans). If there is any unwanted port connections in or out of your network (e.g. you don’t want to see any NetBIOS port connecting to an external host, or a backdoor installation) it can be detected through this list.
3. Produce a list of top source and destination IPs with their respective ports.
This list could help you in your correlation. It could help pin point which particular host is running what service. If you determine that there is an anomalous port, this list can directly show you who is using it.
4. Produce a list of top sessions (source IP <-> destination IP).
This list can also aide you in your correlation. It will show you the frequency of a session. With this list, you can tell if a particular host is frequently visiting Facebook or a malicious web site. Or you can also tell if the host is frequently being connecting to an unknown remote IP.
Now that we have the necessary lists that we need and have a general idea of what is happening in the network, we’re now off to look at the log entries in detail.
To detect anomalies in your network, you have to look at some of the elements of the logs. Each element can tell you if there is something that deviates from the normal operation. Correlating all the information together, you can come up with a bigger picture if what is happening in the network is abnormal. Below are the elements that you must consider looking at:
a. Interfaces – interfaces in the log file can help you determine if the traffic is inbound or outbound
b. Ports – ports determine the services that are being used. It can also indicate if there is a malware (port fingerprint of a malware). You can Google for the suspicious looking port (e.g. 5555) and see if it will return links to malware description sites.
c. Payload – payload (if available) can validate your assumptions of an attack (be it an infection/intrusion, etc). A good example would be seeing non-HTTP payload on HTTP sessions. Another could be seeing shellcodes in the payload.
d. IDS Signature – IDS signature that is shown in the log can tell you about the details and severity of the attack. If it is possible for you to get the structure of the signature, you’ll be able to know what causes the signature to trip.
e. FW Policy – if the logs are that of a firewall, you should verify the firewall’s policy to check if there are any misconfigurations in the ACL.
It is important to know who is talking to who and know if the traffic is inbound or outbound. The common notion about the flow of traffic is that outbound traffic is considered less of a threat compared to an inbound traffic. Well this is not always the case. For the scenario of malware infection, back door access, or outbound D/DoS attacks, we see here that outbound traffic seems to be equally malicious as inbound traffic. So we definitely have to take in consideration the scenario that you are investigating. Different scenario gives different weights to this inbound/outbound factor. Below discusses some different scenarios:
Malware infection – Inbound traffic mean attempts to infect internal vulnerable systems, and outbound traffic can indicates that an internal host is most likely infected and trying to infect others too. It could also be that a bot is trying to connect to a CnC server to update its configuration and send some stolen information from the infected host.
Back door access – inbound traffic indicates complete access to a compromised host, and outbound traffic could entail that some one internally is using company computers to do malicious activities.
D/DoS attacks – inbound traffic indicates some malicious remote host is targeting us (and should be blocked at the firewall). Outbound traffic could entail that there is either some misconfiguration on some internal host, or an infected host is part of a botnet and is participating in a DDoS attack.
The key elements discussed above are not just limited to these. Other elements can be used as it also offers additional information. I just have highlighted these elements since these are almost standard in all log formats, and for most of the case, it is already capable of answering our hypothesis.
By going through the logs in a systematic order, we get to be efficient and we get to cover all possible areas. Some where along the way as you are putting all these pieces of data together and looking close at the key elements, you get to find the answers that confirms your hypothesis.