In some CTFs, working with logs is part of the challenge. While you can certainly use your favorite text editor to find things, I believe it is better to use Linux command-line utilities to acquire the flags quickly. With that said, arming yourself with Linux skills is paramount to your success in CTFs and the real world. This post will cover a few Linux command-line utilities I use in CTFs, typically.
This post contains affiliate links. If you use these links to buy something I may earn a commission. Full disclosure here.
Word count
Some low-difficulty CTF questions would ask for the number of lines of a file or more. You can certainly do this with your favorite GUI-based text editor tool, but I think the fastest way to get the answer is by using the wc
utility.
Here is an example of finding the number of lines of two files combined.
andrew@kali:~$ wc -l access.log query.log 17669 access.log 97351 query.log 115020 total
Another example
In this example, the task is to figure out how many unique IP addresses are in a log. There are different ways of tackling this challenge, but I will show you two variations on how I would tackle this. First, I look at the log’s format before performing any filtering. I used head
utility for this, so it will only display the first ten lines of the file.
andrew@kali:~$ head access.log 191.101.112.228 - - [19/Oct/2017:20:35:52 -0400] "GET /SOC-Webcast-2015v3.pdf HTTP/1.0" 200 2021695 "https://files.andrewroderos.com/" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/51.0.2704.79 Safari/537.36 Edge/14.14393" 100.16.83.174 - - [04/Mar/2018:14:14:22 -0500] "GET /Sample.pdf HTTP/1.1" 200 23380 "https://r.search.yahoo.com/_ylt=A0LEVxp1RZxauUcA9AhXNyoA;_ylu=X3oDMTEyOTNiamZnBGNvbG8DYmYxBHBvcwM0BHZ0aWQDQjUyODVfMQRzZWMDc3I-/RV=2/RE=1520219637/RO=10/RU=https%3a%2f%2ffiles.andrewroderos.com%2fSample.pdf/RK=2/RS=9SQv33mypxNQcsz7XCl0J9xagIM-" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/64.0.3282.167 Safari/537.36" 52.30.16.188 - - [01/Jan/2021:01:10:02 -0500] "GET /wp-login.php HTTP/1.1" 301 533 "-" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_0) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/32.0.1664.3 Safari/537.36" 52.30.16.188 - - [01/Jan/2021:01:10:02 -0500] "GET /wp-login.php HTTP/1.1" 404 3561 "-" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_0) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/32.0.1664.3 Safari/537.36" 13.66.139.67 - - [01/Jan/2021:01:18:31 -0500] "GET / HTTP/1.1" 200 3654 "-" "Mozilla/5.0 (compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm)" 114.119.142.126 - - [01/Jan/2021:01:24:34 -0500] "GET /robots.txt HTTP/1.1" 404 3578 "-" "(compatible;PetalBot;+https://aspiegel.com/petalbot)" 111.202.101.66 - - [01/Jan/2021:01:36:12 -0500] "GET / HTTP/1.1" 301 528 "-" "Sogou web spider/4.0(+http://www.sogou.com/docs/help/webmasters.htm#07)" 111.202.101.66 - - [01/Jan/2021:01:36:34 -0500] "GET / HTTP/1.1" 200 3622 "-" "Sogou web spider/4.0(+http://www.sogou.com/docs/help/webmasters.htm#07)" 114.119.132.61 - - [01/Jan/2021:02:16:16 -0500] "GET /robots.txt HTTP/1.1" 301 585 "-" "(compatible;PetalBot;+https://aspiegel.com/petalbot)" 114.119.132.61 - - [01/Jan/2021:02:16:17 -0500] "GET /robots.txt HTTP/1.1" 404 3578 "-" "(compatible;PetalBot;+https://aspiegel.com/petalbot)"
Now that I know what the log looks like, I can manipulate how I would like the shell to show me the output. Since the task asked for unique IP addresses, displaying only the IP address field is the second step. I typically use the cut
utility for this.
andrew@kali:~$ cut -d ' ' -f 1 access.log 191.101.112.228 100.16.83.174 52.30.16.188 52.30.16.188 13.66.139.67 <-- Output omitted for brevity -->
Let me explain the parameters so you do not need to look at the help menu. The cut
utility allows a user to cut parts of lines from a specified file or piped data and print the result to standard output. The -d
stands for the delimiter. If you look at the log format, there are spaces between fields. When I used -d ' '
, I told the cut
utility to use space as the character between fields. After that, the -f 1
is to display only the first field.
The next step is to display unique IP addresses only. I typically use the sort
utility for this. It allows a user to print the output of a file in a given order. The -u
will filter repeated strings and will only display unique ones.
andrew@kali:~$ cut -d ' ' -f 1 access.log | sort -u 100.16.246.94 100.16.59.121 100.16.83.174 100.24.206.129 100.24.74.62 <-- Output omitted for brevity -->
The last step is to count how many unique IP addresses are in the log file. I used wc
utility again for this.
andrew@kali:~$ cut -d ' ' -f 1 access.log | sort -u | wc -l 3305
Alternative solution
Another way to solve this is using the awk
utility instead of the cut
utility. I typically forget the exact parameter, so I do not use awk
much. The awk
syntax below essentially is saying to display only the first field.
andrew@kali:~$ awk '{print $1}' access.log | sort -u | wc -l 3305
Unique
The uniq
command-line utility allows the user to report or filter out the repeated lines in a file. As you can see from the previous example, I used sort -u
to sort and filter out unique lines. An alternative syntax that I can use is sort | uniq
.
One of the cool things about the uniq
utility is that it can count the number of times the line occurred in the input or file. It is handy in scenarios where you are interested in how many times a string occurred. For example, a CTF challenge tasked you to search for the filename of the 73 GET requests with an HTTP 404 status code.
Want to learn Linux CLI? Buy The Linux Command Line book. |
The first step I would take for this challenge is to look for the HTTP 404 status code in the log. I will use the grep
utility for this.
andrew@kali:~$ grep 404 access.log 52.30.16.188 - - [01/Jan/2021:01:10:02 -0500] "GET /wp-login.php HTTP/1.1" 404 3561 "-" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_0) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/32.0.1664.3 Safari/537.36" 114.119.142.126 - - [01/Jan/2021:01:24:34 -0500] "GET /robots.txt HTTP/1.1" 404 3578 "-" "(compatible;PetalBot;+https://aspiegel.com/petalbot)" 114.119.132.61 - - [01/Jan/2021:02:16:17 -0500] "GET /robots.txt HTTP/1.1" 404 3578 "-" "(compatible;PetalBot;+https://aspiegel.com/petalbot)" 66.249.65.168 - - [01/Jan/2021:09:38:42 -0500] "GET /robots.txt HTTP/1.1" 404 3810 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)" 107.150.64.219 - - [01/Jan/2021:13:09:49 -0500] "GET /Sample.zip HTTP/1.0" 200 404586 "-" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/66.0.3359.117 Safari/537.36" <-- Output omitted for brevity -->
The output contains extra lines that I do not want. For example, if you look at the last log, it has HTTP 200 status code, which is not needed. It means I need to refine my search criteria.
andrew@kali:~$ grep '" 404 ' access.log | head -n5 52.30.16.188 - - [01/Jan/2021:01:10:02 -0500] "GET /wp-login.php HTTP/1.1" 404 3561 "-" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_0) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/32.0.1664.3 Safari/537.36" 114.119.142.126 - - [01/Jan/2021:01:24:34 -0500] "GET /robots.txt HTTP/1.1" 404 3578 "-" "(compatible;PetalBot;+https://aspiegel.com/petalbot)" 114.119.132.61 - - [01/Jan/2021:02:16:17 -0500] "GET /robots.txt HTTP/1.1" 404 3578 "-" "(compatible;PetalBot;+https://aspiegel.com/petalbot)" 66.249.65.168 - - [01/Jan/2021:09:38:42 -0500] "GET /robots.txt HTTP/1.1" 404 3810 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)" 77.88.5.226 - - [01/Jan/2021:16:31:03 -0500] "GET /robots.txt HTTP/1.1" 404 3810 "-" "Mozilla/5.0 (compatible; YandexBot/3.0; +http://yandex.com/bots)" <-- Output for brevity -->
I can verify that my search criteria worked by issuing the command below.
andrew@kali:~$ grep 404 access.log | wc -l; grep '" 404 ' access.log | wc -l 4478 4258
The second step is to only display lines with GET requests.
andrew@kali:~$ grep '" 404 ' access.log | grep GET 52.30.16.188 - - [01/Jan/2021:01:10:02 -0500] "GET /wp-login.php HTTP/1.1" 404 3561 "-" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_0) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/32.0.1664.3 Safari/537.36" 114.119.142.126 - - [01/Jan/2021:01:24:34 -0500] "GET /robots.txt HTTP/1.1" 404 3578 "-" "(compatible;PetalBot;+https://aspiegel.com/petalbot)" 114.119.132.61 - - [01/Jan/2021:02:16:17 -0500] "GET /robots.txt HTTP/1.1" 404 3578 "-" "(compatible;PetalBot;+https://aspiegel.com/petalbot)" 66.249.65.168 - - [01/Jan/2021:09:38:42 -0500] "GET /robots.txt HTTP/1.1" 404 3810 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)" 77.88.5.226 - - [01/Jan/2021:16:31:03 -0500] "GET /robots.txt HTTP/1.1" 404 3810 "-" "Mozilla/5.0 (compatible; YandexBot/3.0; +http://yandex.com/bots)" <-- Output omitted for brevity -->
The third step is to display only the necessary fields. Since I am looking for filenames, I need to filter them out. I will use the cut
utility for this task.
andrew@kali:~$ grep '" 404 ' access.log | grep GET | cut -d '"' -f 2 | cut -d ' ' -f 2 /wp-login.php /robots.txt /robots.txt /robots.txt /robots.txt <-- Output omitted for brevity -->
Now, I am ready to arrange them in order and count how many of the filenames occurred. I am going to use both sort
and uniq
for this task.
andrew@kali:~$ grep '" 404 ' access.log | grep GET | cut -d '"' -f 2 | cut -d ' ' -f 2 | sort | uniq -c | sort -nr 2354 /robots.txt 630 /favicon.ico 151 /wp-login.php 73 /.env 45 /wp/wp-login.php <-- Output omitted for brevity -->
The first sort
syntax arranged the lines in numerical and alphabetical order. Then, I used uniq -c
to remove repeating lines and record the number of times it showed up. Then, I used sort -nr
to arrange the output in reversed numerical order. As you can see from the above, the .env
file appeared 73 times.
Another awk example
Earlier, I showed an example of how to use awk
instead of cut
. In this section, I will show how to use awk
to count a string. As of this writing, I could not think of a real-world scenario where I would use this, but this was in a CTF event that I participated in. If you can think of one, please leave a comment below!
The task asked how many characters were in the longest POST in a log file. First, I need to figure out the log format.
andrew@kali:~$ grep POST access.log 82.165.117.55 - - [07/Jan/2021:08:29:06 -0500] "POST / HTTP/1.1" 301 565 "-" "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/81.0.4044.129 Safari/537.36" 52.152.168.103 - - [13/Jan/2021:07:27:38 -0500] "POST / HTTP/1.1" 301 565 "-" "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/81.0.4044.129 Safari/537.36" 13.72.84.164 - - [22/Jan/2021:09:51:52 -0500] "POST / HTTP/1.1" 301 565 "-" "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/81.0.4044.129 Safari/537.36" 13.72.84.164 - - [22/Jan/2021:09:51:57 -0500] "POST / HTTP/1.1" 200 3558 "-" "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/81.0.4044.129 Safari/537.36" 5.62.61.55 - - [15/Feb/2021:16:51:36 -0500] "POST / HTTP/1.1" 301 565 "-" "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/81.0.4044.129 Safari/537.36" 5.62.63.47 - - [18/Feb/2021:17:32:59 -0500] "POST / HTTP/1.1" 301 565 "-" "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/81.0.4044.129 Safari/537.36" 185.204.1.217 - - [12/Mar/2021:15:43:50 -0500] "POST /RPC2 HTTP/1.1" 301 517 "-" "fasthttp" 185.204.1.217 - - [12/Mar/2021:15:43:50 -0500] "POST /vendor/phpunit/phpunit/src/Util/PHP/eval-stdin.php HTTP/1.1" 301 609 "-" "fasthttp" <-- Output omitted for brevity -->
Now that I know the log format, I can display only the fields I am interested in. In this case, I am interested in the POST section with the first and second double quotes. In the previous section, I used the cut -d '"' -f 2
syntax, but I want to show you an alternative syntax.
andrew@kali:~$ grep POST access.log | cut -d \" -f2 POST / HTTP/1.1 POST / HTTP/1.1 POST / HTTP/1.1 POST / HTTP/1.1 POST / HTTP/1.1 POST / HTTP/1.1 POST /RPC2 HTTP/1.1 POST /vendor/phpunit/phpunit/src/Util/PHP/eval-stdin.php HTTP/1.1 <-- Output omitted for brevity -->
The backslash (\)
is an escape character that allows the user to use non-alphanumeric characters for filtering.
Since I am only interested in the path, I can further filter the output by removing the POST and HTTP fields.
andrew@kali:~$ grep POST access.log | cut -d \" -f2 | cut -d ' ' -f2 / / / / / / /RPC2 /vendor/phpunit/phpunit/src/Util/PHP/eval-stdin.php <-- Output omitted for brevity -->
Now, I can sort and count the length of the string, then sort it again to show the highest number of characters at the top.
andrew@kali:~$ grep POST access.log | cut -d \" -f2 | cut -d ' ' -f2 | sort | awk '{print length, $0}' | sort -nr | head 99 /wp-content/plugins/dzs-videogallery/class_parts/vendor/phpunit/phpunit/src/Util/PHP/eval-stdin.php 86 /wp-content/plugins/jekyll-exporter/vendor/phpunit/phpunit/src/Util/PHP/eval-stdin.php 81 /wp-content/plugins/cloudflare/vendor/phpunit/phpunit/src/Util/PHP/eval-stdin.php 51 /vendor/phpunit/phpunit/src/Util/PHP/eval-stdin.php 51 /vendor/phpunit/phpunit/src/Util/PHP/eval-stdin.php 51 /vendor/phpunit/phpunit/src/Util/PHP/eval-stdin.php 51 /vendor/phpunit/phpunit/src/Util/PHP/eval-stdin.php 51 /vendor/phpunit/phpunit/src/Util/PHP/eval-stdin.php 51 /vendor/phpunit/phpunit/src/Util/PHP/eval-stdin.php 48 /lib/phpunit/phpunit/src/Util/PHP/eval-stdin.php
Final thoughts
I am only scratching the surface with this post. Linux has more command-line utilities available that one can use when working with logs. While the tasks were from CTF, these utilities could also be handy in the real world. Security professionals may argue that you do not use these commands since you use Splunk Query Language or Kibana Query Language. While they have a point, I still believe it is good to have Linux command-line skills.
You might like to read
BUY ME COFFEE ☕