Amazon S3: Reading Server Log Files
In a previous post, I talked about how you enable logging on Amazon S3 to monitor what activity is being performed on your files stored on Amazon S3. Alright, so what happens once you start collecting logs. How do you read them? They are created in the form of text files, which you could probably open in notepad and try to make sense of them in there. Here’s a screenshot of what they look like in notepad:
As you can see, there is a problem here. Unless you read files like these for a living, this format is pretty much unreadable. There is one more problem with Amazon S3 logs – it is that there are always too many of them. S3 doesn’t create one log per day per bucket. It creates many logs per day per bucket. So, for one of my buckets, for 3 days worth of logging, I had 329 log files. Clearly, I will go crazy if I was to open each one, one-by-one, to see what was in them.
Well, in this post, I am going to outline the solution I came up with.
Before I begin, there are of course ready made methods out there to interpret Amazon S3 logs. I must say, that I would love to use them as they provide so much more information and analysis than the simple reading of logs that I do, however, they don’t always work, or work like you would like them to.
Here are two other ways to analyze your S3 logs:
- S3Stat – this is a very awesome service, which costs next to nothing (and actually, you get to try it for free for 30 days). However, there are certain limitations that it poses in the way you want to manage your logging. I didn’t want to do it that way, so I didn’t use it. I highly recommend that you check it out, though.
- SiSense Prism Viewer – On the CloudBerry Explorer blog, you can find some information about this neat product (which is again free in a limited way) which allows you to pull the logs on your desktop and run analytics there. This program didn’t work for me.
Having got those out of the way, let me go on to demonstrate how I solved my problem of reading the Amazon logs.
Problem 1: How do I handle so many files?
I needed a way to combine the files into one big file which I can then try and open in a more readable format. Well, old school computer skills came to my rescue (when I was even thinking of coding a program to combine them). I remembered the copy command – specifically, I remembered that copy can be used to copy multiple files into one file.
So, I fired up the command prompt and issued this awesome command on my log files. Here’s how to do it.
- Before anything, create a folder on your hard drive and copy the logs that you want to process for a given bucket into that folder from Amazon S3. To keep it simple, make sure that you don’t have any other files in that folder.
- Go to your Windows Start Menu, and Choose Run (alternatively, you can press Windows Key + R).
- Type cmd and press Enter. This should open up a command prompt window.
- Now change your directory to the folder where you have stored your log files. You need to use the CD command in order to change directories in a Windows command shell (click here for a quick tutorial).
- Once you are in the directory where you had stored the log files, simply issue the following command: copy * big_log (as shown in the picture below).
This should create a file called big_log in the same directory as the other files and it will have all the log files combined into it. Now, we move to the second problem of how to view this combined log in a format which is easier to understand.
Problem 2: Reading the log
Well, this is even easier. To my rescue comes Microsoft Excel. If you have Excel installed on your computer, then you can simply open the log file and then tell Excel how to interpret it. Here’s what to do:
- Start Excel and go to File menu and choose Open.
- In the Open File dialog box, navigate to the folder where the big_log file was created. You may not see it, because by default this dialog will only show files supported by Excel. You need to choose “All Files” in “Files of type” combo box.
- Locate your file, select it, and click the Open button.
- The moment you do that, you are presented with the Text Import Wizard. The screenshots below show you what you should do.

Choose “Delimited” in the Original data type options, and click Next.
On the next screen, check “Space” in the Delimiters options, and click Finish. Immediately, Excel will divide your data into the right columns and rows, and now you can work with this data to slice and dice it the way you want it. To understand which column represents which value, here’s the Server Log Format Documentation.’
What I usually do is that I get rid of some of the columns that I don’t need and I resize the others and put filters on top of them. This allows me to do my basic analysis of the data. One of the key things that I am usually interested in knowing is whether someone is hot-linking my files. I can easily tell them by looking at the Referer column (which is the second to the last column), and if that column contains any URL other than my own website, then I know that this URL is hot-linking to my images or files.
How do you analyze your S3 files?
You can follow any responses to this entry through the RSS 2.0 feed. You can leave a response, or trackback from your own site.





August 16th, 2010 at 3:53 pm
I have a problem & really know nothing about S3 facility. I heard it first time.
I want to track bandwidth use for each bucket/user. At the end of the month, we need to generate a report for each bucket and report the bandwidth used.
So I need to know how to start the work or what I am supposed to start from. I am able to get a list of buckets but thats it. Except that I am able to do nothing.
How to download daily logs for a bucket is the place where I am lost. Pls help.
August 16th, 2010 at 5:20 pm
Well – did you look at the post that is linked in the beginning of this post? It tells you how you can enable logging on S3.
Also, there are many great services out there which are not very expensive at all which would do it for you.
October 1st, 2011 at 10:58 pm
I just wrote a python script to download and parse s3 bucket logs. You can run it directly to generate a customizable text table (see the source for how to customize it) or you can include it in scripts of your own and access the log data directly.
As a handle extra feature: It caches logs it has already download and doesn’t try to download them again so you can run it over and over and only fetch the data you haven’t seen yet.
You can get it here:
https://github.com/netguy204/bucketlogs
October 3rd, 2011 at 11:38 am
Thanks Brian for sharing.