|Your Daily Source for Apache News and Information|
|Breaking News||Preferences||Contribute||Triggers||Link Us||Search||About|
Apache comes with built-in mechanisms for logging activity on your server. In this series of articles, I'll talk about the standard way that Apache writes log files, and some of the tricks for getting more useful information and statistics out of your server.
This week we'll talk about the information that appears in your transfer log, and what it all means.
If you have done a default installation of Apache, when you run your server, two log files will get written. These files are called
access.log on Windows) and
error.log on Windows). These files can be found (again, if you did a default installation) in
/usr/local/apache/logs. On Windows, the logs will be in the
logs subdirectory of wherever you installed Apache. Various of the package managers put the log files in various other places, and you'll have to poke around to find them, or check in the configuration file for the configured location.
access_log is, as the name suggests, the log of all accesses to your server. Typical entries in this file look like:
188.8.131.52 - - [19/Aug/2000:14:47:37 -0400] "GET / HTTP/1.0" 200 654
This line contains 7 pieces of information. Actually, two of them are blank in this example, but there is space for 7 pieces of information.
The first piece of information is the address of the remote host. That is, who is looking at your web site. In the example above, the host visiting my web site is
184.108.40.206, which is, incidentally, the IP address of the machine called
si3001.inktomi.com. (I figured that out by looking up the address in DNS, with the
inktomi.com is a company that makes web searching software. (I looked at their web site.) Since this same IP address requested the file
robots.txt just a few seconds earlier, I suspect that this is a web searching spider that was indexing my web site. I'll talk about spiders in another column. So, just based on that first piece of information, and a glance back in the log file, I've already found out quite a bit of information about my visitors.
By default, this address is just the IP address of the remote host. You can tell Apache to look up all the host names, and put those host names in the log instead of the IP address. This is probably not a good idea, since it greatly slows down the logging process, and so slows down your entire server. And there are various tools that will go through your log after the fact, and resolve all the IP addresses to host names, so there's no real advantage to doing this anyway.
But, if you want to, you can tell Apache to do these lookups with the directive:
double, rather than
on, will cause the logging process to do a reverse lookup on the name that it finds, to verify that it points back to the IP address that you started with. The value is set to
off by default.
The second slot, alas, is blank, and almost always will be. That's what that ``-'' is: a place-holder for the second piece of information. That is the location where you're supposed to get the identity of the visitor. That's not just their login name, but their email address, or other unique identifier. This information is supposed to be returned by
identd, or directly by the browser. And in the old days, back when Netscape 0.9 was the dominant browser, you would usually have email addresses in this spot. However, it did not take long for unsavory marketing types to think that it would be a good idea to collect those email addresses and send them unsolicited email (also known as spam). So, before very long, this feature was removed from just about every browser on the market. You will almost never find information in this field.
The third piece of information is also blank. The information that would appear there is the username with which the visitor authenticated. This will appear, of course, only when you have required authentication for a particular resource. So for the majority of entries in your log file, for most sites, this will be blank.
Next we have the time when the request was made. This information is enclosed in square brackets, and is in what is called ``common log format'', or ``standard english format.'' So the request in the above example was made at 14:47:37 on Saturday, August 19. The
-0400 pn the end of the field means that the server is in the time zone 4 hours before UTC. This tells you two things. One, that I tend to leave my column until the last minute, and two, that I appear to have the wrong time-zone set on my server. I'll have to make a note to take care of that ...
The next piece of information is probably the most useful piece of information in the record. It tells what request was actually made of the server. This is typically in the format
METHOD RESOURCE PROTOCOL.
In the example above, the
GET. The other most common methods will be
HEAD. There are a number of other valid methods, but those three are what you will see most of the time.
RESOURCE is the actual document, or URL, that was requested from the server. In this example, the client requested ``/'', which is the root, or front page, of the server. In most configurations, this corresponds to the file
index.html in the
DocumentRoot directory, but could be something else, depending on your server configuration.
PROTOCOL is usually going to be
HTTP, followed by a version number. The version number will be either
1.1, with most of the records being
1.0 As you probably know from other articles, HTTP is the protocol that makes the web work. HTTP/1.0 was the earlier version of this protocol, and 1.1 was the more recent version. However, most web clients still speak version 1.0.
The sixth piece of information is a status code. This tells you whether the request was successful, or encountered some problem. Most of the time, this is
200, which means that the transfer was successful, and everything went well. Hopefully. I'm not going to give the whole list of the status codes, and what they mean. You need to look in the documentation for that. But, in general, a status code that starts with 2 was successful. Starting with a 3 means that the request was redirected somewhere else for some reason. Starting with a 4 means that the user did something wrong, and starting with a 5 means that the server did something wrong.
The seventh and final piece of information is the total number of bytes that were transferred to the client. This can tell you if a transfer was interrupted (if the number is different from the size of the file). Adding them up will tell you how much data your server transferred in a day, or week, or whatever.
access_log is located is actually a configuration option. If you look in your configuration file,
httpd.conf, you should see a line that looks like the following:
CustomLog /usr/local/apache/logs/access_log common
Note: If you're running an older version of Apache, this line might look a little different. It might be the
TransferLog directive instead of the
CustomLog directive. If that is the case, I really recommend that you upgrade if at all possible.
CustomLog directive specifies where a particular log file should be stored, and what format that log should be in. Next week we'll talk about custom log formats. The log format described above is the
common log format, which has been in use as the standard since the beginning of web servers. That's why it still contains the ident information field, even though almost no clients actually pass that information to the server.
The path specified there is the location of the log file. Note that this location should be secured against random users writing to it, since the log file is opened by the HTTP user (specified with the
User directive), and so this is potentially a security problem.
In my next few articles, I'll be talking about the following subjects: Custom log format. Logging to a process, rather than to a file. The error log. Getting useful statistics out of your log files. And whatever else you fine readers suggest to me.
Thanks for reading. Please send me a note at if you have any suggestions or comments.
Want to discuss log files with other Apache Today readers? Then check out the PHP discussion at Apache Today Discussions.
|About Triggers||Media Kit||Security||Triggers||Login|
All times are recorded in UTC.
Linux is a trademark of Linus Torvalds.
Powered by Linux 2.4, Apache 1.3, and PHP 4