|Your Daily Source for Apache News and Information
Since my last article, the Apache Software Foundation has released the fourth alpha version of Apache 2.0. In this article, I will review some of the features new to the 2.0 series and explain why they were added and how they will help site administrators.
Piped and Reliable Piped Logs
Logs are very important to every Apache installation. They tell the administrator who is accessing the site and if something has gone wrong with the server. An easily apparent use for logs is to determine if somebody has tried to break in to the server. Logs are obviously not something to be taken lightly; however, there are also some drawbacks to using logs. The first problem with logs is that they can grow very large. Every time a person accesses a page on a site, a message is written to a log. A basic Apache installation does not do anything with logs other than write to them, which means the logs are going to get very large unless something is done about them. Piped and reliable piped logs provide a way to handle this problem.
The second issue with logs is that they can be slow. If an Apache configuration is setup to log the hostname of every machine that requests a page from a site, logging is likely to be very slow on your machine. This is because Apache, like all network programs, uses IP addresses instead of hostnames for all network communication. Apache relies on the local machine's hostname resolver to convert IP addresses into hostnames. This can be a slow process because of the protocol used by the Domain Name Service. The whole time that a thread or process is trying to convert an IP address to a hostname, that thread or process is not doing its primary job, serving web pages. On a heavily loaded site, this can become a very large performance bottleneck. Piped and reliable piped logs can also provide a method for a server to not be affected by this problem.
Now that two real-world issues that piped logs can solve have been identified, we can talk about what they are and how they work. Reliable piped logs and piped logs move the responsibility of writing the log to the file away from the Apache server to some other external process. When Apache starts, if the configuration file specifies that the logs are to be piped, Apache creates a new process and sets up a pipe between that process and the Apache parent process. When the child processes are created, they inherit that pipe and use it to send log messages to the logging process. This happens for each piped log, which means if piped logs are specified for the error, transfer, and access logs, the server will create three separate processes, one for each log. Apache takes advantage of a property of the size of the log messages to ensure that the logging process does not need to synchronize reading the logs. This allows a logging process to read one line from its standard input (the pipe), perform some operation on that string, and write it out to the log file. The log process then reads the next message from the pipe.
How does this help the two problems mentioned above? It allows people to write small programs that solve these problems easily and efficiently. In every Apache distribution there is a small program called rotatelogs. This program reads log messages from the pipe for a specified amount of time and then closes the real log file and renames it. Afterwards, it opens a new log file and begins the process over again. This keeps logs from getting too large, and allows the administrator to easily archive all logs in one convenient place. There is another program called logresolve which will perform the conversion from IP addresses to hostnames.
Now we know what piped logs, both reliable and not, can do for sites running Apache. But what is the difference between the two? Reliable piped logs try to ensure that the log process is always running. It is unusual for a piped log process to die unexpectedly because it is usually a very small program that performs only one function. However, if a log process does die, an Apache installation that takes advantage of reliable piped logs will restart it. Unfortunately, reliable piped logs are not available on all platforms supported by Apache. If Apache does support reliable piped logs on a platform, it will be compiled in by default. To determine if a platform is supported, run Apache with the command line argument -V. This will output all of the options that have been compiled into Apache. Search for the line "-D HAVE_RELIABLE_PIPED_LOG".
The final question is how to configure Apache to use either piped or reliable piped logs. In the configuration file, find the log that should be piped through an external program then simply replace the name of the log file with a command such as:
"| log_program program_arguments"
The "|" tells Apache this will be a piped log and the commands that follow tell Apache what program to use and how to start it. There is one security problem with piped logs that administrators should be aware of: the log program will be run as the user that started the web server. For most servers this is the root user. For this reason, great care must be taken when writing a logging program to ensure that there are no buffer overflows or other weaknesses that can be exploited.
A New Way to Run CGI Scripts & Programs
On some Unix systems, when a threaded process forks to create a child process, all of the threads are created in the child process and then all but one is killed. This is obviously not very good for performance. When Apache 2.0 is configured to use threaded child processes, this problem is instantly encountered when running CGI's. To solve this performance problem, Apache 2.0 provides two CGI modules. The first is mod_cgi, which should be used either on non-Unix platforms or on Unix with the prefork MPM1. The second is mod_cgid, which should be used for all threaded MPM's on Unix.
Mod_cgid avoids the performance problem by creating a new CGI daemon process. Before any of the child process are started in the parent Apache process, the mod_cgid module creates a new process, which will become the CGI daemon. This process creates a Unix domain socket to communicate with the Apache child processes. When a child process gets a CGI request, it will send the request to the CGI daemon. The CGI daemon will then create a new process to run the CGI program. This process will be set up to communicate directly with the child process that originally received the request. As the CGI process outputs the response, it will be sent to the child process, which forwards it along to the client. Because the CGI daemon process is a single threaded process, Apache can avoid the performance problems that the original mod_cgi causes.
There are issues with using mod_cgid that are not present with mod_cgi. It is possible for the CGI daemon process to die unexpectedly, although it is unlikely because the daemon is a very small process that does very little. On platforms that support reliable piped logs, Apache uses the same technology to restart the CGI daemon if anything happens to it. However, on other platforms it is not possible to restart the CGI daemon process from within Apache.
Better error reporting: In previous alphas if Apache failed, very often it wasn't clear what had caused the problem. This problem has been solved and the error reporting is in much better shape now. If Apache fails for some reason, errors reported in the log should be meaningful.
CGI error reporting: If a CGI reports errors to stderr, those errors will now be written to the error_log. This is a necessity for debugging CGI programs.
Portable build environment: One of Apache's best features is that it works on almost every platform. This has not been true for Apache 2.0 until now. The build system was very finicky about which platforms it worked on. This has been fixed with the fourth alpha. (If you have waited to try 2.0 because it didn't support your platform I suggest trying it again.)
Config.nice is created: Apache 1.3 used the APACI configuration scheme to generate the build environment for various platforms. One of the best features of APACI is that it created config.status, which had the exact command used to configure the server. Apache 2.0 has switched to autoconf. As a result, config.status was missing in earlier alphas. With the latest alpha, Apache 2.0 generates config.nice, which replaces config.status.
Apache works on OS/390: While this isn't of interest to most people, it does prove that Apache is an incredibly portable program. Imagine being able to run the same program on a Windows 95 machine and on a OS/390 mainframe!
How To Help
For a discussion of the different MPMs, please see my previous article, "An Introduction to Apache 2.0."
All times are recorded in UTC.
Linux is a trademark of Linus Torvalds.
Powered by Linux 2.4, Apache 1.3, and PHP 4