Your Daily Source for Apache News and Information  
Breaking News Preferences Contribute Triggers Link Us Search About
Apache Today [Your Apache News Source] To internet.com

The Enterprise IM Strategies & Solution Event

Apache HTTPD Links
PHP Server Side Scripting
Apache XML Project
The Jakarta Project
Apache-Perl Integration Project
The Java Apache Project
ApacheCon
Apache-Related Projects
The Apache Software Foundation
Apache Project
The Apache FAQ
Apache Module Registry

  internet.com

Internet News
Internet Investing
Internet Technology
Windows Internet Tech.
Linux/Open Source
Web Developer
ECommerce/Marketing
ISP Resources
ASP Resources
Wireless Internet
Downloads
Internet Resources
Internet Lists
International
EarthWeb
Career Resources

Search internet.com
Advertising Info
Corporate Info
Writing Input Filters for Apache 2.0
Nov 22, 2000, 14 :06 UTC (0 Talkback[s]) (3046 reads) (Other stories by Ryan Bloom)

In last month's column I wrote an output filter to add a header and footer to every web page, this month I want to investigate writing an input filter. This will be the last column devoted to I/O filtering in Apache 2.0.

Input Compared to Output Filtering

Input filtering and outputing filtering are basically the same thing, with some very minor differeces. Both input and output filtering rely on buckets and bucket brigades to pass data from one filter to the next. Both have filters that are associated with the connection and filters that are associated with the request.

Output filters are relatively straight-forward, the filter gets handed data which it either adds to or modifies, and that data gets passed to the next filter. Input filtering can not work this way because Apache isn't generating the data, it has to rely on getting the data from the network. Because of this difference, input filters get called with an emtpy brigade and they pass this brigade to the next filter. The lowest filter in the chain inserts data into the brigade and returns to the previous filter. That filter can then modify the data and send the brigade to the previous filter, and so on until the brigade is returned to the Apache core.

Input filters differ from output filters in one other significant manor. Most output filters only deal with actual data, headers are stored in a table in the request_rec, and there is a core filter that converts that table to a stream of data that is sent to the client. The output headers filter sits is low enough in the filter stack that only filters that are dealing with formatting the data for transmission to the client (e.g. chunking) are after it. Input filtering and headers have a very different relationship. All data coming from a client must pass through the input filters to get to the Apache core. This means that input filters have an opportunity to change the headers of a request before the core ever sees it.

The module that I am presenting this month will modify the headers for a request while Apache reads it. This module came about at ApacheCon Europe 2000 because of the CD that was distributed with the conference proceedings. This CD was created on a Windows machine, and the proceedings were organized as a web site. The problem comes in that the HTML used spaces and forward slashes (/) in URLs for each page. Unfortunately, the URL "http://localhost/foo\Test Page.html" is not the same as "http://localhost/foo/Test%20Page.html". The first is not a valid URL, while the second is. This CD was tested with Internet Explorer, which automatically converts these invalid URLs into valid ones.

While working at Covalent's booth, I had a discussion with two of the conference attendees, Save Buchanan and Karl Royer. They had attended my session about writing Apache 2.0 modules, and suggested that a filter could be written to solve this problem on the server's side. Out of such humble beginnings mod_apachecon was born. This module walks the first line in a request and ensures that when the request is given to Apache all spaces have been converted to "%20" and any forward slashes are converted to back-slashes. This allows Apache 2.0 to successfully serve the ApacheCon CD to any web browser.

The ApacheCon Module

The first part of the apachecon module, is just book-keeping, so lets cover that part as quickly as possible.
static int apcon_pre(conn_rec *c)
{
    ap_add_input_filter("APACHECON_IN", NULL, NULL, c);
    return OK;
}

static void hf_register_hook(void)
{
    ap_hook_pre_connection(apcon_pre, NULL, NULL, AP_HOOK_MIDDLE);
    ap_register_input_filter("APACHECON_IN", apcon_filter_in, 
                             AP_FTYPE_CONNECTION);
}

module MODULE_VAR_EXPORT apachecon_module = {
    STANDARD20_MODULE_STUFF,
    NULL,                       /* create per-directory config structure */
    NULL,                       /* merge per-directory config structures */
    NULL,                       /* create per-server config structure */
    NULL,                       /* merge per-server config structures */
    NULL,                       /* command apr_table_t */
    NULL,                       /* handlers */
    hf_register_hook            /* register hooks */
};

We will take this in the reverse order of how it appears in the module. The last thing is the module structure. There is no configuration for this module because it will modify every request it receives, so this module structure is basically emtpy. The only field that is filled out is the register hooks field. This function is used for two purposes.

The first thing this function must do is register a function for the pre_connection phase. The pre_connection phase is called after Apache accepts the connection from the client, but before Apache begins to do anything with this connection. The point of this phase is to allow modules to setup connection based information. In this case mod_apachecon uses this phase to add an input filter to the connection. In reality this module should ensure that this request is received on a server that is handling HTTP requests, but this is a quick module that should never be enabled in a production server, so cutting a few corners is okay.

The second purpose of the register hooks function is to register the input filter that the pre_connection phase adds to the input filter stream. I have named this filter "APACHECON_IN", which is the name that the pre_connection phase uses to insert the filter. The function that actually implements the filter is apcon_filter_in, so that is specified as the second argument. The final argument is the type of the filter. There are two basic types of input filters, connection and request based. Connection based filters are inserted before a connection is started and get to act upon all of the data that sent to the server. Request based filters are added after the request has been started and they only get to access the request body. In this case the filter is going to be acting on headers, so this has to be a connection based filter.

Now we get to the meat of the module. This is the filter that will replace all spaces with "%20" and forward slashes with back-slashes:

static apr_status_t apcon_filter_in(ap_filter_t *f, ap_bucket_brigade *b,
                                    ap_input_mode_t mode)
{
    const char *str, *begin;
    int length, i, j;
    ap_bucket *e, *d;
    char data[256];

This portion of the code declares the filter function and sets up the local variables. The filter structure that is passed in is a reference to the current filter. The second is the brigade to be filled out, and the final argument is what mode this filter was called in. The mode is unique to input filters. There are three possible modes, AP_MODE_BLOCKING, AP_MODE_NONBLOCKING, and AP_MODE_PEEK. AP_MODE_BLOCKING and AP_MODE_NONBLOCKING are relatively straight-forward, when reading data from the client, it is done in either blocking or non-blocking mode. AP_MODE_PEEK requires a bit of thought. The problem is that Apache needs to determine if there is a second request coming over the same connection. AP_MODE_PEEK is a way for Apache to ask the input filters if there is more information on the connection without having any of the data returned to the caller. As follows:

    ap_get_brigade(f->next, b, mode);

    e = AP_BRIGADE_FIRST(b);

    if (e->type == NULL) {
        return APR_SUCCESS;
    }

The first line calls the next filter in the chain to get the data from the client. Once the next filter returns, we need to ensure that we actually received data from it. The AP_BRIGADE_FIRST macro gets the first bucket in the brigade. This gives us a starting point. If that bucket is NULL, then we didn't actually get any data and we should just return to the previous filter. In the next section:

    ap_bucket_read(e, &str, &length, 1);

    if (strncmp("GET ", str, strlen("GET "))) {
        return APR_SUCCESS;
    }
    ap_bucket_split(e, strlen("GET ") + 1);
    e = AP_BUCKET_NEXT(e);
    ap_bucket_read(e, &str, &length, 1);
  /* this should work, because we are just searching for HTTP/1.0 or HTTP/1.1 */
    begin = str + (strlen(str) - 3);
    do {
        begin--;
    } while (strncmp("HTTP", begin, 4) && (begin > str));
    ap_bucket_split(e, begin - str - 1);

Once we are sure we have data, we have to start looking at it to determine if we have modify anything. The first line of an HTTP GET request is GET URI HTTP/1.x. This section of the code ensures that the data in this bucket matches that syntax, and while doing that, it splits the bucket into three buckets. The first bucket has contains "GET", the last one contains "HTTP/1.x", and the middle bucket contains the URI.

    ap_bucket_read(e, &str, &length, 1);
    i = 0;
    j = 0;
    while (i < length) {
        if (str[i] == ' ') {
            data[j++] = '%';
            data[j++] = '2';
            data[j++] = '0';
            i++;
        }
        else if (str[i] == '\') {
            data[j++] = '/';
            i++;
        }
        else {
            data[j++] = str[i++];
        }
    }
    d = ap_bucket_create_transient(data, j);
    ap_bucket_setaside(d);
    AP_BUCKET_INSERT_AFTER(e, d);
    AP_BUCKET_REMOVE(e);
    ap_bucket_destroy(e);
    return APR_SUCCESS;
}

This final section gets the data out of the middle bucket, and traverses it copying it to another location. As it copies the data it converts the illegal characters to the characters discussed earlier. After the data has been copied, we create another bucket out of it. This bucket is a transient bucket, which is done just to cut a corner. We know that using a transient bucket is OK to do in input filters, although this really should be a heap bucket. Once the new bucket is created, we need to insert it into the brigade in place of the original bucket. This is done by inserting after the original bucket and then removing the old bucket. To ensure that we do not leak any memory, any bucket that is removed from a brigade must be destroyed by the function that removed it. Finally, we return to the calling filter, so that it can interpret the request.

When this module is inserted into an Apache 2.0 server, that server will accept requests for URI's that contain both spaces and slashed. While this is not a good idea to add to a production server, this module does solve the problem that people were having with the ApacheCon CDs, and hopefully it shows some of the power of input filters.

Apache 2.0 Update

The Apache Group has already made seven alpha releases, and they have decided to release alpha eight on November 18th or 19th. The big news, however, is that after alpha eight, the Apache Group plans to release the first Beta sometime between November 27th and November 30th. The reason for the long window, is that we want to be sure that if bugs are reported for alpha eight, we have some time to try to fix them. I want to stress that this is a goal right now, and not an absolute certainty. If there are a lot of bugs reported, or if the Apache developers decide at any time that we aren't really ready for a beta then the release will be delayed. If the first beta is released at the end of this month, then next month's article will discuss the new features. If for some reason the Apache Group does not release a beta, then next month's article will discuss why.

Related Stories:
Apache 2.0 alpha 8 released!(Nov 20, 2000)
Filtering I/O in Apache 2.0: Part 2(Oct 23, 2000)
Filtering I/O in Apache 2.0(Sep 20, 2000)
Apache 2.0 Server Up and Running(Aug 19, 2000)
Looking at Apache 2.0 Alpha 4 (Jun 30, 2000)
An Introduction to Apache 2.0(May 28, 2000)

  Current Newswire:
Apache Jakarta James Mailserver v2.0a2 Released

PostgreSQL v7.2 Final Release

Daemon News: Multiple webservers behind one IP address

Zend Technologies launches Zend Studio 2.0

NuSphere first to enable development of PHP web services

Covalent Technologies raises $18 million in venture capital

Apache 1.3.23 released

wdvl: Build Your Own Database Driven Website Using PHP and MySQL: Part 4

Business 2.0: Find High Tech in the Bargain Basement

Another mod_xslt added to the Apache Module Registry database


No talkbacks posted.
Enter your comments below.
Your Name: Your Email Address:


Subject: CC: [will also send this talkback to an E-Mail address]
Comments:

See our talkback-policy for or guidelines on talkback content.

About Triggers Media Kit Security Triggers Login


All times are recorded in UTC.
Linux is a trademark of Linus Torvalds.
Powered by Linux 2.4, Apache 1.3, and PHP 4
Copyright 2002 INT Media Group, Incorporated All Rights Reserved.
Legal Notices,  Licensing, Reprints, & Permissions,  Privacy Policy.
http://www.internet.com/