Last night I began working on a project that allows you to share state across multiple machines within a distributed environment. It allows you to instantiate objects and sessions on one machine and have them automatically cascade out so that they are available on every machine within your network. But, instead of having a local copy on every machine in your cluster, each machine only holds a handler that points back to the machine that the object originated on. This keeps the resource requirements in the cluster to a minimum. To test my application, the easiest thing that came to mind was to fashion together a round-robin scenario using traditional load balancing mechanisms. Instead of wasting money purchasing one of the big-box load balancers, I chose to go with the easiest and cheapest (free) method I could think of short of writing my own load balancer (which I still did, but I will discuss that in another article). For my load balancing needs, I decided to use a tool that I already had loaded and ready to go. And now, I am going to show you how to do the same by teaching you how to setup a simple load balancer using the Apache HTTP server. Being that Apache’s HTTP server is extremely powerful and robust, you can use this same setup in a production environment and on the web. Let’s begin!
Before jumping right in to setting up Apache as a load balancer, you will first need to download and install Apache if you haven’t already. You can find the version that fits your environment at http://httpd.apache.org/download.cgi/. Since I was testing my application in a Windows environment, I went with the 2.2.22 version which I downloaded as an MSI installer from http://apache.mirrors.pair.com//httpd/binaries/win32/. I already had Apache installed. But, when you go to install it for yourself, stick with the default configuration unless you specifically need to change anything along the way. The only thing you will probably want to change are the default settings for your server name. I went with something like “developer.prv” and “local.developer.prv”, but you can use whatever you want as that’s not the important part of this article.
Once you have Apache installed, configuring it to be used for load balancing is extremely simple. Since Apache already comes with everything you need for load balancing, the first thing you will need to do is to enable the modules that provide that functionality. If you have never worked with Apache before, or even if you have, everything you will need to configure it to run as a load balancer can be done inside the “httpd.conf” file. If you are working with Windows, which I am, you can find the httpd.conf file in “C:\Program Files\Apache Software Foundation\Apache2.2\conf\“. If you’re using a *nix based system, you can typically find the httpd.conf file in “/etc/httpd/conf/“. With yout httpd.conf file located, you will need to open it up with a text editor so that you can configure it for your load balancer.
The first thing you will need to do is to scroll down and uncomment (remove the # sign from the beginning of the line) the following lines.
#LoadModule proxy_module modules/mod_proxy.so
#LoadModule proxy_balancer_module modules/mod_proxy_balancer.so
#LoadModule proxy_http_module modules/mod_proxy_http.so
Make sure you enable the mod_proxy_http shared object. I missed that one the first time through. Apache didn’t report any errors, but the load balancer just didn’t work.
Now that you have the modules enabled for your load balancer, jump to the bottom of the file. There, you will need to define your proxy. You can do that by entering the following lines:
ProxyPass / balancer://mycluster/
ProxyPassReverse / balancer://mycluster/
BalancerMember http://localhost:9991 route=node1
BalancerMember http://localhost:9992 route=node2
As you can see, I have included 2 nodes that I want workload distributed across by registering a “BalancerMember” for each server within my cluster. For the sake of this article, I have chosen to use the same computer (localhost) but having the application server running on 2 separate ports, 9991 & 9992. I have chosen to name my cluster “mycluster” just like in the mod_proxy_balancer documentation at http://httpd.apache.org/docs/2.2/mod/mod_proxy_balancer.html. You can name yours whatever you want. You can also choose to leave out the “route=node*” for each BalancerMember if you want. I added those in because I have plans to extend my balancer in the near future.
That’s it. You are now ready to use Apache as a load balancer. Just save the httpd.conf file and startup the Apache service. To test it, open a web browser and point it to “http://localhost/“. By default, Apache is set to listen for requests on port 80. If you have decided to change the default port along the way, you will need to add the new port to your URL in your browser. By declaring a single “/” (forward slash) in my ProxyPass, this tells Apache to proxy all requests that come in at the root level. If you would rather have a specific sub-level address, you can declare that like so:
ProxyPass /sublevel balancer://mycluster/
This will tell Apache to only proxy addresses that look like “http://localhost/sublevel“. Whatever you decide on, make sure you include the trailing “/” (forward slash) at the end of your balancer://mycluster/. Otherwise, you will see some issues further on that I will explain shortly.
If everything worked accordingly, when you launch “http://localhost/” in your browser, it should display the page that lives at the same location on each BalancerMember. This method will work no matter what you choose as your application server that you plan on load balancing. For example, I could replace http://localhost:9991 with http://my_tomcat_server_1 and http://localhost:9992 with http://my_tomcat_server_2 respectively. I can now use Apache to load balance Tomcat, Jetty, Glassfish, JBoss, Websphere, other Apache servers, Python servers, etc…
If you do not have an application server to test with, don’t worry. I have decided to provide you with a simple Python server, the same server I originally tested my load balancer with before moving on to my real application as mentioned above. It’s a really simple HTTP server that I’ve used in other examples on this website. When you access the server, it simply returns “Connected to PORT” where “PORT” is the port number the server is listening on. By running multiple instances of the same server, all listening on a different port number, this will show me exactly which node my load balancer has sent me to in my cluster. Here is what the simple Python HTTP server looks like.
from BaseHTTPServer import HTTPServer, BaseHTTPRequestHandler PORT = 9991 class ConnectionHandler(BaseHTTPRequestHandler): def _writeheaders(self): self.send_response(200) self.send_header('Content-type', 'text/html') self.end_headers() def do_HEAD(self): self._writeheaders() def do_GET(self): self._writeheaders() self.wfile.write("""<HTML><HEAD><TITLE>Simple Server</TITLE></HEAD> <BODY>Connected to %d</BODY></HTML>""" % PORT) serveraddr = ('', PORT) srvr = HTTPServer(serveraddr, ConnectionHandler) srvr.serve_forever()
As shown in my Proxy configuration in Apache above, you can see that I have launched 2 instances of this Python HTTP server. The first server listened for connections on port 9991 and the second server listened for connections on port 9992. I registered both server instances in the Proxy configuration as BalancerMember. With both instances of my Python HTTP server running and my Apache load balancer running, whenever I point my browser to “http://localhost/“, I will see the message “Connected to 9991” or “Connected to 9992” depending on which server I was routed to.
The method I have demonstrated here uses a standard round-robin approach to load balancing. That means, every request that comes in to the load balancer will get routed to the next server. In the case of having only 2 nodes in my cluster, each request will be routed from one node to the next and back again. If you test this in Chrome, you will probably always see the same message “Connected to 9992” which appears that the load balancer is not working. If you followed everything in this article exactly, I assure you that your load balancer is working properly. It’s just that every request from Chrome will send one request for favicon.ico which will be routed to the first server and then the actual page request will be routed to the second server. If you do not want every request to be sent to a different server, you can lock clients to the same server upon each request by using the “sticky” method as shown in the Apache mod_proxy_balancer documentation found at http://httpd.apache.org/docs/2.2/mod/mod_proxy_balancer.html.
If for some reason you still don’t believe that your load balancer is working, there are a few things you can do to double-check it. The first thing you should do is run the same test using another web browser such as Firefox. When I test it in Firefox, I always get a different message every time I refresh the page, indicating that my load balancer is working. If you are using the Python HTTP server I provided here for testing, you can watch the output window and see which requests are going thru which server. If you do this and see that requests are only hitting one of the Python HTTP servers, you can enable the built-in Balancer Manager in Apache which can be accessed from “http://localhost/balancer-manager“.
To do that, you will first need to add a new ProxyPass that tells Apache not to load balance requests to the “balancer-manager” context.
ProxyPass /balancer-manager !
If you leave out this step, you will always get an Error 500. Upon examination of the Apache log files, you will see the warning message:
[warn] proxy: No protocol handler was valid for the URL /balancer-manager. If you are using a DSO version of mod_proxy, make sure the proxy submodules are included in the configuration using LoadModule.
Next, you will need to register the Balancer Manager location like this:
Allow from all
Once you have configured the Balancer Manager, you can access it by pointing your browser to “http://localhost/balancer-manager“. From there, you can see the status of each node in your cluster. You can also configure and enable/disable each node by clicking the link for each node.
That’s all folks! You should now be able to use the Apache HTTP server as a load balancer. Here are all of the code changes I made to my httpd.conf file to make this happen.
LoadModule proxy_module modules/mod_proxy.so LoadModule proxy_balancer_module modules/mod_proxy_balancer.so LoadModule proxy_http_module modules/mod_proxy_http.so ProxyPass /balancer-manager ! ProxyPass / balancer://mycluster/ ProxyPassReverse / balancer://mycluster/ <Proxy balancer://mycluster> BalancerMember http://localhost:9991 route=node1 BalancerMember http://localhost:9992 route=node2 </Proxy> <Location /balancer-manager> SetHandler balancer-manager Order Deny,Allow Allow from all </Location>
PayPal will open in a new tab.