Interesting article about a topic I've been discussing quite a bit lately: the notion of "DevOps".
http://somic.org/2010/03/02/the-rise-of-devops/ I'm a firm believer that everyone in IT should be a developer. For example, QA needs to be able to create test scripts, automate processes, and generate coverage reports. Ops needs to be able to develop scripts to automate monitoring, troubleshooting, provisioning, and capacity management. One of the problems in enterprise IT is hiring people who don't have development skills because their job title isn't "developer" or "programmer". The ability to write software and automate processes needs to become the norm in IT if we're going to progress beyond silos of responsibility and truly work together as a team. Being able to write software is a basic skill now and should be part of every job description. This transition is similar to the changeover from the "spreadsheet specialists" of 15-20 years ago who knew the intricacies of Lotus 1-2-3 or Microsoft Excel to the current day where everyone uses spreadsheets and word processors as part of their daily life. IT always wants to work smarter, cheaper, and more consistently - these are the by-products of good software development efforts. It's time to spread those benefits across the organization.Analyzing Peak Traffic with Python
We've been using AWstats at work for generating traffic reports from our Apache server logs. I like AWstats and think it does a nice job in capturing trends and helping with capacity planning. However, there's one area where it provides zero information - peak traffic. Because of the way AWstats aggregates the web traffic into daily/hourly/monthyl blocks, there is no way to know the distribution of that traffic within a specific hour, minute, or second.
This generally isn't a big deal. However, sometimes you need to answer the following question: "What is the maximum number of requests per second we handled in the last (day|month|year)?" With the aggregation provided by log analysis tools, the best you can do is use a formula to try and average the number of requests for the peak hour assuming a standard distribution of requests. For example, if you had 600 requests in your peak hour, you would have to assume (600 / 60 minutes = 10 requests per minute = 1 request every six seconds). In that example, though, it's entirely possible (but unlikely) that those requests all happened in a three second period and your servers sit idle for much of the time and then get clobbered with traffic intermittently.
We recently needed to answer the peak traffic question so I set to writing a script to figure it out for myself. The results are a Python script which is available here and which does the following:
- Recursively locates all web server access logs within a given directory structure
- Opens each log file and parses it for requests matching a regular expression (see why below)
- Sums the request counts into per-second buckets
- Stores the per-second totals in a Sqlite database
In my case, the web server log files were gzipped so I had to uncompress the files prior to parsing them. Also, I wasn't interested in total hits but hits which involved our application servers (since the question was related to application server sizing and not Apache sizing). This was the reason for the regular expression during parsing - it identified requests which contained URIs which had been JkMounted to JBoss.
After some tweaks for performance, the script was able to parse and report on about 1.5 GB worth of compressed log files in about an hour on my laptop. The generated Sqlite database contains request totals and was easy to use for generating peak traffic reports. I think the next step will be to add some type of visualization to the reporting and graphically show the traffic patterns.
More great Posterous themes at themes.posterous.com.

