Explaining Business Activity Monitoring (BAM) – Part I – Aggregation

WTH is BAM?

To start off, let us understand what exactly is meant by a business activity. It’s important to understand that Business Activity Monitoring (BAM) was coined with respect to monitoring business activities run in BPM in a SOA deployment. But soon, it was understood, that not every business activity went through BPM software.

A business activity can either be a business process that is orchestrated by business process management (BPM) software, or a business process that is a series of activities spanning multiple systems and applications – Wikipedia

There is no rocket science here. Monitoring a business activity, is simply monitoring the execution path of the business process.

Business Activity Monitoring is a term coined by Gartner, Inc. It’s a worthwhile definition to look into when you are trying to understanding BAM.

Business Activity Monitoring refers to the aggregation, analysis, and presentation of real-time information about activities inside organizations and involving customers and partners – Wikipedia

I really like this definition. It tells you exactly the sequence any information should go through to successfully BAM it.

Aggregation → Analysis → Presentation

So why did I write an entire blog post just to explain the first step? It seems simple enough. Although it seems trivial, as you try to run a production grade Business Activity Monitoring system, there are many implications and limitations that apply to each of these steps. Let me run you through a simple aggregation example.

A (not so) simple sample

Let us take a sample business activity.

Let me take you through a 3 step business activity. Products for the sample setup is drawn from the WSO2 product stack.  A client requests comes to an Enterprise Service Bus (ESB) which forwards it to a service hosted on Application Server (AS). Ths AS processes the request and delivers an appropriate response to the ESB. The ESB, now formats the response and forwards it as a request to the Data Services Server (DSS). The DSS processes the request and sends a response back to the ESB, which forwards it back to the client. This is a service chaining pattern found commonly across many SOA deployments.

To recap, the message path is : client → ESB → AS → ESB → DSS → ESB → client

Obviously, there is no black magic in BAM. Some sort of information (referred as an event) needs to be sent to a BAM server (Business Activity Monitor, or BAM for brevity) as messages pass through the setup.  For us to successfully aggregate event from this setup, what we would do is, install some sort of event publishers on each server and pump events into a BAM. This would give us enough information to analyze and present, even if the messages are not routed to DSS and AS through an ESB.

The woes of an aggregator

The life of an aggregator is fairly simple. Collect the events for analysis. What about the event publisher? In a practical BAM setup, the publisher plays a major role in sending over the events that will be aggregated. So for completeness and the sake of practical reality, we will consider problems that affect both the publisher and aggregator.

  1. Event persistence

We cannot keep the events that we receive in memory. If we receive events at any decent rate, we will burn BAM by going out of memory (OOM). Even, if we can handle it, a practical system will want to run different analysis on events and we will end up holding events in memory forever, eventually going OOM. So, an aggregator can cleanly end it’s job by dumping the event into a data store. This means that we have to be ready for an additional data store in the system just for BAM.

  1. Information Loss

The problem with data is that if you do not capture it at the exact moment it occurs, you forever lose the chance of turning that data into useful information. So, even though it is possible to get an available BAM server from a vendor, the decision of what data to capture will be always with the system architect. It is vital to get this right. If you capture too little information, you will not have enough information to analyze. If you capture too much information, you will take a big hit on performance. For example, if you have 10 MB messages going through and capturing and publishing everything towards BAM means publisher memory and performance hits, and additional storage on BAM server. So you need to strike the right balance and decide on what data to capture as an event for BAM.

  1. Storage growth

It s typical to have systems with tens of millions of messages passing through every day. When we are monitoring such a system, our data will grow. Not by simple amounts, but Terabytes and Terabytes of data will get accumulated. The BAM server needs to have a data architecture that can combat this massive data growth. Even in small scale systems, this data growth can be significant. The system architect should also be aware of this data growth and publish only what is necessary. Otherwise, storage would be wasted unnecessarily. And with big data such as this, the wasted storage would be enormous.

  1. Publisher Performance

Publishers have to be really, really efficient when it comes to publishing events. They need to be capable of publishing events extremely fast, without disturbing the actual flow of data. Of course, this is impossible. You cannot have zero impact on the actual message flow when you are monitoring it. But practical publishers have to come very close to that. If the publishers are not efficient, it can take the whole server down during high production loads. The requirements for publishers are,

  • Publishers have to be intelligent to handle failures – Drop events or stop accepting events, increase/decrease publishing rates
  • Publishers have to be very, very fast – Fast protocols, optimized implementation
  • Publishers have to be unobtrusive – System designers have to publish data they want to capture with minimum effect to the actual data flow
  1. Aggregator Performance

Aggregators have to be as fast as the publishers. They have to be designed to handle enormous event rates. If multiple publishers are publishing events, a single aggregator will not be able to cope with the load. Therefore, aggregators have to be scale. It should be possible to cluster and load balance them to handle extremely large event publishing rates. These kind of rates are completely normal when you are monitoring multiple server nodes in a large production system. So the requirements for aggregators are,

  • Aggregators have to be very, very fast – Dump the event to storage as fast as possible
  • Aggregators have to scale – A single server will always have a limit, even an efficient one.

In addition, to these five concerns, there are other minor implications such as interoperability between publishers and aggregators,  server-server security, etc. I plan to follow up on these issues on a later blog post.

Conclusion

This blog post planned to give a comprehensive coverage on the aggregation part of BAM. Even though long, hopefully, the blog post was informative and useful. I plan to follow up on the analysis and presentation sections as well in later posts.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s