CSV Connector for WSO2 BAM

I’ve worked on a small tool to publish your spreadsheets (after converting to CSVs of course!) to WSO2 BAM.

The cool thing is you can publish 1 (or 1000s) of spread sheets to WSO2 BAM, and use the HiveAnalytics UI to slice and dice them to produce neat results.

So, you need maven to build and run this (and of course, WSO2 BAM up and running). Here are the steps:

1. Download and unzip the source from this link.

2. Run ‘mvn clean install’ at the unzipped location.

3. Now run the exec command in maven as per the following example: ‘mvn exec:java -Dexec.mainClass=org.wso2.carbon.bam.CSVAgent -DcsvFile=../ExportCustomerAccounts.csv -DstreamName=CustomerAccounts -DstreamVersion=1.2.0’

Here is what happens:

“CustomerAccounts” is the stream that will get created out of the CSV file, “ExportCustomerAccounts.csv”. All streams are versioned in BAM, so this stream will have the version “1.2.0”. Versioning means you can publish different versions of the CSV (columns deleted or added) with different versions.

The potential of this is you can publish any number of CSVs to BAM and make use of the SQL-like Hive query language to do joins and group bys to get valuable information out of your spread sheets.

BAM, SOA & Big Data

Leveraging Big Data has become a commodity for most IT departments. It’s like the mobile phone. You can’t remember the times when you couldn’t just call someone from your mobile, no matter where you are in the world, can you? Similarly, IT folks can’t remember the days when files were too big to summarize, or grep, or even just store. Setup a Hadoop cluster and everything can be stored, analyzed and made sense of. But, then I tried to ask the question, what if the data is not stored in a file? What if it was all flying around in my system?

Deployment

Shown above is a setup that is not uncommon deployment of a production SOA setup. Let’s summarize briefly what each server does:

  • An ESB cluster fronts all the traffic and does some content based routing (CBR).
  • Internal and external app server clusters host apps that serve different audiences.
  • A Data Services Server cluster exposes Database operations as a service.
  • A BPS cluster coordinates a bunch of processes between the ESB, one App server cluster and the DSS cluster.

Hard to digest? Fear not. It’s a complicated system that would serve a lot of complex requirements while enhancing re-use, interoperability and all other good things SOA brings.

Now, in this kind of system whether it’s SOA enabled or not, there lies a tremendous amount of data. And No, they are not stored as files. They are transferred between your servers and systems. Tons and tons of valuable data are going through your system everyday. What if you could excavate this treasure of data and make use of all the hidden gems to derive business intelligence?

The answer to this can be achieved through Business Activity Monitoring (BAM-ing). It would involve the process of aggregating, analyzing and presenting data. SOA and BAM was always a love story. As system functions were exposed as services, monitoring these services meant you were able to monitor the whole system. Most of the time, if the system architects were smart, they used open standards, that made plugging and monitoring systems even easier.

But even with BAM, it was impossible to capture every message and every request that passed through the server. The data growth alone would be tremendous for a fairly active deployment. So, here we have a Big Data problem, but it is not a typical one. A big data problem that concerns live data. So to actually fully monitor all the data that passes through your system you need a BAM solution that is Big Data ready. In other words, to make full sense of the data and derive intelligence out of the data that passes through modern systems, we need a Business Activity Monitor that is Big Data ready.

Now, a system architect has to worry about BAM, SOA and Big Data as they are essentially interwined. A solution that delivers anything less, is well short of a visionary.

WSO2 BAM 2.0.0 released!

 

The screenshots above show the final result of a Service statistics monitoring use case. Data across many servers got published to BAM, had to be analyzed and then presented on the dashboard you see above. Nothing better than a cool dashboard to make sense of all that data 😉

It has been an enduring journey with an abundance of learning curves that allows the BAM team to make some great technologies work together seamlessly. After, spending almost an year on a complete re-write of the WSO2 Business Activity Monitor, we were able to put the 2.0.0 release of this product, which is a complete re-write of the 1.x product. It has been a marathon effort for the last few months, and having a great team made all the work feel like a refreshing summer breeze.

The release note I concocted should say all you need to know about the product. A major thanks to everyone who helped inside and outside WSO2 to make the final release a reality.

 

WSO2 Business Activity Monitor 2.0.0 released!

The WSO2 Business Activity Monitor (WSO2 BAM) is an enterprise-readyfully-open sourcecomplete solution for aggregating, analyzing and presenting information about business activities. The aggregation refers to collection of data, analysis refers to manipulation of data in order to extract information, and presentation refers to representing this data visually or in other ways such as alerts. The WSO2 BAM architecture reflects this natural flow in its design.

Since all WSO2 products are based on the component-based WSO2 Carbon platform, WSO2 BAM is lean, lightweight and consists of only the required components for efficient functioning. It does not contain unnecessary bulk, unlike many over-bloated, proprietary solutions. WSO2 BAM comprises of only required modules to give the best of performance, scalability and customizability, allowing businesses to achieve time-effective results for their solutions without sacrificing performance or the ability to scale.

The product is available for download at: http://wso2.com/products/business-activity-monitor

The documentation is available at: http://docs.wso2.org/wiki/display/BAM200/WSO2+Business+Activity+Monitor+Documentation

Key Features

  • Collect & Store any Type of Business Events

    • Events are named, versioned and typed by event source
    • Event structure consists of (name, value) tuples of business data, metadata and correlation data
  • High Performance Data Capture Framework

    • High performance, low latency API for receiving large volumes of business events over various transports including Apache Thrift, REST, HTTP and Web services
    • Scalable event storage into Apache Cassandra using columns families per event type
    • Non-blocking, multi-threaded, low impact Java Agent SDK for publishing events from any Java based system
    • Use of Thrift, HTTP and Web services allows event publishing from any language or platform
    • Horizontally scalable with load balancing and high available deployment
  • Pre-Built Data Agents for all WSO2 Products

  • Scalable Data Analysis Powered by Apache Hadoop

    • SQL-like flexibility for writing analysis algorithms via Apache Hive
    • Extensibility via analysis algorithms implemented in Java
    • Schedulable analysis tasks
    • Results from analysis can be stored flexibly, including in Apache Cassandra, a relational database or a file system
  • Powerful Dashboards and Reports

    • Tools for creating customized dashboards with zero code
    • Ability to write arbitrary dashboards powered by Google Gadgets and {JaggeryJS}
  • Installable Toolboxes

    • Installable artifacts to cover complete use cases
    • One click install to deploy all artifacts for a use case
Issues Fixed in This Release
All fixed issues have been recorded at – http://bit.ly/Tzb1VP
Known Issues in This Release
All known issues have been recorded at – http://bit.ly/TzberZ

Engaging with Community

Mailing Lists

Join our mailing list and correspond with the developers directly.

Reporting Issues

WSO2 encourages you to report issues, enhancements and feature requests for WSO2 BAM. Use the issue tracker for reporting issues.

Discussion Forums

We encourage you to use stackoverflow (with the wso2 tag) to engage with developers as well as other users.

Training

WSO2 Inc. offers a variety of professional Training Programs, including training on general Web services as well as WSO2 Business Activity Monitor and number of other products. For additional support information please refer to http://wso2.com/training/

Support

We are committed to ensuring that your enterprise middleware deployment is completely supported from evaluation to production. Our unique approach ensures that all support leverages our open development methodology and is provided by the very same engineers who build the technology.

For additional support information please refer tohttp://wso2.com/support/

For more information on WSO2 BAM, and other products from WSO2, visit the WSO2 website.


We welcome your feedback and would love to hear your thoughts on this release of WSO2 BAM.

The WSO2 BAM Development Team

 

WSO2 BAM 2.0.0-Alpha 2 released!

My team at WSO2 was able to release a 2nd alpha of our upcoming BAM 2.0. Do give it a spin.

The release note is below:

The WSO2 team is pleased to announce the release of version 2.0.0 – ALPHA 2 of WSO2 Business Activity Monitor.

WSO2 Business Activity Monitor (WSO2 BAM) is a comprehensive framework designed to solve the problems in the wide area of business activity monitoring. WSO2 BAM comprises of many modules to give the best of performance, scalability and customizability. These allow to achieve requirements of business users, dev ops, CxOs without spending countless months on customizing the solution without sacrificing performance or the ability to scale.

WSO2 BAM is powered by WSO2 Carbon, the SOA middleware component platform.

Downloads

The binary distribution can be downloaded at http://dist.wso2.org/products/bam/2.0.0-alpha2/wso2bam-2.0.0-ALPHA2.zip.

The documentation pack is available at http://dist.wso2.org/products/bam/2.0.0-alpha2/wso2bam-2.0.0-ALPHA2-docs.zip.

Samples
  1. Service Data Agent – Sample to install Service data agent, publish statistics and intercepted message activity from Service Hosting WSO2 Servers such as WSO2 AS, DSS, BPS, CEP, BRS and any other WSO2 Carbon server with the service hosting feature
  2. Mediation Data Agent – Sample to install Mediation data agent, publish mediation statistics and intercepted message activity using Message Activity Mediators from the WSO2 ESB
  3. Data center wide cluster monitoring – Sample to simulate two data centers each having two clusters sending statistics events, perform summarizations and visualize them in a dashboard
  4. End – End Message Tracing – Sample to simulate messages fired from a set of servers to WSO2 BAM and set up message tracing analytics and visualizations of respective messages
  5. KPI Definition – Sample to simulate receiving events from a server (ex: WSO2 AS), perform summarizations and visualize product and consumer data in a retail store
  6. Fault Detection & Alerting – Sample to simulate receiving events from a server (ex: WSO2 ESB), detect faults and fire email alerts

Features

  • Data Agents
    1. Pre built data agents – Service Data Agent for the WSO2 AS, DSS, BPS, CEP, BRS and any other WSO2 Carbon server with the service hosting feature and Mediation Data Agent for the WSO2 ESB
    2. A re-usable Agent API to publish events to the BAM server from any application (samples included)
    3. Apache Thrift based Agents to publish data at extremely high throughput rates
    4. Option to use Binary or HTTP protocols
  • Event Storage
    1. Apache Cassandra based scalable data architecture for high throughput of writes and reads
    2. Carbon based security mechanism on top of Cassandra
  • Analytics
    1. An Analyzer Framework with the capability of writing and plugging in any custom analysis tasks
    2. Built in Analyzers for common operations such as get, put aggregate, alert, fault detection, etc.
    3. Scheduling capability of analysis tasks
  • Visualization
    1. Drag and drop gadget IDE to visualize analyzed data with zero code
    2. Capability to plug in additional UI elements and Data sources to Gadget IDE
    3. Google gadgets based dashboard

Reporting Issues

WSO2 encourages you to report issues, enhancements and feature requests for WSO2 BAM. Use the issue tracker for reporting any of these.

A revolution with Business Activity Monitor (BAM) 2.0

Producing middle ware that is both lean and enterprise worthy is a difficult job. It’s either non-existent or requires innovative thinking (a lot of it) and a lot of going back and forth with your implementations. Very risky business, but if you get it right, it puts you far ahead of anyone else. It’s why we thought of re-writing  WSO2 BAM from scratch and taking a leap rather than chugging away slowly by iterative fixing. If you prefer to hear me rather than reading this, please catch a webinar on this at http://bit.ly/xKxm8R.

Diagram coutesy of http://softwarecreation.org/2008/ideas-in-software-development-revolution-vs-evolution-part-1/

When you try to monitor your business activities, you need to plug in to your servers and capture events. It sounds easy enough, so what’s the big deal? you may ask. Here’s a few road blocks we hit with our intial BAM 1.x version:

  • Performance – We plug in to our ESBs and App Servers and all metrics were perfect. It nicely showed request counts, response times, etc. It was perfect as long as the load is low. If one server starts sending 1000 events/sec, things started getting ugly. Even worse, if we plug in to a few servers and start getting 1 billion events / day, well, that would have been a nightmare from the word go. We couldn’t even fathom what would happen at that type of scale.
  • Scalability – We need to store events and process them. Sadly, we discovered the hard waye this would mean is we need to scale in many different ways.
    • Event load – We need to scale in terms oh handling large amounts of events. We didn’t have a high performance server, but no matter how good our performance would be, there is still a breaking point. Afterwards, you need to scale.
    • Storage – If you store 1000 events a day, your data will grow. And, all of us hate to delete off old email, to get more inbox space. So naturally, everyone wants to keep their events.
    • Processing power – When you want to analyze events that you collect, a single server can only give you that much of processing power. You need to scale out your analytics. Another, ‘oh, so obvious’ thing that we learnt eventually.
  • Customizability – We provided a lovely set of dashboards that showed all you wanted to know about your server and API metrics. But, no one is ever satisfied with what we they have. They want more. They want to monitor their metrics and analyze their data and put up their own graphs. And, of course, they want to do it now, not in 2 months.

 

In May 2011, we decided to start a whole new initiative to re-write WSO2 BAM from scratch. We analyzed the problem made a few decisions. Here’s a few of them.

  • Divide and conquer – We divided the problem. We have to aggregate, analyze and present data. So we built separate components for each, keeping in mind that we need to scale each individually. We mapped these into the event receiver, analyzer framework and a presentation layer. Data agents are the link between anyone who wants to send events and the BAM server. The WSO2 Carbon platform, allows us to easily uninstall a component from any server. This means we can take the BAM distro, uninstall other components just to make an Event Receiver BAM server. Or to make an Analyzer BAM server. It’s just a click of a button.
The 3 main components of BAM 2.0
  • Scalable and fast storage – We chose to use Apache Cassandra as our storage solution. I do not want to argue that it’s the best data store ever. But, it works for us well. It allows us to do fast writes to store a large amount of data, quickly. Also, it’s built to scale. Scaling up Cassandra, takes minutes, not weeks. And scaling up doesn’t mean it’s going to cost you. Also, it’s written in Java, and being a Java house, it allows us to hack around the code.
  • Fast protocol – We chose to use Apache Thrift as our default protocol. There are many arguments against it, but it holds up well for us. It’s fast and it does it’s job. It allows us to maintain sessions, supports a bunch of languages. One key thing was Cassandra uses it as well, allowing us to gain more performance in streaming data into Cassandra without deserializing.
  • Scalable analytics – We chose to write our own analytics language. But, if it doesn’t suit you, you can plugin your own java code. Hadoop is unavoidable when it comes to scaling analytics. So, we decided to have a Hadoop mode for large amounts of data and a non-Hadoop mode, so that anyone can just use BAM without worrying about any Hadoop cluster.

  • Gadget based dashboards/reports – Drag and drop visualizations are very attractive when you don’t want to spend weeks writing code to visualize. We developed a gadget generator so you can quickly visualize your analyzed data easily.

After a couple of milestones, we were able to spin off an alpha. It’s available here: http://dist.wso2.org/products/bam/2.0.0-Alpha/wso2bam-2.0.0-ALPHA.zip. It is not the silver bullet and documentation is still WIP. But, if we haven’t already reached our destination, it’s within our reach now.

 

WSO2 Business Activity Monitor 2.0.0 Alpha released!

After a lot of re-designing, re-architecting, re-writing, re-re-writing we have come with an alpha of the all new BAM 2. Although, this is still an alpha, it will provide a good taste of things to come in the major BAM release.

Here’s the (not so) official release note:

WSO2 Business Activity Monitor (BAM) 2.0.0-Alpha is now available for download at [1].
The 2.0.0 alpha version is a complete re-write of BAM concentrating on scalability, performance and customizability.
Samples

This release contains samples that can be run without setting up another server to send events to the BAM server.
  1. KPI Definition – Sample to simulate receiving events from a server (ex: WSO2 AS), perform summarizations and visualize product and consumer data in a retail store
  2. Fault Detection & Alerting – Sample to simulate receiving events from a server (ex: WSO2 ESB), detect faults and fire email alerts
Features

Data Agents
  1. A re-usable Agent API to publish events to the BAM server from any application (samples included)
  2. Apache Thrift based Agents to publish data at extremely high throughput rates
  3. Option to use Binary or HTTP protocols
Event Storage
  1. Apache Cassandra based scalable data architecture for high throughput of writes and reads
  2. Carbon based security mechanism on top of Cassandra
Analytics
  1. An Analyzer Framework with the capability of writing and plugging in any custom analysis tasks
  2. Built in Analyzers for common operations such as get, put aggregate, alert, fault detection, etc.
  3. Scheduling capability of analysis tasks
Visualization
  1. Drag and drop gadget IDE to visualize analyzed data with zero code
  2. Capability to plug in additional UI elements and Data sources to Gadget IDE
  3. Google gadgets based dashboard

We welcome to use this and provide feed back ahead of the major release in Q1/Q2 2012.
Keys available at [2], [3].

Explaining Business Activity Monitoring (BAM) – Part I – Aggregation

WTH is BAM?

To start off, let us understand what exactly is meant by a business activity. It’s important to understand that Business Activity Monitoring (BAM) was coined with respect to monitoring business activities run in BPM in a SOA deployment. But soon, it was understood, that not every business activity went through BPM software.

A business activity can either be a business process that is orchestrated by business process management (BPM) software, or a business process that is a series of activities spanning multiple systems and applications – Wikipedia

There is no rocket science here. Monitoring a business activity, is simply monitoring the execution path of the business process.

Business Activity Monitoring is a term coined by Gartner, Inc. It’s a worthwhile definition to look into when you are trying to understanding BAM.

Business Activity Monitoring refers to the aggregation, analysis, and presentation of real-time information about activities inside organizations and involving customers and partners – Wikipedia

I really like this definition. It tells you exactly the sequence any information should go through to successfully BAM it.

Aggregation → Analysis → Presentation

So why did I write an entire blog post just to explain the first step? It seems simple enough. Although it seems trivial, as you try to run a production grade Business Activity Monitoring system, there are many implications and limitations that apply to each of these steps. Let me run you through a simple aggregation example.

A (not so) simple sample

Let us take a sample business activity.

Let me take you through a 3 step business activity. Products for the sample setup is drawn from the WSO2 product stack.  A client requests comes to an Enterprise Service Bus (ESB) which forwards it to a service hosted on Application Server (AS). Ths AS processes the request and delivers an appropriate response to the ESB. The ESB, now formats the response and forwards it as a request to the Data Services Server (DSS). The DSS processes the request and sends a response back to the ESB, which forwards it back to the client. This is a service chaining pattern found commonly across many SOA deployments.

To recap, the message path is : client → ESB → AS → ESB → DSS → ESB → client

Obviously, there is no black magic in BAM. Some sort of information (referred as an event) needs to be sent to a BAM server (Business Activity Monitor, or BAM for brevity) as messages pass through the setup.  For us to successfully aggregate event from this setup, what we would do is, install some sort of event publishers on each server and pump events into a BAM. This would give us enough information to analyze and present, even if the messages are not routed to DSS and AS through an ESB.

The woes of an aggregator

The life of an aggregator is fairly simple. Collect the events for analysis. What about the event publisher? In a practical BAM setup, the publisher plays a major role in sending over the events that will be aggregated. So for completeness and the sake of practical reality, we will consider problems that affect both the publisher and aggregator.

  1. Event persistence

We cannot keep the events that we receive in memory. If we receive events at any decent rate, we will burn BAM by going out of memory (OOM). Even, if we can handle it, a practical system will want to run different analysis on events and we will end up holding events in memory forever, eventually going OOM. So, an aggregator can cleanly end it’s job by dumping the event into a data store. This means that we have to be ready for an additional data store in the system just for BAM.

  1. Information Loss

The problem with data is that if you do not capture it at the exact moment it occurs, you forever lose the chance of turning that data into useful information. So, even though it is possible to get an available BAM server from a vendor, the decision of what data to capture will be always with the system architect. It is vital to get this right. If you capture too little information, you will not have enough information to analyze. If you capture too much information, you will take a big hit on performance. For example, if you have 10 MB messages going through and capturing and publishing everything towards BAM means publisher memory and performance hits, and additional storage on BAM server. So you need to strike the right balance and decide on what data to capture as an event for BAM.

  1. Storage growth

It s typical to have systems with tens of millions of messages passing through every day. When we are monitoring such a system, our data will grow. Not by simple amounts, but Terabytes and Terabytes of data will get accumulated. The BAM server needs to have a data architecture that can combat this massive data growth. Even in small scale systems, this data growth can be significant. The system architect should also be aware of this data growth and publish only what is necessary. Otherwise, storage would be wasted unnecessarily. And with big data such as this, the wasted storage would be enormous.

  1. Publisher Performance

Publishers have to be really, really efficient when it comes to publishing events. They need to be capable of publishing events extremely fast, without disturbing the actual flow of data. Of course, this is impossible. You cannot have zero impact on the actual message flow when you are monitoring it. But practical publishers have to come very close to that. If the publishers are not efficient, it can take the whole server down during high production loads. The requirements for publishers are,

  • Publishers have to be intelligent to handle failures – Drop events or stop accepting events, increase/decrease publishing rates
  • Publishers have to be very, very fast – Fast protocols, optimized implementation
  • Publishers have to be unobtrusive – System designers have to publish data they want to capture with minimum effect to the actual data flow
  1. Aggregator Performance

Aggregators have to be as fast as the publishers. They have to be designed to handle enormous event rates. If multiple publishers are publishing events, a single aggregator will not be able to cope with the load. Therefore, aggregators have to be scale. It should be possible to cluster and load balance them to handle extremely large event publishing rates. These kind of rates are completely normal when you are monitoring multiple server nodes in a large production system. So the requirements for aggregators are,

  • Aggregators have to be very, very fast – Dump the event to storage as fast as possible
  • Aggregators have to scale – A single server will always have a limit, even an efficient one.

In addition, to these five concerns, there are other minor implications such as interoperability between publishers and aggregators,  server-server security, etc. I plan to follow up on these issues on a later blog post.

Conclusion

This blog post planned to give a comprehensive coverage on the aggregation part of BAM. Even though long, hopefully, the blog post was informative and useful. I plan to follow up on the analysis and presentation sections as well in later posts.

Presenting BAM at WSO2Con

I’m extremely proud to be presenting at WSO2Con. I’m proud, not only because of the opportunity I have to present, but more so, because I’m part of something much bigger. I’m part of something that represents a rising of a tiny island nation through open source tech. To date, Sri Lanka has made its mark in the Apache world, GSoC, Sahana and many other open source communities. And now, an open source company from Sri Lanka hosts an event, that brings in people from over 20+ countries. And among them tech leaders from corporate giants such as ebay. I’m sure many people would share my sentiment.

I’m presenting on BAM (Business Activity Monitoring) for your SOA deployment at WSO2Con on the 14th of Sept at 2.15 pm. The conference will be held from 12 – 16 Sept. 2011, at the place I call home. The island paradise of Sri Lanka.