Tuesday, March 17, 2015

Performance Testing/Engineering Message  Oriented Middleware (MOM) Architecture   (Approach)

                                   By Khounish Dasgupta  

 

Introduction:Message oriented middleware is an implementation in which software or hardware infrastructure supports sending and receiving messages between homogeneous /heterogeneous distributed systems.
Typically in N-Tier Architecture Presentation Tier/Customer facing applications are made beautifully light weighed so that  rendering can be faster and seamless.These customer facing thick or think client applications share/access information to/from middle-ware tier by publishing the messages through common formats which each of the connected homogeneous(Vendor1 EMS queue->Vendor1 Ems queue) or Heterogeneous(Vendor1 EMS queue->Vendor2 MQ queue->Vendor1 EMS queue) queues which are subscribed and processed by middleware business logic layer. Middle-ware messaging systems in turn accepts the inputs from publishers and implements heavily reusable orchestrated business logic to serve the application purposes.
       Message Formats: In Message oriented orchestrated Architecture , the            messages  can flow in/out through different formats.Publisher may publish        in one format,subscriber may subscribe to  the  same format and use it as is        or can subscribe to the message,convert into compatible format and process        further.Subscriber can become again publisher and post the message in a            particular format, Another subscriber can accept same or converted format       and so on....
                      
                                       HygieneSHOP a Case Study
Lets take a basic shopping system called ""HygieneSHOP" would be designed to have customer facing  thin client (web based)application functionality and also  some thick client (stand alone)application functionality areas that would  be encompassing back office works.Thin web client application are  ported onto a well capable multi/hyper threaded Vendor Application servers(weblogic/websphere/jboss).Thick clients are stand alone vendor (java/.net) based thick client ported onto individual these systems  would also  be interacting and interchanging information through  MOM( message oriented  middleware ) tier.Say HygieneSHOP  is designed with below requirements
Customer Login functionality: As per basic design, say  a customer facing thin client application  would enable login functionality .As per the functionality ,As soon as the customer logs in or logs out  application , the application  events out an text /pipe delimited messages stream  (Format:2015||03||17||xxx||yyy||..||nnn||... ) on login.  Point to be noted that here  application would post  message in text-Pipe Delimited format
Customer Order functionality: As part of further product design ,customer are suppose to navigate and choose product and  can order .  This type of  functionality carries very sensitive information like credit card details ,SSN etc, hence message formats must be secure .Hence as per product specification , as soon as an order is placed , all the required information should be encapsulated a,  events out as binary/base 64bit encoded-encrypted messages stream ("TWFuIGlzIGRpc3Rpbmd1aXNoZWQsIG5vdCBvbmx5IGJ5IGhpcyByZWFzb24sIGJ1dCBieSB0aGlzIHNpbmd1bGFyIHBhc3Npb24gZnJvbSBvdGhlciBhbmltYWxzLCB3aGl") . Point to be noted that same application  fro a secure functioality  would post encrypted type of messages which base64bit encoded
Agent Reporting Functionality: As a part of project requirement  an Agent facing(internal) thick client application would be designed . Using the same, an agent would login to the system and  would generate report  of  purchase. An agent would choose the customer and  click on a button Say "customer report"  . As a result the application events out XML   formated messages delimited messages stream like
  • <?xml version="1.0"?>
    <note>
    <to>Customer Admin</to>
    <from>Khounish</from>
    <heading>order report</heading>
    <body> <orderid>10012</orderid>
    <billedAmount>45 USD</billedAmount>
    <productid>p_1101</productid> </body>
    </note>"
Customer Navigation history : Using same thick client ,an agent can click on  button called "Customer History" . once agent does so, standalone thick client  events out  JSON messages messages stream like
{"firstName": "Khounish", "lastName": "dasgupta", "isAlive": true, "age": 25, "height_cm": 167.6, "address": { "streetAddress": "21999 xxx Street", "city": "timesquare", "state": "NY", "postalCode": "10021-3100" }, "phoneNumbers": [ { "type": "home", "number": "212 555-1234" }, { "type": "office", "number": "646 555-4567" } ], "children": [], "spouse": null }  

Emphasize Note:   above few paragraphs emphasizes about  different types of message formats i.e text stream,base64 stream,XML stream,JSON Stream.
Messaging models:Before discussing on how to perform performance testing in messaging environment ,let discuss on messaging models . On the basis of two basic model n-tier messaging architecture can be built. The basic two models are Publish-and-Subscribe and Point-to-Point
Publish-and-Subscribe:publish-and-subscribe , sometimes also called as pub/sub model ,is intended for a one-to-many broadcast of messages. At one end publishing applications publish  messages on a message-store called Topic .on other end each subscriber /consumer applications subscribes to the topic and consumes the copy of a message ,process it further to its need.The pub/sub messaging model is a push-based model, where messages are automatically broadcast to consumers without them having to request or poll the topic for new messages.Even if any Subscriber is down, as soon as it comes back it starts getting messaging from the offset it stopped.
Point-to-Point point-to-point is implemented for one-to-one delivery of messages. At one end  application publishers , event out message and post /publishes  on to a queue, on the other end a subscriber( sometimes called listener or consumer) process stays connected to the queue and polls it continuously .Their could be multiple consumer/subscriber to a queue, but at a time one would receiver each message. multiple consumer can be used to polling for filter based message payload
Example: lets say a publisher application, publishes message including an attribute ranging from 1-100,101-200,201-300 and so on. say consumer-1/subscriber-1 processes the messages with attribute load ranging from 1-100, consumer-2 processes the messages with attribute load ranging from 101 to 200. however,consumer 1 would not process 101 to 200 or vice verse.

       
                    Synchronous vs Asynchronous Messages:


Message processing happens through certain steps .i.e. Subscriber process->queue connection establishment->poll message->receive message->process->repoll message. This can be done by polling for a message, complete processing and then re-poll for new message.i.e. blocking re-polling for a new message until the earlier message operation completes. This process is called synchronous process.This technique resembles TCP  packet commuication in network terminology . Synchronous process reliable but slow .To resolve the slowness another technique is used. To resolve the slowness challenges, developers are often guided to take certain part of the consumer process works as non-blocking. i.e. the subscriber process pulls message from queue ,assign it to  some other module for processing ,re-polls the next message from queue, and so on. So synchronous is blocking and asynchronous is non-blocking 
Now that basics of very vast message oriented middleware is understood,We need to know the need to test the various attached stacks trough nonfunctional aspects.
Any Business owner,Developer,Decision makers would like to know whats the impact of messsage traffic on the infrastructure where message oriented middleware is implemented. There could be a regular load of messages feed in by publishing source and also irregular situation can occur when a sudden massive heavy traffic arises . This situation is very much possible in Travel,Banking,Financial organizations,Gaming domains .
                Question that makes Business nervous:
  • Would the consuming system scale to business need?
  • what is the message processing capacity?
  • whats the impact on application container?
  • whats the impact on operating systems on which the overall architecture has been connected/ported
  • In case consumer process has another face of being publisher to another queue would it scale the same way as compared to standalone
  • whats the baseline load should be
  • Is there any memory leak in the "vendor messaging "
  • Is there any memory leak in the messaging implementation
Emphasize note: Above few paragraphs emphasizes fundamentals of messaging model and synchronous vs asynchronous message handling technique by subscriber/consumer, the basic questionnaire stake holders can think about before deciding on production movement
Performance testing requirement meet-up:When a performance tester meets business for testing their requirement ,its like meeting of two Vagabonds where each one knows what is suppose to be done but not aware about each other doing. In the initial meeting business talk more about functionality and asks performance tester whats the approach of testing their stack.A performance tester in turn ask stake holders "Can you share the logical and physical diagram".which means performance tester is asking architecture diagram and also sequence flow diagram. Performance tester then analyzes architecture and provides the approach of how to test End-to-End architecture or each attached Component level through various performance testing criteria.
In this article assumption is that we need to test a messaging stack which can have multi-tier (homogeneous level or heterogeneous combinations)
Needs of messaging/event driven setup : Let's take-a-fake architecture (dummy architecture) , and proceed forward towards understanding need and approach of performances testing Messaging stack.lets take an airline domain where in the core repository system is called flight information system. the core repository gets updated by various flight level information such as.Lets take a flight which can have typically below information.

  • when is the flight scheduled to arrive or depart
  • what is the estimated arrival time
  • what is the real arrival time
  • whats the estimated departure time
  • what is the real departure time
  • which gate was it suppose to come
  • which gate it came
  • was there any deviation or diversion in the designated path.
  • was there any equipment changed,pnr information ,baggage information
  • whats the baggage status and count /deviation,flight cancellation


As result of all these information,millions of messages gets generated through applications running on airport,flight. these information needs to be handy in real time basis to n number of consuming applications for their own purposes.few of them are
  • The customer facing applications should have real time data.
  • Agent side applications should also have real time data
  • The Orchestrated SOA layer should also be receiving same real data without delay
  • The stored procedure and triggers running at database layers also needs real time data to respond with accuracy
  • The single-sign LDAP based system that relates to the Flight information also needs to in sync
  • Other partner Airlines systems should also receive the real time feed at the same time
Above mentioned facts generates the need of a system through which any legacy or new format of information can pass through , can be understood by end consuming system and meet the purpose.Messaging through queue/pipe mechanism stood one of the best faster method of message transferring
Message testing Kick off : Let discuss a generic process ,entry point of non functional-performance testing .Typically on high level below are the life-cycle  steps
  • Business gives the message process  requirement to Developer
  • Developer writes the code and does the unit level testing
  • Business calls the QA ( functional QA and Nonfunctional QA for requirement sharing/gathering)
  • Functional testing team gathers detailed information so that the test cases can be written at a best possible granular level so that best of best coverage can be provided.
  • Performance testing team would gather the most critical business cases so that they can visualize the scripting needs or visualize the need to of end-to-end/component level test.
  • Functional testing team starts executing , non functional team starts preparing data,scripts for non functional test,setup monitors to gather metrics.
  • Once functional testing team certifies 80-85% success rate , its assumed that it covers all the business critical test cases has been passed.
  • Here is the point ,when functional testing team signs off on 80-85% performance testing team picks up for various performance tests
  • For a new stack if SLA is  available , performance testing team would perform smoke test,load test,stress test,spike test, scalability test,failover and fallback test,long term endurance sustenance test
Emphasize Note : Above few paragraphs emphasize  the types of nonfunctional-performance test should be introduced when a new stack is under test.Also, the entry point of performance testing has been discussed . it has been discussed that performance testing starts after 80-85% successful completion /sign off on business scenarios which includes all the business critical scenarios that would be part of performance testing.
In this article the we would discuss about performance testing approach/strategies for a newly built/deployed messaging stack.lets discuss through a case study of a Virtual Airlines System
                                Hygienic Airlines (A case study)
On a one good Vitamin-D day(Sunny Day) ,Business and Development teams of HygienicAirlines call in for a meeting between 10 AM-3PM. They want to meet with performance testing engineer Khounish . Reason :A nervous thought i.e. " A messaging system has been implemented which needs a subscriber to process a large volume of messages traffic coming from various upstream publishing priduction systems? Question:"Can it Sustain for shorter and longer period?? " .Khounish meets up the business and development team and here is what he discusses in detail.
Business Development Manager(BDM): Hi Khounish !!! We are from a XXX department .We need you to certify the a messaging layer stack that includes traffic coming through a production system . we need to know whats the breakpoint of the subscriber process . Please suggest us a non functional strategy around performance testing for the same.
Khounish: Hi BDM!! thanks for letting me know the requirement at high level. However, before providing the strategy/approach , i would need some elicitation around the requirement.
BDM and Development: Sure !!!Why Not !!! we are happy to help you to help us!!!
Khounish : Great!! Below are my queries to Dev and Business team
To Business Team:
  • Please provide us a demo of the publishing system/application ,areas of application that generates the message traffic.
  • Please provide us a architecture diagram of the pub/sub system
  • Please provide us the logical flow and business critical transactions
  • please provide us expected traffic volume
  • please provide us an anticipated growth factor of the message traffic in future ( say in 6 months)
  • what is expected traffic volume when there are some irregular pattern /situation happens(bad weather)
  • whats the long term landscape of traffic increment (say 2 years)
  • what's the production vs staging environment ratio.
  • is it a single layer or multilevel setup . i.e is it one publisher->queue->one subscriber or one publisher->queue->(subscriber->publisher)->queue->subscriber.......... and so on
  • does the queue setup involve single or multi vendor

To Development team
  • Can performance testing team play around with production traffic or do we need to simulate the traffic and becomes a dummy production who can throttle the traffic.
  • If simulations needs to be done , can the same message be posted in replay mode if not what are the constraints
  • what is the application server used
  • what is the operating system used
  • can you please provide us a sample message format
  • can you please provide us the Queue Connection details such as connection factory details,port,queue name,host name,user name,password
  • can you provide is the log path of a well instrumented log
  • can you provide us a starting statement of the message/ending statement of the message so that we can know when a particular message starts and when it it ends.
  • Is it a single layer or multilevel setup . i.e is it one publisher->queue->one subscriber or one publisher->queue->(subscriber->publisher)->queue->subscriber.......... and so on
  • does the queue setup involve single or multi vendor
BDM: "My comments are mentioned below"
  • some body from our team would provide demo of the publishing system/application to Performance testing team and would point out areas of application that generates the message traffic:
  • would provide architecture diagram of the pub/sub system
  • would provide the logical flow and business critical transactions
  • please provide us expected traffic volume: 10 million a day
  • please provide us an anticipated growth factor of the message traffic in future ( say in 6 months) :13 millon/day
  • what is expected traffic volume when there are some irregular pattern /situation happens(bad weather);20 million/230-350
  • whats the long term landscape of traffic increment (say 2 years):40 million
  • what's the production vs staging environment ratio: Production has 4 class 8 servers( hyper-threaded 48 cpu core based ,128GB ram 1 DB shared across other apps).Staging/performance environment has 2 class 8 servers (hyper threaded 48 cpu core based ,128GB ram 1 DB shared across other apps)
  • Is it a single layer or multilevel setup . i.e is it one publisher->queue->one subscriber or one publisher->queue->(subscriber->publisher)->queue->subscriber.......... and so on:
  • Architecture includes the HygienicAirlinesReservation system which publishes messages to queue which is subscribed by a process which populates a oracle database
  • does the queue setup involve single or multi vendor : single vendor
Development Manager:" My comments are inline"
  • Can performance testing team play around with production traffic or do we need to simulate the traffic and becomes a dummy production who can throttle the traffic. : Yes performance testing team can play around with production traffic
  • If simulations needs to be done , can the same message be posted in replay mode if not what are the constraints: Simulations does not need to be done.but better to put different data set in regular intervals. i.e. throttling production traffic would be sufficient.
  • what is the application server used: Weblogic with Jrockit 64 bit
  • what is the operating system used : Linux redhat 5.2 64 bit
  • can you please provide us a sample message format: would provide
  • can you please provide us the Queue Connection details such as connection factory details,port,queuename,hostname,username,password: woudl provide
  • can you provide is the log path of a well instrumented log: would provide
  • can you provide us a entry statement of the message, exit statement of the a message block so that we can know when a particular message starts and when it ends.: would provide

Khounish : Thanks for the information . Based on the elicitation questionnaire.Please find the strategy/approach mentioned belowWe would define the approach through a test plan which would include below
Objective of the test: Objective of the testing is to certify the HygienicAirlines FilightInformation messaging stack through all non functional-performance test types .objective is to observe the impact of throttled load on operating system,application server ,network,database .Its to be tested that subscriber that would process the published message should sustain to any load criteria without deviating from SLA.
Load modelling : Load modelling is defined as the model using the reference of which various volumetric load pattern can be defined through which subscribing application has to sustain through. For HygienicAirlines FilightInformation messaging stack below are load model for various test.
  • Load volume : SLA :10 million/day->10000000/24=416667/hour->416667/60=6945/minute->6945/60=115/second(this is based on same volume distributed over 24 hours)
  • Stress volume:SLA: 13 million million/day->13000000/24=541667/hour->541667/60=9027/minute->9027/60=150/second(this is based on same volume distributed over 24 hours)
  • Spike/Irregular volume:SLA: 20 million million/day->20000000/24=833333/hour->833333/60=13888/minute->13888/60=232/second(this is based on same volume distributed over 24 hours)
  • Scalability volume:SLA: 40 million million/day->40000000/24=1666667/hour->1666667/60=27778/minute->27778/60=463/second(this is based on same volume distributed over 24 hours)

Test Scenarios vs Approach(Business critical path) : As per test scenario post discussing with dev and business its decided that payload/test scenarios would be to play around with original production traffic feeding through staging environment.As for as basic load testing is concerned this can be achieve by letting production traffic be coming in and observing the impact on subscribing system. but from production there would not be constant traffic coming in and the need is to throttle the traffic. Our Performance test would like to approach the throttling with DST (Divert->Store->Throttle) technique (will be described in final section of article)
Architecture vs Test Type merging and Scoping: As its a new stack hence Performance testing team would like to perform all types of performance testing on current stack.
Smoke test: As a part of this test Performance tester would perform the throttle framework test with some sample payload message i.e. 10 messages/second for 15 minutes and would observe the queue.
Load test: Load test is performed to satisfy the expected production load volume.Load test would be performed based on basic load modelling.As a part of this test ,Performance tester would perform the throttle framework test with a payload of 115 messages/second for 120 minutes and would observe the queue,operating system,server side statistics.
Stress test:Stress test is performed to satisfy the future growth factor of traffic which could be double/triple or partial increment ( if load =1x partial increased load =1.2/1.5/2.2/5.2).Stress test would be performed with stress load modelling.As a part of this test Performance tester would perform the throttle framework test with a payload of i.e. 150 messages/second for 120 minutes and would observe the queue,operating system,server side statistics.
Spike test:Spike test is like sudden death based heavy traffic together. Flight cancel or divert /latency/weather effect might generate such traffic man.Spike test would be performed with spike/irregular payload modelling.this test would be performance in combinations with a load pattern.As a part of this test Performance tester would perform the throttle framework test with a payload of i.e. 115messages/second for 60 minutes and on 61 th minute the volume will be spiked to 230-350 message for would observe the queue,operating system,server side statistics.
Scalability test: Scalability test some time is also called as breakpoint in messaging layer this is done by piling up enormous number of messages on the queue and then release it together , In this case we would take the similar path and test if the system scales to 465 messages(as per sla).
Endurance test: This test is as same as load test with a difference of running it for longer period. Endurance test helps in determining any memory leak issue etc..Endurance test would be performed based on basic load modelling.As a part of this test ,Performance tester would perform the throttle framework test with a payload of 115 messages/second for 72 hours and would observe the queue,operating system,server side statistics.
  Recapitulation :  In previous article Khounish et all Performance  testing team decided to test the new HeigenicAirlines Messaging tier stack  with all possible test types through load modelling against expected SLA defined by Business.
 Business team still  nervous as whats going on. Business is still not clear on 
  • How would Khounish and team going to perform the test
  • What are the other help khounish would need from dev and Business
  • What are the metrics Khounish and team are going  provide
  • are the monitoring setup in place.
Business  has provided the architecture as mentioned below

 As per the above figure  FlightOperations-production applications publishes messages to the  staging queues  which in turn is subscribed by a java based subscribing  process which continuously polls the queue processes and updates the flight information DB behind the scene.
As discussed in the previous article, the payload from production is  to be used for the testing. the question arises that how to handle the Throttle condition to satisfy stress,spike,scalabilitly factors. for the same reason  Khounish came up with below  approach  with business. Here is the story
Khounish calls up for a meeting with business  and development team post deciding on the NF-test approach. 
Khounish:  Good Morning Business and Development team. Our NFQA team has gone through all the requirements and  to have come up with resolution on how to handle throttle conditions of messaged posting.
Below is the  approach diagram attached with architecture diagram.
           
Approach for  Testing Throttle Conditons: NFQA team would  play Snatch-and-Blow technique  i.e. snatch the messsages from original queue an then  blow the  accumulated messages volume through throttle conditions. Below are detail steps.
  • NFQA  would place a request  middeware team to create a temp queue
  • NFQA team would request middleware to  divert the traffic of publishing system from orginal queue to the temp queeu
  • NFQA team would write a custom Test framework code to pull the messaged from the queue and store on local filesystem.
  • NFQA  Test code framework would be using configurable parameter, through framework would pull  number of messages from local file system based on load/stress/spike/scalability  criteria 
  • NFQA  test code framework would post messages to  original queue and observed the  impact . 
Ideal Business and development team:  Hi  Khounish!!!  thanks for sharing the approach. however we have some questions  such as , what about monitors and metrics. what are the system would you  be monitoring and  what are the expected metrics
Khounish : very good questions in deed!!! well NFQA team tries to  go as much detail as possible.
Monitoring and Metrics would be catagorized  through below dimensions.
Operating System Metrics:  Heavy volume can have deep and vast breaking impact on  operating system .hence some of the  important metrics  would be monitored  they are Processor (CPU) : MemoryPage/Swap,Network,Disk I/O,Process Utilization .The tool that would be used for grabbing the information are sitescope/nagios/nmon and NFQA written shell scripts
Application Metrics:Heavy traffic can also have deep impact on operating system because of un-tuned application container parameter.  a well  tuned subscribing application can scale to a great level hence some of the application server parameters needs to be monitored. Thread pool,Connection pools,Transactions or message processing /sec,successful transactions,Failed transaction,Transaction Response Times.. These metrics can be pulled up through well instrumented server side log and admin console of application servers.
Virtual Machine Metrics: Messaging Application Servers are built on top of virtual machines that plays a vital role in memory mananagement through technique called garbage collections. and un-tuned virtual machine or application can lead to Memory leakage issue and this a larger segment of running memory can clog generating Out of memory issues. A well tuned Virtual machine server can scale to a great level hence some of the application server parameters needs to be monitored Below are the metrics taken for Messaging Application servers.Class loaders,Garbage collections,Heap size / Used heap size,Young and tenure collections,gc log analysis,heap dump analysis ,native vs muxer thread analysis.The used tools are Jconsole,Jvisualvm,Jhat,JRMC,Manual analysis through a sump file.
Database Metrics: The first most observation should be done on the capability of end DB system which messaging subscribing system updates at the pace it receives the messages from Publishing system. the correct information on Database can lead to a better scaling or predicting capacity of handling load.Some of the metrics are I/O utilization,Data cache utilization,Latch contention ,Memory pool usage,reads and writes ratio on database,transaction completion time . As the database here is oracle so AWR reports will be used
Business and Development team  " Great !!!! GO ahead!!! Happy testing!!!!"