Loadtesting using Markov chains to simulate user behavior

One of my clients has a rather large SharePoint farm and uses the add-in / provider-hosted-app model a lot. Everyone who looks at the topology and machine specs, expects the farm to provide a ‘huge’ amount of performance. In practice the users report that its rather slow.

The guys in maintenance had done some rudimentary measurements and didn’t really see any reason to doubt the machine specs. So we were wondering…

how come that users are perceiving a lack of performance? Are we measuring correctly? How do we compare the farm’s performance before and after major changes? If we fix bottlenecks, how far can we push the farm?

These are different questions, that all need the results from a ‘proper’ loadtest. So what is a ‘proper’ loadtest and how do we build that?

Besides the criteria described in My Definition of Done for performance and loadtest cases, a proper loadtest needs to simulate realistic behavior patterns. In our case that means:

  • Users will choose to sequentially do many different actions in a short time period.
  • Some actions are usually only done by some types of users. Some actions are usually done by all types of users.
  • Action sometimes depend on data that was generated during an earlier action.
  • Some actions are always followed by another type of action.
  • Its not fixed exactly in which order and when the actions will be executed. Its dependent on probabilities.
  • We have a rough idea what the probabilities are, but we expect these probabilities to change as time progresses and our knowledge increases. We don’t want to invest significant time to change the tests.

In order to satisfy the above, I decided that a plain-old operational profile wasn’t suitable. We needed to combine it with a Markov chain (see Kozialek’s paper on Operational Profiles). Also I wanted the loadtest to decide during run-time which path through the chain would be taken. I really didn’t want to generate the chains at design time and have to redo them once the probabilities are measured more accurately. Visual Studio doesn’t support this out of the box, so I had create some plug-ins that would achieve this:

Plugin Type Purpose
GenerateRandomNumber WebTestRequestPlugin Generates a random number between a user defined lower- and upperbound. Visual Studio only had a plugin that would generate the number once during a test. I needed to generate it every iteration within a single test.
NumberInRange ConditionalRule Executes a block of requests only when the input number falls in a user defined range. This lets me dynamically select the transtions/action in the Markov chain like in the following pseudo-code:

probability = GenerateRandomNumber(1,100)
if probability between 1 and 5: do action1; stay in current state;
if probability between 6 and 90: do action2; go to state xxx;
if probability between 91 and 98 do action3; go to state yyy;
if probability between 99 and 100: go to exit state;
ChooseDocumentSet WebTestRequestPlugin About 70% of the users will select a case-file and start working with it. This plugin queries SharePoint to find a suitable document set and ensures that this document set is never shared by multiple users.
ForEachCharacter ConditionalRule (loop) Simulates a user typing into a people-picker. Its a plugin that that executes a block of requests for each character in user defined context parameter.
GenerateGuid WebTestRequestPlugin Is able to generate a new GUID every time a certain request is executed instead on only once per test.
ExtractQuerystringParameter ExtractionRule Retrieves the value of a specific querystring parameter. My system dynamically generates information and returns a URI containg this information in various querystring parameters. My testcases needed to remember these values for later use.
ClearContextParameter WebTestPlugin Is able to clear the value of an existing context parameter each time a .webtest is called by another webtest.