Performance Testing Java Rest Clients

Being a member of the technical track of a large integrated development team occasionally has it benefits. In the smaller teams I am usually involved with it is often the case developers are forced into a continuous cycle of rush, rush, rush, deliver. Move onto the next project. The original developers responsible for a deliverable are often not there when it comes to pushing the product out to live. Thankfully that is not the case in my current employer. Recently though one of the teams in our department requested my help to track down an issue which was causing their web service to perform very poorly in the QA environment. Initially they thought it to be an issue with the network connection between the client (attached to our main web application) and the new machine which their service was hosted on. A couple of WGETs from the client server proved however that this was probably a problem the their client implementation.

We quickly bashed out a performance testing application by raiding and simplifying our production code for the most commonly used http clients. Since we have about 10 web services associated with our mainline web application we were able to find a decent spread of implementations. We yanked three handlers of particular interest:

  • An Apache Commons client handler (commons-httpclient 3.1)
  • A Jersey Client handler (jersey-core 1.11)
  • A client making use of the Spring RestTemplate (springframework 3.0.5-RELEASE)
  • We also created a four client using the java.net package from the SDK library. We used this as a control.

The objective of the experiment was to see how their teams handler stacked up against the our existing ones. Since the task had a fairly high priority at this stage we were assigned a veteran system administrator to deploy and test it asap. He pushed it out to machines with the following specifications:

  • A: 1.4Ghz processors Sun SPARC, Solaris
  • B: 2.85Ghz processors, Sun SPARC, Solaris
  • C: 2.4Ghz processors, X86, Solaris
  • D: 3.47Ghz, Linux
  • E: 3.47Ghz, Linux (This machine was also the host of the web service we were performing GET requests on) The experiment immediately revealed that their handler (which used the Jersey Client) performed particularly badly when 100 sequential calls were made to the server. Here are the results:

Environment: Native, Jersey, Apache, Spring
A: 783, 29385, 3761, 1285
B: 505, 7625, 2130, 890
C: 506, 6563, 2000, 1153
D: 451, 3765, 870, 594
E: 469, 3812, 860, 520

Reviewing the teams Jersey calling code showed that they were misusing the Jersey client. Although the client looks like a lightweight object from its creation code (Client.create()), It is actually an expensive object to spawn (something noted in the Jersey java docs). We converted it to a Singleton and retested. It performed significantly better in all the environments:

Environment: Jersey Client (Singleton)
A: 2033
B: 1014
C: 1074
D: 735
E: 676

So another lesson learned the hard way. But at-least we caught it in our testing phase. Looking at the final results though - If you are in search for a speedy third party implementation (which should give you some additional bells and whistles) the Spring RestTemplate appears to have the fastest underlying implementation.

If you are interested in running the tests yourself I have pushed the test source to github: java-http-get-clients. Let me know if you get different results.

Restructuring a Large Codebase: Lessons - Part 3 (Coding Style & Anti-patterns)

This is a multipart post. It will probably make more sense to you if you read the introduction before reading this.

The Law of Demeter

A.getB().getC().mutateCRightHere(). Everyone likes short three or four line methods to get the job done but this is not the way to do it. There are at least four really important problems with the above code. One - It is incredibly prone to NPE’s. Two - it clearly violates the Law of Demeter (Objects should only talk to immediate friends to ensure loose coupling). Three - it violates the “Tell, Don’t Ask” style of programming which leads to objects like A having intimate knowledge of C (because it can mutate it). C should not expose its inner workings to anyone let alone objects two (dependency graph) edges away from it. Probably most importantly from a support perspective: four - by practising this style you are asking your fellow programmers to load up the same context you had in your mind when you were writing it. When you or your peers come back to maintain code (remembering you are likely to read many hundreds of files that day) you will be forced to continuously load up context after context into your short term memory. This can be incredibly taxing over the long term. Be a terse, punchy programmer, one line does one thing. Its good for the code and its good for you.

Defend against anti-patterns

It is astonishing the amount of anti-patterns that turn up in a large codebase. Stuff that everyone knows is bad for long term longevity of the application turns up by the barrel full. I sometimes sit in disbelief at the amount of public java static methods we have. Insanely hard to test and against the basic principles of OO they are usually found in badly named Util classes which seem to attract all manner of junk behaviour. Helper objects are just as bad, often developers will reach for the ‘helper’ suffix when coming up with a real name requires too much thinking. I would advocate that instead of creating yet another ambiguous behaviour bucket, that you instead give the object a descriptive name along the lines of the behaviour you are about to put in it. A SecurityHelper object with .isUserLoggedIn() becomes UserLoggedInChecker. This style of programming makes it harder for you and your team members to create large in-cohesive God objects.

Restructuring a Large Codebase: Lessons - Part 2 (Modules)

This is a multipart post. It will probably make more sense to you if you read the introduction before reading this.

Review and Refactor

We have found time and again that a large proportion of our code tends to be thrown roughly in the place it needs to be, but usually not exactly where it ought to reside. This can manifest itself in methods/behaviours that are in the wrong classes as well as fields/attributes attached to the wrong object. It may be hypercritical as we do have the benefit of hindsight, but when you take the time to look at two or three related packages in a module these mistakes often pop out and glare at you. Not taking the time to refactor them as soon as you spot them tends to cause developers down the lifecycle chain grief when both maintaining and extending your code.

Another observation that we have made is that developers are adverse to creating new projects or modules (something you might find funny to hear from a 170+ module codebase). Devs will often piggy-back their objects or packages on modules that are related to or have similar needs to the code they are writing even if they do not make logical sense. The most extreme and horrific manifestation of this is the all to often used ‘common’ module. When you get to any codebase of a reasonable size it is very unlikely that all your applications will share many if anything in common. Your applications will probably have different purposes and therefore require different perspectives on your data, meaning your objects will have similar but not the same concern. Creating a common project only invites pain down the road. Lazy developers use it as a place to dump anything they create that is remotely generic and inexperienced ones will often optimise their code early thinking that it might be useful for the future. While this might not seem bad on the surface, the repercussions can be astounding. Over half of our ‘generic’ objects in our root module were only ever used in one other place. This meant a single change to an object that is only used in one place can trigger a reactor build that bubbles and requires the testing of the entire base.

I guess if you want to take only one thing away from this mutipart post it should be this… build your modules like you do your objects, that is: highly cohesive. Small, focused and well-tested modules are the holy grail of software development, they can be reused easily and counted on to not create bloat through transitive dependencies. Additionally, small modules are both individually robust and less prone to architecturally degrade over time. All application modules start small but tend to grow, keeping them small by refactoring is important. Applications built on the foundation of many tiny pillars are more likely to provide lasting business value over the longer term (because its easier to extend or swap out pieces of them). Don’t keep your eggs in one big basket, not even in ten or twenty, use five hundred.