In Search Of Performance (2)

❠Just remember, once you’re over the hill you begin to pick up speed.❞

Charls M. Schulz

This article on performance was not actually intended to be a multi-part article. But if you take a closer look at your own code through performance glasses, you will quickly discover further bottlenecks. In this case, it’s an unclean modeling that FreshMarker has been carrying around since its early days. This article is about how this affects the performance of the template engine.

Any number of complex expressions can be used in the FreshMarker template. These expressions are realized internally as trees of TemplateObject implementations. Some implementations such as TemplateNumber and TemplateBoolean implement data types, while others such as TemplateJunction and TemplateRelational realize operations within the expression.

When a Template is used, the expressions are evaluated and the result is printed or used for decisions. The inaccuracy of the modeling results from the fact that the TemplateObject implementations have the following method, among others.

TemplateObject evaluateToObject(ProcessContext context);

When evaluating an expression, this method is called on the root node. Depending on the type of implementation, further evaluateObject calls are then executed in the tree.

public record TemplateSign(TemplateObject expression) implements TemplateExpression {

  @Override
  public TemplateObject evaluateToObject(ProcessContext context) {
      return expression.evaluateToObject(context).negate();
  }
}

In this example, the TemplateSign class calls the evaluateToObject method from its subnode expression to negate its result with the negate method.

The problem here is that the nodes in the expression tree are of the same type as the results produced. The dynamic results and the static nodes in the expressions have an important difference. The nodes in the expression tree must not have a state. With a state, it would no longer be possible to use the same template instance for parallel calls. Otherwise, there would be a risk that information from one call would be used in another call. For this reason, information that is call-specific is passed into the methods via the ProcessContext.

Some TemplateObject implementations such as TemplateJunction, TemplateExists, TemplateSign only exist in the expression trees and therefore must not contain a state. Other implementations such as TemplateNumber, TemplateString and TemplateNull can exist in the expression tree and as a result, so they can have a state depending on their usage. Last but not least, there are some implementations that only exist as a result, such as the TemplateSequenceLooper and the TemplateHashLooper.

The TemplateSequenceLooper and the TemplateHashLooper are two auxiliary objects that can be used to obtain the current value of the loop variable and the loop metadata. As both implementations can have a state, they contain the list used in the loop and the index to the current element in the list.

@Override
public TemplateObject evaluateToObject(ProcessContext context) {
  Object object = sequence.get(index);
  if (object instanceof TemplateObject templateObject) {
    return templateObject;
  }
  return context.mapObject(object);
}

public void increment() {
  index++;
}

The evaluateToObject method takes an element at the current position from the list and maps the value in the list to the internal representation as a TemplateObject if required. If the loop variable is now used several times in the loop, this is carried out each time, although the object cannot have changed.

As both implementations may have a state and actually already use it, performance can be improved with a little trick.

@Override
public TemplateObject evaluateToObject(ProcessContext context) {
  if (current != null) {
    return current;
  }
  Object object = sequence.get(index);
  if (object instanceof TemplateObject templateObject) {
    current = templateObject;
    return templateObject;
  }
  current = context.mapObject(object);
  return current;
}

public void increment() {
  index++;
  current = null;
}

In addition to the list and the index, the current element in the list is now also saved as a TemplateObject. If evaluateToObject is called and the element has already been determined, it is returned directly. Otherwise it is calculated as before. If the index is incremented in the increment method, the stored value current for the current element is deleted so that it can be recalculated in another loop if necessary.

This rather inconspicuous change has a major impact on the performance of FreshMarker. While it previously ranked just ahead of Mustache and Freemarker, it can now move ahead of Velocity and Trimou for the benchmark used.

Freemarker.benchmark    thrpt   50  31451,022 ± 318,743  ops/s
FreshMarker.benchmark   thrpt   50  51253,174 ± 610,095  ops/s
Handlebars.benchmark    thrpt   50  32204,790 ± 432,129  ops/s
Mustache.benchmark      thrpt   50  32204,285 ± 223,241  ops/s
Thymeleaf.benchmark     thrpt   50   3078,860 ±  35,203  ops/s
Velocity.benchmark      thrpt   50  36140,685 ± 449,326  ops/s
Pebble.benchmark        thrpt   50  54861,944 ± 544,948  ops/s
Trimou.benchmark        thrpt   50  48192,131 ± 787,200  ops/s

On the one hand, it is a great success in improving performance. On the other hand, however, it also shows that a stricter separation between results and expression tree would have significantly increased performance much earlier.

Leave a Comment