FreshMarker Performance (1)

“If you optimize everything, you will always be unhappy.”

Donald Knuth

FreshMarker started as an academic project. The main aim was to show how a template engine can be created. At some point, however, the question arose as to how fast the template engine actually is. To be honest, it has been slow so far. But first things first.

FreshMarker is normally used to generate source code or e-mails. Areas where a high throughput is often not required. After stumbling across a template engine benchmark again, I adapted it for FreshMarker out of pure curiosity.

The benchmark checks how often an HTML page can be generated from a template. The time for creating the template is not measured. Most of the template is static text. The dynamic part of the page consists of a list of elements. The existing FreeMarker template could be converted to FreshMarker very quickly.

<#list items as item with looper>
<tr class="${looper?item_parity}">
  <td>${looper?counter}</td>
  <td><a href="/stocks/${item.symbol}">${item.symbol}</a></td>
  <td><a href="${item.url}">${item.name}</a></td>
  <td><strong>${item.price}</strong></td><#if (item.change < 0.0)>
  <td class="minus">${item.change}</td>
  <td class="minus">${item.ratio}</td><#else>
  <td>${item.change}</td>
  <td>${item.ratio}</td></#if>
</tr>

The only difference to FreeMarker is the Loop-Variable looper with its Built-Ins item_parity and counter.

After the benchmark is started, we get a result of approximately 4500 pages per second. A figure that looks good for the application scenario and the lack of interest in performance.

Benchmark               Mode  Cnt     Score     Error  Units
FreshMarker.benchmark  thrpt   50  4468,465 ± 265,536  ops/s

If the benchmark for FreeMarker is run, this template engine shines with around 15500 pages per second.

Benchmark              Mode  Cnt      Score     Error  Units
Freemarker.benchmark  thrpt   50  15598,189 ± 668,014  ops/s

With such a big difference, it is not difficult to decide to look into performance after all.

However, optimizations should never be based on assumptions, which is why an analysis of the runtime behaviour is necessary. A useful tool here is the VisualVM Profiler. It shows how much time was spent in which methods and how often methods were called.

The results for the benchmark provided some interesting insights that were used to improve the performance of FreshMarker.

A first interesting point is the high cost of the NumberFormatter used to display numbers in FreshMarker. Since there is an easy way to prevent the use of the NumberFormatter, this was chosen for the time being.

<#list items as item with looper>
<tr class="${looper?item_parity}">
  <td>${looper?counter?c}</td>
  <td><a href="/stocks/${item.symbol}">${item.symbol}</a></td>
  <td><a href="${item.url}">${item.name}</a></td>
  <td><strong>${item.price?c}</strong></td><#if (item.change < 0.0)>
  <td class="minus">${item.change?c}</td>
  <td class="minus">${item.ratio?c}</td><#else>
  <td>${item.change?c}</td>
  <td>${item.ratio?c}</td></#if>
</tr>

The Build-In c (computer language) generates a simple representation for various data types. The corresponding toString method is used for numeric types.

This change in the test is directly noticeable and 5200 pages can be generated per second. However, this is only a simplification for the benchmark. In the case of special formatting of numbers, the poor performance of the NumberFormatter takes effect again.

Benchmark               Mode  Cnt     Score     Error  Units
FreshMarker.benchmark  thrpt   50  5148,130 ± 302,675  ops/s

However, the profiler also provides further figures that show potential for improvement. The template in the benchmark contains an extremely large number of ConstantFragment instances because texts and spaces are pulled apart during parsing. In addition, the whitespace handling from the article Whitespace Handling in FreshMarker ensures that concatenations of whitespaces are also split up if necessary.

Instead of processing many consecutive ConstantFragment instances, you could replace them with a single one.

@Override
public List<Fragment> visit(Token ftl, List<Fragment> input) {
  if (!input.isEmpty()) {
    Fragment fragment = input.getLast();
    if (fragment instanceof ConstantFragment constantFragment) {
      constantFragment.add(ftl.toString());
      return input;
    }
  }
  String image = ftl.toString();
  if (ftl.getType() == TokenType.PRINTABLE_CHARS) {
    input.add(new ConstantFragment(image));
  } else if (ftl.getType() == TokenType.WHITESPACE) {
    if (" ".equals(image)) {
      input.add(ONE_WHITESPACE);
    } else {
      input.add(new ConstantFragment(image));
    }
  }
  return input;
}

@Override
public List<Fragment> visit(Text ftl, List<Fragment> input) {
  String content = ftl.getAllTokens(false).stream().map(TerminalNode::toString).collect(Collectors.joining());
  if (!input.isEmpty()) {
    Fragment fragment = input.getLast();
    if (fragment instanceof ConstantFragment constantFragment) {
      constantFragment.add(content);
      return input;
    }
  }
  input.add(new ConstantFragment(content));
  return input;
}

The two visit methods have been extended to concatenate constant texts. This reduces the number of ConstantFragment instances and increases processing to 6500 pages per second.

Benchmark               Mode  Cnt     Score     Error  Units
FreshMarker.benchmark  thrpt   50  6519,047 ± 154,409  ops/s

The profiler also shows that processing lists with streams is not a good idea for performance. After the streams were replaced by enhanced for-loops, there was a shocking increase to 10500 pages per second.

Benchmark               Mode  Cnt      Score     Error  Units
FreshMarker.benchmark  thrpt   50  10797,210 ± 146,141  ops/s

Security always takes time, and FreshMarker is no exception. When accessing classes that are to be interpreted as TemplateBean, it is checked whether they are in lists for permitted or prohibited packages or classes. This check is time-consuming and uses the startsWith method of the String class.

@Override
public TemplateObject provide(BaseEnvironment environment, Object o) {
  Class<?> type = o.getClass();
  if (!environment.getChecks().contains(type)) {
    modelSecurityGateway.check(type);
  }
  environment.getChecks().add(type);
  return new TemplateBean(beanProvider.provide(o, environment), type);
}

The improved version of the test now first checks whether the corresponding class is already contained in a set with permitted classes. In this case, all further checks can be omitted.

Another improvement here is that the BaseEnvironment is accessed directly. In previous versions, many environment accesses were passed through the entire stack to Environment instances.

This change is also directly noticeable, now 12500 pages per second can be produced.

Benchmark               Mode  Cnt      Score    Error  Units
FreshMarker.benchmark  thrpt   50  12573,562 ± 49,704  ops/s

As a final change, the evaluated attributes of the TemplateBean instances are saved internally.

public TemplateObject get(ProcessContext context, String name) {
    TemplateObject templateObject = mapped.get(name);
    if (templateObject != null) {
        return templateObject;
    }
    Object object = map.get(name);
    if (object == null) {
        mapped.put(name, TemplateNull.NULL);
        return TemplateNull.NULL;
    }
    if (object instanceof TemplateObject t) {
        mapped.put(name, t);
        return t;
    }
    TemplateObject result = context.getBaseEnvironment().mapObject(object);
    mapped.put(name, result);
    return result;
}

An environment exists for all top-level variables to store the encapsulated model variables. However, attributes of a bean are evaluated and encapsulated again each time they are accessed. An additional Map in the TemplateBean now provides a remedy.

This optimization and a few other small adjustments now ensure that 14500 pages per second can be generated. That is 10000 pages more than at the beginning of this article.

Benchmark               Mode  Cnt      Score    Error  Units
FreshMarker.benchmark  thrpt   50  14722,349 ± 59,797  ops/s

In the next article on FreshMarker performance, we will see how Built-Ins and the TemplateBean become faster without reflections and how this can work at all.

Leave a Comment