In Search Of Performance

❠Caches aren’t architecture, they’re just optimization!❞

Robert Pike

In addition to the expressiveness of the template, the number of supported data types and functions and the expandability, performance is a decisive criterion for a template engine. FreshMarker only evaluates the template output at runtime, so although the template engine is very flexible, it cannot compete with those that evaluate the templates at compile time. For historical reasons, FreshMarker also has no upfront type information about the data used, so some optimization options are not available.

Software achieves good performance through two strategies. Firstly, through a lean algorithm that optimally describes the software’s use case. The wrong algorithm for a task usually results in extremely poor performance. The other strategy is to optimize calculations that are either very slow or are repeated extremely often. Unfortunately, these two strategies very rarely complement each other because the path of a clean algorithm has to be abandoned for optimizations. I already had to make this experience in the article FreshMarker Performance, because many things that simplify the algorithm drag down the performance.

At irregular intervals, I initiate a test for template engines that compares FreshMarker with some other template engines. The test uses the Java Microbenchmark Harness (JMH) to measure the throughput for a specific template for a number of template engines.

Of course, the result of this test is only a rough approximation of the actual performance of a template engine. The actual performance depends heavily on the use case, the available memory, the additional frameworks used, the size of the templates and various other factors. Use cases can also be constructed in such a way that some template engines cannot serve it at all.

Some results of an example test run with JMH is shown below.

Benchmark               Mode    Cnt Score     Error       Units
Freemarker.benchmark    thrpt   50  29473,704 ±  233,323  ops/s
FreshMarker.benchmark   thrpt   50  31611,732 ±  631,232  ops/s
FreshMarker2.benchmark  thrpt   50  86514,958 ± 1064,813  ops/s
FreshMarker3.benchmark  thrpt   50  34939,549 ±  449,473  ops/s

The figures should only be considered in relation to each other, because different hardware naturally delivers completely different absolute figures.

The Score values for Freemarker are slightly below those for FreshMarker, which means that the main goal of the performance optimizations for FreshMarker has been achieved for the moment. It is actually only a snapshot, as new functionalities in FreshMarker always have an impact (good or bad) on performance.

The other two lines labeled FreshMarker2 and FreshMarker3 are two more interesting figures. FreshMarker2 is a compiled version of the test case and FreshMarker3 is a new experimental optimization.

A compiled version of the template has a much higher throughput than an interpreted version. Unfortunately, it will be quite some time before the FreshMarker compiler can compile all templates.

The FreshMarker and FreshMarker3 tests both contain optimizations for the next FreshMarker version 2.2.0. For these optimizations, the question was, how can I give FreshMarker more information about the data model? There were already ideas for this. For example, the interpolations could be provided with type information or adding information to the data model in the form of a schema. Both variants have the disadvantage that they interfere deeply with the way the template engine works and require major modifications.

During these explorations, there were a few minor surprising discoveries that led to small improvements. The first discovery was that little information about the variables had been used up to that point. Previously, when a variable name appeared in an expression, an instance of the following class was inserted into the expression tree.

public record TemplateVariable(String name) implements TemplateExpression {

  @Override
  public TemplateObject evaluateToObject(ProcessContext context) {
    TemplateObject value = context.getEnvironment().getValue(name);
    if (value instanceof TemplateLoopVariable loopVariable) {
      return loopVariable.evaluateToObject(context);
    }
    return value;
  }
}

The TemplateVariable retrieves the value of the variable name from the environment and returns it. Unfortunately, it is checked each time whether value is a TemplateLoopVariable in order to evaluate it if necessary. The TemplateLoopVariable holds the current values of a loop pass.

<#list bean as key, value>
${key} ↣ ${value}
</#list>

In this example, TemplateLoopVariable instances contain values for key and value. The special treatment for the TemplateLoopVariable can therefore be omitted from the evaluation if we first check whether it is a list variable or another type of variable. The type of a variable can be easily evaluated by creating a dictionary that stores the name of the variable and its type. A variable is entered into the dictionary when the parse tree evaluation passes through the location of its definition and its type is retrieved at the location of its use. For our loop variables, the location of their definition is visited in the following visit method.

public List<Fragment> visit(ListInstruction ftl, List<Fragment> input) {
  dictionary.push();
  TemplateObject list = ftl.get(3).accept(interpolationBuilder, null);
  String identifier = ftl.get(5).toString();
  dictionary.putVariable(identifier, VariableType.KEY);
  int index = 6;
  Comparator<String> comparator = null;
  if (ftl.get(index).getType() == TokenType.SORTED) {
    comparator = COMPARATORS.get(ftl.get(index + 1).getType());
    index += 2;
  }
  String valueIdentifier = null;
  if (ftl.get(index).getType() == TokenType.COMMA) {
    valueIdentifier = ftl.get(index + 1).toString();
    dictionary.putVariable(valueIdentifier, VariableType.VALUE);
    index += 2;
  }
  // ...

The List Directive is analyzed here and various data is extracted. These are, in the sequence of their occurrence, the TemplateObject list, which contains the data of the list. This can be a variable, a constant or an expression of any complexity. This is followed by the String identifier, which contains the name of the run variable for lists or the key for hashes. After an optional Comparator for the hash key, the String valueIdentifier for the hash value can optionally follow after a comma.

In the dictionary, identifier is entered with VariableType.KEY and valueIdentifier with VariableType.VALUE. Similar things happen in other places for other variables.

At the beginning of the method, you probably noticed the call dictionary.push(). Of course, the dictionary must respect the scopes of the variables, so a new definition area is opened in the dictionary at the start of processing the List Directive. Otherwise, a loop variable would overwrite an outer variable with the same name.

The dictionary is evaluated in the InterpolationBuilder. The InterpolationBuilder is a visitor that generates the necessary TemplateObject instances in the expression trees from the parse tree of the expressions.

case IDENTIFIER -> {
  VariableType type = dictionary.getVariable(image);
  yield switch (type) {
    case MODEL -> new ModelVariable(image);
    case LOOPER -> new LooperVariable(image);
    case KEY, VALUE -> new KeyValueLoopVariable(image);
    default -> new DefaultTemplateVariable(image);
  };
}

When evaluating tokens of type IDENTIFIER, an instance of TemplateVariable was previously created. Now a specialized variant is created based on the type in the dictionary. A KeyValueLoopVariable instance is created for KEY and VALUE and a ModelVariable instance for MODEL. Both no longer require an instanceof check.

Another rather inconspicuous optimization resulted from the fact that the TemplateBuiltIn instances have to search for their implementation each time they are called. This is due to the fact that the type of a variable only becomes known during evaluation. However, since many Built-Ins have namesakes on the different types, it is necessary to check which type is present before calling the Built-In. This problem cannot be solved in general at the moment, but it can actually be avoided for some Built-Ins. If there is only one Built-In with a special name, e.g. snake_case, then the Built-In you are looking for is already known. So why use a TemplateBuiltIn instance for these Built-Ins that searches for their Built-In instead of a specialized HookedBuiltIn that knows its Built-In already?

The optimization only provides small improvements, but it has opened an interesting door. When I have run through the template for a data model, I know the types of model variables that have been used. If I could remember some of this information, it would be beneficial for the performance of subsequent calls.

To test this, a new method hook was added to the Template class. Similar to the reduce method, it returns a new Template instance.

public Template hook(Map<String, Object> dataModel) {
  HashMap<Class<?>, TemplateObjectProvider> currentMap = new HashMap<>(templateObjectProviderMap);
  ProcessContext processContext = createContext(context, dataModel, new StringWriter(), userDirectives, featureSet, currentMap, localContext);
  processContext.setResourceBundle(resourceBundleName);
  try {
    rootFragment.process(processContext);
  } catch (TemplateReturnException e) {
    log.debug("return exception: {}", e.getMessage());
  }
  return new Template(context, path, rootFragment, featureSet, localContext, currentMap);
}

This call adds a map currentMap to the ProcessContext, which stores for each data type used in the model which TemplateObjectProvider is used to generate an internal representation for it. This mechanism used to be quite expensive because a series of TemplateObjectProviders had to be checked for each object until the BeanTemplateObjectProvider finally took over the work or reported an error. In addition, it cannot simply be optimized because Freshmarker also has an extension point there.

private TemplateObject wrapByProviders(Object o, Object current) {
  for (TemplateObjectProvider provider : providers) {
    TemplateObject object = provider.provide(this, current);
    if (object != null) {
      return object;
    }
  }
  throw new UnsupportedDataTypeException("unsupported data type: " + o.getClass());
}

At this point, the list is run through all TemplateObjectProviders, if one of them can process the transferred value, the loop is terminated. This mechanism follows a simple contract, if you process one instance of a type, then you process all of them. So we can cache the TemplateObjectProvider for a type if it has processed an object of that type. Instead of going through the list of TemplateObjectProviders for this type again and again, we can use the TemplateObjectProvider from the map.

private TemplateObject wrapByProviders(Object o, Object current) {
  TemplateObjectProvider cached = templateObjectProviderMap.get(o.getClass());
  if (cached != null) {
    return cached.provide(this, current);
  }
  for (TemplateObjectProvider provider : providers) {
    TemplateObject object = provider.provide(this, current);
    if (object != null) {
      templateObjectProviderMap.put(current.getClass(), provider);
      return object;
    }
  }
  throw new UnsupportedDataTypeException("unsupported data type: " + o.getClass());
}

We fill the map that we use here to cache the TemplateObjectProviders in the hook method and pass it to the new Template. This means that, in the best case, this Template never has to search for a TemplateObjectProvider again, but has all of them in its cache. One disadvantage, however, is that this Template can now still use the stored TemplateObjectProviders. A variable cannot now sometimes refer to a bean and then again to a map or a record. Such cases now all end with an error.

The hook method is an experimental feature that offers some more interesting possibilities. If the types of the variables are now known, even more simplifications can be made in the expressions.

Leave a Comment