❠Caches aren’t architecture, they’re just optimization!❞
Robert Pike
In addition to the expressiveness of the template, the number of supported data types and functions and the expandability, performance is a decisive criterion for a template engine. FreshMarker only evaluates the template output at runtime, so although the template engine is very flexible, it cannot compete with those that evaluate the templates at compile time. For historical reasons, FreshMarker also has no upfront type information about the data used, so some optimization options are not available.
Software achieves good performance through two strategies. Firstly, through a lean algorithm that optimally describes the software’s use case. The wrong algorithm for a task usually results in extremely poor performance. The other strategy is to optimize calculations that are either very slow or are repeated extremely often. Unfortunately, these two strategies very rarely complement each other because the path of a clean algorithm has to be abandoned for optimizations. I already had to make this experience in the article FreshMarker Performance, because many things that simplify the algorithm drag down the performance.
At irregular intervals, I initiate a test for template engines that compares FreshMarker with some other template engines. The test uses the Java Microbenchmark Harness (JMH) to measure the throughput for a specific template for a number of template engines.
Of course, the result of this test is only a rough approximation of the actual performance of a template engine. The actual performance depends heavily on the use case, the available memory, the additional frameworks used, the size of the templates and various other factors. Use cases can also be constructed in such a way that some template engines cannot serve it at all.
Some results of an example test run with JMH is shown below.
Benchmark Mode Cnt Score Error Units Freemarker.benchmark thrpt 50 29473,704 ± 233,323 ops/s FreshMarker.benchmark thrpt 50 31611,732 ± 631,232 ops/s FreshMarker2.benchmark thrpt 50 86514,958 ± 1064,813 ops/s FreshMarker3.benchmark thrpt 50 34939,549 ± 449,473 ops/s
The figures should only be considered in relation to each other, because different hardware naturally delivers completely different absolute figures.
The Score values for Freemarker are slightly below those for FreshMarker, which means that the main goal of the performance optimizations for FreshMarker has been achieved for the moment. It is actually only a snapshot, as new functionalities in FreshMarker always have an impact (good or bad) on performance.
The other two lines labeled FreshMarker2 and FreshMarker3 are two more interesting figures. FreshMarker2 is a compiled version of the test case and FreshMarker3 is a new experimental optimization.
A compiled version of the template has a much higher throughput than an interpreted version. Unfortunately, it will be quite some time before the FreshMarker compiler can compile all templates.
The FreshMarker and FreshMarker3 tests both contain optimizations for the next FreshMarker version 2.2.0. For these optimizations, the question was, how can I give FreshMarker more information about the data model? There were already ideas for this. For example, the interpolations could be provided with type information or adding information to the data model in the form of a schema. Both variants have the disadvantage that they interfere deeply with the way the template engine works and require major modifications.
During these explorations, there were a few minor surprising discoveries that led to small improvements. The first discovery was that little information about the variables had been used up to that point. Previously, when a variable name appeared in an expression, an instance of the following class was inserted into the expression tree.
public record TemplateVariable(String name) implements TemplateExpression { @Override public TemplateObject evaluateToObject(ProcessContext context) { TemplateObject value = context.getEnvironment().getValue(name); if (value instanceof TemplateLoopVariable loopVariable) { return loopVariable.evaluateToObject(context); } return value; } }
The TemplateVariable
retrieves the value of the variable name
from the environment and returns it. Unfortunately, it is checked each time whether value
is a TemplateLoopVariable
in order to evaluate it if necessary. The TemplateLoopVariable
holds the current values of a loop pass.
<#list bean as key, value> ${key} ↣ ${value} </#list>
In this example, TemplateLoopVariable
instances contain values for key
and value
. The special treatment for the TemplateLoopVariable
can therefore be omitted from the evaluation if we first check whether it is a list variable or another type of variable. The type of a variable can be easily evaluated by creating a dictionary that stores the name of the variable and its type. A variable is entered into the dictionary when the parse tree evaluation passes through the location of its definition and its type is retrieved at the location of its use. For our loop variables, the location of their definition is visited in the following visit
method.
public List<Fragment> visit(ListInstruction ftl, List<Fragment> input) { dictionary.push(); TemplateObject list = ftl.get(3).accept(interpolationBuilder, null); String identifier = ftl.get(5).toString(); dictionary.putVariable(identifier, VariableType.KEY); int index = 6; Comparator<String> comparator = null; if (ftl.get(index).getType() == TokenType.SORTED) { comparator = COMPARATORS.get(ftl.get(index + 1).getType()); index += 2; } String valueIdentifier = null; if (ftl.get(index).getType() == TokenType.COMMA) { valueIdentifier = ftl.get(index + 1).toString(); dictionary.putVariable(valueIdentifier, VariableType.VALUE); index += 2; } // ...
The List Directive is analyzed here and various data is extracted. These are, in the sequence of their occurrence, the TemplateObject
list
, which contains the data of the list. This can be a variable, a constant or an expression of any complexity. This is followed by the String
identifier
, which contains the name of the run variable for lists or the key for hashes. After an optional Comparator
for the hash key, the String
valueIdentifier
for the hash value can optionally follow after a comma.
In the dictionary, identifier
is entered with VariableType.KEY
and valueIdentifier
with VariableType.VALUE
. Similar things happen in other places for other variables.
At the beginning of the method, you probably noticed the call dictionary.push()
. Of course, the dictionary must respect the scopes of the variables, so a new definition area is opened in the dictionary at the start of processing the List Directive. Otherwise, a loop variable would overwrite an outer variable with the same name.
The dictionary is evaluated in the InterpolationBuilder
. The InterpolationBuilder
is a visitor that generates the necessary TemplateObject
instances in the expression trees from the parse tree of the expressions.
case IDENTIFIER -> { VariableType type = dictionary.getVariable(image); yield switch (type) { case MODEL -> new ModelVariable(image); case LOOPER -> new LooperVariable(image); case KEY, VALUE -> new KeyValueLoopVariable(image); default -> new DefaultTemplateVariable(image); }; }
When evaluating tokens of type IDENTIFIER
, an instance of TemplateVariable
was previously created. Now a specialized variant is created based on the type in the dictionary. A KeyValueLoopVariable
instance is created for KEY
and VALUE
and a ModelVariable
instance for MODEL
. Both no longer require an instanceof
check.
Another rather inconspicuous optimization resulted from the fact that the TemplateBuiltIn
instances have to search for their implementation each time they are called. This is due to the fact that the type of a variable only becomes known during evaluation. However, since many Built-Ins have namesakes on the different types, it is necessary to check which type is present before calling the Built-In. This problem cannot be solved in general at the moment, but it can actually be avoided for some Built-Ins. If there is only one Built-In with a special name, e.g. snake_case
, then the Built-In you are looking for is already known. So why use a TemplateBuiltIn
instance for these Built-Ins that searches for their Built-In instead of a specialized HookedBuiltIn
that knows its Built-In already?
The optimization only provides small improvements, but it has opened an interesting door. When I have run through the template for a data model, I know the types of model variables that have been used. If I could remember some of this information, it would be beneficial for the performance of subsequent calls.
To test this, a new method hook
was added to the Template
class. Similar to the reduce
method, it returns a new Template
instance.
public Template hook(Map<String, Object> dataModel) { HashMap<Class<?>, TemplateObjectProvider> currentMap = new HashMap<>(templateObjectProviderMap); ProcessContext processContext = createContext(context, dataModel, new StringWriter(), userDirectives, featureSet, currentMap, localContext); processContext.setResourceBundle(resourceBundleName); try { rootFragment.process(processContext); } catch (TemplateReturnException e) { log.debug("return exception: {}", e.getMessage()); } return new Template(context, path, rootFragment, featureSet, localContext, currentMap); }
This call adds a map currentMap
to the ProcessContext
, which stores for each data type used in the model which TemplateObjectProvider
is used to generate an internal representation for it. This mechanism used to be quite expensive because a series of TemplateObjectProviders
had to be checked for each object until the BeanTemplateObjectProvider
finally took over the work or reported an error. In addition, it cannot simply be optimized because Freshmarker also has an extension point there.
private TemplateObject wrapByProviders(Object o, Object current) { for (TemplateObjectProvider provider : providers) { TemplateObject object = provider.provide(this, current); if (object != null) { return object; } } throw new UnsupportedDataTypeException("unsupported data type: " + o.getClass()); }
At this point, the list is run through all TemplateObjectProviders
, if one of them can process the transferred value, the loop is terminated. This mechanism follows a simple contract, if you process one instance of a type, then you process all of them. So we can cache the TemplateObjectProvider
for a type if it has processed an object of that type. Instead of going through the list of TemplateObjectProviders
for this type again and again, we can use the TemplateObjectProvider
from the map.
private TemplateObject wrapByProviders(Object o, Object current) { TemplateObjectProvider cached = templateObjectProviderMap.get(o.getClass()); if (cached != null) { return cached.provide(this, current); } for (TemplateObjectProvider provider : providers) { TemplateObject object = provider.provide(this, current); if (object != null) { templateObjectProviderMap.put(current.getClass(), provider); return object; } } throw new UnsupportedDataTypeException("unsupported data type: " + o.getClass()); }
We fill the map that we use here to cache the TemplateObjectProviders
in the hook
method and pass it to the new Template
. This means that, in the best case, this Template
never has to search for a TemplateObjectProvider
again, but has all of them in its cache. One disadvantage, however, is that this Template
can now still use the stored TemplateObjectProviders
. A variable cannot now sometimes refer to a bean and then again to a map or a record. Such cases now all end with an error.
The hook
method is an experimental feature that offers some more interesting possibilities. If the types of the variables are now known, even more simplifications can be made in the expressions.