❠Just remember, once you’re over the hill you begin to pick up speed.❞
Charls M. Schulz
This article on performance was not actually intended to be a multi-part article. But if you take a closer look at your own code through performance glasses, you will quickly discover further bottlenecks. In this case, it’s an unclean modeling that FreshMarker has been carrying around since its early days. This article is about how this affects the performance of the template engine.
Any number of complex expressions can be used in the FreshMarker template. These expressions are realized internally as trees of TemplateObject
implementations. Some implementations such as TemplateNumber
and TemplateBoolean
implement data types, while others such as TemplateJunction
and TemplateRelational
realize operations within the expression.
When a Template
is used, the expressions are evaluated and the result is printed or used for decisions. The inaccuracy of the modeling results from the fact that the TemplateObject
implementations have the following method, among others.
TemplateObject evaluateToObject(ProcessContext context);
When evaluating an expression, this method is called on the root node. Depending on the type of implementation, further evaluateObject
calls are then executed in the tree.
public record TemplateSign(TemplateObject expression) implements TemplateExpression { @Override public TemplateObject evaluateToObject(ProcessContext context) { return expression.evaluateToObject(context).negate(); } }
In this example, the TemplateSign
class calls the evaluateToObject
method from its subnode expression
to negate its result with the negate
method.
The problem here is that the nodes in the expression tree are of the same type as the results produced. The dynamic results and the static nodes in the expressions have an important difference. The nodes in the expression tree must not have a state. With a state, it would no longer be possible to use the same template instance for parallel calls. Otherwise, there would be a risk that information from one call would be used in another call. For this reason, information that is call-specific is passed into the methods via the ProcessContext
.
Some TemplateObject
implementations such as TemplateJunction
, TemplateExists
, TemplateSign
only exist in the expression trees and therefore must not contain a state. Other implementations such as TemplateNumber
, TemplateString
and TemplateNull
can exist in the expression tree and as a result, so they can have a state depending on their usage. Last but not least, there are some implementations that only exist as a result, such as the TemplateSequenceLooper
and the TemplateHashLooper
.
The TemplateSequenceLooper
and the TemplateHashLooper
are two auxiliary objects that can be used to obtain the current value of the loop variable and the loop metadata. As both implementations can have a state, they contain the list used in the loop and the index to the current element in the list.
@Override public TemplateObject evaluateToObject(ProcessContext context) { Object object = sequence.get(index); if (object instanceof TemplateObject templateObject) { return templateObject; } return context.mapObject(object); } public void increment() { index++; }
The evaluateToObject
method takes an element at the current position from the list and maps the value in the list to the internal representation as a TemplateObject
if required. If the loop variable is now used several times in the loop, this is carried out each time, although the object cannot have changed.
As both implementations may have a state and actually already use it, performance can be improved with a little trick.
@Override public TemplateObject evaluateToObject(ProcessContext context) { if (current != null) { return current; } Object object = sequence.get(index); if (object instanceof TemplateObject templateObject) { current = templateObject; return templateObject; } current = context.mapObject(object); return current; } public void increment() { index++; current = null; }
In addition to the list
and the index
, the current
element in the list is now also saved as a TemplateObject
. If evaluateToObject
is called and the element has already been determined, it is returned directly. Otherwise it is calculated as before. If the index
is incremented in the increment method, the stored value current
for the current element is deleted so that it can be recalculated in another loop if necessary.
This rather inconspicuous change has a major impact on the performance of FreshMarker. While it previously ranked just ahead of Mustache and Freemarker, it can now move ahead of Velocity and Trimou for the benchmark used.
Freemarker.benchmark thrpt 50 31451,022 ± 318,743 ops/s FreshMarker.benchmark thrpt 50 51253,174 ± 610,095 ops/s Handlebars.benchmark thrpt 50 32204,790 ± 432,129 ops/s Mustache.benchmark thrpt 50 32204,285 ± 223,241 ops/s Thymeleaf.benchmark thrpt 50 3078,860 ± 35,203 ops/s Velocity.benchmark thrpt 50 36140,685 ± 449,326 ops/s Pebble.benchmark thrpt 50 54861,944 ± 544,948 ops/s Trimou.benchmark thrpt 50 48192,131 ± 787,200 ops/s
On the one hand, it is a great success in improving performance. On the other hand, however, it also shows that a stricter separation between results and expression tree would have significantly increased performance much earlier.