“If you optimize everything, you will always be unhappy.”
Donald Knuth
FreshMarker started as an academic project. The main aim was to show how a template engine can be created. At some point, however, the question arose as to how fast the template engine actually is. To be honest, it has been slow so far. But first things first.
FreshMarker is normally used to generate source code or e-mails. Areas where a high throughput is often not required. After stumbling across a template engine benchmark again, I adapted it for FreshMarker out of pure curiosity.
The benchmark checks how often an HTML page can be generated from a template. The time for creating the template is not measured. Most of the template is static text. The dynamic part of the page consists of a list of elements. The existing FreeMarker template could be converted to FreshMarker very quickly.
<#list items as item with looper> <tr class="${looper?item_parity}"> <td>${looper?counter}</td> <td><a href="/stocks/${item.symbol}">${item.symbol}</a></td> <td><a href="${item.url}">${item.name}</a></td> <td><strong>${item.price}</strong></td><#if (item.change < 0.0)> <td class="minus">${item.change}</td> <td class="minus">${item.ratio}</td><#else> <td>${item.change}</td> <td>${item.ratio}</td></#if> </tr>
The only difference to FreeMarker is the Loop-Variable looper
with its Built-Ins item_parity
and counter
.
After the benchmark is started, we get a result of approximately 4500 pages per second. A figure that looks good for the application scenario and the lack of interest in performance.
Benchmark Mode Cnt Score Error Units FreshMarker.benchmark thrpt 50 4468,465 ± 265,536 ops/s
If the benchmark for FreeMarker is run, this template engine shines with around 15500 pages per second.
Benchmark Mode Cnt Score Error Units Freemarker.benchmark thrpt 50 15598,189 ± 668,014 ops/s
With such a big difference, it is not difficult to decide to look into performance after all.
However, optimizations should never be based on assumptions, which is why an analysis of the runtime behaviour is necessary. A useful tool here is the VisualVM Profiler. It shows how much time was spent in which methods and how often methods were called.
The results for the benchmark provided some interesting insights that were used to improve the performance of FreshMarker.
A first interesting point is the high cost of the NumberFormatter
used to display numbers in FreshMarker. Since there is an easy way to prevent the use of the NumberFormatter
, this was chosen for the time being.
<#list items as item with looper> <tr class="${looper?item_parity}"> <td>${looper?counter?c}</td> <td><a href="/stocks/${item.symbol}">${item.symbol}</a></td> <td><a href="${item.url}">${item.name}</a></td> <td><strong>${item.price?c}</strong></td><#if (item.change < 0.0)> <td class="minus">${item.change?c}</td> <td class="minus">${item.ratio?c}</td><#else> <td>${item.change?c}</td> <td>${item.ratio?c}</td></#if> </tr>
The Build-In c
(computer language) generates a simple representation for various data types. The corresponding toString
method is used for numeric types.
This change in the test is directly noticeable and 5200 pages can be generated per second. However, this is only a simplification for the benchmark. In the case of special formatting of numbers, the poor performance of the NumberFormatter
takes effect again.
Benchmark Mode Cnt Score Error Units FreshMarker.benchmark thrpt 50 5148,130 ± 302,675 ops/s
However, the profiler also provides further figures that show potential for improvement. The template in the benchmark contains an extremely large number of ConstantFragment
instances because texts and spaces are pulled apart during parsing. In addition, the whitespace handling from the article Whitespace Handling in FreshMarker ensures that concatenations of whitespaces are also split up if necessary.
Instead of processing many consecutive ConstantFragment
instances, you could replace them with a single one.
@Override public List<Fragment> visit(Token ftl, List<Fragment> input) { if (!input.isEmpty()) { Fragment fragment = input.getLast(); if (fragment instanceof ConstantFragment constantFragment) { constantFragment.add(ftl.toString()); return input; } } String image = ftl.toString(); if (ftl.getType() == TokenType.PRINTABLE_CHARS) { input.add(new ConstantFragment(image)); } else if (ftl.getType() == TokenType.WHITESPACE) { if (" ".equals(image)) { input.add(ONE_WHITESPACE); } else { input.add(new ConstantFragment(image)); } } return input; } @Override public List<Fragment> visit(Text ftl, List<Fragment> input) { String content = ftl.getAllTokens(false).stream().map(TerminalNode::toString).collect(Collectors.joining()); if (!input.isEmpty()) { Fragment fragment = input.getLast(); if (fragment instanceof ConstantFragment constantFragment) { constantFragment.add(content); return input; } } input.add(new ConstantFragment(content)); return input; }
The two visit
methods have been extended to concatenate constant texts. This reduces the number of ConstantFragment
instances and increases processing to 6500 pages per second.
Benchmark Mode Cnt Score Error Units FreshMarker.benchmark thrpt 50 6519,047 ± 154,409 ops/s
The profiler also shows that processing lists with streams is not a good idea for performance. After the streams were replaced by enhanced for-loops, there was a shocking increase to 10500 pages per second.
Benchmark Mode Cnt Score Error Units FreshMarker.benchmark thrpt 50 10797,210 ± 146,141 ops/s
Security always takes time, and FreshMarker is no exception. When accessing classes that are to be interpreted as TemplateBean
, it is checked whether they are in lists for permitted or prohibited packages or classes. This check is time-consuming and uses the startsWith
method of the String
class.
@Override public TemplateObject provide(BaseEnvironment environment, Object o) { Class<?> type = o.getClass(); if (!environment.getChecks().contains(type)) { modelSecurityGateway.check(type); } environment.getChecks().add(type); return new TemplateBean(beanProvider.provide(o, environment), type); }
The improved version of the test now first checks whether the corresponding class is already contained in a set with permitted classes. In this case, all further checks can be omitted.
Another improvement here is that the BaseEnvironment
is accessed directly. In previous versions, many environment accesses were passed through the entire stack to Environment
instances.
This change is also directly noticeable, now 12500 pages per second can be produced.
Benchmark Mode Cnt Score Error Units FreshMarker.benchmark thrpt 50 12573,562 ± 49,704 ops/s
As a final change, the evaluated attributes of the TemplateBean
instances are saved internally.
public TemplateObject get(ProcessContext context, String name) { TemplateObject templateObject = mapped.get(name); if (templateObject != null) { return templateObject; } Object object = map.get(name); if (object == null) { mapped.put(name, TemplateNull.NULL); return TemplateNull.NULL; } if (object instanceof TemplateObject t) { mapped.put(name, t); return t; } TemplateObject result = context.getBaseEnvironment().mapObject(object); mapped.put(name, result); return result; }
An environment exists for all top-level variables to store the encapsulated model variables. However, attributes of a bean are evaluated and encapsulated again each time they are accessed. An additional Map
in the TemplateBean
now provides a remedy.
This optimization and a few other small adjustments now ensure that 14500 pages per second can be generated. That is 10000 pages more than at the beginning of this article.
Benchmark Mode Cnt Score Error Units FreshMarker.benchmark thrpt 50 14722,349 ± 59,797 ops/s
In the next article on FreshMarker performance, we will see how Built-Ins and the TemplateBean become faster without reflections and how this can work at all.