Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Large table generation performance issue #232

Open
cj-clx opened this issue Jul 10, 2020 · 5 comments
Open

Large table generation performance issue #232

cj-clx opened this issue Jul 10, 2020 · 5 comments

Comments

@cj-clx
Copy link

cj-clx commented Jul 10, 2020

Similar to #222, and as I described on Gitter yesterday, I am running into CPU time limits when generating tables with large numbers of rows. I've tweaked my XML and TSS a few times to try to optimize it -- even removing conditional operations for the most part (e.g. :iteration(var="value") ).

Attached is a ZIP file with example XML, TSS and PHP code for reproducing the problem. The PHP code can be run from the command line or from a web server (e.g. php -S localhost:8000 -t .). If from the command line, the first argument is the number of rows to generate, e.g. php index.php 42. Similarly, from a web server, the URL contains the parameter named limit, e.g. http://localhost:8000/index.php?limit=42.

On my laptop, the error occurs when generating roughly 1500 rows.

files.zip

@cj-clx
Copy link
Author

cj-clx commented Jul 10, 2020

I tried about a dozen different ways of specifying the path to each <td> element that needed to get data from iteration() and managed to find a small speed improvement (less than 10%). As with #222, it seems very sensitive but erratically so to the path specification. Here are my revised files (I also cleaned a few other things up a bit).
files.zip

@cj-clx
Copy link
Author

cj-clx commented Jul 10, 2020

Enabling simple caching as follows only seems to improve performance by less than 4% CPU time, and less than 1% wall clock.

$cache = new \SimpleCache\SimpleCache('./cache');
$template = new Builder(
    "default_base_layout.xml",
    "list.tss"
);
$template->setCache($cache);

@cxj
Copy link
Contributor

cxj commented Aug 5, 2020

After much research, we think we have a theory about one of the main causes for the poor performance of Transphporm on large tables with complex cell data rules (e.g. conditional application of per datum links, modals or formatting). We noticed that the performance degraded not in a linear fashion one might expect.

For example, if 500 rows took 12 seconds to render, one might reasonably expect that 1000 rows would take around 24 seconds to render. Instead, the time increased exponentially per each additional row!

In addition to locating the tokens, Transphporm must also replace the contents or append to the contents of the output page being generated. This is done using effectively a large string in memory. This exponential behavior is the classic result of having to reallocate memory as a string increases in size.

Presumably, Transphporm is manipulating a large string representing the page being generated, and inserting or replacing substrings within it. This would likely cause the PHP interpreter to perform many C library malloc() or realloc() calls, resulting in this exponential processing time increase.

@TRPB
Copy link
Member

TRPB commented Aug 5, 2020

All the replacement is done via PHP's DomDocument. It doesn't use string manipulation but rather treats the document as a data structure and manipulates the data structure before serializing it to HTML/XML.

We could try re-implementing DomDocument in native PHP (Which isn't hard, but reimplementing DomXpath is) but I doubt a PHP implementation of DomDocument would be particularly faster.

The problem is, the rule

tr td:nth-child(1) { content: iteration(foo); }

Will match 500 elements in a 500 line table. Each element has to have its content replaced and iteration(foo) looked up which requires iteration's context to be determined first. It's not clear exactly which part is causing the slowdown.

If it is a memory allocation issue, I wonder if we could artificially inflate the document size with whitespace or other garbage at the point it is created.

@cj-clx
Copy link
Author

cj-clx commented Aug 5, 2020

One hopes DomDocument is that smart, or have you actually checked to see?

I'd have thought a rule like

tr td:nth-child(1) { content: iteration(foo); }

would have resulted in a linear increase in time consumption, rather than exponential. Hmm.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants