Sync generated test suite with Yoga #285

nicoburns · 2022-12-20T15:51:37Z

What problem does this solve or what need does it fill?

The more tests we have the better! Our generated test system is derived from Yoga's and the format is mostly compatible. And Yoga has a bunch of tests in their system that we don't have (some of these may not be applicable, but many of them will be). We have also added tests that Yoga doesn't have, and we should be good OSS citizens and make them available for Yoga.

What solution would you like?

Import tests from Yoga that we are missing, and fix any relevant bugs.
Export any tests that Yoga are missing back to the Yoga project.

Tests Taffy is missing

As of #320, Taffy has now imported all of Yoga's tests, however the following tests are disabled (by prefixing the filename with an "x") because they are failing:

-absolute_child_with_cross_margin
-absolute_child_with_main_margin
-absolute_layout_percentage_height
-absolute_layout_row_width_height_end_bottom
-align_baseline_child_margin
-align_baseline_child_margin_percent
-align_baseline_child_multiline_no_override_on_secondline
-align_baseline_child_multiline_override
-align_baseline_child_padding
-align_baseline_multiline
-align_baseline_multiline_row_and_column
-align_content_flex_end
-align_content_not_stretch_with_align_items_stretch
-display_none_absolute_child
-display_none_with_position_absolute
-do_not_clamp_height_of_absolute_node_to_height_of_its_overflow_hidden_parent
-justify_content_column_min_height_and_margin
-justify_content_colunn_max_height_and_margin
-margin_auto_start_and_end
-margin_auto_start_and_end_column
-margin_end
-margin_start
-position_root_with_rtl_should_position_withoutdirection
-rounding_inner_node_controversy_combined
-rounding_inner_node_controversy_horizontal

Tests Yoga is missing

This list excludes CSS Grid tests as out of scope:

+absolute_layout_child_order
+absolute_layout_no_size
+align_baseline_child_top2
+align_baseline_multiline_column2
+align_baseline_nested_child
+align_content_space_around_single_line
+align_content_space_around_wrapped
+align_content_space_between_single_line
+align_content_space_between_wrapped
+align_content_space_evenly_single_line
+align_content_space_evenly_wrapped
+align_items_center_with_child_margin
+align_items_center_with_child_top
+border_no_child
+container_with_unsized_child
+flex_basis_and_main_dimen_set_when_flexing
+flex_basis_larger_than_content_column
+flex_basis_larger_than_content_row
+flex_basis_slightly_smaller_then_content_with_flex_grow_large_size
+flex_basis_smaller_than_content_column
+flex_basis_smaller_than_content_row
+flex_basis_smaller_than_main_dimen_column
+flex_basis_smaller_than_main_dimen_row
+flex_basis_smaller_then_content_with_flex_grow_large_size
+flex_basis_smaller_then_content_with_flex_grow_small_size
+flex_basis_smaller_then_content_with_flex_grow_unconstraint_size
+flex_basis_smaller_then_content_with_flex_grow_very_large_size
+flex_basis_unconstraint_column
+flex_basis_unconstraint_row
+flex_grow_flex_basis_percent_min_max
+flex_shrink_by_outer_margin_with_max_size
+gap_column_gap_flexible_undefined_parent
+gap_column_gap_inflexible_undefined_parent
+gap_column_gap_percentage_cyclic_partially_shrinkable
+gap_column_gap_percentage_cyclic_shrinkable
+gap_column_gap_percentage_cyclic_unshrinkable
+gap_column_gap_percentage_flexible
+gap_column_gap_percentage_flexible_with_padding
+gap_column_gap_percentage_inflexible
+gap_column_gap_row_gap_wrapping
+gap_percentage_row_gap_wrapping
+justify_content_column_min_height_and_margin_bottom
+justify_content_column_min_height_and_margin_top
+margin_left
+margin_right
+max_height_overrides_height_on_root
+max_width_overrides_width_on_root
+measure_child
+measure_child_absolute
+measure_child_constraint
+measure_child_constraint_padding_parent
+measure_child_with_flex_grow
+measure_child_with_flex_shrink
+measure_flex_basis_overrides_measure
+measure_height_overrides_measure
+measure_remeasure_child_after_growing
+measure_remeasure_child_after_shrinking
+measure_remeasure_child_after_stretching
+measure_root
+measure_stretch_overrides_measure
+measure_width_overrides_measure
+min_height_overrides_height_on_root
+min_height_overrides_max_height
+min_width_overrides_max_width
+min_width_overrides_width_on_root
+overflow_cross_axis
+overflow_main_axis
+padding_align_end_child
+padding_no_child
+percentage_moderate_complexity
+percentage_size_based_on_parent_inner_size
+percentage_size_of_flex_basis
+relative_position_should_not_nudge_siblings
+rounding_flex_basis_flex_grow_row_width_of_100
+rounding_fractial_input_1
+rounding_fractial_input_2
+rounding_fractial_input_3
+rounding_fractial_input_4
+size_defined_by_child
+size_defined_by_child_with_border
+size_defined_by_child_with_padding
+size_defined_by_grand_child
+width_smaller_then_content_with_flex_grow_large_size
+width_smaller_then_content_with_flex_grow_small_size
+width_smaller_then_content_with_flex_grow_unconstraint_size
+width_smaller_then_content_with_flex_grow_very_large_size
+wrap_reverse_column
+wrap_reverse_row

alice-i-cecile · 2022-12-20T21:17:09Z

Yep: yoga team, I'd love to help you out like this if you wanna steal our tests (and approaches to implementing algorithms) :D

NickGerleman · 2022-12-20T22:46:49Z

I really appreciate this 🙂. It also helps to catalog some of the things Yoga doesn't currently support.

I'm not sure what the right mechanism is for pushing/pulling, but we could maybe start by keeping a flat set of synced fixtures. Then specific engines would maintain their own list of known failures as part of the test generation or test runner. Apart from Yoga missing Taffy capabilities, I think we need that anyway:

We want to find a way to utilize some of the WPT test cases for CSS layout, potentially by translation to fixtures. Apart from regressions, we could not pass the full set of layout-specific WPT tests due to bits which are intentionally not implemented, ideally implemented but not, or functionality already divergent from browsers that need to be reconciled. So we could have our own list to project the full set into what buckets they should act as.
There have been cases where Chrome behavior changed, modifying fixture output. We haven't changed Yoga's behavior to match the new Chrome behavior. This shouldn't cause test failures, but we might want to keep that documented as divergent.

alice-i-cecile · 2022-12-21T01:53:12Z

Yeah, I'm wondering if at least for taffy we should have a Chrome compatibility mode controlled by a feature. And then attempt to use the "most natural" behavior in cases where we think Chrome has bugs.

nicoburns · 2022-12-21T14:51:43Z

Regarding syncing Taffy/Yoga's existing test suites:

I'm not sure what the right mechanism is for pushing/pulling, but we could maybe start by keeping a flat set of synced fixtures.

I agree. This definitely seems like the most sensible way to start.

We'll probably want to harmonise the format used to store the fixtures:

@alice-i-cecile I personally feel like Yoga's format which stores multiple related fixtures in a single file separated by two newlines is nicer / more manageable than Taffy's format where each test is in it's own file with it's copy of the test template / boilerplate.

Although the one thing I do really like about Taffy's approach is that it means the HTML file can be opened and played with directly in a web browser without any setup. Perhaps we could create a small CLI tool which:

takes a test name (e.g. cargo debugtest column_gap_child_margins)
interpolates it into the standard template and exposes it over a local web server
(Attempts to) open the test webpage in the default browser (like cargo doc --open)

to retain this capability while switching to Yoga's on-disk storage format.

Taffy's approach does also make it easier to enumerate tests as one can do so just by listing files without parsing them. But parsing the Yoga format isn't all that difficult either. It may also be possible to go for an in-between approach of "single test per file, but no surrounding boilerplate". If we do go for "single test per file" I'd definitely like to introduce sub-directories though. The current flat list of fixtures is quite unwieldy!

We may also want to harmonise the test generation scripts:

This isn't really necessary – we could also keep these separate – but perhaps we could share maintenance by merging implementations. My guess is this would work out as less work overall if communication between projects is good, and would become problematic if it's not.

Taffy currently implements test generation mostly in Rust, with only a small JavaScript script that inside Chrome measures nodes and returns computed styles but doesn't actually do test generation. Yoga currently has a thin Ruby wrapper (64 LOC) which runs chromedriver, and the bulk of the test generation is done using JavaScript which runs within Chrome. This JS script has "plugins" for each of Yoga's language bindings so that the tests can be generated for them too (yoga-rs also uses Yoga's script and has a compatible "plugin" for it's Rust bindings).

If we were to harmonise this, my suggestion would be to replace Yoga's Ruby wrapper with a Rust wrapper (on the basis that setting up a Rust toolchain and installing dependencies is much simpler than managing Ruby versions and it's tendency to install gems globally), but to port Taffy's actual code generation to Yoga's JS approach (we'd need to extend this to deal with Grid and text-layout).

Then specific engines would maintain their own list of known failures as part of the test generation or test runner.

Yes, I think this makes a lot of sense (my instinct is to make it part of test generation, but I guess it probably doesn't matter too much).

Regarding intentional divergence from Chrome:

There have been cases where Chrome behavior changed, modifying fixture output. We haven't changed Yoga's behavior to match the new Chrome behavior. This shouldn't cause test failures, but we might want to keep that documented as divergent.

I wonder if it would be worth defining some kind of intermediate representation for "expected measurements" (perhaps a simple JSON tree with x, y, width, height and children properties). That way we could have different gentests getting their measurements from different sources of truth. That could include Chrome, WPT (some tests include expected measurements directly), or defined manually by us in cases of divergence.

Regarding importing tests from WPT:

I definitely think this makes sense, but I think this is going to be fairly large task. Some notes from a quick look at the available tests:

There are a LOT of tests. On the order of ~1500 each for Flexbox and CSS Grid (although there are whole categories of them that won't be relevant as they test things that we don't implement (table layout, float, subgrids, etc).
The format of the tests is not consistent. So far I've found 4 categories:
1. Some have data attributes like data-expected-width which specify the expected result
2. Some have an external reference file (a second HTML file) which specifies the result
3. Some seem to be visual "test passes if no red is showing"
4. Some require execution of JavaScript which integrates which runs it's own tests and integrates a test harness.
Category 1 (and perhaps) 2 seem like they could potentially be translated into tests without bring Chrome into the loop. All of the first 3 categories seem like they could be automatically translated using our existing gentest approach of comparing with Chrome. Category 4 seems like they would need to be manually translated.
The tests use external CSS, either in <style> tags or externally referenced style sheets. We'd probably want to do CSS inlining to make them usable in our tests
Some tests depend on content sizing / text layout using the Ahem font which has known-size glyphs to enable tests to compute content sizes without implementing a full-blown layout engine. Taffy currently has support for a very similar (but incompatible) scheme using a different font (text-layout implementation is fairly simple: ~40 LOC). It would make sense to port Taffy's support to the use Ahem font, and then to extend Yoga's test generation to support this functionality too.
The test suite is regularly updated (although most of the updates seem to be for new layout modes rather than updates to the tests for existing ones), and we would probably want some way to sync changes rather than treating it as a one-time import.

Given this, I would propose the following:

Creating an automated tool/script, that translates WPT tests into Yoga/Taffy-style fixtures (@NickGerleman would you be open to a rust-based tool/script for doing this?)
Keeping WPT-sourced tests and manually created gentests separate.
The tool should implement some kind of allowlist/blocklist approach to limit the scope of the tests we are pulling in. The ignore crate implements gitignore syntax which seems like it might be a good fit (I think this would give us both allow and block capabilities). Yoga and Taffy could maintain independent allow/block lists.
The tool should parse the HTML and CSS of the file, and inline any CSS styles, including those defined in external files. (possibly using the inline_css crate. We may need to implement some heuristics (or per-test metadata) to determine which DOM nodes constitute the actual test for each test file. I have some code which implements test generation (except measurement) by directly parsing fixtures rather than using Chrome which could perhaps be adapted.

I'm probably going to focus on getting an MVP CSS Grid implementation merged into Taffy in the immediate future, and circle back around to this later.

alice-i-cecile · 2022-12-22T19:48:17Z

If we do go for "single test per file" I'd definitely like to introduce sub-directories though. The current flat list of fixtures is quite unwieldy!

Mild preference for this format :) I think it's clearer and easy to parse and diff.

If we were to harmonise this, my suggestion would be to replace Yoga's Ruby wrapper with a Rust wrapper (on the basis that setting up a Rust toolchain and installing dependencies is much simpler than managing Ruby versions and it's tendency to install gems globally), but to port Taffy's actual code generation to Yoga's JS approach (we'd need to extend this to deal with Grid and text-layout).

Agreed, this seems like an excellent compromise.

Totally fine to change the test format however you see fit.

I wonder if it would be worth defining some kind of intermediate representation for "expected measurements" (perhaps a simple JSON tree with x, y, width, height and children properties). That way we could have different gentests getting their measurements from different sources of truth. That could include Chrome, WPT (some tests include expected measurements directly), or defined manually by us in cases of divergence.

Sensible, but can be deferred.

Creating an automated tool/script, that translates WPT tests into Yoga/Taffy-style fixtures (@NickGerleman would you be open to a rust-based tool/script for doing this?)

Keeping WPT-sourced tests and manually created gentests separate.

The tool should implement some kind of allowlist/blocklist approach to limit the scope of the tests we are pulling in. The ignore crate implements gitignore syntax which seems like it might be a good fit (I think this would give us both allow and block capabilities). Yoga and Taffy could maintain independent allow/block lists.

The tool should parse the HTML and CSS of the file, and inline any CSS styles, including those defined in external files. (possibly using the inline_css crate. We may need to implement some heuristics (or per-test metadata) to determine which DOM nodes constitute the actual test for each test file. I have some code which implements test generation (except measurement) by directly parsing fixtures rather than using Chrome which could perhaps be adapted.

On board with this! @mockersf, do you have strong feelings or good ideas here? You've helped us with the automated tests before and I trust your instincts on this.

nicoburns · 2023-01-05T00:51:37Z

I've created #320 for importing tests. It includes a small script for diffing the names which is much better than the manual process I used first time around as it's automated and patches up the names of some tests that are in fact the same. I've updated the diff output in the top post in the following way:

It's based on this script (so gap tests now match up)
It's based on Add newer fixtures facebook/yoga#1194 in which Yoga pulls in tests from FlexLayout, Meta's proprietary internal flexbox implementation.
I've excluded the grid tests, seeing as there's no point tracking those until Yoga has a grid implementation.

I plan to proceed by:

Writing a small rust script that will automatically import a yoga test given it's name (manually porting them over way taking far too long)
Importing all tests that we're missing, but disabling any tests that fail

We can then go through and fix the failing tests at our leisure.

alice-i-cecile · 2023-01-05T11:55:22Z

Excellent work: I'm excited to share these.

nicoburns · 2023-01-05T14:16:40Z

Ok, #320 now includes all of Yoga's test fixtures (including those from facebook/yoga#1194) and is ready for review/merging. I've replaced the raw diff in the description with two separate lists:

Yoga tests that Taffy has disabled
Taffy tests that Yoga is missing.

nicoburns added the enhancement New feature or request label Dec 20, 2022

nicoburns mentioned this issue Jan 5, 2023

Import yoga tests #320

Merged

alice-i-cecile closed this as completed in #320 Jan 12, 2023

nicoburns mentioned this issue Apr 8, 2024

Run WPT test suite against Taffy #639

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Sync generated test suite with Yoga #285

Sync generated test suite with Yoga #285

nicoburns commented Dec 20, 2022 •

edited

Loading

alice-i-cecile commented Dec 20, 2022

NickGerleman commented Dec 20, 2022 •

edited

Loading

alice-i-cecile commented Dec 21, 2022

nicoburns commented Dec 21, 2022

alice-i-cecile commented Dec 22, 2022

nicoburns commented Jan 5, 2023

alice-i-cecile commented Jan 5, 2023

nicoburns commented Jan 5, 2023

Sync generated test suite with Yoga #285

Sync generated test suite with Yoga #285

Comments

nicoburns commented Dec 20, 2022 • edited Loading

What problem does this solve or what need does it fill?

What solution would you like?

Tests Taffy is missing

Tests Yoga is missing

alice-i-cecile commented Dec 20, 2022

NickGerleman commented Dec 20, 2022 • edited Loading

alice-i-cecile commented Dec 21, 2022

nicoburns commented Dec 21, 2022

Regarding syncing Taffy/Yoga's existing test suites:

Regarding intentional divergence from Chrome:

Regarding importing tests from WPT:

alice-i-cecile commented Dec 22, 2022

nicoburns commented Jan 5, 2023

alice-i-cecile commented Jan 5, 2023

nicoburns commented Jan 5, 2023

nicoburns commented Dec 20, 2022 •

edited

Loading

NickGerleman commented Dec 20, 2022 •

edited

Loading