Comparing ASAP7 and sky130hd with respect to ALU operations #3881

oharboe · 2023-08-20T09:26:49Z

oharboe
Aug 20, 2023
Collaborator

The mock-alu allows studying the speed and area of various typical ALU operations.

The mock-alu has two 64 bit inputs and one 64 bit output. The inputs and outputs are registered within the mock-alu. This is to have clear boundary conditions when studying the guts of the mock-alu.

The mock-alu implements a number of operations and variants of these operations to study the area and minimum clock period for these operations.

The operations fall into a couple of categories:

add, subtract and compare: ADD, SUB, SETCC_EQ(equal), SETCC_NE (not equal), SETCC_LT (less than), SETT_LE (less than or equal), SETCC_ULT (unsigned less than), SETCC_ULE(unsigned less than or equal)
barrel shifter: SHR(logical shift right), SRA(arithmetic shift right), SHL(shift left)
bitwise logic: OR(bitwise or), AND(bitwise and), XOR(bitwise xor)
multiplication: 64 bit multiply. There are various algorithms used, default Han Carlson. The implementation is PDK specific and comes from https://github.com/antonblanchard/vlsiffra/
multiplexor: MUX1..8. This is not really an ALU operation. All that is happening here is that bits from the input as selected using a mux and put into the output. This allows studying the performance of the mux that sits before the output of an ALU, which is selecting between the various supported operations.

Next, the mock-alu allows implementing any combination of these operations. This allows implementing an ALU that only supports the shift operations, which can be labelled "SHR,SHL,SRA". This shift operation only mock-alu has a single shared barrelshifter. Similarly, a bitwise logic only mock-alu, can be labelled "OR,AND,XOR".

At this point, we can plot various mock-alu implementations for ASAP7 and sky130hd:

Here 8,16,32 and 64 wide ADD operations are plotted:

Various multiplication algorithms for 64 bit multiplication, 4 pipeline stages:

Plotting Han Carlson multiply algorithm with 8, 16, 32 and 64 bit bit width:

Thoughts?

maliberty · 2023-08-20T14:12:30Z

maliberty
Aug 20, 2023
Maintainer

Its interesting that the various multiply algorithms have the same delay (excluding ripple).

I don't see a MULT for sky130

10 replies

maliberty Aug 21, 2023
Maintainer

The same as my comment "Its interesting that the various multiply algorithms have the same delay (excluding ripple)."

QuantamHD Aug 21, 2023
Collaborator

If you're interested you might want to try a booth multiplier from https://github.com/antonblanchard/vlsiffra

oharboe Aug 21, 2023
Collaborator Author

booth isn't mentioned in the docs on that link, nor can I see it in the command line parser.

I missed it when I first looked, but I am going to try out vlsi-adder.

QuantamHD Aug 21, 2023
Collaborator

vlsiffra achieves this by using many well established techniques including Booth encoding, Dadda reduction and a choice of fast adders like Kogge-Stone.

oharboe Aug 21, 2023
Collaborator Author

I see. As I understand booth is used within the algorithms that are listed in vlsi-multiplier algorithms and I have plotted those results above for these multiplication algorithms.

oharboe · 2023-08-21T17:31:48Z

oharboe
Aug 21, 2023
Collaborator Author

Here are the vlsi-adder algorith, excluding ripple that was ca. 2000+ ps. It would have made the graph harder to read:

3 replies

oharboe Aug 21, 2023
Collaborator Author

With negative slack and TNS_END_PERCENT=100:

maliberty Aug 21, 2023
Maintainer

A bit mixed. Some are improved and others are just bigger.

oharboe Aug 21, 2023
Collaborator Author

Negative slack for multipliers, 64 bit, 4 pipeline stages.

This is a great example of how one should not be fooled by the scales on the axis....

There's essentially no difference between these implementations.

Leaving out MULT_RIPPLE as it is far, far off to the right and the bottom. As expected.

maliberty · 2023-08-21T17:34:41Z

maliberty
Aug 21, 2023
Maintainer

Do all your runs end in negative slack at the end of the cts & the complete flow? Once we reach zero slack we stop optimizing.

1 reply

oharboe Aug 21, 2023
Collaborator Author

Ah, I see. No, they have positive slack.

maliberty · 2023-08-21T20:46:50Z

maliberty
Aug 21, 2023
Maintainer

One thing worth noting is that asap7 doesn't have a full or half adder cells in the library (sky130 does).

2 replies

oharboe Aug 27, 2023
Collaborator Author

Can you explain a bit more what that means.

I saw full adders and half adders here...

https://github.com/The-OpenROAD-Project/OpenROAD-flow-scripts/blob/7db5d2c1fc7a2d99711ea3af19c9567dd45e371e/flow/platforms/asap7/yoSys/cells_adders_L.v#L28

maliberty Aug 28, 2023
Maintainer

I take it back - it does seem to have them. The $fa is the full adder in yosys and this code maps it to the tech dependent cells (if you have a constant on a full adder then you can use a half adder instead).

mithro · 2023-08-21T23:31:43Z

mithro
Aug 21, 2023

BTW Have you seen Teo's spreadsheet @ https://docs.google.com/spreadsheets/d/1pTGzZo5XYU7iuUryxorfzJwNuE9rM3le5t44wmLohy4/edit#gid=126548956 ?

2 replies

oharboe Aug 22, 2023
Collaborator Author

Fascinating. So much to explore on simple addition :-)

mithro Aug 22, 2023

FYI, Here is a link dump of the various stuff around adders that I have seen (in case you haven't seen these before);

Teo's work

Matt Venn testing of adders

https://github.com/mattvenn/instrumented_adder
https://github.com/mattvenn/wrapped_instrumented_adder
https://www.youtube.com/watch?v=O38i5Y98m44
https://docs.google.com/presentation/d/12Errw5M7lFYbBfi9yHc9-JNaikUcMz3qiLRNhj5hiUc/edit#slide=id.p
https://www.youtube.com/watch?v=Gg6mxvhiEUs <-- Matt Venn mentions having gotten silicon back

Other stuff

It looks like you have already seen Anton's work @ https://github.com/antonblanchard/vlsiffra

Also checkout Alan Mishchenko's (the author of ABC) recent pre-prints for yet another direction in this space.

I'm sure I'm missing a bunch of stuff too!

mithro · 2023-08-22T21:47:32Z

mithro
Aug 22, 2023

BTW I would love to get a similar spreadsheet to Teo's for GF180MCU and ASAP7 too.

Sadly, Teo got distracted by the mathematical theory and then was stolen by NVIDIA before he could get to that.

0 replies

tspyrou · 2023-08-23T15:25:48Z

tspyrou
Aug 23, 2023
Maintainer

@oharboe did you try using set_clock_uncertainty to force the slack to be negative and make the tool work harder.
As @maliberty mentioned optimization will stop once timing passes.

1 reply

oharboe Aug 23, 2023
Collaborator Author

I simply reduced the clock period to have negative slack and set TNS_END_PERCENT to 100. It changed the results a bit for the better, but I didnt see any dramatic change in relationships between algorithms.

mithro · 2023-09-01T22:40:33Z

mithro
Sep 1, 2023

@oharboe - Any chance you could do a write up of what you discovered?

2 replies

oharboe Sep 2, 2023
Collaborator Author

It is in this post... Any particular questions that come to mind that I could clarify?

mithro Sep 2, 2023

There is a lot bunch of back and forth and I'm unclear what the final results are (and exactly how you produced them).

Various questions include;

How do I reproduce your results and graphs?
What settings did you end up using and why did you end up using those settings? (Particularly around making the tool work harder?)
How do the various implementations compare? Do you understand why the compare this way?
How do SKY130 and ASAP7 compare in the end here?
Do the relative "positions" between the implementations hold across SKY130 and ASAP7?
What was the most interesting / unexpected thing you discovered?

Writing it up as a nice coherent blog post would be pretty awesome but totally understand if you do not have the time to do so.

oharboe · 2023-09-03T08:22:17Z

oharboe
Sep 3, 2023
Collaborator Author

There is a lot bunch of back and forth and I'm unclear what the final results are (and exactly how you produced them).

Various questions include;

How do I reproduce your results and graphs?

Run this script: https://github.com/The-OpenROAD-Project/OpenROAD-flow-scripts/blob/master/flow/designs/src/mock-alu/plot-area-min-clock-period.py

Some tinkering required.

What settings did you end up using and why did you end up using those settings? (Particularly around making the tool work harder?)

I didn't study how to make a best possible ALU, I was only interested the relationship between the ALU operations.

How do the various implementations compare? Do you understand why the compare this way?

The various implementation of additions and multiplications?

It is a mystery why there is essentially no difference between multiplication implementation clock periods...

How do SKY130 and ASAP7 compare in the end here?

The lessons learned on the relative size and speed of ALU operations appear to be much the same with SKY130 and ASAP7. Which is surprising. Ca. 15 years separate them...

Do the relative "positions" between the implementations hold across SKY130 and ASAP7?

Pretty much.

What was the most interesting / unexpected thing you discovered?

That relative size and speed of simple ALU operations are essentially unchanged across process nodes.

Also, it would appear that if an ALU operation is 200ps on x86 7nm, yielding 5GHz, then one could choose to divide clock period of ASAP7+OpenROAD by 4 for simple ALU operations when one models and decide to take the lessons learned and apply them to architectural exploration. Further choose to treat the speed of ASAP7+OpenROAD as not terribly important in terms of making architctural choices as the choices will be the same if everything is optimized.

By this I mean that to drive your architctural exploration, as a first order approximation, write the RTL in an idiomatic way, run them through ASAP7+OpenROAD and if your design is 4x the desired clock period, your design isn't completely off.

Nobody who have information on commercial tools and PDKs can challenge me here. :-) Not because I'm right, but because PDKs and commercial tools are under strict NDAs... This also explains why there are a lot of unsaid things in this thread...

Writing it up as a nice coherent blog post would be pretty awesome but totally understand if you do nt have the time to do so.

Agreed. At least I summarize a bit here. I'm happy to hear that there are some that are interested in this.

Perhaps you would like to write a blog-post using the script above?

0 replies

Comparing ASAP7 and sky130hd with respect to ALU operations #3881

oharboe Aug 20, 2023 Collaborator

Replies: 9 comments · 21 replies

maliberty Aug 20, 2023 Maintainer

maliberty Aug 21, 2023 Maintainer

QuantamHD Aug 21, 2023 Collaborator

oharboe Aug 21, 2023 Collaborator Author

QuantamHD Aug 21, 2023 Collaborator

oharboe Aug 21, 2023 Collaborator Author

oharboe Aug 21, 2023 Collaborator Author

oharboe Aug 21, 2023 Collaborator Author

maliberty Aug 21, 2023 Maintainer

oharboe Aug 21, 2023 Collaborator Author

maliberty Aug 21, 2023 Maintainer

oharboe Aug 21, 2023 Collaborator Author

maliberty Aug 21, 2023 Maintainer

oharboe Aug 27, 2023 Collaborator Author

maliberty Aug 28, 2023 Maintainer

mithro Aug 21, 2023

oharboe Aug 22, 2023 Collaborator Author

mithro Aug 22, 2023

Teo's work

Matt Venn testing of adders

Other stuff

mithro Aug 22, 2023

tspyrou Aug 23, 2023 Maintainer

oharboe Aug 23, 2023 Collaborator Author

mithro Sep 1, 2023

oharboe Sep 2, 2023 Collaborator Author

mithro Sep 2, 2023

oharboe Sep 3, 2023 Collaborator Author

oharboe
Aug 20, 2023
Collaborator

Replies: 9 comments 21 replies

maliberty
Aug 20, 2023
Maintainer

maliberty Aug 21, 2023
Maintainer

QuantamHD Aug 21, 2023
Collaborator

oharboe Aug 21, 2023
Collaborator Author

QuantamHD Aug 21, 2023
Collaborator

oharboe Aug 21, 2023
Collaborator Author

oharboe
Aug 21, 2023
Collaborator Author

oharboe Aug 21, 2023
Collaborator Author

maliberty Aug 21, 2023
Maintainer

oharboe Aug 21, 2023
Collaborator Author

maliberty
Aug 21, 2023
Maintainer

oharboe Aug 21, 2023
Collaborator Author

maliberty
Aug 21, 2023
Maintainer

oharboe Aug 27, 2023
Collaborator Author

maliberty Aug 28, 2023
Maintainer

mithro
Aug 21, 2023

oharboe Aug 22, 2023
Collaborator Author

mithro
Aug 22, 2023

tspyrou
Aug 23, 2023
Maintainer

oharboe Aug 23, 2023
Collaborator Author

mithro
Sep 1, 2023

oharboe Sep 2, 2023
Collaborator Author

oharboe
Sep 3, 2023
Collaborator Author