Replies: 9 comments 21 replies
-
Its interesting that the various multiply algorithms have the same delay (excluding ripple). I don't see a MULT for sky130 |
Beta Was this translation helpful? Give feedback.
-
Here are the |
Beta Was this translation helpful? Give feedback.
-
Do all your runs end in negative slack at the end of the cts & the complete flow? Once we reach zero slack we stop optimizing. |
Beta Was this translation helpful? Give feedback.
-
One thing worth noting is that asap7 doesn't have a full or half adder cells in the library (sky130 does). |
Beta Was this translation helpful? Give feedback.
-
BTW Have you seen Teo's spreadsheet @ https://docs.google.com/spreadsheets/d/1pTGzZo5XYU7iuUryxorfzJwNuE9rM3le5t44wmLohy4/edit#gid=126548956 ? |
Beta Was this translation helpful? Give feedback.
-
BTW I would love to get a similar spreadsheet to Teo's for GF180MCU and ASAP7 too. Sadly, Teo got distracted by the mathematical theory and then was stolen by NVIDIA before he could get to that. |
Beta Was this translation helpful? Give feedback.
-
@oharboe did you try using set_clock_uncertainty to force the slack to be negative and make the tool work harder. |
Beta Was this translation helpful? Give feedback.
-
@oharboe - Any chance you could do a write up of what you discovered? |
Beta Was this translation helpful? Give feedback.
-
Run this script: https://github.com/The-OpenROAD-Project/OpenROAD-flow-scripts/blob/master/flow/designs/src/mock-alu/plot-area-min-clock-period.py Some tinkering required.
I didn't study how to make a best possible ALU, I was only interested the relationship between the ALU operations.
The various implementation of additions and multiplications? It is a mystery why there is essentially no difference between multiplication implementation clock periods...
The lessons learned on the relative size and speed of ALU operations appear to be much the same with SKY130 and ASAP7. Which is surprising. Ca. 15 years separate them...
Pretty much.
That relative size and speed of simple ALU operations are essentially unchanged across process nodes. Also, it would appear that if an ALU operation is 200ps on x86 7nm, yielding 5GHz, then one could choose to divide clock period of ASAP7+OpenROAD by 4 for simple ALU operations when one models and decide to take the lessons learned and apply them to architectural exploration. Further choose to treat the speed of ASAP7+OpenROAD as not terribly important in terms of making architctural choices as the choices will be the same if everything is optimized. By this I mean that to drive your architctural exploration, as a first order approximation, write the RTL in an idiomatic way, run them through ASAP7+OpenROAD and if your design is 4x the desired clock period, your design isn't completely off. Nobody who have information on commercial tools and PDKs can challenge me here. :-) Not because I'm right, but because PDKs and commercial tools are under strict NDAs... This also explains why there are a lot of unsaid things in this thread...
Agreed. At least I summarize a bit here. I'm happy to hear that there are some that are interested in this. Perhaps you would like to write a blog-post using the script above? |
Beta Was this translation helpful? Give feedback.
-
The mock-alu allows studying the speed and area of various typical ALU operations.
The mock-alu has two 64 bit inputs and one 64 bit output. The inputs and outputs are registered within the mock-alu. This is to have clear boundary conditions when studying the guts of the mock-alu.
The mock-alu implements a number of operations and variants of these operations to study the area and minimum clock period for these operations.
The operations fall into a couple of categories:
Next, the mock-alu allows implementing any combination of these operations. This allows implementing an ALU that only supports the shift operations, which can be labelled "SHR,SHL,SRA". This shift operation only mock-alu has a single shared barrelshifter. Similarly, a bitwise logic only mock-alu, can be labelled "OR,AND,XOR".
At this point, we can plot various mock-alu implementations for ASAP7 and sky130hd:
Here 8,16,32 and 64 wide ADD operations are plotted:
Various multiplication algorithms for 64 bit multiplication, 4 pipeline stages:
Plotting Han Carlson multiply algorithm with 8, 16, 32 and 64 bit bit width:
Thoughts?
Beta Was this translation helpful? Give feedback.
All reactions