Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Question] How are the error responses treated #31

Open
khandelwaltanuj opened this issue Oct 17, 2024 · 16 comments · Fixed by #38
Open

[Question] How are the error responses treated #31

khandelwaltanuj opened this issue Oct 17, 2024 · 16 comments · Fixed by #38
Assignees
Labels
bug Something isn't working fixed Waiting for validation

Comments

@khandelwaltanuj
Copy link

Hello,

I would like to understand, how error responses are treated. This is what I am observing.

  1. In the case where memory responds to a READ with an error, all the subsequent load which are hit at the same addresse have an error response.
  2. In the case where memory responds to a READ with an error, the store which follows at the same adresse has a response withtout and error.

Is it correct ?

Thanks and Regards
Tanuj Khandelwal

@cfuguet
Copy link
Contributor

cfuguet commented Oct 17, 2024

Hello @khandelwaltanuj,

The second case,

In the case where memory responds to a READ with an error, the store which follows at the same adresse has a response withtout and error.

The behaviour of the cache depends on the write policy used for the store. Let me explain: when there is a read miss, if the response has the error flag set, then the cache does not refill the cacheline and the response is dropped. Then, if there is a store after that on the same cacheline there will be a write miss.

The HPDcache implements a write-non-allocate policy for write-through stores, and a write-allocate policy for write-back stores. In the first case, after a write miss, the cache writes the store data into the write-buffer and responds right away to the core (with no error0. In the second case, the cache first reads the cacheline from the memory, then:

  • if the read response has not the error flag set, then it refills the cacheline, writes the store data locally and responds with no error to the core.
  • if the read response has the error flag set, the cache does not refill the cacheline, and responds with an error to the core.

We can say that write-through stores are acknowledged asynchronously to the core, then it is not possible to know at response time if the store will be an error. On the other hand, write-back stores are synchronous, thus it is possible to respond with an eventual error to the core.

I hope it is clear.

Cheers,

César

@cfuguet cfuguet self-assigned this Oct 17, 2024
@khandelwaltanuj
Copy link
Author

Hi César ,

Thanks for the response.

I assume in the case where a load follows are load. If first load is an error, the cache will not make a new memory access for the second load. It will just send another error ?

Regards
Tanuj

@cfuguet
Copy link
Contributor

cfuguet commented Oct 17, 2024

No, it will again try the access to the memory. This is for two reasons:

  1. It will be too expensive to save the state of every accessed cacheline.
  2. Some errors are transient (e.g. error detection on data transmission from the DRAM controller). A given read could return an error in a given time, but succeed in subsequent one. In such cases, it would be wrong to tag the address as bad indefinitely. Moreover, some of these transient errors are not even related to a given address.

@khandelwaltanuj
Copy link
Author

Hello

Thanks for your reponse.

Regards
Tanuj

@khandelwaltanuj
Copy link
Author

khandelwaltanuj commented Oct 23, 2024

Hello @cfuguet

Can you please look into following scenerio:

I have a LOAD (NEED_RSP = 0, TID=25), followed by a STORE (NEED_RSP=1, TID=26). Both are write through.
For load memory replies with an error and we observe that STORE (TID=26) response from cache is with an error.
Here I believe that it is not correct for STORE to respond with an error. Can you please take a look at the following part of log.
If you think there is an issue here, I can share the test.

UVM_INFO @ 358311500 ps [SB HPDCACHE REQ 4] OP=HPDCACHE_REQ_LOAD SID=4(x), TID=25(x), ADDR=cc8eef1b801e(x) SET=0(d), TAG=3323bbc6e(x), WORD=3(x) DATA=2f975367c81873b607a455d88a559909(x) BE=8000(x) SIZE=1(x) NEED_RSP=0(x) PHYS_IDX=0(x) UNCACHEABLE=0(x) WRITE POLICY=HPDCACHE_WR_POLICY_WT

UVM_INFO @ 358313000 ps [SB MEM REQ] ID=0(x), ADDR=cc8eef1b8000(x) SET=0(d), TAG=3323bbc6e(x), WORD=0(x) SIZE=6(d) LEN=0(d), CMD=HPDCACHE_MEM_READ ATOMIC=HPDCACHE_MEM_ATOMIC_ADD CACHEABLE=1(x)

UVM_INFO @ 358313500 ps [SB HPDCACHE REQ 4] OP=HPDCACHE_REQ_STORE SID=4(x), TID=26(x), ADDR=cc8eef1b8018(x) SET=0(d), TAG=3323bbc6e(x), WORD=3(x) DATA=b05fd77fa3cea49fdb552465dfbda0c4(x) BE=d700(x) SIZE=3(x) NEED_RSP=1(x) PHYS_IDX=1(x) UNCACHEABLE=0(x) WRITE POLICY=HPDCACHE_WR_POLICY_WT

UVM_INFO @ 358321000 ps [SB MEM READ RSP] ID=0(x), SET=0(d), TAG=3323bbc6e(x), WORD=0(x) ERROR=1(x), LAST=1(x) DATA=9359450d0c53ec359aabc8853128668f92cc8abc2e460c833b516b320623d9aec73eeb02fd3d2c4db9c1dc886deab60bb8948865d41fd2658c37f79cfbbf1800(x)

UVM_INFO @ 358326500 ps [SB HPDCACHE RSP 4] RSP SID=4(x), TID=26(x), ADDR=cc8eef1b8018(x) SET=0(d), TAG=3323bbc6e(x), DATA=0(x) ERROR=1(x)

UVM_ERROR @ 358326500 ps: uvm_test_top.env.m_hpdcache_sb [SB HPDCACHE ERROR ERROR] SET=0(d), TAG=3323bbc6e(x), Expected : 0(b), RECIEVED : 1(b)

Thanks and Regards
Tanuj Khandelwal

@cfuguet
Copy link
Contributor

cfuguet commented Oct 24, 2024

Hello @khandelwaltanuj,

Yes, thank you. I will take a look into it.

I think I know where the problem comes from. It is a side-effect of the modifications to respond with an error in case of a write miss with error response from the memory.

When a read misses and, while waiting for its response, there is a write on the cacheline, the write is put on-hold into the Replay Table (RTAB). When the read error response arrives to the cache, the miss handler tags the write with an error, so when this write is replayed, the cache responds immediately with an error to the core.

I need to change the condition to tag with an error the pending write. I need only to do it when it is a write-back write miss. Otherwise, I can replay it normally.

I will do the modification and let you know,

Thanks,

César

@cfuguet cfuguet added the bug Something isn't working label Oct 24, 2024
@khandelwaltanuj
Copy link
Author

Thanks @cfuguet

You mean to say in the case where the STORE request (TID=26) was write back, the cache would reply with an error=1 because the previous load on the same entry has an Error=1 ?

Regards
Tanuj

@cfuguet
Copy link
Contributor

cfuguet commented Oct 25, 2024

No, if the write TID=26 is write-back, it should trigger a read to the memory because it will miss in the cache (the previous load TID=25 was an error), then the cache will respond with an error to the write if the read to the memory responds with an error. But this is unrelated to the previous load TID=25.

@cfuguet
Copy link
Contributor

cfuguet commented Oct 29, 2024

@khandelwaltanuj, this issue is now fixed. Let me know if you are able to validate it on your side.

Thanks

@cfuguet cfuguet added the fixed Waiting for validation label Oct 29, 2024
@khandelwaltanuj
Copy link
Author

Hi César,

As this issue is opened by me, I prefer that I vérify the fix and close it.

Thanks a lot
Regards
Tanuj

@cfuguet
Copy link
Contributor

cfuguet commented Oct 29, 2024

Ok, that's fair.

I used the mechanism of Github to make a pull_request to automatically close related issues...

But let's keep it open until you validate the fix on your side.

@khandelwaltanuj
Copy link
Author

khandelwaltanuj commented Oct 30, 2024

Hello @cfuguet

I have a following scenario in one of my test:
There are multiples stores with write_policy_auto followed by a store with write_policy_wb. I have cfg_default_wb_i == 0 and following parameters set.
wtEn : 1,
wbEn : 1

In the following scenarion, I see a read with ID=27 (the first one) with an error response=1. I am not able to understand which request is causing this read request, is it the write with write back policy ou the write with auto policy ?

If it is write back policy that is causing this read, in that case the UVM ERROR is probably because of an issue in the scoreboard, otherwise it may comme from an issue in the design. Can you please take a look at it ?

UVM_INFO @ 387331500 ps:[SB HPDCACHE REQ 0] OP=HPDCACHE_REQ_LOAD SID=0(x), TID=66(x), ADDR=2bd96baed9d2(x) SET=103(d), TAG=af65aebb(x), WORD=2(x) DATA=f5493511c5c6d6df15a130f639126aa0(x) BE=8(x) SIZE=1(x) NEED_RSP=0(x) PHYS_IDX=1(x) UNCACHEABLE=0(x) WRITE POLICY=HPDCACHE_WR_POLICY_AUTO
UVM_INFO @ 387486000 ps:[SB MEM WRITE RSP(SOME OLD REQUEST)] ID=0(x), SET=103(d), TAG=af65aebb(x), WORD=0(x) ERROR=0(x), ATOMIC=0(x)
UVM_INFO @ 387574500 ps:[SB HPDCACHE REQ 0] OP=HPDCACHE_REQ_STORE SID=0(x), TID=2c(x), ADDR=2bd96baed9d0(x) SET=103(d), TAG=af65aebb(x), WORD=2(x) DATA=eeea323220bf50ef0732a61af82b429e(x) BE=40(x) SIZE=3(x) NEED_RSP=0(x) PHYS_IDX=1(x) UNCACHEABLE=0(x) WRITE POLICY=HPDCACHE_WR_POLICY_AUTO
UVM_INFO @ 387577000 ps:[SB MEM REQ] ID=3(x), ADDR=2bd96baed9c0(x) SET=103(d), TAG=af65aebb(x), WORD=0(x) SIZE=6(d) LEN=0(d), CMD=HPDCACHE_MEM_WRITE ATOMIC=HPDCACHE_MEM_ATOMIC_ADD CACHEABLE=1(x)
UVM_INFO @ 387579000 ps:[SB MEM EXT REQ] ID=3(x), ADDR=2bd96baed9c0(x) SET=103(d), TAG=af65aebb(x), WORD=0(x) Data=3200000000000000000000000000000000000000000000(x) BE=400000(x) SIZE=x(d) LEN=x(d), CMD=HPDCACHE_MEM_WRITE ATOMIC=HPDCACHE_MEM_ATOMIC_SMAX CACHEABLE=1(x)
UVM_INFO @ 387600500 ps:[SB HPDCACHE REQ 0] OP=HPDCACHE_REQ_STORE SID=0(x), TID=62(x), ADDR=2bd96baed9d0(x) SET=103(d), TAG=af65aebb(x), WORD=2(x) DATA=6ca7487bed7c02ebbff6f68a2c07df92(x) BE=b4(x) SIZE=3(x) NEED_RSP=0(x) PHYS_IDX=1(x) UNCACHEABLE=0(x) WRITE POLICY=HPDCACHE_WR_POLICY_WB
UVM_INFO @ 387624000 ps:[SB MEM WRITE RSP] ID=3(x), SET=103(d), TAG=af65aebb(x), WORD=0(x) ERROR=0(x), ATOMIC=0(x)
UVM_INFO @ 387629000 ps:[SB MEM REQ] ID=27(x), ADDR=2bd96baed9c0(x) SET=103(d), TAG=af65aebb(x), WORD=0(x) SIZE=6(d) LEN=0(d), CMD=HPDCACHE_MEM_READ ATOMIC=HPDCACHE_MEM_ATOMIC_ADD CACHEABLE=1(x)
UVM_INFO @ 387682000 ps:[SB MEM READ RSP] ID=27(x), SET=103(d), TAG=af65aebb(x), WORD=0(x) ERROR=1(x), LAST=1(x) DATA=509582ff84d2cb7e67bee1423e7843d2c9b7ebdf4af2406cf068ea7cac233d8c9c08dfddb2724fc069324eec5096056d0c699f768ea18e852a546a5f73b991e5(x)
UVM_INFO @ 387821500 ps:[SB HPDCACHE REQ 4] OP=HPDCACHE_REQ_LOAD SID=4(x), TID=c(x), ADDR=2bd96baed9d2(x) SET=103(d), TAG=af65aebb(x), WORD=2(x) DATA=95f064238655a1d13594099c21c698be(x) BE=4(x) SIZE=0(x) NEED_RSP=1(x) PHYS_IDX=0(x) UNCACHEABLE=0(x) WRITE POLICY=HPDCACHE_WR_POLICY_WT
UVM_INFO @ 387823000 ps:[SB MEM REQ] ID=27(x), ADDR=2bd96baed9c0(x) SET=103(d), TAG=af65aebb(x), WORD=0(x) SIZE=6(d) LEN=0(d), CMD=HPDCACHE_MEM_READ ATOMIC=HPDCACHE_MEM_ATOMIC_ADD CACHEABLE=1(x)
UVM_INFO @ 387855000 ps:[SB MEM READ RSP] ID=27(x), SET=103(d), TAG=af65aebb(x), WORD=0(x) ERROR=0(x), LAST=1(x) DATA=509582ff84d2cb7e67bee1423e7843d2c9b7ebdf4af2406cf068ea7cac233d8c9c08dfddb2724fc069324eec5096056d0c699f768ea18e852a546a5f73b991e5(x)
UVM_INFO @ 387857500 ps:[SB HPDCACHE RSP 4] RSP SID=4(x), TID=c(x), ADDR=2bd96baed9d2(x) SET=103(d), TAG=af65aebb(x), DATA=9c08dfddb2724fc069324eec5096056d(x) ERROR=0(x)
UVM_INFO @ 387857500 ps: uvm_test_top.env.m_hpdcache_sb [SB HPDCACHE LOAD/AMO RSP] OP=HPDCACHE_REQ_LOAD ADDR=2bd96baed9c0(x) SET=103(d), TAG=af65aebb(x) Offset=18(d) WORD=1(d) DATA=9c08dfddb2724fc0bf32f68a5007056d(x) ERROR=0(x) ERROR=0(x)
UVM_ERROR @ 387857500 ps: uvm_test_top.env.m_hpdcache_sb [SB HPDCACHE DATA ERROR] ADDR=2bd96baed9d2(x), SET=103(d), TAG=af65aebb(x) BYTE=2(d) ACC DATA=96(x) EXP DATA=7(x)

Regards
Tanuj Khandelwal

@khandelwaltanuj
Copy link
Author

Hi @cfuguet

Any update on this one please ?

Thanks and Regards
Tanuj Khandelwal

@cfuguet
Copy link
Contributor

cfuguet commented Nov 19, 2024

Hello @khandelwaltanuj,

I do not have yet access to QuestaSim on my side, thus I cannot replay the test. Whatsoever, the STORE with ID=63 indicates the WB mode, thus it can trigger a MEM_READ in case of miss. This is probably what is happening here.

César

@khandelwaltanuj
Copy link
Author

Hi @cfuguet

do have access to any industrial simulator like vcs or something else ? I can try to shift to that simultor if we have acces ?

Regards
Tanuj Khandelwal

@cfuguet
Copy link
Contributor

cfuguet commented Nov 19, 2024

Unfortunately, for the moment I'm only able to use Verilator... but I will have soon again access to commercial tools.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working fixed Waiting for validation
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants