<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.h, branch v6.14</title>
<subtitle>Mirror of https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/
</subtitle>
<id>https://git.shady.money/linux/atom?h=v6.14</id>
<link rel='self' href='https://git.shady.money/linux/atom?h=v6.14'/>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/'/>
<updated>2024-12-10T15:26:48Z</updated>
<entry>
<title>drm/amdgpu: parse legacy RAS bad page mixed with new data in various NPS modes</title>
<updated>2024-12-10T15:26:48Z</updated>
<author>
<name>Tao Zhou</name>
<email>tao.zhou1@amd.com</email>
</author>
<published>2024-10-31T07:48:10Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=a8d133e625ceb147a173b6cafc862a9bd4312894'/>
<id>urn:sha1:a8d133e625ceb147a173b6cafc862a9bd4312894</id>
<content type='text'>
All legacy RAS bad pages are generated in NPS1 mode, but new bad page
can be generated in any NPS mode, so we can't use retired_page stored
on eeprom directly in non-nps1 mode even for legacy data. We need to
take different actions for different data, new data can be identified
from old data by UMC_CHANNEL_IDX_V2 flag.

Signed-off-by: Tao Zhou &lt;tao.zhou1@amd.com&gt;
Reviewed-by: Hawking Zhang &lt;Hawking.Zhang@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</content>
</entry>
<entry>
<title>drm/amdgpu: save UMC global channel index to eeprom</title>
<updated>2024-12-10T15:26:46Z</updated>
<author>
<name>Tao Zhou</name>
<email>tao.zhou1@amd.com</email>
</author>
<published>2024-10-29T11:46:44Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=71a0e9630027f77d7646c5b750593c9ecfaa27d3'/>
<id>urn:sha1:71a0e9630027f77d7646c5b750593c9ecfaa27d3</id>
<content type='text'>
Save the global channel index returned by RAS TA to eeprom.
We can get memory physical address by MCA address and channel index.

Signed-off-by: Tao Zhou &lt;tao.zhou1@amd.com&gt;
Reviewed-by: Hawking Zhang &lt;Hawking.Zhang@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</content>
</entry>
<entry>
<title>drm/amdgpu: Prefer RAS recovery for scheduler hang</title>
<updated>2024-12-10T15:26:46Z</updated>
<author>
<name>Lijo Lazar</name>
<email>lijo.lazar@amd.com</email>
</author>
<published>2024-10-24T05:31:57Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=e1ee2111ca48169a9fdc5075f7863f5d4d591e2f'/>
<id>urn:sha1:e1ee2111ca48169a9fdc5075f7863f5d4d591e2f</id>
<content type='text'>
Before scheduling a recovery due to scheduler/job hang, check if a RAS
error is detected. If so, choose RAS recovery to handle the situation. A
scheduler/job hang could be the side effect of a RAS error. In such
cases, it is required to go through the RAS error recovery process. A
RAS error recovery process in certains cases also could avoid a full
device device reset.

An error state is maintained in RAS context to detect the block
affected. Fatal Error state uses unused block id. Set the block id when
error is detected. If the interrupt handler detected a poison error,
it's not required to look for a fatal error. Skip fatal error checking
in such cases.

Signed-off-by: Lijo Lazar &lt;lijo.lazar@amd.com&gt;
Reviewed-by: Hawking Zhang &lt;Hawking.Zhang@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</content>
</entry>
<entry>
<title>drm/amdgpu: Implement virt req_ras_err_count</title>
<updated>2024-11-11T16:55:42Z</updated>
<author>
<name>Victor Skvortsov</name>
<email>victor.skvortsov@amd.com</email>
</author>
<published>2024-10-30T14:18:00Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=84a2947ecc85c67f433f2cc2186e54cdb9047b61'/>
<id>urn:sha1:84a2947ecc85c67f433f2cc2186e54cdb9047b61</id>
<content type='text'>
Enable RAS late init  if VF RAS Telemetry is supported.

When enabled, the VF can use this interface to query total
RAS error counts from the host.

The VF FB access may abruptly end due to a fatal error,
therefore the VF must cache and sanitize the input.

The Host allows 15 Telemetry messages every 60 seconds, afterwhich
the host will ignore any more in-coming telemetry messages. The VF will
rate limit its msg calling to once every 5 seconds (12 times in 60 seconds).
While the VF is rate limited, it will continue to report the last
good cached data.

v2: Flip generate report &amp; update statistics order for VF

Signed-off-by: Victor Skvortsov &lt;victor.skvortsov@amd.com&gt;
Acked-by: Tao Zhou &lt;tao.zhou1@amd.com&gt;
Reviewed-by: Zhigang Luo &lt;zhigang.luo@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</content>
</entry>
<entry>
<title>drm/amdgpu: Add helper to initialize badpage info</title>
<updated>2024-09-26T21:06:38Z</updated>
<author>
<name>Lijo Lazar</name>
<email>lijo.lazar@amd.com</email>
</author>
<published>2024-08-30T05:51:43Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=b17f87329d49860130a524ab424ecefd3332600f'/>
<id>urn:sha1:b17f87329d49860130a524ab424ecefd3332600f</id>
<content type='text'>
Add a separate function to read badpage data during initialization.
Reading bad pages will need hardware access and cannot be done during
reset. Hence in cases where device needs a full reset during
init itself, attempting to read will cause a deadlock.

Signed-off-by: Lijo Lazar &lt;lijo.lazar@amd.com&gt;
Reviewed-by: Feifei Xu &lt;Feifei.Xu@amd.com&gt;
Reviewed-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
Acked-by: Rajneesh Bhardwaj &lt;rajneesh.bhardwaj@amd.com&gt;
Tested-by: Rajneesh Bhardwaj &lt;rajneesh.bhardwaj@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</content>
</entry>
<entry>
<title>drm/amdgpu: remove RAS unused paramter 'err_addr'</title>
<updated>2024-08-06T15:11:01Z</updated>
<author>
<name>Yang Wang</name>
<email>kevinyang.wang@amd.com</email>
</author>
<published>2024-08-02T02:11:37Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=671af06690e7f79db51b475a35c3b2619f345abc'/>
<id>urn:sha1:671af06690e7f79db51b475a35c3b2619f345abc</id>
<content type='text'>
- amdgpu_ras_error_statistic_ue_count()
- amdgpu_ras_error_statistic_ce_count()
- amdgpu_ras_error_statistic_de_count()

The parameter 'err_addr' is no longer used since following patch.

Fixes: a7e8467fbeee ("drm/amdgpu: Remove unused code")
Signed-off-by: Yang Wang &lt;kevinyang.wang@amd.com&gt;
Reviewed-by: Hawking Zhang &lt;Hawking.Zhang@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</content>
</entry>
<entry>
<title>drm/amdgpu: create function to check RAS RMA status</title>
<updated>2024-08-06T15:11:01Z</updated>
<author>
<name>Tao Zhou</name>
<email>tao.zhou1@amd.com</email>
</author>
<published>2024-08-01T06:26:19Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=792be2e23ac69821db7860ba4ba94592101f0b07'/>
<id>urn:sha1:792be2e23ac69821db7860ba4ba94592101f0b07</id>
<content type='text'>
In the convenience of calling it globally.

Signed-off-by: Tao Zhou &lt;tao.zhou1@amd.com&gt;
Reviewed-by: Hawking Zhang &lt;Hawking.Zhang@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</content>
</entry>
<entry>
<title>drm/amdgpu: Add more types for boot time error reporting</title>
<updated>2024-08-06T15:10:17Z</updated>
<author>
<name>Hawking Zhang</name>
<email>Hawking.Zhang@amd.com</email>
</author>
<published>2024-08-01T05:45:27Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=dfe9d047b162f3a79ab63046608c693ee14c5b7a'/>
<id>urn:sha1:dfe9d047b162f3a79ab63046608c693ee14c5b7a</id>
<content type='text'>
Data abort exception and unknown errors are supported.

Signed-off-by: Hawking Zhang &lt;Hawking.Zhang@amd.com&gt;
Reviewed-by: Tao Zhou &lt;tao.zhou1@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</content>
</entry>
<entry>
<title>drm/amdgpu: Remove unused code</title>
<updated>2024-07-23T21:32:20Z</updated>
<author>
<name>YiPeng Chai</name>
<email>YiPeng.Chai@amd.com</email>
</author>
<published>2024-07-11T08:27:08Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=a7e8467fbeee654e390aad1736291d273b407a2c'/>
<id>urn:sha1:a7e8467fbeee654e390aad1736291d273b407a2c</id>
<content type='text'>
Remove unused code.

Signed-off-by: YiPeng Chai &lt;YiPeng.Chai@amd.com&gt;
Reviewed-by: Hawking Zhang &lt;Hawking.Zhang@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</content>
</entry>
<entry>
<title>drm/amdgpu: optimize logging deferred error info</title>
<updated>2024-07-23T21:32:14Z</updated>
<author>
<name>YiPeng Chai</name>
<email>YiPeng.Chai@amd.com</email>
</author>
<published>2024-07-11T08:14:22Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=56631dee2932dbc203f0abd1011aa9d3d621e206'/>
<id>urn:sha1:56631dee2932dbc203f0abd1011aa9d3d621e206</id>
<content type='text'>
1. Use pa_pfn as the radix-tree key index to log
   deferred error info.
2. Use local array to store a row of bad pages.

Signed-off-by: YiPeng Chai &lt;YiPeng.Chai@amd.com&gt;
Reviewed-by: Hawking Zhang &lt;Hawking.Zhang@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</content>
</entry>
</feed>
