<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.h, branch v5.14</title>
<subtitle>Mirror of https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/
</subtitle>
<id>https://git.shady.money/linux/atom?h=v5.14</id>
<link rel='self' href='https://git.shady.money/linux/atom?h=v5.14'/>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/'/>
<updated>2021-07-13T15:48:10Z</updated>
<entry>
<title>drm/amdgpu: Return error if no RAS</title>
<updated>2021-07-13T15:48:10Z</updated>
<author>
<name>Luben Tuikov</name>
<email>luben.tuikov@amd.com</email>
</author>
<published>2021-07-02T22:35:14Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=43a44c5322d1030d8f36ad679307c61f5b4e3716'/>
<id>urn:sha1:43a44c5322d1030d8f36ad679307c61f5b4e3716</id>
<content type='text'>
In amdgpu_ras_query_error_count() return an error
if the device doesn't support RAS. This prevents
that function from having to always set the values
of the integer pointers (if set), and thus
prevents function side effects--always to have to
set values of integers if integer pointers set,
regardless of whether RAS is supported or
not--with this change this side effect is
mitigated.

Also, if no pointers are set, don't count, since
we've no way of reporting the counts.

Also, give this function a kernel-doc.

Cc: Alexander Deucher &lt;Alexander.Deucher@amd.com&gt;
Cc: John Clements &lt;john.clements@amd.com&gt;
Cc: Hawking Zhang &lt;Hawking.Zhang@amd.com&gt;
Reported-by: Tom Rix &lt;trix@redhat.com&gt;
Fixes: a46751fbcde505 ("drm/amdgpu: Fix RAS function interface")
Signed-off-by: Luben Tuikov &lt;luben.tuikov@amd.com&gt;
Reviewed-by: Alexander Deucher &lt;Alexander.Deucher@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</content>
</entry>
<entry>
<title>drm/amdgpu: Use delayed work to collect RAS error counters</title>
<updated>2021-05-27T16:23:06Z</updated>
<author>
<name>Luben Tuikov</name>
<email>luben.tuikov@amd.com</email>
</author>
<published>2021-05-21T15:53:09Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=05adfd80cc52e0b4581e65bb5418de5dfd24d105'/>
<id>urn:sha1:05adfd80cc52e0b4581e65bb5418de5dfd24d105</id>
<content type='text'>
On Context Query2 IOCTL return the correctable and
uncorrectable errors in O(1) fashion, from cached
values, and schedule a delayed work function to
calculate and cache them for the next such IOCTL.

v2: Cancel pending delayed work at ras_fini().
v3: Remove conditionals when dealing with delayed
    work manipulation as they're inherently racy.

Cc: Alexander Deucher &lt;Alexander.Deucher@amd.com&gt;
Cc: Christian König &lt;christian.koenig@amd.com&gt;
Cc: John Clements &lt;john.clements@amd.com&gt;
Cc: Hawking Zhang &lt;Hawking.Zhang@amd.com&gt;
Signed-off-by: Luben Tuikov &lt;luben.tuikov@amd.com&gt;
Reviewed-by: Alexander Deucher &lt;Alexander.Deucher@amd.com&gt;
Reviewed-by: Christian König &lt;christian.koenig@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</content>
</entry>
<entry>
<title>drm/amdgpu: Fix RAS function interface</title>
<updated>2021-05-27T16:22:54Z</updated>
<author>
<name>Luben Tuikov</name>
<email>luben.tuikov@amd.com</email>
</author>
<published>2021-05-19T01:07:17Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=a46751fbcde505e6aff8622e17995092c8d86ae4'/>
<id>urn:sha1:a46751fbcde505e6aff8622e17995092c8d86ae4</id>
<content type='text'>
The correctable and uncorrectable errors
are calculated at each invocation of this
function. Therefore, it is highly inefficient to
return just one of them based on a Boolean
input. If the caller wants both, twice the work
would be done. (And this work is O(n^3) on
Vega20.)

Fix this "interface" to simply return what it had
calculated--both values. Let the caller choose
what it wants to record, inspect, use.

Cc: Alexander Deucher &lt;Alexander.Deucher@amd.com&gt;
Cc: John Clements &lt;john.clements@amd.com&gt;
Cc: Hawking Zhang &lt;Hawking.Zhang@amd.com&gt;
Signed-off-by: Luben Tuikov &lt;luben.tuikov@amd.com&gt;
Reviewed-by: Alexander Deucher &lt;Alexander.Deucher@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</content>
</entry>
<entry>
<title>drm/amdgpu: Conditionally reset RAS counters on boot</title>
<updated>2021-05-20T02:38:11Z</updated>
<author>
<name>John Clements</name>
<email>john.clements@amd.com</email>
</author>
<published>2021-05-17T08:36:26Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=8f6368a9c92645e72fdd55862aa09821cdb81b43'/>
<id>urn:sha1:8f6368a9c92645e72fdd55862aa09821cdb81b43</id>
<content type='text'>
Only clear RAS error counters if perestent EDC harvesting is not supported

Reviewed-by: Hawking Zhang &lt;Hawking.Zhang@amd.com&gt;
Signed-off-by: John Clements &lt;john.clements@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</content>
</entry>
<entry>
<title>drm/amdgpu: Rename to ras_*_enabled</title>
<updated>2021-05-10T22:08:12Z</updated>
<author>
<name>Luben Tuikov</name>
<email>luben.tuikov@amd.com</email>
</author>
<published>2021-05-04T06:25:29Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=8ab0d6f030bab4d8c7310e9894f6fdb59817d241'/>
<id>urn:sha1:8ab0d6f030bab4d8c7310e9894f6fdb59817d241</id>
<content type='text'>
Rename,
  ras_hw_supported --&gt; ras_hw_enabled, and
  ras_features     --&gt; ras_enabled,
to show that ras_enabled is a subset of
ras_hw_enabled, which itself is a subset
of the ASIC capability.

Cc: Alexander Deucher &lt;Alexander.Deucher@amd.com&gt;
Cc: John Clements &lt;john.clements@amd.com&gt;
Cc: Hawking Zhang &lt;Hawking.Zhang@amd.com&gt;
Signed-off-by: Luben Tuikov &lt;luben.tuikov@amd.com&gt;
Acked-by: Christian König &lt;christian.koenig@amd.com&gt;
Reviewed-by: John Clements &lt;John.Clements@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</content>
</entry>
<entry>
<title>drm/amdgpu: Move up ras_hw_supported</title>
<updated>2021-05-10T22:08:06Z</updated>
<author>
<name>Luben Tuikov</name>
<email>luben.tuikov@amd.com</email>
</author>
<published>2021-05-04T00:43:00Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=e509965e58ab2c96418e1a5e756e25d9a54e0765'/>
<id>urn:sha1:e509965e58ab2c96418e1a5e756e25d9a54e0765</id>
<content type='text'>
Move ras_hw_supported into struct amdgpu_dev.
The dependency is:
struct amdgpu_ras &lt;== struct amdgpu_dev &lt;== ASIC,
read as "struct amdgpu_ras depends on struct
amdgpu_dev, which depends on the hardware."

This can be loosely understood as, "if RAS is
supported, which is property of the ASIC (struct
amdgpu_dev), then we can access struct
amdgpu_ras."

v2: Fix a typo: must binary AND in ternary cond
    in amdgpu_ras.c

Cc: Alexander Deucher &lt;Alexander.Deucher@amd.com&gt;
Cc: John Clements &lt;john.clements@amd.com&gt;
Cc: Hawking Zhang &lt;Hawking.Zhang@amd.com&gt;
Signed-off-by: Luben Tuikov &lt;luben.tuikov@amd.com&gt;
Acked-by: Christian König &lt;christian.koenig@amd.com&gt;
Reviewed-by: John Clements &lt;John.Clements@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</content>
</entry>
<entry>
<title>drm/amdgpu: Remove redundant ras-&gt;supported</title>
<updated>2021-05-10T22:07:56Z</updated>
<author>
<name>Luben Tuikov</name>
<email>luben.tuikov@amd.com</email>
</author>
<published>2021-05-04T00:02:22Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=acdae2169bae6993c61ded36ae0b2301c07c0a9e'/>
<id>urn:sha1:acdae2169bae6993c61ded36ae0b2301c07c0a9e</id>
<content type='text'>
Remove redundant ras-&gt;supported, as this value
is also stored in adev-&gt;ras_features.

Use adev-&gt;ras_features, as that supercedes "ras",
since the latter is its member.

The dependency goes like this:
ras &lt;== adev-&gt;ras_features &lt;== hw_supported,
and is read as "ras depends on ras_features, which
depends on hw_supported." The arrows show the flow
of information, i.e. the dependency update.

"hw_supported" should also live in "adev".

Cc: Alexander Deucher &lt;Alexander.Deucher@amd.com&gt;
Cc: John Clements &lt;john.clements@amd.com&gt;
Cc: Hawking Zhang &lt;Hawking.Zhang@amd.com&gt;
Signed-off-by: Luben Tuikov &lt;luben.tuikov@amd.com&gt;
Acked-by: Christian König &lt;christian.koenig@amd.com&gt;
Reviewed-by: John Clements &lt;John.Clements@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</content>
</entry>
<entry>
<title>drm/amdgpu: fix send ras disable cmd when asic not support ras</title>
<updated>2021-03-24T03:30:12Z</updated>
<author>
<name>Stanley.Yang</name>
<email>Stanley.Yang@amd.com</email>
</author>
<published>2021-03-10T11:10:11Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=970fd19764349081d8fcb1ce816f7c75907b9d54'/>
<id>urn:sha1:970fd19764349081d8fcb1ce816f7c75907b9d54</id>
<content type='text'>
    cause:
	It is necessary to send ras disable command to ras-ta during gfx
	block ras later init, because the ras capability is disable read
	from vbios for vega20 gaming, but the ras context is released
	during ras init process, this will cause send ras disable command
	to ras-to failed.
    how:
	Delay releasing ras context, the ras context
	will be released after gfx block later init done.

Changed from V1:
    move release_ras_context into ras_resume

Changed from V2:
    check BIT(UMC) is more reasonable before access eeprom table

Signed-off-by: Stanley.Yang &lt;Stanley.Yang@amd.com&gt;
Reviewed-by: Hawking Zhang &lt;Hawking.Zhang@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</content>
</entry>
<entry>
<title>drm/amdgpu: harvest edc status when connected to host via xGMI</title>
<updated>2021-03-24T03:00:41Z</updated>
<author>
<name>Dennis Li</name>
<email>Dennis.Li@amd.com</email>
</author>
<published>2021-02-04T05:32:05Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=761d86d37f86ebba77e59fa59ccef4dc2f38674f'/>
<id>urn:sha1:761d86d37f86ebba77e59fa59ccef4dc2f38674f</id>
<content type='text'>
When connected to a host via xGMI, system fatal errors may trigger
warm reset, driver has no change to query edc status before reset.
Therefore in this case, driver should harvest previous error loging
registers during boot, instead of only resetting them.

v2:
1. IP's ras_manager object is created when its ras feature is enabled,
so change to query edc status after amdgpu_ras_late_init called

2. change to enable watchdog timer after finishing gfx edc init

Signed-off-by: Dennis Li &lt;Dennis.Li@amd.com&gt;
Reivewed-by: Hawking Zhang &lt;hawking.zhang@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</content>
</entry>
<entry>
<title>drm/amdgpu: remove unnecessary reading for epprom header</title>
<updated>2021-02-26T22:23:49Z</updated>
<author>
<name>Dennis Li</name>
<email>Dennis.Li@amd.com</email>
</author>
<published>2021-02-26T01:17:10Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=11003c68b158c07b95e4ba3630a86aff9c442ee7'/>
<id>urn:sha1:11003c68b158c07b95e4ba3630a86aff9c442ee7</id>
<content type='text'>
If the number of badpage records exceed the threshold, driver has
updated both epprom header and control-&gt;tbl_hdr.header before gpu reset,
therefore GPU recovery thread no need to read epprom header directly.

v2: merge amdgpu_ras_check_err_threshold into amdgpu_ras_eeprom_check_err_threshold

Signed-off-by: Dennis Li &lt;Dennis.Li@amd.com&gt;
Reviewed-by: Hawking Zhang &lt;Hawking.Zhang@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</content>
</entry>
</feed>
