<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux/drivers/gpu/drm/amd/amdgpu/amdgpu_reset.h, branch v6.8</title>
<subtitle>Mirror of https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/
</subtitle>
<id>https://git.shady.money/linux/atom?h=v6.8</id>
<link rel='self' href='https://git.shady.money/linux/atom?h=v6.8'/>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/'/>
<updated>2024-01-18T21:43:42Z</updated>
<entry>
<title>drm/amdgpu: revert "Adjust removal control flow for smu v13_0_2"</title>
<updated>2024-01-18T21:43:42Z</updated>
<author>
<name>Christian König</name>
<email>christian.koenig@amd.com</email>
</author>
<published>2024-01-10T14:19:29Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=fb1c93c2e9604a884467a773790016199f78ca08'/>
<id>urn:sha1:fb1c93c2e9604a884467a773790016199f78ca08</id>
<content type='text'>
Calling amdgpu_device_ip_resume_phase1() during shutdown leaves the
HW in an active state and is an unbalanced use of the IP callbacks.

Using the IP callbacks like this can lead to memory leaks, double
free and imbalanced reference counters.

Leaving the HW in an active state can lead to DMA accesses to memory now
freed by the driver.

Both is a complete no-go for driver unload so completely revert the
workaround for now.

This reverts commit f5c7e7797060255dbc8160734ccc5ad6183c5e04.

Signed-off-by: Christian König &lt;christian.koenig@amd.com&gt;
Acked-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
Cc: stable@vger.kernel.org
</content>
</entry>
<entry>
<title>drm/amdgpu: Create version number for coredumps</title>
<updated>2023-10-20T19:11:29Z</updated>
<author>
<name>André Almeida</name>
<email>andrealmeid@igalia.com</email>
</author>
<published>2023-09-15T16:44:53Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=de009982c6aa8363b2bc8800fb0a13896d264853'/>
<id>urn:sha1:de009982c6aa8363b2bc8800fb0a13896d264853</id>
<content type='text'>
Even if there's nothing currently parsing amdgpu's coredump files, if
we eventually have such tools they will be glad to find a version field
to properly read the file.

Create a version number to be displayed on top of coredump file, to be
incremented when the file format or content get changed.

Signed-off-by: André Almeida &lt;andrealmeid@igalia.com&gt;
Reviewed-by: Shashank Sharma &lt;shashank.sharma@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</content>
</entry>
<entry>
<title>drm/amdgpu: Move coredump code to amdgpu_reset file</title>
<updated>2023-10-20T19:11:29Z</updated>
<author>
<name>André Almeida</name>
<email>andrealmeid@igalia.com</email>
</author>
<published>2023-09-15T14:44:16Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=69619868d39bf364721db8d9d2429420704417a3'/>
<id>urn:sha1:69619868d39bf364721db8d9d2429420704417a3</id>
<content type='text'>
Giving that we use codedump just for device resets, move it's functions
and structs to a more semantic file, the amdgpu_reset.{c, h}.

Signed-off-by: André Almeida &lt;andrealmeid@igalia.com&gt;
Signed-off-by: Shashank Sharma &lt;shashank.sharma@amd.com&gt;
Reviewed-by: Shashank Sharma &lt;shashank.sharma@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</content>
</entry>
<entry>
<title>drm/amdgpu: Keep reset handlers shared</title>
<updated>2023-08-30T18:57:54Z</updated>
<author>
<name>Lijo Lazar</name>
<email>lijo.lazar@amd.com</email>
</author>
<published>2023-08-05T09:57:01Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=f8a499aed290667bd37011ad534c66320dc48257'/>
<id>urn:sha1:f8a499aed290667bd37011ad534c66320dc48257</id>
<content type='text'>
Instead of maintaining a list per device, keep the reset handlers common
per ASIC family. A pointer to the list of handlers is maintained in
reset control.

Signed-off-by: Lijo Lazar &lt;lijo.lazar@amd.com&gt;
Reviewed-by: Le Ma &lt;le.ma@amd.com&gt;
Reviewed-by: Asad Kamal &lt;asad.kamal@amd.com&gt;
Tested-by: Asad Kamal &lt;asad.kamal@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</content>
</entry>
<entry>
<title>Revert "drm/amdgpu: let mode2 reset fallback to default when failure"</title>
<updated>2022-10-19T02:08:33Z</updated>
<author>
<name>Victor Zhao</name>
<email>Victor.Zhao@amd.com</email>
</author>
<published>2022-10-13T03:06:33Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=a340847b0214aa9b8fd9839f7b2822ccc607edab'/>
<id>urn:sha1:a340847b0214aa9b8fd9839f7b2822ccc607edab</id>
<content type='text'>
This reverts commit dac6b80818ac2353631c5a33d140d8d5508e2957.

This commit reverted the AMDGPU_SKIP_MODE2_RESET as it conflicts with
the original design of reset handler. Will redesign it.

Fixes: dac6b80818ac23 ("drm/amdgpu: let mode2 reset fallback to default when failure")
Signed-off-by: Victor Zhao &lt;Victor.Zhao@amd.com&gt;
Reviewed-by: Lijo Lazar &lt;lijo.lazar@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</content>
</entry>
<entry>
<title>drm/amdgpu: Skip put_reset_domain if it doesn't exist</title>
<updated>2022-09-29T13:43:52Z</updated>
<author>
<name>Vignesh Chander</name>
<email>Vignesh.Chander@amd.com</email>
</author>
<published>2022-09-28T18:59:45Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=f61a825aa86115dbdcaba25bba78e007b5e8e1b1'/>
<id>urn:sha1:f61a825aa86115dbdcaba25bba78e007b5e8e1b1</id>
<content type='text'>
For xgmi sriov, the reset is handled by host driver and hive-&gt;reset_domain
is not initialized so need to check if it exists before doing a put.

Signed-off-by: Vignesh Chander &lt;Vignesh.Chander@amd.com&gt;
Reviewed-by: Shaoyun Liu &lt;Shaoyun.Liu@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</content>
</entry>
<entry>
<title>drm/amdgpu: Adjust removal control flow for smu v13_0_2</title>
<updated>2022-09-19T19:17:20Z</updated>
<author>
<name>YiPeng Chai</name>
<email>YiPeng.Chai@amd.com</email>
</author>
<published>2022-09-07T08:07:42Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=f5c7e7797060255dbc8160734ccc5ad6183c5e04'/>
<id>urn:sha1:f5c7e7797060255dbc8160734ccc5ad6183c5e04</id>
<content type='text'>
Adjust removal control flow for smu v13_0_2:
   During amdgpu uninstallation, when removing the first
device, the kernel needs to first send a mode1reset message
to all gpu devices. Otherwise, smu initialization will fail
the next time amdgpu is installed.

V2:
1. Update commit comments.
2. Remove the global variable amdgpu_device_remove_cnt
   and add a variable to the structure amdgpu_hive_info.
3. Use hive to detect the first removed device instead of
   a global variable.

V3:
 1. Update commit comments.
 2. Split a patch into multiple patches.
 3. The current patch does:
    a. Add a work mode of AMDGPU_RESET_FOR_DEVICE_REMOVE into
       the existing gpu recover path, which make all devices
       in hive list only have HW reset but no resume (except
       the base IP).
    b. Call AMDGPU_RESET_FOR_DEVICE_REMOVE and
       AMDGPU_NEED_FULL_RESET mode of amdgpu_device_gpu_recover
       in amdgpu_pci_remove when removing the first device in
       hive list.
    c. When removing the first device, the IP blocks keyword
       function call sequence is as follows:
.suspend-&gt;mode1reset-&gt;.resume(basic ip)-&gt;.hw_fini-&gt;.early_fini-&gt;.sw_fini.
   ^                           |
   |-&lt;----------&lt;---------&lt;----|
	The first three sequences are because of a call to
        amdgpu_device_gpu_recover. The three sequences will be
        executed in a loop until all devices in the hive list
        are iterated.
        The sequences starting from .hw_fini only apply to the
        first device. Since .suspend has been called before,
        except the resumed phase1 basic ip blocks, all other ip
        blocks .hw_fini of current device will do nothing.
     d. When removing other devices, the calling sequences is the
        same as legacy:
	   .hw_fini -&gt; .early_fini -&gt; .sw_fini.
	Since .suspend has been called when removing the first device,
        except the resumed phase1 basic ip blocks, all of other ip
        blocks .hw_fini of current device will do nothing.

Signed-off-by: YiPeng Chai &lt;YiPeng.Chai@amd.com&gt;
Reviewed-by: Hawking Zhang &lt;Hawking.Zhang@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</content>
</entry>
<entry>
<title>drm/amdgpu: let mode2 reset fallback to default when failure</title>
<updated>2022-08-16T22:14:31Z</updated>
<author>
<name>Victor Zhao</name>
<email>Victor.Zhao@amd.com</email>
</author>
<published>2022-07-28T02:39:23Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=dac6b80818ac2353631c5a33d140d8d5508e2957'/>
<id>urn:sha1:dac6b80818ac2353631c5a33d140d8d5508e2957</id>
<content type='text'>
- introduce AMDGPU_SKIP_MODE2_RESET flag
- let mode2 reset fallback to default reset method if failed

v2: move this part out from the asic specific part

Signed-off-by: Victor Zhao &lt;Victor.Zhao@amd.com&gt;
Acked-by: Andrey Grodzovsky &lt;andrey.grodzovsky@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</content>
</entry>
<entry>
<title>drm/amdgpu: Avoid another list of reset devices</title>
<updated>2022-08-10T19:07:14Z</updated>
<author>
<name>Lijo Lazar</name>
<email>lijo.lazar@amd.com</email>
</author>
<published>2022-08-03T11:24:24Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=0a83bb35d8a6ff3d18c2772afe616780c23293a6'/>
<id>urn:sha1:0a83bb35d8a6ff3d18c2772afe616780c23293a6</id>
<content type='text'>
A list of devices to be reset is already created in
amdgpu_device_gpu_recover function. Creating another list with the
same nodes is incorrect and not supported in list_head. Instead, pass
the device list as part of reset context.

Fixes: 9e08564727fc (drm/amdgpu: Refactor mode2 reset logic for v13.0.2)
Signed-off-by: Lijo Lazar &lt;lijo.lazar@amd.com&gt;
Reviewed-by: Hawking Zhang &lt;Hawking.Zhang@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</content>
</entry>
<entry>
<title>drm/amdgpu: Cache result of last reset at reset domain level.</title>
<updated>2022-06-10T19:25:34Z</updated>
<author>
<name>Andrey Grodzovsky</name>
<email>andrey.grodzovsky@amd.com</email>
</author>
<published>2022-05-17T15:17:35Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=ab9a0b1f3661157d144fb744f3a197563e8e0ff4'/>
<id>urn:sha1:ab9a0b1f3661157d144fb744f3a197563e8e0ff4</id>
<content type='text'>
Will be read by executors of async reset like debugfs.

Signed-off-by: Andrey Grodzovsky &lt;andrey.grodzovsky@amd.com&gt;
Reviewed-by: Christian König &lt;christian.koenig@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</content>
</entry>
</feed>
