<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.h, branch v5.12</title>
<subtitle>Mirror of https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/
</subtitle>
<id>https://git.shady.money/linux/atom?h=v5.12</id>
<link rel='self' href='https://git.shady.money/linux/atom?h=v5.12'/>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/'/>
<updated>2020-09-25T21:03:22Z</updated>
<entry>
<title>drm/amdgpu: Implement new guest side VF2PF message transaction (v2)</title>
<updated>2020-09-25T21:03:22Z</updated>
<author>
<name>Bokun Zhang</name>
<email>Bokun.Zhang@amd.com</email>
</author>
<published>2020-07-28T19:29:12Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=519b8b76f0b62a0be0d9fcee39819d2461fc3cb0'/>
<id>urn:sha1:519b8b76f0b62a0be0d9fcee39819d2461fc3cb0</id>
<content type='text'>
- Refactor the driver code to use amdgpu_virt_read_pf2vf_data
  and amdgpu_virt_write_vf2pf_data instead of writing all code in
  one function (which is the old amdgpu_virt_init_data_exchange)

- Adding a new transaction method for VF2PF message between host
  and guest driver. Guest side will periodically update VF2PF
  message in the framebuffer.

  In the new header, we include guest ucode information, guest
  framebuffer usage, and engine usage

- Clean up the old macros since they will cause compile error if
  the new transaction method is used

v2: squash in build fix

Signed-off-by: Bokun Zhang &lt;Bokun.Zhang@amd.com&gt;
Reviewed-by: Monk Liu &lt;monk.liu@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</content>
</entry>
<entry>
<title>drm/amdgpu: Update VF2PF interface</title>
<updated>2020-09-25T20:55:44Z</updated>
<author>
<name>Bokun Zhang</name>
<email>Bokun.Zhang@amd.com</email>
</author>
<published>2020-07-15T23:46:26Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=1721bc1b2afaefbd880b4606f5147b4ff487904c'/>
<id>urn:sha1:1721bc1b2afaefbd880b4606f5147b4ff487904c</id>
<content type='text'>
- Update guest side VF2PF interface header file

Signed-off-by: Bokun Zhang &lt;Bokun.Zhang@amd.com&gt;
Reviewed-by: Monk Liu &lt;monk.liu@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</content>
</entry>
<entry>
<title>drm/amdgpu: refine codes to avoid reentering GPU recovery</title>
<updated>2020-08-24T16:22:56Z</updated>
<author>
<name>Dennis Li</name>
<email>Dennis.Li@amd.com</email>
</author>
<published>2020-08-19T09:23:03Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=53b3f8f40e6cff36ae12b11e6d6b308af3c7e53f'/>
<id>urn:sha1:53b3f8f40e6cff36ae12b11e6d6b308af3c7e53f</id>
<content type='text'>
if other threads have holden the reset lock, recovery will
fail to try_lock. Therefore we introduce atomic hive-&gt;in_reset
and adev-&gt;in_gpu_reset, to avoid reentering GPU recovery.

v2:
drop "? true : false" in the definition of amdgpu_in_reset

Reviewed-by: Hawking Zhang &lt;Hawking.Zhang@amd.com&gt;
Signed-off-by: Dennis Li &lt;Dennis.Li@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</content>
</entry>
<entry>
<title>drm/amdgpu: revert "fix system hang issue during GPU reset"</title>
<updated>2020-08-14T20:22:40Z</updated>
<author>
<name>Christian König</name>
<email>christian.koenig@amd.com</email>
</author>
<published>2020-08-12T15:48:26Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=f1403342ebdfcff3c3cf57ae476f19d3078f2767'/>
<id>urn:sha1:f1403342ebdfcff3c3cf57ae476f19d3078f2767</id>
<content type='text'>
The whole approach wasn't thought through till the end.

We already had a reset lock like this in the past and it caused the same problems like this one.

Completely revert the patch for now and add individual trylock protection to the hardware access functions as necessary.

This reverts commit df9c8d1aa278c435c30a69b8f2418b4a52fcb929.

Signed-off-by: Christian König &lt;christian.koenig@amd.com&gt;
Acked-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</content>
</entry>
<entry>
<title>drm/amdgpu: fix system hang issue during GPU reset</title>
<updated>2020-07-27T20:21:37Z</updated>
<author>
<name>Dennis Li</name>
<email>Dennis.Li@amd.com</email>
</author>
<published>2020-07-08T07:07:13Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=df9c8d1aa278c435c30a69b8f2418b4a52fcb929'/>
<id>urn:sha1:df9c8d1aa278c435c30a69b8f2418b4a52fcb929</id>
<content type='text'>
when GPU hang, driver has multi-paths to enter amdgpu_device_gpu_recover,
the atomic adev-&gt;in_gpu_reset and hive-&gt;in_reset are used to avoid
re-entering GPU recovery.

During GPU reset and resume, it is unsafe that other threads access GPU,
which maybe cause GPU reset failed. Therefore the new rw_semaphore
adev-&gt;reset_sem is introduced, which protect GPU from being accessed by
external threads during recovery.

v2:
1. add rwlock for some ioctls, debugfs and file-close function.
2. change to use dqm-&gt;is_resetting and dqm_lock for protection in kfd
driver.
3. remove try_lock and change adev-&gt;in_gpu_reset as atomic, to avoid
re-enter GPU recovery for the same GPU hang.

v3:
1. change back to use adev-&gt;reset_sem to protect kfd callback
functions, because dqm_lock couldn't protect all codes, for example:
free_mqd must be called outside of dqm_lock;

[ 1230.176199] Hardware name: Supermicro SYS-7049GP-TRT/X11DPG-QT, BIOS 3.1 05/23/2019
[ 1230.177221] Call Trace:
[ 1230.178249]  dump_stack+0x98/0xd5
[ 1230.179443]  amdgpu_virt_kiq_reg_write_reg_wait+0x181/0x190 [amdgpu]
[ 1230.180673]  gmc_v9_0_flush_gpu_tlb+0xcc/0x310 [amdgpu]
[ 1230.181882]  amdgpu_gart_unbind+0xa9/0xe0 [amdgpu]
[ 1230.183098]  amdgpu_ttm_backend_unbind+0x46/0x180 [amdgpu]
[ 1230.184239]  ? ttm_bo_put+0x171/0x5f0 [ttm]
[ 1230.185394]  ttm_tt_unbind+0x21/0x40 [ttm]
[ 1230.186558]  ttm_tt_destroy.part.12+0x12/0x60 [ttm]
[ 1230.187707]  ttm_tt_destroy+0x13/0x20 [ttm]
[ 1230.188832]  ttm_bo_cleanup_memtype_use+0x36/0x80 [ttm]
[ 1230.189979]  ttm_bo_put+0x1be/0x5f0 [ttm]
[ 1230.191230]  amdgpu_bo_unref+0x1e/0x30 [amdgpu]
[ 1230.192522]  amdgpu_amdkfd_free_gtt_mem+0xaf/0x140 [amdgpu]
[ 1230.193833]  free_mqd+0x25/0x40 [amdgpu]
[ 1230.195143]  destroy_queue_cpsch+0x1a7/0x270 [amdgpu]
[ 1230.196475]  pqm_destroy_queue+0x105/0x260 [amdgpu]
[ 1230.197819]  kfd_ioctl_destroy_queue+0x37/0x70 [amdgpu]
[ 1230.199154]  kfd_ioctl+0x277/0x500 [amdgpu]
[ 1230.200458]  ? kfd_ioctl_get_clock_counters+0x60/0x60 [amdgpu]
[ 1230.201656]  ? tomoyo_file_ioctl+0x19/0x20
[ 1230.202831]  ksys_ioctl+0x98/0xb0
[ 1230.204004]  __x64_sys_ioctl+0x1a/0x20
[ 1230.205174]  do_syscall_64+0x5f/0x250
[ 1230.206339]  entry_SYSCALL_64_after_hwframe+0x49/0xbe

2. remove try_lock and introduce atomic hive-&gt;in_reset, to avoid
re-enter GPU recovery.

v4:
1. remove an unnecessary whitespace change in kfd_chardev.c
2. remove comment codes in amdgpu_device.c
3. add more detailed comment in commit message
4. define a wrap function amdgpu_in_reset

v5:
1. Fix some style issues.

Reviewed-by: Hawking Zhang &lt;Hawking.Zhang@amd.com&gt;
Suggested-by: Andrey Grodzovsky &lt;andrey.grodzovsky@amd.com&gt;
Suggested-by: Christian König &lt;christian.koenig@amd.com&gt;
Suggested-by: Felix Kuehling &lt;Felix.Kuehling@amd.com&gt;
Suggested-by: Lijo Lazar &lt;Lijo.Lazar@amd.com&gt;
Suggested-by: Luben Tukov &lt;luben.tuikov@amd.com&gt;
Signed-off-by: Dennis Li &lt;Dennis.Li@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</content>
</entry>
<entry>
<title>drm/amdgpu: support reserve bad page for virt (v3)</title>
<updated>2020-07-01T05:59:17Z</updated>
<author>
<name>Stanley.Yang</name>
<email>Stanley.Yang@amd.com</email>
</author>
<published>2020-05-14T08:44:42Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=5278a159cf350142e91788f12659489e33a71a91'/>
<id>urn:sha1:5278a159cf350142e91788f12659489e33a71a91</id>
<content type='text'>
v1: rename some functions name, only init ras error handler data for
    supported asic.

v2: fix potential memory leak.

Signed-off-by: Stanley.Yang &lt;Stanley.Yang@amd.com&gt;
Reviewed-by: Tao Zhou &lt;tao.zhou1@amd.com&gt;
Reviewed-by: Guchun Chen &lt;guchun.chen@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</content>
</entry>
<entry>
<title>drm/amdgpu: add amdgpu_virt_get_vf_mode helper function</title>
<updated>2020-05-18T15:23:52Z</updated>
<author>
<name>Kevin Wang</name>
<email>kevin1.wang@amd.com</email>
</author>
<published>2020-04-29T10:49:23Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=a7f28103374787ae43b936cd2ec2f8388958668e'/>
<id>urn:sha1:a7f28103374787ae43b936cd2ec2f8388958668e</id>
<content type='text'>
the swsmu or powerplay(hwmgr) need to handle task according to different VF mode,
this function to help query vf mode.

vf mode:
1. SRIOV_VF_MODE_BARE_METAL: the driver work on host  OS (PF)
2. SRIOV_VF_MODE_ONE_VF    : the driver work on guest OS with one VF
3. SRIOV_VF_MODE_MULTI_VF  : the driver work on guest OS with multi VF

Signed-off-by: Kevin Wang &lt;kevin1.wang@amd.com&gt;
Reviewed-by: Hawking Zhang &lt;Hawking.Zhang@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</content>
</entry>
<entry>
<title>drm/amdgpu: resume kiq access debugfs</title>
<updated>2020-04-13T16:01:56Z</updated>
<author>
<name>Yintian Tao</name>
<email>yttao@amd.com</email>
</author>
<published>2020-04-13T06:31:27Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=d32709dac627e5602fb3e66bcf4316906356120d'/>
<id>urn:sha1:d32709dac627e5602fb3e66bcf4316906356120d</id>
<content type='text'>
If there is no GPU hang, user still can access
debugfs through kiq.

Signed-off-by: Yintian Tao &lt;yttao@amd.com&gt;
Reviewed-by: Monk Liu &lt;Monk.liu@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</content>
</entry>
<entry>
<title>drm/amdgpu: restrict debugfs register access under SR-IOV</title>
<updated>2020-04-13T16:01:04Z</updated>
<author>
<name>Yintian Tao</name>
<email>yttao@amd.com</email>
</author>
<published>2020-04-07T10:08:39Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=95a2f917387a23c8b09f4ab95e1560a69db5b1f1'/>
<id>urn:sha1:95a2f917387a23c8b09f4ab95e1560a69db5b1f1</id>
<content type='text'>
Under bare metal, there is no more else to take
care of the GPU register access through MMIO.
Under Virtualization, to access GPU register is
implemented through KIQ during run-time due to
world-switch.

Therefore, under SR-IOV user can only access
debugfs to r/w GPU registers when meets all
three conditions below.
- amdgpu_gpu_recovery=0
- TDR happened
- in_gpu_reset=0

v2: merge amdgpu_virt_can_access_debugfs() into
    amdgpu_virt_enable_access_debugfs()

v3: drop ret variable in amdgpu_virt_enable_access_debugfs()
    and directly return result

Signed-off-by: Yintian Tao &lt;yttao@amd.com&gt;
Acked-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</content>
</entry>
<entry>
<title>drm/amdgpu: introduce new request and its function</title>
<updated>2020-04-01T18:44:43Z</updated>
<author>
<name>Monk Liu</name>
<email>Monk.Liu@amd.com</email>
</author>
<published>2020-03-04T03:38:36Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=aa53bc2edb66624ac05902910c41d8b4f685b8bc'/>
<id>urn:sha1:aa53bc2edb66624ac05902910c41d8b4f685b8bc</id>
<content type='text'>
1) modify xgpu_nv_send_access_requests to support
new idh request

2) introduce new function: req_gpu_init_data() which
is used to notify host to prepare vbios/ip-discovery/pfvf exchange

Signed-off-by: Monk Liu &lt;Monk.Liu@amd.com&gt;
Reviewed-by: Emily Deng &lt;Emily.Deng@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</content>
</entry>
</feed>
