前置条件
- HP ProLiant MicroServer Gen8
- Intel® Xeon® E3-1265L V2 × 8
- Ubuntu 24.04.4 LTS / Linux Kernel 6.8.0-40-generic
问题详情
系统稳定性差,长时间运行之后出现不稳定现象。执行 sudo dmesg 函数之后,可以观察到如下报错信息:
|
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 |
[ 97.194200] DMAR: ERROR: DMA PTE for vPFN 0xf1f7f already set (to f1f7f003 not 17039b003) [ 97.194215] ------------[ cut here ]------------ [ 97.194217] WARNING: CPU: 6 PID: 3630 at drivers/iommu/intel/iommu.c:2210 __domain_mapping+0x2c5/0x320 [ 97.194222] Modules linked in: snd_seq_dummy snd_hrtimer snd_seq_midi snd_seq_midi_event snd_rawmidi snd_seq snd_seq_device snd_timer snd soundcore qrtr overlay zram binfmt_misc nls_iso8859_1 ipmi_ssif intel_rapl_msr intel_rapl_common x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm irqbypass crct10dif_pclmul polyval_clmulni polyval_generic ghash_clmulni_intel gpio_ich sha256_ssse3 sha1_ssse3 aesni_intel input_leds crypto_simd joydev cryptd rapl intel_cstate acpi_power_meter serio_raw mgag200 lpc_ich hpilo acpi_ipmi ipmi_si ipmi_devintf i2c_algo_bit ipmi_msghandler ie31200_edac mac_hid sch_fq_codel msr parport_pc ppdev lp parport efi_pstore nfnetlink dmi_sysfs ip_tables x_tables autofs4 btrfs blake2b_generic raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c raid0 hid_generic usbhid hid raid1 crc32_pclmul psmouse tg3 xhci_pci xhci_pci_renesas pata_acpi uas usb_storage [ 97.194309] CPU: 6 PID: 3630 Comm: upowerd Tainted: G W I 6.8.0-40-generic #40-Ubuntu [ 97.194313] Hardware name: HP ProLiant MicroServer Gen8, BIOS J06 04/04/2019 [ 97.194315] RIP: 0010:__domain_mapping+0x2c5/0x320 [ 97.194319] Code: b0 48 c7 c7 c0 ec cd 95 e8 58 5c 64 ff 8b 05 26 cb bb 01 4c 8b 4d b0 41 bb 00 00 00 00 85 c0 74 09 83 e8 01 89 05 0f cb bb 01 <0f> 0b e9 f9 fe ff ff 41 80 e5 7f e9 da fe ff ff ba 01 00 00 00 e9 [ 97.194322] RSP: 0018:ffffa0d18b1b6dc0 EFLAGS: 00010046 [ 97.194325] RAX: 0000000000000000 RBX: ffff9498c473ebf8 RCX: 0000000000000000 [ 97.194328] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000 [ 97.194330] RBP: ffffa0d18b1b6e20 R08: 0000000000000000 R09: ffff9498c473ebf8 [ 97.194332] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000001 [ 97.194334] R13: 000000017039b003 R14: ffff9498c1949100 R15: 0000000000000001 [ 97.194336] FS: 0000000000000000(0000) GS:ffff949bb9d00000(0000) knlGS:0000000000000000 [ 97.194339] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 97.194342] CR2: 000077c061890008 CR3: 000000016333e006 CR4: 00000000001706f0 [ 97.194344] Call Trace: [ 97.194346] <TASK> [ 97.194348] ? show_regs+0x6d/0x80 [ 97.194351] ? __warn+0x89/0x160 [ 97.194356] ? __domain_mapping+0x2c5/0x320 [ 97.194360] ? report_bug+0x17e/0x1b0 [ 97.194364] ? handle_bug+0x51/0xa0 [ 97.194368] ? exc_invalid_op+0x18/0x80 [ 97.194372] ? asm_exc_invalid_op+0x1b/0x20 [ 97.194378] ? __domain_mapping+0x2c5/0x320 [ 97.194383] intel_iommu_map_pages+0xe1/0x140 [ 97.194388] __iommu_map+0x121/0x280 [ 97.194392] iommu_map_sg+0xbf/0x1f0 [ 97.194397] iommu_dma_map_sg+0x463/0x4f0 [ 97.194403] ? __pfx_ata_scsi_rw_xlat+0x10/0x10 [ 97.194408] __dma_map_sg_attrs+0x35/0xd0 [ 97.194411] dma_map_sg_attrs+0xe/0x30 [ 97.194415] ata_qc_issue+0xfc/0x2d0 [ 97.194419] ? __pfx_ata_scsi_rw_xlat+0x10/0x10 [ 97.194423] ? __pfx_ata_scsi_rw_xlat+0x10/0x10 [ 97.194427] __ata_scsi_queuecmd+0xf2/0x3a0 [ 97.194430] ata_scsi_queuecmd+0x44/0x80 [ 97.194434] scsi_dispatch_cmd+0x91/0x240 [ 97.194437] scsi_queue_rq+0x2c4/0x670 [ 97.194441] blk_mq_dispatch_rq_list+0x137/0x520 [ 97.194445] ? sbitmap_get+0x73/0x180 [ 97.194451] __blk_mq_do_dispatch_sched+0xbb/0x300 [ 97.194455] ? finish_task_switch.isra.0+0x93/0x300 [ 97.194460] __blk_mq_sched_dispatch_requests+0x151/0x190 [ 97.194464] blk_mq_sched_dispatch_requests+0x2c/0x70 [ 97.194467] blk_mq_run_hw_queue+0x1bf/0x210 [ 97.194472] blk_mq_get_tag+0x1ef/0x2f0 [ 97.194476] ? __pfx_autoremove_wake_function+0x10/0x10 [ 97.194481] __blk_mq_alloc_requests+0xd6/0x290 [ 97.194485] blk_mq_submit_bio+0x190/0x6b0 [ 97.194489] __submit_bio+0xb3/0x1c0 [ 97.194492] submit_bio_noacct_nocheck+0x13c/0x1f0 [ 97.194495] submit_bio_noacct+0x162/0x5b0 [ 97.194499] submit_bio+0xb2/0x110 [ 97.194502] ext4_mpage_readpages+0x37e/0xaa0 [ 97.194505] ? __mod_memcg_lruvec_state+0xd6/0x1a0 [ 97.194512] ext4_readahead+0x3f/0x50 [ 97.194516] read_pages+0x95/0x290 [ 97.194522] page_cache_ra_unbounded+0x167/0x1c0 [ 97.194528] page_cache_ra_order+0x2a9/0x350 [ 97.194531] ? xas_load+0xf/0x60 [ 97.194535] ondemand_readahead+0x21c/0x4d0 [ 97.194539] page_cache_sync_ra+0x8a/0xa0 [ 97.194541] filemap_get_pages+0x109/0x3b0 [ 97.194547] filemap_read+0xf7/0x470 [ 97.194551] ? __ext4_ext_check+0x1ff/0x500 [ 97.194558] generic_file_read_iter+0xbb/0x110 [ 97.194562] ext4_file_read_iter+0x63/0x210 [ 97.194566] vfs_read+0x258/0x390 [ 97.194570] ksys_read+0x73/0x100 [ 97.194574] __x64_sys_read+0x19/0x30 [ 97.194577] x64_sys_call+0x1ada/0x25c0 [ 97.194580] do_syscall_64+0x7f/0x180 [ 97.194583] ? terminate_walk+0xf0/0x100 [ 97.194587] ? path_openat+0x140/0x2d0 [ 97.194591] ? do_filp_open+0xaf/0x170 [ 97.194598] ? putname+0x5b/0x80 [ 97.194602] ? do_sys_openat2+0x9f/0xe0 [ 97.194607] ? __x64_sys_openat+0x55/0xa0 [ 97.194610] ? syscall_exit_to_user_mode+0x89/0x260 [ 97.194615] ? do_syscall_64+0x8c/0x180 [ 97.194619] ? handle_pte_fault+0x114/0x1d0 [ 97.194623] ? __handle_mm_fault+0x653/0x790 [ 97.194627] ? __count_memcg_events+0x6b/0x120 [ 97.194631] ? count_memcg_events.constprop.0+0x2a/0x50 [ 97.194635] ? restore_fpregs_from_fpstate+0x47/0xf0 [ 97.194640] ? switch_fpu_return+0x55/0xf0 [ 97.194645] ? irqentry_exit_to_user_mode+0x7e/0x260 [ 97.194649] ? irqentry_exit+0x43/0x50 [ 97.194653] ? exc_page_fault+0x94/0x1b0 [ 97.194657] entry_SYSCALL_64_after_hwframe+0x78/0x80 [ 97.194661] RIP: 0033:0x7e2a20834be8 [ 97.194671] Code: 48 3d 00 f0 ff ff 77 0a c3 66 0f 1f 84 00 00 00 00 00 f7 d8 89 05 c8 36 01 00 48 c7 c0 ff ff ff ff c3 f3 0f 1e fa 31 c0 0f 05 <48> 3d 00 f0 ff ff 77 08 c3 0f 1f 80 00 00 00 00 f7 d8 89 05 a0 36 [ 97.194674] RSP: 002b:00007fff7a1c5b28 EFLAGS: 00000246 ORIG_RAX: 0000000000000000 [ 97.194678] RAX: ffffffffffffffda RBX: 00007e2a2044b530 RCX: 00007e2a20834be8 [ 97.194680] RDX: 0000000000000340 RSI: 00007fff7a1c5bf8 RDI: 0000000000000003 [ 97.194682] RBP: 00007fff7a1c5b70 R08: 00007fff7a1c5bd7 R09: 0000000000000000 [ 97.194685] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000003 [ 97.194687] R13: 00007fff7a1c5bf8 R14: 0000000000000340 R15: 00007fff7a1c5bf0 [ 97.194692] </TASK> [ 97.194693] ---[ end trace 0000000000000000 ]--- |
问题原因
Linux 内核的 intel_iommu 与 HP ProLiant MicroServer Gen8 的 BIOS 冲突导致的。
解决方案
修改引导的内核参数:
|
1 |
$ sudo vim /etc/default/grub |
修改如下配置:
|
1 |
GRUB_CMDLINE_LINUX_DEFAULT="quiet splash intel_iommu=off" |
配置修改后,使用如下命令生效内核配置:
|
1 |
$ sudo update-grub |
重启系统
|
1 |
$ sudo reboot |
参考链接
- 如何修复DMAR故障?
- Updated to 8.2 - DMA error
- DMAR: ERROR: DMA PTE for vPFN 0x... Errors
- Microserver Gen8 with P222 + debian testing. (5.15 kernel)
- Re: [PATCH] iommu/dma: Reserve iova ranges for reserved regions of all devices
- Advisory: (Revision) Red Hat Enterprise Linux 5.9 - Linux Will Panic During Boot When "intel_iommu=on" Is Set as a Kernel Parameter if the HP ProLiant Server Is Configured With an Intel Xeon E5-14xx v2 Processor