现在的位置: 首页 > 综合 > 正文

kvm-vt-d

2017年12月22日 ⁄ 综合 ⁄ 共 16457字 ⁄ 字号 评论关闭

 

How to assign devices with VT-d in KVM

VT-d support

  • In order to assign devices in KVM, you'll need a system which supports VT-d. This has nothing to do with the VT-x support of your CPU, VT-d needs to be supported by both your chipset on your motherboard and by your CPU.
  • If you are in doubt whether your motherboard or CPU supports VT-d or not, the Xen VT-d wikipage has some pointers of VT-d enabled chipsets, motherboards and CPUs:
       http://wiki.xensource.com/xenwiki/VTdHowTo
  • If your hardware doesn't have an IOMMU ("Intel VT-d" support in case of Intel - "AMD I/O Virtualization Technology" support in case of AMD), you'll not be able to assign devices in KVM. Some work towards allowing this were done, but the code never made
    it into KVM, due to various issues with the code. At the moment it doesn't seem like device assignment without hardware support, will ever be integrated into KVM.
  • Assignment of graphics cards are not officially supported at the moment, but there has been success passing through a secondary Radeon HD 5850 as a VM's secondary display. It also seems like one person is currently working on writing patches for primary
    display support in his spare time (February, 2010):
       http://www.spinics.net/lists/kvm/msg25977.html

Assigning device to guest

1. Modifying kernel config:

  • make menuconfig
  • set "Bus options (PCI etc.)" -> "Support for DMA Remapping Devices" to "*"
  • set "Bus options (PCI etc.)" -> "Enable DMA Remapping Devices" to "*"
  • set "Bus options (PCI etc.)" -> "PCI Stub driver" to "*"
  • optional setting:
       set "Bus options (PCI etc.)" -> "Support for Interrupt Remapping" to "*"
  • exit/save

2. build kernel:

  • make
  • make modules_install
  • make install

3. reboot and verify that your system has IOMMU support

  • AMD Machine

    • dmesg | grep AMD-Vi
     ...
     AMD-Vi: Enabling IOMMU at 0000:00:00.2 cap 0x40
     AMD-Vi: Lazy IO/TLB flushing enabled
     AMD-Vi: Initialized for Passthrough Mode
     ...
  • Intel Machine

    • dmesg | grep -e DMAR -e IOMMU
     ...
     DMAR:DRHD base: 0x000000feb03000 flags: 0x0
     IOMMU feb03000: ver 1:0 cap c9008020e30260 ecap 1000
     ...
  • If you get no output you'll need to fix this before moving on. Check if your hardware supports VT-d and check that it has been enabled in BIOS.

NOTE: If you still get an error "No IOMMU found." Check dmesg for errors suggesting your BIOS is broken. Another possible reason: CONFIG_DMAR_DEFAULT_ON is not set. In that case, pass "intel_iommu=on" as kernel parameter to enable it. AMD
uses different kernel parameter than Intel, on AMD you need to pass "iommu=pt iommu=1".

4. unbind device from host kernel driver (example PCI device 01:00.0)

  • Load the PCI Stub Driver if it is compiled as a module
       modprobe pci_stub
  • lspci -n
  • locate the entry for device 01:00.0 and note down the vendor & device ID 8086:10b9
       ...
       01:00.0 0200: 8086:10b9 (rev 06)
       ...
  • echo "8086 10b9" > /sys/bus/pci/drivers/pci-stub/new_id
  • echo 0000:01:00.0 > /sys/bus/pci/devices/0000:01:00.0/driver/unbind
  • echo 0000:01:00.0 > /sys/bus/pci/drivers/pci-stub/bind

5. load KVM modules:

  • modprobe kvm
  • modprobe kvm-intel

6. assign device:

  • /usr/local/bin/qemu-system-x86_64 -m 512 -boot c -net none -hda /root/ia32e_rhel5u1.img -device pci-assign,host=01:00.0

VT-d device hotplug

KVM also supports hotplug devices with VT-d to guest. In guest command interface (you can press Ctrl+Alt+2 to enter it), you can use following command to hot add/remove devices to/from guest:

  • hot add:
       device_add pci-assign,host=01:00.0,id=mydevice
  • hot remove:
       device_del mydevice

Notes

  • VT-d spec specifies that all conventional PCI devices behind a PCIe-to PCI/PCI-X bridge or conventional PCI bridge can only be collectively assigned to the same guest. PCIe devices do not have this restriction.
  • If the device doesn't support MSI, and it shares IRQ with other devices, then it cannot be assigned due to host irq sharing for assigned devices is not supported. You will get warning message when you assign it. Notice this also apply to the devices which
    only support MSI-X.

 

 

Platform virtualization is about sharing a platform among two or more operating systems for more efficient use of resources. Butplatform implies more than just a processor: it also includes the other important elements that make up a platform, including
storage, networking, and other hardware resources. Some hardware resources can easily be virtualized, such as the processor or storage, but other hardware resources cannot, such as a video adapter or a serial port. Peripheral Component Interconnect (PCI) passthrough
provides the means to use those resources efficiently, when sharing is not possible or useful. This article explores the concept ofpassthrough, discusses its implementation in hypervisors, and details the hypervisors that support this recent innovation.

Platform device emulation

Before we jump into passthrough, let's explore how device emulation works today in two hypervisor architectures. The first architecture incorporates device emulation within the hypervisor, while the second pushes device emulation to a hypervisor-external
application.

Device emulation within the hypervisor is a common method implemented within the VMware workstation product (an operating system-based hypervisor). In this model, the hypervisor includes emulations of common devices that the various guest operating
systems can share, including virtual disks, virtual network adapters, and other necessary platform elements. This particular model is shown in Figure 1.

Figure 1. Hypervisor-based device emulation
Hypervisor-based device emulation

The second architecture is called user space device emulation (see Figure 2). As the name implies, rather than the device emulation being embedded within the hypervisor, it is instead implemented in user space. QEMU (which provides not only device
emulation but a hypervisor as well) provides for device emulation and is used by a large number of independent hypervisors (Kernel-based Virtual Machine [KVM] and VirtualBox being just two). This model is advantageous, because the device emulation is independent
of the hypervisor and can therefore be shared between hypervisors. It also permits arbitrary device emulation without having to burden the hypervisor (which operates in a privileged state) with this functionality.

Figure 2. User space device emulation
User-space device emulation

Pushing the device emulation from the hypervisor to user space has some distinct advantages. The most important advantage relates to what's called thetrusted computing base (TCB). The TCB of a system is the set of all components that are critical
to its security. It stands to reason, then, that if the system is minimized, there exists a smaller probability of bugs and, therefore, a more secure system. The same idea exists with the hypervisor. The security of the hypervisor is crucial, as it isolates
multiple independent guest operating systems. With less code in the hypervisor (pushing the device emulation into the less privileged user space), the less chance of leaking privileges to untrusted users.

Another variation on hypervisor-based device emulation is paravirtualized drivers. In this model, the hypervisor includes the physical drivers, and each guest operating system includes a hypervisor-aware driver that works in concert with the hypervisor drivers
(called paravirtualized, or PV, drivers).

Regardless of whether the device emulation occurs in the hypervisor or on top in a guest virtual machine (VM), the emulation methods are similar. Device emulation can mimic a specific device (such as a Novell NE1000 network adapter) or a specific type of
disk (such as an Integrated Device Electronics [IDE] drive). The physical hardware can differ greatly—for example, while an IDE drive is emulated to the guest operating systems, the physical hardware platform can use a serial ATA (SATA) drive. This is useful,
because IDE support is common among many operating systems and can be used as a common denominator instead of all guest operating systems supporting more advanced drive types.


Device passthrough

As you can see in the two device emulation models discussed above, there's a price to pay for sharing devices. Whether device emulation is performed in the hypervisor or in user space within an independent VM, overhead exists. This overhead is worthwhile
as long as the devices need to be shared by multiple guest operating systems. If sharing is not necessary, then there are more efficient methods for sharing devices.

So, at the highest level, device passthrough is about providing an isolation of devices to a given guest operating system so that the device can be used exclusively by that guest (see Figure 3). But why is this useful? Not surprisingly, there are a number
of reasons why device passthrough is worthwhile. Two of the most important reasons are performance and providing exclusive use of a device that is not inherently shareable.

Figure 3. Passthrough within the hypervisor
Passthrough within the hypervisor

For performance, near-native performance can be achieved using device passthrough. This is perfect for networking applications (or those that have high disk I/O) that have not adopted virtualization because of contention and performance degradation through
the hypervisor (to a driver in the hypervisor or through the hypervisor to a user space emulation). But assigning devices to specific guests is also useful when those devices cannot be shared. For example, if a system included multiple video adapters, those
adapters could be passed through to unique guest domains.

Finally, there may be specialized PCI devices that only one guest domain uses or devices that the hypervisor does not support and therefore should be passed through to the guest. Individual USB ports could be isolated to a given domain, or a serial port
(which is itself not shareable) could be isolated to a particular guest.


Underneath the covers of device emulation

Early forms of device emulation implemented shadow forms of device interfaces in the hypervisor to provide the guest operating system with a virtual interface to the hardware. This virtual interface would consist of the expected interface, including a virtual
address space representing the device (such as shadow PCI) and virtual interrupt. But with a device driver talking to a virtual interface and a hypervisor translating this communication to actual hardware, there's a considerable amount of overhead—particularly
in high-bandwidth devices like network adapters.

Xen popularized the PV approach (discussed in the previous section), which reduced the degradation of performance by making the guest operating system driver aware that it was being virtualized. In this case, the guest operating system would not see a PCI
space for a device (such as a network adapter) but instead a network adapter application programming interface (API) that provided a higher-level abstraction (such as a packet interface). The downside to this approach was that the guest operating system had
to be modified for PV. The upside was that you can achieve near-native performance in some cases.

Early attempts at device passthrough used a thin emulation model, in which the hypervisor provided software-based memory management (translating guest operating system address space to trusted host address space). And while early attempts provided the means
to isolate a device to a particular guest operating system, the approach lacked the performance and scalability required for large virtualization environments. Luckily, processor vendors have equipped next-generation processors with instructions to support
hypervisors as well as logic for device passthrough, including interrupt virtualization and direct memory access (DMA) support. So, instead of catching and emulating access to physical devices below the hypervisor, new processors provide DMA address translation
and permissions checking for efficient device passthrough.

Hardware support for device passthrough

Both Intel and AMD provide support for device passthrough in their newer processor architectures (in addition to new instructions that assist the hypervisor). Intel calls its optionVirtualization Technology for Directed I/O (VT-d), while AMD refers
to I/O Memory Management Unit (IOMMU). In each case, the new CPUs provide the means to map PCI physical addresses to guest virtual addresses. When this mapping occurs, the hardware takes care of access (and protection), and the guest operating system
can use the device as if it were a non-virtualized system. In addition to mapping guest to physical memory, isolation is provided such that other guests (or the hypervisor) are precluded from accessing it. The Intel and AMD CPUs provide much more virtualization
functionality. You can learn more in the
Resources
section.

Another innovation that helps interrupts scale to large numbers of VMs is calledMessage Signaled Interrupts (MSI). Rather than relying on physical interrupt pins to be associated with a guest, MSI transforms interrupts into messages that are more
easily virtualized (scaling to thousands of individual interrupts). MSI has been available since PCI version 2.2 but is also available in PCI Express (PCIe), where it allows fabrics to scale to many devices. MSI is ideal for I/O virtualization, as it allows
isolation of interrupt sources (as opposed to physical pins that must be multiplexed or routed through software).


Hypervisor support for device passthrough

Using the latest virtualization-enhanced processor architectures, a number of hypervisors and virtualization solutions support device passthrough. You'll find support for device passthrough (using VT-d or IOMMU) in Xen and KVM as well as other hypervisors.
In most cases, the guest operating system (domain 0) must be compiled to support passthrough, which is available as a kernel build-time option. Hiding the devices from the host VM may also be required (as is done with Xen usingpciback). Some restrictions
apply in PCI (for example, PCI devices behind a PCIe-to-PCI bridge must be assigned to the same domain), but PCIe does not have this restriction.

Additionally, you'll find configuration support for device passthrough in libvirt (along with virsh), which provides an abstraction to the configuration schemes used by the underlying hypervisors.


Problems with device passthrough

One of the problems introduced with device passthrough is when live migration is required.Live migration is the suspension and subsequent migration of a VM to a new physical host, at which point the VM is restarted. This is a great feature to support
load balancing of VMs over a network of physical hosts, but it presents a problem when passthrough devices are used. PCI hotplug (of which there are several specifications) is one aspect that needs to be addressed. PCI hotplug permits PCI devices to come and
go from a given kernel, which is ideal—particularly when considering migration of a VM to a hypervisor on a new host machine (devices need to be unplugged, and then subsequently plugged in at the new hypervisor). When devices are emulated, such as virtual
network adapters, the emulation provides a layer to abstract away the physical hardware. In this way, a virtual network adapter migrates easily within the VM (also supported by the Linux® bonding driver, which allows multiple logical network adapters to be
bonded to the same interface).


Next steps in I/O virtualization

The next steps in I/O virtualization are actually happening today. For example, PCIe includes support for virtualization. One virtualization concept that's ideal for server virtualization is calledSingle-Root I/O Virtualization (SR-IOV). This virtualization
technology (created through the PCI-Special Interest Group, or PCI-SIG) provides device virtualization in single-root complex instances (in this case, a single server with multiple VMs sharing a device). Another variation, calledMulti-Root IOV, supports
larger topologies (such as blade servers, where multiple servers can access one or more PCIe devices). In a sense, this permits arbitrarily large networks of devices, including servers, end devices, and switches (complete with device discovery and packet routing).

With SR-IOV, a PCIe device can export not just a number of PCI physical functions but also a set of virtual functions that share resources on the I/O device. The simplified architecture for server virtualization is shown in Figure 4. In this model, no passthrough
is necessary, because virtualization occurs at the end device, allowing the hypervisor to simply map virtual functions to VMs to achieve native device performance with the security of isolation.

Figure 4. Passthrough with SR-IOV
Passthrough with SR-IOV


Going further

Virtualization has been under development for about 50 years, but only now is there widespread attention on I/O virtualization. Commercial processor support for virtualization has been around for only five years. So, in essence, we're on the cusp of what's
to come for platform and I/O virtualization. And as a key element of future architectures like cloud computing, virtualization will certainly be an interesting technology to watch as it evolves. As usual, Linux is on the forefront for support of these new
architectures, and recent kernels (2.6.27 and beyond) are beginning to include support for these new virtualization technologies

 

 

 

 

 

http://docs.redhat.com/docs/en-US/Red_Hat_Enterprise_Linux/6/html/Virtua...

原文敘述的較詳細,簡單的來說在 Boot kernel 加入 intel_iommu=on(Intel Platform) 或是 amd_iommu=on(AMD Platform) 即可支援 Intel VT-D 或是 AMD IOMMU.

  • Intel Platform

    1. Enable the Intel VT-d extensions

      The Intel VT-d extensions provides hardware support for directly assigning a physical devices to guest.The VT-d extensions are required for PCI device assignment with Red Hat Enterprise Linux. The extensions must be enabled in the BIOS. Some system manufacturers
      disable these extensions by default.These extensions are often called various terms in BIOS which differ from manufacturer to manufacturer. Consult your system manufacturer's documentation.

    2. Activate Intel VT-d in the kernel

      Activate Intel VT-d in the kernel by appending the intel_iommu=on parameter to the kernel line of the kernel line in the /boot/grub/grub.conf file.The example below is a modified grub.conf file with Intel
      VT-d activated.

      [root@benjr Desktop]#vi /boot/grub/grub.conf
      default=0
      timeout=5
      splashimage=(hd0,0)/grub/splash.xpm.gz
      hiddenmenu
      title Red Hat Enterprise Linux Server (2.6.32-36.x86-645)
      root (hd0,0)
      kernel /vmlinuz-2.6.32-36.x86-64 ro root=/dev/VolGroup00/LogVol00 rhgb quiet
      intel_iommu=on

      initrd /initrd-2.6.32-36.x86-64.img

    3. Ready to use

      Reboot the system to enable the changes. Your system is now PCI device assignment capable.

  • AMD Platform
    1. Enable AMD IOMMU extensions

      The AMD IOMMU extensions are required for PCI device assignment with Red Hat Enterprise Linux. The extensions must be enabled in the BIOS. Some system manufacturers disable these extensions by default.

    2. Enable IOMMU kernel support

      Add append amd_iommu=on to the kernel line so that AMD IOMMU extensions are enabled at boot.

  ./qemu-system-x86_64 -m 1024 -smp 4,sockets=4,cores=1,threads=1 -hda /var/lib/kvm/images/sles10/disk0.raw -vnc *:0 -k en-us  -device pci-assign,host=02:00.0,iommu=on,id=hostdev0,multifunction=on,addr=0x6  -device pci-assign,host=02:00.1,iommu=on,id=hostdev1,addr=0x6.0x1

 

【上篇】
【下篇】

抱歉!评论已关闭.