现在的位置: 首页 > 综合 > 正文

Timeout Detection and Recovery

2013年05月23日 ⁄ 综合 ⁄ 共 6640字 ⁄ 字号 评论关闭

Timeout Detection and Recovery of GPUs through WDDM

Updated: April 27, 2009
On This Page
Introduction Introduction
Timeout Detection and Recovery Timeout Detection and Recovery
Windows Vista SP1 Update Windows Vista SP1 Update
Error Messaging Error Messaging
Registry Keys Registry Keys
Next Steps Next Steps
Resources Resources

Introduction

One of the most common stability problems in graphics is when the system appears completely "frozen" or "hung" while processing an end-user command or operation. Users generally wait a few seconds and then reboot the system by pressing the Power button. Usually the graphics processing unit (GPU) is "busy" processing intensive graphical operations, typically during gameplay. This results in nothing being updated on the screen, thus appearing to the user that the system is frozen.

This paper briefly describes the timeout detection and recovery (TDR) process in Windows Vista. It also documents the registry controls so developers can easily debug problems.

What's New for Windows Vista SP1
Changes for Windows Vista SP1 to improve user experience in cases of frequent and rapidly occurring GPU hangs. New registry keys to support these changes.

Timeout Detection and Recovery

Windows Vista attempts to detect these problematic hang situations and recover a responsive desktop dynamically. In this process, the Windows Display Driver Model (WDDM) driver is reinitialized and the GPU is reset. No reboot is necessary, which greatly enhances the user experience. The only visible artifact from the hang detection to the recovery is a screen flicker, which results from resetting some portions of the graphics stack, causing a screen redraw. Some older Microsoft DirectX applications may render to a black screen at the end of this recovery. The end user would have to restart these applications.

The following is a brief overview of the TDR process:

1.

Timeout detection: The Video Scheduler component of the Windows Vista graphics stack detects that the GPU is taking more than the permitted quantum time to execute the particular task and tries to preempt this particular task. The preempt operation has a "wait" timeout—the actual "TDR timeout." This step is thus the "timeout detection" phase of the process. The default timeout period in Windows Vista is 2 seconds. If the GPU cannot complete or preempt the current task within the TDR timeout, then the GPU is diagnosed as hung.

2.

Preparation for recovery: The operating system informs the WDDM driver that a timeout has been detected and it must reset the GPU. The driver is told to stop accessing memory and should not access hardware after this time. The operating system and the WDDM driver collect hardware and other state information that could be useful for post-mortem diagnosis.

3.

Desktop recovery: The operating system resets the appropriate state of the graphics stack. The Video Memory Manager component of the graphics stack purges all allocations from video memory. The WDDM driver resets the GPU hardware state. The graphics stack takes the final actions and restores the desktop to the responsive state. As mentioned earlier, some older DirectX applications may now render just black, and the user may be required to restart these applications. Well-written DirectX 9Ex and DirectX 10 applications that handle "Device Remove" continue to work correctly. The application must release and then recreate its Microsoft Direct3D device and all of its objects. DirectX application programmers can find more information in the Windows SDK.

Windows Vista SP1 Update

Minor changes were made in Windows Vista SP1 to improve the user experience in cases of frequent and rapidly occurring GPU hangs. Repetitive GPU hangs indicate that the graphics hardware has not recovered successfully. In these instances, the system must be shut down and restarted to fully reset the graphics hardware. If the operating system detects that six or more GPU hangs and subsequent recoveries occur within 1 minute, then the following GPU hang is treated as a system bug check.

Error Messaging

Throughout the process of GPU hang detection and recovery, the desktop is unresponsive and thus unavailable to the user. In the final stages of recovery, a brief screen flash occurs that is similar to the one when the screen resolution is changed. After the desktop has been successfully recovered, the following informational message appears to the user.

Error Messaging

The message is also logged in the Windows Vista Event Viewer. Diagnosis information is collected in the form of a debug report that is returned to Microsoft through the Online Crash Analysis (OCA) mechanism if the user opts in to provide feedback.

Registry Keys

The following registry keys are documented for testing purposes only. These registry keys should not be manipulated by any applications outside targeted testing or debugging.

The TDR-related registry keys are located under HKLM\System\CurrentControlSet\Control\GraphicsDrivers.

TdrLevel: REG_DWORD. The initial level of recovery. The possible values are:

TdrLevelOff (0). – Detection disabled.

TdrLevelBugcheck (1) – Bug check on detected timeout, for example, no recovery.

TdrLevelRecoverVGA (2) – Recover to VGA (not implemented).

TdrLevelRecover(3) – Recover on timeout. This is the default value.

TdrDelay: REG_DWORD. The number of seconds that the GPU is allowed to delay the preempt request from the scheduler. This is effectively the timeout threshold. The default value is 2.

TdrDdiDelay: REG_DWORD. The number of seconds that the operating system allows threads to leave the driver. After a specified time, the operating system bug checks the system with the code VIDEO_TDR_FAILURE (0x116). The default value is 5.

TdrTestMode: REG_DWORD: Internal test usage.

TdrDebugMode: REG_DWORD: The debugging-related behavior of the TDR process.

TDR_DEBUG_MODE_OFF (0) breaks to kernel debugger before the recovery to allow investigation of the timeout.

TDR_DEBUG_MODE_IGNORE_TIMEOUT (1) ignores any timeout.

TDR_DEBUG_MODE_RECOVER_NO_PROMPT (2) recovers without break into the debugger. This is the default value.

TDR_DEBUG_MODE_RECOVER_UNCONDITIONAL (3) recovers even if some recovery conditions are not met (for example, recovers on consecutive timeouts).

TdrLimitTime: REG_DWORD (Windows Vista SP1 and later versions only): The default time within which a "TdrLimitCount" number of TDRs are allowed without crashing the system.

TdrLimitCount: REG_DWORD (Windows Vista SP1 and later versions only): The default number of TDRs (0x117) that are allowed in "TdrLimitTime" without crashing the system.

Next Steps

Graphics hardware vendors:

Ensure that graphics operations (that is, DMA buffer completion) take no more than 2 seconds in end-user scenarios such as productivity and gameplay.

Graphics software vendors:

Ensure that the DirectX graphics application does not run at a low frames per second (FPS) rate. As the FPS decreases, the likelihood of the GPU getting reset increases. If the application is running at 10 FPS or lower and a complex graphics operation is about to start, then a flush can be inserted.

For running benchmark tests on low-end GPUs, use the aforementioned registry keys that control the TDR timeout. Remember that they should not be used in production systems because it would affect overall system stability and robustness. Use these keys only as a final solution.

System manufacturers:

Work with the graphics hardware vendor to diagnose the TDR debug reports.

Remember that any system that uses the aforementioned TDR registry keys to change the default values is a Windows Logo Program violation.

Resources

Queries: If you have questions that are not answered in this document, send e-mail to directx@microsoft.com.

Windows Driver Kit
DirectX Developer Center on MSDN
Windows Logo Program Requirements
Windows SDK for Windows Vista


抱歉!评论已关闭.