现在的位置: 首页 > 综合 > 正文

Uniscribe

2014年01月22日 ⁄ 综合 ⁄ 共 16009字 ⁄ 字号 评论关闭
 

Uniscribe
Uniscribe是一组APIs用来精细真实控制复杂文本处理。因为字符、符号不是以一个简单的方式排版,所以一个复合文本需要特殊处理以显示和编辑。控制符号的形状和位置的规则被指定在The Unicode Standard:Worldwide Character Encoding ,Version 2.0, Version 2.0, Addison-Wesley Publishing Company.
这个主题讨论处理复杂文本不同的方面,在下面列出。
About Uniscribe
Uniscribe是处理复杂文本几种方法之一。放它到设备环境中,我们从一个复杂文本的简短描述和特殊问题以及讨论其它处理复杂文本的标准方法开始。
About Complex Scripts
一个复杂文本至少有下列一个特征:
l        允许双向绘制
l        有上下主修整
l        有组合字符
l        有专门的字中断和对齐规则
l        筛选出非法字符组合
双向绘制引用文本的能力以处理从左到右和从右到左的读取文本.举个例子:阿拉伯数字的双向绘制,对文本的默认读取方向是从右到左,但一些数字,它是从左到右,处理一个复杂文本必须解决符号的逻辑顺序和可视顺序之间的不同。另外,必须适当的处理(caret)插入符号的移动和击中测试,在屏幕位置和字符序号之间映射,也就是说文本选择或字符显示需要布局算法知识.
当一个文本的字符依照围绕它的字符改变形状时”上下文修整”重现,这个重现在英语草写体中比如当一个小写的”l”改变形状时它要取决于它前面的字符例如一个”a”(连接低音到”l”)或一个”o”(连接高音),阿拉伯数字是一个显示上下文修整的文本。
组合字符或连体字当它们一块放置时联合成一个字符。一个例子”ae”,在英语中联结;它有时由一个单一字符表示。阿拉伯数字是一个有好多组合字符的文本.
专门的字中断和字对齐引用有复杂规则的在一个文本行上在行和对齐文本之间划分字的文本。
当一种语言不允许某些字符组合,筛选出违例字符组合重现,泰语就是这样的文本。
Built on Tuesday, May 09, 2000
 
Uniscribe
Uniscribe 能够非常精细的处理复杂文本,它支持在文本中复杂规则的查找,如阿拉伯数字、印度语、泰语,经也处理文本从到到左写,如阿拉伯数字和希伯来文,并且支持混合文本。
OpenType Font Format
Unicode-based Microsoft® OpenType® 字体格式扩展了TrueType 字体文件格式,OpenType字体允许在字符和glyphs(字形)之间映射,允许支持连体字,位置格式,替换和其它代替。OpenType字符也可以包含支持二维字形位置和字形附属的信息,并可以包含TrueTYpe或PostScript形状.
在OpenType字符内部的布局特征是由文本和语言组合的,允许一个单一字体支持多重书写系统,甚至在同一个文本之中,在应用程序中确保在文本布局操作中的一致性(连贯性、相容性)并避免不必要的系统开销。多数文本布局和语言语义算法被包含在Uniscribe中.它减轻了开发者在一个字体中不得不定义广义字体规则的负担。
应用程序关于文本布局可以引用它们自己的知识或参数选择.OpenType布局字体甚至可以包含复制或取代那些由操作系统服务的应用布局规则,操作系统服务的分层结构支持文本布局允许一个客户选择使用哪一个布局信息和如何应用它。
 
Using Uniscribe
下面的章节展示了处理Uniscribe的代码例子.
 
下面的代码例子调用ScriptGetProperties以检测文本是否需要字形修整.
The following code sample calls ScriptGetProperties to check if the script requires glyph shaping.
const SCRIPT_PROPERTIES **g_ppScriptProperties;
 int g_iMaxScript;
 ScriptGetProperties(&g_ppScriptProperties,
                     &g_iMaxScript);
 hResult = ScriptItemize( … , pItems, &cItems);
 for (i=0; i<cItems; i++) {
     if (g_ppScriptProperties[pItems[i].a.eScript]
         >fComplex) {
         // Item [i] is complex script text
         // requiring glyph shaping
   }
 }
ScriptGetProperties
ScriptGetProperties函数返回当前文本的信息.
HRESULT WINAPI ScriptGetProperties(
 const SCRIPT_PROPERTIES ***ppSp, 
  int *piNumScripts 
);
Parameters
ppSp
[out]接收一个指向一个由文本编入索引的SCRIPT_PROPERTIES结构的指针数组。
 
piNumScripts
[out] 接收文本数量,这个值的有效范围是0到NumScripts-1;
Return Values
如果函数成功,返回值是零。
如果函数失败,它返回一个非零埴,如果任何不可校正的错误出现,它也返回一个HRESULT值。举例,从Win32 API出错返回使用HRESULT_FROM_WIN32宏被转换为HRESULT并返回在HRSULT中给用户。
转换鼠标击中”x”的偏移位置为插入记号位置
按照惯例,插入记号位置(cp)可以通过点击字符二分之一的后面一半或字符二分之一的前面一半。这个可以按下面实现:
 
int iCharPos;
int iCaretPos
int fTrailing;
ScriptXtoCP(iMouseX, ..., &iCharPos, &fTrailing);
iCaretPos = iCharPos + fTrailing;
对于文本, 揿钮接头
For scripts that snap the caret to cluster boundaries, ScriptXtoCP returns ftrailing set to either 0 or the width of the cluster in code points.
在双向字符串中显示插入记号
在单向文本中,在插入记号位置上没有二义性,因为字符前沿与先前字符的尾沿是在相同的位置上,但在双向文本中,插入记号位置在反向(反接方向)的run之间是模糊的。举个例子:在LTR(从左到右)的段落中”helloMAALAS”,最后的字符直接先于”salaam”的第一个字符。在这个串中的最好位置显示插入记号取决于是否它被认为按照以”hello”的”o”或者以”salaam”前面的”s”.
Uniscribe使用下面的插入记号协议。
状态
可视的插入记号位置
键入
最后键入字符的后沿
粘贴
最后粘贴字符的后沿
Caret advancing
Trailing edge of last character passed over.
Caret retiring
Leading edge of last character passed over.
Home (键)
行的前沿。Leading edge of line.
End (键)
行的后沿。Trailing edge of line.
 
插入记号可以按照下面定位:
if (advancing) {
    ScriptCPtoX(iCharPos-1, TRUE, ..., &iCaretX);
} else {
    ScriptCPtoX(iCharPos, FALSE, ..., &iCaretX);
}
或者更简单,给定一个fAdvancing BOOL限制为TRUE或者FALSE:
Or, more simply, given an fAdvancing BOOL restricted to TRUE or FALSE:
ScriptCPtoX(iCharPos-fAdvancing, fAdvancing, ..., &iCaretX);
ScriptCPtoX 逻辑上处理溢出:对于iCharPos<0它返回run的前沿,对于iCharPos=length它返回run的后沿.
handles out-of-range positions logically: It returns the leading edge of the run for iCharPos <0, and the trailing edge of the run for iCharPos =length.
 
Processing Complex Scripts with Uniscribe
Uniscribe provides APIs to support the display and editing of international text, including the complex rules of Middle Eastern and Asian scripts. Uniscribe provides low level routines for handling fully formatted text, and an easier ScriptString API set for unformatted text.
Using Uniscribe, applications need only manage a backing store of Unicode character codes. Text layout applications do not need to maintain any other buffer or mapping table to track character order. An application only needs to store and manage the order in which the characters were entered by the user, which is the same logical order as defined by Unicode. The application's backing store never changes as a result of layout operations. Uniscribe maintains an index from the reordered clusters to the original character boundaries passed by the application. The following topics are covered in this section.
·                  Shaping Engines
·                  Caching
·                  Displaying Text with Uniscribe
·                  The ScriptString Functions
·                  Related Processing for Complex Scripts
·                  Caret Placement and Hit Testing
·                  Word Break Points
·                  Character Clusters
·                  Notes on ScriptXtoCP and ScriptCPtoX


Shaping Engines
Uniscribe使用对于特殊文本包含布局知识的多重修整引擎。它也利用Microsoft® OpenType®布局修整引擎处理特殊字符文本如字形生成,范围测量,和字中断支持。uniscribe使用Unicode双向算法管理双向字符重新排序,并且对于阿拉伯数字、希伯来文,和泰文理解non-OpenType布局字体格式。
精确的码点赋予每个修整引擎可以多样化,因此除了SCRIPT_UNDEFINED之外,文本数字没有被公布,但是,你能够通过调用ScriptGetProperties函数测试文本的属性,它访问全部文本属性表。应用程序可以使用全局文本属性以帮助组合它们自己所需的图形引擎划分的布局规则。
 
所有的复杂文本图形引擎、数字图形引擎和ASCII图形引擎在修整(图形)以前引擎验证hdc中的字体,并且如果字体不能包含足够的字形或修整(图形)表将返回USP_E_SCRIPT_NOT_IN_FONT.只有有属性fComplex的文本会被以由ScriptItemize函数返回的文本修整。所有其它的runs可能会被以指定在SCRIPT_ANALYSIS结构中的SCRIPT_UNDEFINED合并和修整。注意如果字符没有支持的字体,SCRITP_UNDEFINED不会以USP_E_SCRIPT_NOT_IN_FONT失败.缺少的字形通常会以一个空的矩形显示。一个应用程序能够通过调用ScriptGetFontProperties函数获得默认字形序号而确定是否一个代码点由一个字体支持,并且ScriptGetCMap函数对Unicode代码点查找字体字形。但是,一些代码点能够被通过一个字形组合显示,举个例子,00c9; LATIN CAPITAL LETTER E WITH ACUTE.在这个例子中,如果一个字体支持大写字母E字形并且敏锐字形但不支持一个单一字形009c.ScriptGetCMap将标记009c是未支持的。
对一个包含这些代码点的字符串可靠的确定字体支持调用ScriptShape.如果它返回S_OK,对缺少的字形检测输出。
 
Caching
Uniscribe保存Unicode为字形映射(CMAP)、字形宽度、和OpenType文本图形表。一个用于特定尺寸的特定字体表的句柄被叫做一个script cache(文本缓存)。许多Uniscribe函数要求两个参数一个HDC和一个SCRIPT_CACHE参数。这些函数通过script cache查找第一个信息,使用这个设备环境只有当所需的表不是已经缓存时。当调用ScriptShapeScriptPlace或者ScriptTextOut函数时你必须提供一个SCRIPT_CACHE结构指针,它必须被初始化为NULL..
一个应用程序可以在任何时候释放一个script cache,Uniscribe在它的字体和图形器缓存中维护引用计数,并只有当所有字体的尺寸被释放时释放字体数据。当你使用一种格式时,也就是说,某一套典型的包括字体、尺寸和颜色属性,调用ScriptFreeecache函数以释放用于文件的script cache.
对于ScriptShapeScriptPlace,传递一个NULL设备环境是有效的。大多数经常调用将是成功的作为所需的表将已经被缓存。如果图形或布局需要访问一个设备环境,ScriptShapeScriptPlace将直接返回E_PENDING错误代码。然后应用程序必须选择字体进入设备环境,这个除去大大多数对SelectObject函数的调用。
For ScriptShape and ScriptPlace, it is valid to pass a NULL device context. Most often the call will be successful as required tables will already be cached. If the shaping or placement requires access to a device context, ScriptShape or ScriptPlace will return immediately with the E_PENDING error code. Then the application must select the font into the device context. This eliminates most calls to the SelectObject function.



  © 2002 Microsoft Corporation. All rights reserved.
 
 
International Features
Displaying Text with Uniscribe
一个使用复杂文本的应用程序有一个简单的接近格式和显示的问题。
首先,复杂文本的宽度取决于它的上下文。保存宽度在简单表中是不可能的。
第二文本中在字之间中断象泰文需要字典支持因为在泰文中的字之间没有分隔字符。
第三,阿拉伯文、希伯来文、波斯语、乌尔都语和其它双向文本在显示前需要记录。
最后,字体关联的格式经常需要容易的使用复杂文本。
 
充分的处理这些版本,Uniscribe使用段落作为显示单元。注意,这个意思是Uniscribe必须被用于整个段落。即使段落的章节不是复杂文本。
在使用Uniscribe之前,一个应用程序划分段落为runs,也就是说,一个有相同风格的字符串,风格取决于应用程序完成的实现,但典型的包括如字体的属性如尺寸和颜色。Uniscribe划分段落为items—有同一种文本和方向的字符串。应用程序应用item信息以产生runs,rusn在文本和方向中是唯一的.
Uniscribe在每个run中识别cluster(串、群集)并确定每个cluster(串、群集)的尺寸,一个cluster是一个文本定义,是一个不可分割的组。对于欧洲语言,一个cluster是一个单一字符,但在语言中例如泰语,它是一个字形组,Uniscribe合计cluster以确定一个run的尺寸。然后应用程序合计run的长度直到它们溢出一行(或到达边距)。并在当前行和下一行之间划分溢出行的run
对于每一行,一个映射从可视的位置被建立对一个run对于每一个run,代码点被图形化为字形,它然后被定位和绘制。
以这个溢出智能,我们能够查看详细处理和Uniscribe如何装配。一个应用程序做文本布局,或格式一次。然后它保存图形和位置以用于显示或者它每次产生它们它的显示文本。典型的一个应用程序每次显示时将产生字形和位置,因此处理被呈现为一个布局过程和一个显示过程。
使用Uniscribe布置文本
这个过程假定应用已经划分了段落为runs
1.   Call ScriptRecordDigitSubstitution only when the application starts, or when receiving a WM_SETTINGCHANGE message.
2.   (optional) Call ScriptIsComplex to determine if the paragraph requires complex processing.
3.   For automatic digit substitution, call ScriptApplyDigitSubstitution to prepare the SCRIPT_CONTROL and SCRIPT_STATE structures in ScriptItemize. If the application does its own reordering and layout, it must substitute the proper digits for Unicode U+0030 through U+0039 (the Western digits).
4.   Call ScriptItemize to divide the paragraph into items. If an application already knows the bidirectional order -- for example, because of the keyboard layout used to enter the character -- it can call ScriptItemize with NULL for the SCRIPT_CONTROL and SCRIPT_STATE parameters. This generates items only by shaping engine. The application can then reorder the items using its information.
5.   Merge the item information with the run information to produce runs with a single style, script, and direction.
6.   Call ScriptGetCMap to assign a font to a run and get glyphs. If some glyphs are not supported by the font, either substitute another font or set the eScript member to SCRIPT_UNDEFINED. Note that if a font renders a code point by a combination of glyphs instead of a single glyph, this method may indicate that the code point is unsupported. In this case, call ScriptShape, check for an S_OK return code, and then check the output for missing glyphs.
7.   Call ScriptShape to identify clusters and generate glyphs.
8.   Call ScriptPlace to generate advance widths and x and y positions for the run width.
9.   Sum the run widths until the line overflows.
10.            Break the run on a word boundary by using the fSoftBreak and fWhiteSpace members in the logical attributes. To break a single character cluster off the run, use the information returned by calling ScriptBreak.
This completes layout of the line. Repeat steps 6 through 10 for each line in the paragraph. However, if the application needed to break the last run on the line, call ScriptShape to reshape the remaining part of the run as the first run on the next line.
To Display Text Using Uniscribe
This procedure is done for each line. It assumes that the text has already been laid out using Uniscribe, and that the glyphs and positions from the layout process were not saved. If speed is a concern, an application can save the glyphs and positions from the layout procedure and start at #2.
1.   For each run, in logical order.
a.   If the style has changed since the last run, update the hdc.
b.   Call ScriptShape to generate glyphs for the run.
c.    Call ScriptPlace to generate an advance width and an x,y offset for each glyph.
2.   Establish the correct visual order for the runs in this line:
a.   Extract an array of bidi embedding levels, one per run, from the merged item and run information. The embedding level is given by (SCRIPT_ITEM) si.(SCRIPT_ANALYSIS) a. (SCRIPT_STATE) s.uBidiLevel.
b.   Pass this array to ScriptLayout to generate a map of visual to logical positions.
3.   (optional) To justify the text, either call ScriptJustify or use specialized knowledge of the text. For more information, see Related Processing by Uniscribe.
4.   Use the visual to logical map to display the runs in visual order. Starting at the left end of the line, call ScriptTextOut to display the run given by the first entry in the visual to logical map. For each subsequent entry in the visual to logical map, call ScriptTextOut to display the indicated run to the right of the previously displayed run.
Note that step 2 may be omitted if the text contains no characters from right-to-left scripts, contains no bidi control characters, and the base embedding level is left-to-right. In this case, step 4 becomes: start at the left end of the line and call ScriptTextOut to display the first logical run and then to display each logical run to the right of the previous run.



  © 2002 Microsoft Corporation. All rights reserved.
 
ScriptGetCMap
ScriptGetCMap函数接受一个字符串并依照TrueType cmap表或依照为老式字体而实现的标准的 cmap 表返回Unicode字符字形目录.
The ScriptGetCMap function takes a string and returns the glyph indices of the Unicode characters according to the TrueType cmap table or the standard cmap table implemented for old style fonts.
HRESULT WINAPI ScriptGetCMap(
 HDC hdc, 
  SCRIPT_CACHE *psc, 
  const WCHAR *pwcInChars, 
  int cChars, 
  DWORD dwFlags, 
  WORD *pwOutGlyphs 
);
Parameters
hdc
[in] 设备环境句柄,这个参数是可选的.
psc
[in/out] 指向一个 SCRIPT_CACHE 类型的结构体.
pwcInChars
[in] 指向一个Unicode字符的字符串.
cChars
[in] 在pwcInChars中的Unicode字符的数量.
dwFlags
[in] 这个参数可能是下面的值This parameter can be the following value.
Value
Meaning
SGCM_RTL
指示glyph数组pwOutGlyps应当包含一个镜像的字形以为那些有一个镜像等价的字形。
Indicates the glyph array pwOutGlyps should contain mirrored glyphs for those glyphs that have a mirrored equivalent.
 
pwOutGlyphs
[out] 指向一个接收字形目录的数组.
Return Values
如果所有Unicode代码点在字体中是存在的,返回值是S_OK.
如果函数失败,它可以返回下列的非零值之一:
Return value
Meaning
E_HANDLE
字体或系统不支持字形目录
The font or the system does not support glyph indices.
S_FALSE
一些Unicode代码点被映射到默认字形.
Some of the Unicode code points were mapped to the default glyph.
 
如果其它不可校正的错误遇到,它也返回一个HRESULT值,举个例子,
If any other unrecoverable error is encountered, it is also returned as HRESULT. For example, error returns from Win32 API functions are converted to HRESULT using the HRESULT_FROM_WIN32 macro and returned to the client in the HRESULT.
Remarks
ScriptGetCMap可以被用于确定所选择的字体支持一个run中的哪些字符。调用者可以扫描返回的字形缓冲区寻找默认字形以确定哪些字符不是可用的。用于所选的字体的默认字形目录应当由调用ScriptGetFontProperties确定.
返回值标志了任何缺失字形的出现。
The return value indicates the presence of any missing glyphs.
一些代码点可能由一个组合字形绘制,也可以由一个单一字形.举个例子,00C9; LATIN CAPITAL LETTER E WITH ACUTE,在这个例子中,如果字体支持大写E字形并且......,ScriptGetCMap将展示00C9是未支持的。确定字体支持包含这些种类代码点的字符串,调用ScriptShape,如果它返回S_OK,对于缺失的字形检测输出.
Note that some code points can be rendered by a combination of glyphs as well as by a single glyph -- for example, 00C9; LATIN CAPITAL LETTER E WITH ACUTE. In this case, if the font supports the capital E glyph and the acute glyph but not a single glyph for 00C9, ScriptGetCMap will show 00C9 is unsupported. To determine the font support for a string that contains these kinds of code points, call ScriptShape. If it returns S_OK, check the output for missing glyphs.
ScriptGetGlyphABCWidth
ScriptGetGlyphABCWidth函数返回一个给定字形的ABC宽度.
HRESULT WINAPI ScriptGetGlyphABCWidth(
 HDC hdc, 
  SCRIPT_CACHE *psc,
 WORD wGlyph,
 ABC *pABC,
);
Parameters
hdc
[in] 设备环境句柄,它是可选的取决于psc.
psc
[in/out] SCRIPT_CACHE 结构指针.
wGlyph
[in] 被分析的Glyph .
pABC
[out] wGlyph的ABC 宽度.
Return Values
如果字形的ABC宽度被返回,函数返回S_OK.
如果字符或系统不支持字形目录,函数返回E_HANDLE,并且如果任何其它不可校正的错误出现,它也返回一个HRESULT值。
If the ABC width of the glyph is returned, the function returns S_OK.
If the font or system does not support glyph indices, the function returns E_HANDLE. And if any other unrecoverable error is encountered, it is also returned as HRESULT. For example, error returns from Win32 API functions are converted to HRESULT using the HRESULT_FROM_WIN32 macro and returned to the client in the HRESULT.
Remarks
ScriptGetGlyphABCWidth函数在绘制字形画格表是相当有用的,它不应当用于普通的复杂文本格式化。
The ScriptGetGlyphABCWidth function may be useful for drawing glyph charts. It should not be used for ordinary complex script text formatting.
ABC
ABC结构包含了在一个TrueType字体中的字符宽度.
The ABC structure contains the width of a character in a TrueType font.
typedef struct _ABC { 
  int     abcA; 
  UINT    abcB; 
  int     abcC; 
} ABC, *PABC; 
Members
abcA
指定字符的A间距,A间距是在绘制字符字形之前增加(到)当前位置的距离.
Specifies the A spacing of the character. The A spacing is the distance to add to the current position before drawing the character glyph.
abcB
指定字符的B间距,B间距是字符字形绘制部分的宽度.
Specifies the B spacing of the character. The B spacing is the width of the drawn portion of the character glyph.
abcC
指定字符的C间距,C间距是增加当前位置以对字符字形右侧提供空白的距离。
Specifies the C spacing of the character. The C spacing is the distance to add to the current position to provide white space to the right of the character glyph.
Remarks
一个字符总的宽度是A、B、C的和,A或者C间距可以是负值以标志
The total width of a character is the summation of the A, B, and C spaces. Either the A or the C space can be negative to indicate underhangs or overhangs.
 
SCRIPT_CACHE
SCRIPT_CACHE是一个不透明指针,指向一个Uniscribe字形度量调整存储器结构。
SCRIPT_CACHE is an opaque pointer to a Uniscribe font metric cache structure.
typedef void *SCRIPT_CACHE; 
Remarks
用户必须为每种使用的字体分配和保持一个SCRIPT_CACHE变量,它必须被客户初始化为NULL.
许多script函数带一个HDC与SCRIPT_CACHE的组合,Uniscribe将首先试图使用SCRIPT_CACHE访问字体数据并且如果所需的数据没有被缓存将只是检查HDC。
Many script functions take a combination of HDC and SCRIPT_CACHE. Uniscribe will first attempt to access font data by using the SCRIPT_CACHE and will only inspect the HDC if the required data is not already cached.
HDC可以作为一个NULL传递。如果Uniscribe所需的数据已经被缓存,HDC将不被访问,并且操作正常继续。
The HDC may be passed as NULL. If data required by Uniscribe is already cached, the HDC won't be accessed, and the operation continues normally.
如果HDC作为一个NULL传递,并且因为任何原因Uniscribe需要访问它,Uniscribe将返回E_PENDING.
If the HDC is passed as NULL, and Uniscribe needs to access it for any reason, Uniscribe will return E_PENDING.
E_PENDING被快速返回,允许用户避免耗时的SelectObject调用。下面的例子适用于所有带一个SCRIPT_CACHE并且HDC是一个可选的参数的函数
E_PENDING is returned quickly, allowing the client to avoid time-consuming SelectObject calls. The following example applies to all functions that take a SCRIPT_CACHE and an optional HDC.
hr = ScriptShape(NULL, &sc, ..);
if (hr == E_PENDING) {
    ... select font into hdc ...
    hr = ScriptShape(hdc, &sc, ...);
}
ScriptStringAnalyse
ScriptStringAnalyse函数分析一个纯文本字符串
 
HRESULT WINAPI ScriptStringAnalyse(
 HDC hdc,
 const void *pString,
 int cString,
 int cGlyphs,
 int iCharset<

抱歉!评论已关闭.