现在的位置: 首页 > 综合 > 正文

sed-非交互式文本编辑器(L.E.McMahon 著,中文翻译)

2018年04月26日 ⁄ 综合 ⁄ 共 39132字 ⁄ 字号小中大 ⁄ 评论关闭

文章目录

1.1. 命令行标志
1.2. 编辑命令的应用次序
1.3. 模式空间
1.4. 示例
例子:
2.1. 行号地址
2.2. 上下文地址
2.3. 地址的数目
例子:
3.1. 面向整行的函数
3.2. 替换函数
3.3. 输入输出函数
3.4. 多输入行函数
3.5. 保存和取回函数
3.6. 控制流函数
3.7. 杂类函数

sed - 非交互式文本编辑器

Lee E. McMahon

Bell Laboratories
Murray Hill, New Jersey 07974

翻译：寒蝉退士

译者声明：译者对译文不做任何担保，译者对译文不拥有任何权利并且不负担任何责任和义务。
原文：http://cm.bell-labs.com/7thEdMan/vol2/sed

摘要

sed 是在 UNIX ^®操作系统上运行的一个非交互式上下文编辑器。sed 被设计在下列三种情况下发挥作用:

1) 编辑那些对舒适的交互式编辑而言太大的文件。
2) 在编辑命令太复杂而难于在交互模式下键入的时候编辑任何大小的文件。
3) 要在对输入的一趟扫描中有效的进行多个‘全局’编辑函数。

本备忘录是给 sed 用户的手册。

August 15, 1978

介绍

sed 是一个非交互式上下文(context)编辑器，它被设计在下列三种情况下发挥作用:

1) 编辑那些对舒适的交互式编辑而言太大的文件。2) 在编辑命令太复杂而难于在交互模式下键入的时候编辑任何大小的文件。3) 要在对输入的一趟扫描中有效的进行多个‘全局’(global)编辑函数。

因为每次只把输入的某些行驻留在内存中，并且不使用临时文件，所以可编辑的文件的有效大小，只受限于输入和输出要同时共存于次级存储的要求。

可以单独的建立复杂的编辑脚本并作为给 sed 的命令文件。对于复杂的编辑，这节省了可观的键入和随之而来的错误。从命令文件运行 sed 高效于作者所知道的任何交互式编辑器，甚至包括能用预先写好的脚本驱动的编辑器。

相较于交互式编辑器而言，根本性的损失是缺乏相对地址(由于操作是每次一行的)，和缺乏对命令如期运行的立即验证。

sed 是 UNIX 编辑器 ed 的直系后代。由于在交互式和非交互式操作之间的差异，在 ed 和 sed 之间已经有了可观的变化；甚至 ed 的惯常用户都会经常感到惊讶(并可能气愤)，如果他们没有阅读本文档的章节 2 和 3，就草率的使用 sed 的话。在两个编辑器之间最显著的家族性共同之处，在于他们所识别的模式(‘正则表达式’)的种类；匹配模式的代码可以从 ed 的代码几乎原封不动的复制过来，在章节 2 中对正则表达式的描述就是从 UNIX Programmer’s Manual[1] 几乎原封不动的复制过来的。(代码和描述都是 Dennis M. Ritchie 写的)。

1. 整体操作

sed 缺省的把标准输入复制到标准输出，在把每行写到输出之前可能在其上进行一个或多个编辑命令。这种行为可以通过命令行上的标志来更改；参见下面的章节 1.1。

编辑命令的一般格式为:

[地址1,地址2][函数][参数]

一个或两个地址是可以省略的；地址的格式在章节 2 中给出。可以用任何数目的空白或 tab 把地址和函数分隔开。函数必须出现；在章节 3 中讨论可用的所有命令。依据给出的是哪个函数，参数可能是必需的或是可选的；它们在章节 3 中每个单独的函数之下讨论。

忽略在这些行开始处的 tab 字符和空格。

1.1. 命令行标志

在命令行上识别三个标志:

-n：告诉 sed 不复制所有的行，只复制 p 函数或在 s 函数后 p 标志所指定的行(参见章节 3.3)。
-e：告诉 sed 把下一个参数接受为编辑命令。
-f：告诉 sed 把下一个参数接受为文件名；这个文件应当包含一行一个的编辑命令。

1.2. 编辑命令的应用次序

在做任何编辑之前(实际上，甚至在打开任何文件之前)，所有编辑命令都被编译成了在执行阶段(在把这些命令实际应用于输入文件的行的时候)有适当效率的形式。按它们出现的次序编译这些命令；一般而言这也是在执行时尝试应用它们的次序。这些命令一次应用一个；给每个命令的输入都是所有前面命令的输出。

编译命令应用的缺省的线性次序可以通过控制流命令 t 和 b 来变更(参见章节 3)。即使在应用次序被这些命令改变的时候，给任何命令的输入仍是任何此前应用的命令的输出。

1.3. 模式空间

模式匹配的范围叫做模式空间。一般而言，模式空间是输入文本中某一行，但是可以通过使用 N 命令把多于一行读入模式空间(参见章节 3.6.)。

1.4. 示例

例子分散在正文中。除非特别说明，例子都假定了下列输入文本:

In Xanadu did Kubla Khan
A stately pleasure dome decree:
Where Alph, the sacred river, ran
Through caverns measureless to man
Down to a sunless sea.

(在任何情况下 sed 命令的输出都不能被当作是对 Coleridge 作品的改进。)

例子:

命令

2q

会在复制了输入的前两行之后退出。输出将是:

In Xanadu did Kubla Khan
A stately pleasure dome decree:

2. 地址: 选择要编辑的行

编辑命令要应用于其上的，输入文件中的行可以通过地址来选择。地址可以是行号或者是上下文地址。

通过用花括号(‘{ }’)组合(group)命令，可以用一个地址(或地址对)来控制一组命令的应用(参见章节 3.6.)。

2.1. 行号地址

行号是十进制整数。在从输入读入每一行的时候，增加一个行号计数器；行号地址匹配(选择)导致这个内部计数器等于地址行号的输入行。计数器在多个输入文件上累计运行，在打开一个新文件的时候它不被复零(reset)。

作为特殊情况，字符 $ 匹配输入文件的最后一行。

2.2. 上下文地址

上下文地址是包围在斜杠中(‘/’)的模式(‘正则表达式’)。sed 识别的正则表达式被构造如下:

1) 普通字符(不是下面讨论的某个字符)是一个正则表达式，并且匹配这个字符。
2) 在正则表达式开始处的‘^’符号(circumflex)匹配在行开始处的空(null)字符。
3) 在正则表达式结束处的美元符号‘$’匹配在行结束处的空字符。
4) 字符‘/n’匹配内嵌的换行字符，而不是在模式空间结束处的换行。
5) 点‘.’匹配除了模式空间的终止换行之外的任何字符。
6) 跟随着星号‘*’的正则表达式，匹配它所跟丛的正则表达式的任何数目(包括 0)的毗连出现。
7) 在方括号‘[ ]’内的字符串，匹配在字符串内的任何字符，而非其他。但是如果这个字符串的第一个字符是‘^’符号，正则表达式匹配除了在这个字符串内的字符和模式空间的终止换行之外的任何字符。
8) 正则表达式的串联(concatenation)是正则表达式，它匹配这个正则表达式的成员所匹配的字符串的串联。
9) 在顺序的‘/(’和‘/)’之间的正则表达式，在效果上等同于没有它修饰的正则表达式，但它有个副作用，将在下面的 s 命令和紧后面的规定 10 中描述。
10) 表达式‘/d’意味着与在同一个表达式中先前的‘/(’和‘/)’中包围的表达式所匹配的那些字符同样的字符串。这里的 d 是一个单一的数字；指定的字符串是‘/(’的从左至右的第 d 个出现所起始的字符串。例如，表达式‘^/(.*/)/1’匹配开始于同一个字符串的两次重复出现的行。
11) 孤立的空正则表达式(就是‘//’)等价于编译的最后一个正则表达式。

要使用特殊字符(^ $ . * [ ] / /)中的某一个字符作为文字(去匹配输入中它们自身的出现)，要对这个特殊字符前导一个反斜杠‘/’。

上下文地址‘匹配’输入要求地址内的整个模式匹配模式空间的某个部分。

2.3. 地址的数目

在下一章节中的命令可能有 0, 1 或 2 个地址。在每个命令中都给出了允许的地址的最大数目。地址多于最大允许个数的命令被认为是错误的。

如果命令没有地址，它应用于输入中每个行。

如果命令有一个地址，它应用于匹配这个地址的所有行。

如果命令有两个地址，它应用于匹配第一个地址的第一行，和直到(并包括)匹配第二个地址的第一个后续行的所有后续行。接着在后续的行上再次尝试匹配第一个地址，并重复这个处理。

两个地址用逗号分隔。

例子:

/an/         匹配我们样例文本的第 1, 3, 4 行
/an.*an/     匹配第 1 行
/^an/        没有匹配行
/./          匹配所有行
//./         匹配第 5 行
/r*an/       匹配第 1,3, 4 行(number = zero!)
//(an/).*/1/ 匹配第 1 行

3. 函数

所有函数都用一个单一字符来命名。在下面的总结中，允许地址的最大数目在成对的圆括号内给出，接着的单一字符是函数名字，可能有的参数包围在成对的尖括号(< >)内，单一字符名字的英语解释，并在最后描述每个函数做些什么。在参数外围的尖括号不是参数的一部分，在实际编辑命令中不应该键入。

3.1. 面向整行的函数

(2)d -- delete lines

d 函数从文件中删除(不写入输出)匹配它的地址的所有行。

它还有一个副作用，在这个已删除的行上将不再尝试进一步的命令；在执行了 d 之后，马上就从输入读取一个新行，在新行上从头重新启动编辑命令列表。

(2)n -- next line

n 函数从输入读取下一行，替代当前行。当前行被写入输出，如果应该的话。继续执行编辑命令列表在 n 命令之后的部分。

(1)a/
<文本> -- append lines

a 函数导致在匹配它的地址的行之后把参数<文本>写入输出。a 命令是天生多行的；a 必须出现在一行的结束处，而<文本>可以包含任意数目的行。为了保持一行一个命令的构想，内部的换行必须用给换行立即前导上反斜杠字符(‘/ ’)的方式来隐藏。<文本>参数终止于第一个未隐藏的换行(没有立即前导反斜杠的第一个换行)。

一旦 a 函数成功的执行了，<文本>将被写入输出，而不管后来的命令对触发它的行会做些什么。触发的行可以被完全删除掉；而<文本>仍会被写入输出。

<文本>不被地址匹配所扫描，不尝试对它做编辑命令。它不引起行号计数器的任何变化。

(1)i/
<文本> -- insert lines

i 函数表现得等同于 a 函数，除了<文本>在匹配行之前写入输出之外。关于 a 函数的所有其他注释同样适用于 i 函数。

(2)c/
<文本> -- change lines

c 函数删除它的地址所选择的那些行，并把它们替代为在<文本>中的行。象 a 和 i 一样，c 必须跟随着被反斜杠隐藏了的换行；并且在<文本>中的内部的换行必须用反斜杠隐藏。

c 命令可以有两个地址，所以可选择一定范围内的行。如果找到，在这个范围内的所有行都被删除，只把<文本>的一个复本写入输出，而不是对每个删除的行都写一个复本。同于 a 和 i，<文本>不被地址匹配所扫描，不尝试对它做编辑命令。它不引起行号计数器的任何变化。

在一行已经被 c 函数删除之后，在这个已删除的行上将不再尝试进一步的命令。

如果 a 或 r 函数在某一行之后添加了文本，而这一行随后被 c 函数变更了，则 c 函数所插入的文本将会放置在 a 或 r 函数的文本之前。(r 函数在章节 3.4. 中描述)。

注意: 在这些函数放入输出的文本内，前导的空白和 tab 都会消失，象 sed 的编辑命令一样。要把前导的空白和 tab 放入输出中，需要在想要的第一个空白或 tab 之前前导反斜杠；这个反斜杠不会出现在输出中。

例子:

编辑命令的列表:

n
a/
XXXX
d

应用于我们的标准输入，生成:

In Xanadu did Kubhla Khan
XXXX
Where Alph, the sacred river, ran
XXXX
Down to a sunless sea.

在这个特定情况下，下面两列命令列表会生成同样的效果:

n         n
i/        c/
XXXX      XXXX
d

3.2. 替换函数

这是一个非常重要的函数，它改变在一行之内通过上下文查找而选择出的这一行的某部分。

(2)s<模式><替代><标志> -- substitute

s 函数替代行的(通过<模式>选择的)某部分为<替代>。它可以读做:

替换<模式>为<替代>

<模式>参数包含一个模式，它完全等同于地址中的模式(参见章节 2.2)。在<模式>和上下文地址之间的唯一区别是上下文地址必须用斜杠字符(‘/’)来界定；<模式>可以用不是空格或换行的任何其他字符来界定。

缺省的，只替换匹配<模式>的第一个字符串，参见后面的 g 标志。

<替代>参数紧接着<模式>的第二个分界字符之后开始，并且它必须立即跟随着分界字符的另一个实例。(所以准确的有三个分界字符的实例)。<替代>不是模式，在模式中有特殊意义的字符在<替代>中没有特殊意义。反而有特殊意义的字符是:

& 被替代为匹配<模式>的字符串。

/d (这里的 d 是一个单一的数字)被替代为同<模式>中第 d 个包围在‘/(’和‘/)’内的部分相匹配的子串。如果在<模式>中出现嵌套的子串，第 d 个通过计数开分界符 (‘/(’)来界定。同在模式中一样，特殊字符可以通过前导反斜杠(‘/’)来变为文字。

<标志>参数可以包含任何下列标志:

g -- 把此行中<模式>的所有(不重叠)的实例都替换为<替代>，对<模式>的下一个实例的扫描就开始于插入的这些字符之后；放置入行中的来自<替代>的字符不会被重新扫描。

p -- 打印此行，如果做了成功替换的话。p 标志导致把输入行写入输出，当且仅当这个 s 函数实际上做了替换。注意如果有多个 s 函数，每个函数都跟随着 p 标志，它们都在同一个输入行上成功的做了替换，会把这一行的多个复本写到输出: 每个成功的替换都写一个复本。

w <文件名> -- 把此行写入一个文件，如果做了成功的替换的话。w 标志导致实际上被 s 函数替代了那些行被写到<文件名>所指名的文件中。如果<文件名>在 sed 运行前就存在，则覆盖它。否则，就建立它。

必须用一个单一的空格分隔 w 和<文件名>。

同 p 一样有着写入一个输入行的多个略有不同的复本的可能性。

在 w 标志和 w 函数(参见后面章节)之后可以提及的不同的文件名字合起来的最大数目为 10 个。

例子:

把下列命令应用于我们的标准输入，

s/to/by/w changes

生成，在标准输出上:

In Xanadu did Kubhla Khan
A stately pleasure dome decree:
Where Alph, the sacred river, ran
Through caverns measureless by man
Down by a sunless sea.

在文件‘changes’中:

Through caverns measureless by man
Down by a sunless sea.

如果不复制选项生效，命令:

s/[.,;?:]/*P&*/gp

生成:

A stately pleasure dome decree*P:*
Where Alph*P,* the sacred river*P,* ran
Down to a sunless sea*P.*

最后为了展示 g 标志的效果，命令:

/X/s/an/AN/p

生成(假定不复制模式):

In XANadu did Kubhla Khan

而命令:

/X/s/an/AN/gp

生成:

In XANadu did Kubhla KhAN

3.3. 输入输出函数

(2)p -- print

打印函数把寻址到的行写到标准输出文件。在遇到 p 函数的时候就写入它们，而不管后续的编辑命令对这些行会做些什么。

(2)w <文件名> -- write on <filename>

写函数把寻址到的行写到<文件名>指名的文件中。如果这个文件以前就存在，则覆盖它；否则，就建立它。每行都按遇到写函数时现存的样子写入，而不管后续的编辑命令对这些行会做些什么。必须用精确的一个空格分隔 w 和<文件名>。在 s 函数的 w 标志之后和写函数中可以提及的不同的文件名字合起来的最大数目为 10 个。

(1)r <文件名> -- read the contents of a file

读函数读入<文件名>的内容，并把它们添加到匹配这个地址的行的后面。读取这个文件并添加它的内容，而不管后续的编辑命令对匹配它的地址的这些行会做些什么。如果 r 和 a 函数在同一行上执行，来自 a 函数和 r 函数的文本按照这些函数执行的次序写入输出。必须用精确的一个空格分隔 r 和<文件名>。如果 r 函数提及的文件不能打开，它被当作一个空文件，而不是一个错误，所以不给出诊断信息。

注意: 因为对可以同时打开的文件数目是有所限制的，要小心在 w 命令或标志中不要提及多于 10 个(不同的)文件；如果有任何 r 函数出现，这个数目还会再减少一个。(在一个时候只能打开一个读取文件)。

例子

假定文件‘note1’有如下内容:

	Note: Kubla Khan (more properly Kublai Khan; 1216-1294) was the grandson and most eminent successor of Genghiz (Chingiz) Khan, and founder of the Mongol dynasty in China.

则下列命令:

     /Kubla/r note1

生成:

In Xanadu did Kubla Khan
	Note: Kubla Khan (more properly Kublai Khan; 1216-1294) was the grandson and most eminent successor of Genghiz (Chingiz) Khan, and founder of the Mongol dynasty in China.
A stately pleasure dome decree:
Where Alph, the sacred river, ran
Through caverns measureless to man
Down to a sunless sea.

3.4. 多输入行函数

有三个用大写字母拼写的函数特殊处理包含内嵌换行的模式空间；它们主要意图提供跨越输入中的行的模式匹配。

(2)N -- Next line

在模式空间中把下一行添加到当前行之后；两个输入行用一个内嵌的换行分隔。模式匹配可以延伸跨越这个内嵌换行。

(2)D -- Delete first part of the pattern space

删除当前模式空间中直到并包括第一个换行字符的所有字符。如果这个模式空间变成了空的(唯一的换行是终止换行)，则从输入读取另一行。在任何情况下，都再次从编辑命令列表的起始处开始执行。

(2)P -- Print first part of the pattern space

打印模式空间中的直到并包括第一个换行的所有字符。

P 和 D 函数等价于它们对应的小写函数，如果在模式空间中没有内嵌换行的话。

3.5. 保存和取回函数

有四个函数为将来的使用而保存和取回部分输入。

(2)h -- hold pattern space

h 函数把模式空间的内容复制到保存区域(销毁保存区域以前的内容)。

(2)H -- Hold pattern space

H 函数把模式空间的内容添加到保存区域的内容之后；以前和新的内容用换行分隔。

(2)g -- get contents of hold area

g 函数把保存区域的内容复制到模式空间(销毁模式空间以前的内容)。

(2)G -- Get contents of hold area

G 函数把保存区域的内容添加到模式空间的内容之后；以前和新的内容用换行分隔。

(2)x -- exchange

对换命令交换模式空间和保存区域的内容。

例子

命令

1h
1s/ did.*//
1x
G
s//n/ :/

应用于我们的标准例子，生成:

In Xanadu did Kubla Khan :In Xanadu
A stately pleasure dome decree: :In Xanadu
Where Alph, the sacred river, ran :In Xanadu
Through caverns measureless to man :In Xanadu
Down to a sunless sea. :In Xanadu

3.6. 控制流函数

这些函数不在输入行上做编辑，但是控制函数到地址部分所选择的行的应用。

(2)! -- Don’t

非命令导致(写在同一行上的)下一个命令，应用到所有的且只能是未被地址部分选择到那些输入行上。

(2){ -- Grouping

组合命令‘{’导致下一组命令作为一个块而被应用(或不应用)到组合命令的地址所选择的输入行上。在组合控制下的的命令中的第一个命令可以出现在与‘{’相同的一行或下一行上。

组合的命令由自己独立在一行之上的相匹配的‘}’终止。

组合可以嵌套。

(0):<标号> -- place a label

标号函数在编辑命令列表中标记一个位置，它将来可以被 b 和 t 函数所引用。<标号>可以是八个或更少的字符的任何序列；如果两个不同的冒号函数有相同的标号，就会生成编译时间诊断信息，而不做执行尝试。

(2)b<标号> -- branch to label

分支函数导致应用于当前输入行上的编辑命令序列，被立即重新启动到有相同的<标号>的冒号函数的所在位置之后。如果在所有编辑命令都已经被编译了之后仍没有找到有相同的标号的冒号函数，就会生成一个编译时间诊断信息，而不做执行尝试。

不带有<标号>的 b 函数被当作到编辑命令列表结束处的分支；对当前输入行做应做的无论怎样的处理，并读入其他输入行；编辑命令的列表在这个新行上从头重新启动。

(2)t<标号> -- test substitutions

t 函数测试在当前输入行上是否已经做了任何成功的替换；如果有，它分支到<标号>；否则，它什么都不做。指示已经执行了成功的替换的标志通过如下方式复零:

1) 读取一个新输入行，或

2) 执行 a 和 t 函数。

3.7. 杂类函数

(1)= -- equals

= 函数向标准输出写入匹配它的地址的行的行号。

(1)q -- quit

q 函数导致把当前行写到标准输出(如果应该的话)，任何添加的或读入的文本也被写出，而且执行会被终止。

引用

[1] Ken Thompson and Dennis M. Ritchie, The UNIX Programmer’s Manual. Bell Laboratories, 1978.

原文地址 http://cm.bell-labs.com/7thEdMan/vol2/sed

发表于： 2006-06-27，修改于： 2006-07-04 20:57，已浏览8926次，有评论1条推荐投诉

网友： lgfang

时间：2006-06-28 11:20:13 IP地址：192.11.188.★

原文是man－page，无法直接看，我用emacs把它格式化了：

SED -- A Non-interactive Text Editor

Lee E. McMahon

Context search Editing

Sed is a non-interactive context editor that runs on the

operating  system.  Sed is  designed to  be especially  useful in

three cases:



1) To edit files too large for comfortable interactive editing;

2) To edit any size file when the sequence of editing commands is

     too complicated to be comfortably typed in interactive mode.

3) To perform multiple  `global' editing functions efficiently in

     one pass through the input.



     This memorandum constitutes a manual for users of sed.



Introduction



     Sed  is  a non-interactive  context  editor  designed to  be

     especially useful in three cases:



1) To edit files too large for comfortable interactive editing;

2) To edit any size file when the sequence of editing commands is

     too complicated to be comfortably typed in interactive mode;

3) To perform multiple  `global' editing functions efficiently in

     one pass through the input.



     Since only  a few lines of  the input reside in  core at one

     time, and no temporary files are used, the effective size of

     file that can  be edited is limited only  by the requirement

     that the input and  output fit simultaneously into available

     secondary storage.



     Complicated  editing scripts can  be created  separately and

     given to  sed as  a command file.   For complex  edits, this

     saves  considerable typing, and  its attendant  errors.  Sed

     running from a command file  is much more efficient than any

     interactive editor known to  the author, even if that editor

     can be driven by a pre-written script.



     The principal  loss of functions compared  to an interactive

     editor  are  lack of  relative  addressing  (because of  the

     line-at-a-time    operation),   and   lack    of   immediate

     verification that a command has done what was intended.



     Sed is a lineal descendant  of the UNIX editor, ed.  Because

     of the  differences between interactive  and non-interactive

     operation,  considerable changes have  been made  between ed

     and  sed; even  confirmed  users of  ed  will frequently  be

     surprised (and  probably chagrined), if they  rashly use sed

     without reading Sections 2 and 3 of this document.  The most

     striking family  resemblance between  the two editors  is in

     the   class  of   patterns   (`regular  expressions')   they

     recognize; the  code for matching patterns  is copied almost

     verbatim  from  the code  for  ed,  and  the description  of

     regular expressions  in Section 2 is  copied almost verbatim

     from  the  UNIX   Programmer's  Manual[1].  (Both  code  and

     description were written by Dennis M. Ritchie.)



1. Overall Operation



     Sed  by default copies  the standard  input to  the standard

     output, perhaps  performing one or more  editing commands on

     each line  before writing it  to the output.   This behavior

     may be  modified by flags  on the command line;  see Section

     1.1 below.



     The general format of an editing command is:



               [address1,address2][function][arguments]



One or both addresses may  be omitted; the format of addresses is

given in  Section 2.  Any number  of blanks or  tabs may separate

the addresses  from the function.  The function  must be present;

the available commands are discussed in Section 3.  The arguments

may  be required  or  optional, according  to  which function  is

given;  again,  they  are  discussed  in  Section  3  under  each

individual function.



     Tab  characters and  spaces at  the beginning  of  lines are

     ignored.



1.1. Command-line Flags



     Three flags are recognized on the command line:

          -n:

               tells sed  not to copy  all lines, but  only those

               specified  by  p  functions  or p  flags  after  s

               functions (see Section 3.3);

          -e:

               tells sed to take  the next argument as an editing

               command;

          -f:

               tells  sed to  take the  next argument  as  a file

               name;  the file  should contain  editing commands,

               one to a line.



1.2. Order of Application of Editing Commands



     Before any editing  is done (in fact, before  any input file

     is even opened), all  the editing commands are compiled into

     a  form  which  will  be  moderately  efficient  during  the

     execution phase  (when the commands are  actually applied to

     lines of the input file).   The commands are compiled in the

     order in  which they are encountered; this  is generally the

     order  in which they  will be  attempted at  execution time.

     The commands  are applied one at  a time; the  input to each

     command is the output of all preceding commands.



     The default linear order  of application of editing commands

     can be changed by the flow-of-control commands, t and b (see

     Section 3).   Even when the order of  application is changed

     by these commands,  it is still true that  the input line to

     any command is the output of any previously applied command.



1.3.  Pattern-space



     The range  of pattern matches  is called the  pattern space.

     Ordinarily, the pattern space is one line of the input text,

     but more than one line can be read into the pattern space by

     using the N command (Section 3.6.).



1.4. Examples



     Examples  are scattered throughout  the text.   Except where

     otherwise noted, the examples all assume the following input

     text:



          In Xanadu did Kubla Khan

          A stately pleasure dome decree:

          Where Alph, the sacred river, ran

          Through caverns measureless to man

          Down to a sunless sea.



     (In  no  case  is the  output  of  the  sed commands  to  be

     considered an improvement on Coleridge.)



Example:



     The command



     2q



     will quit  after copying the  first two lines of  the input.

     The output will be:



          In Xanadu did Kubla Khan

          A stately pleasure dome decree:



2. ADDRESSES: Selecting lines for editing



     Lines in the input file(s)  to which editing commands are to

     be applied  can be selected by addresses.   Addresses may be

     either line numbers or context addresses.



     The application of a group  of commands can be controlled by

     one address (or address-pair)  by grouping the commands with

     curly braces (`{ }')(Sec. 3.6.).



2.1. Line-number Addresses



     A line  number is a decimal  integer.  As each  line is read

     from  the input,  a  line-number counter  is incremented;  a

     line-number address  matches (selects) the  input line which

     causes   the   internal  counter   to   equal  the   address

     line-number.  The counter runs cumulatively through multiple

     input  files; it  is  not reset  when  a new  input file  is

     opened.



     As a special case, the  character ___FCKpd___46nbsp;matches the last line of

     the last input file.



2.2. Context Addresses



     A  context  address  is  a  pattern  (`regular  expression')

     enclosed   in  slashes   (`/').   The   regular  expressions

     recognized by sed are constructed as follows:



1) An ordinary character (not one  of those discussed below) is a

     regular expression, and matches that character.



2) A  circumflex `^'  at the  beginning of  a  regular expression

     matches the null character at the beginning of a line.

3) A dollar-sign `

at the end of a regular expression matches

the null character at the end of a line.

4) The characters `/n' match an imbedded newline character, but

not the newline at the end of the pattern space.

5) A period `.' matches any character except the terminal newline

of the pattern space.

6) A regular expression followed by an asterisk `*' matches any

number (including 0) of adjacent occurrences of the

regular expression it follows.

7) A string of characters in square brackets `[ ]' matches any

character in the string, and no others. If, however, the

first character of the string is circumflex `^', the regular

expression matches any character except the characters in

the string and the terminal newline of the pattern space.

8) A concatenation of regular expressions is a regular expression

which matches the concatenation of strings matched by

the components of the regular expression.

9) A regular expression between the sequences `/(' and `/)' is

identical in effect to the unadorned regular expression,

but has side-effects which are described under the s

command below and specification 10) immediately below.

10) The expression `/d' means the same string of characters

matched by an expression enclosed in `/(' and `/)' earlier

in the same pattern. Here d is a single digit; the string

specified is that beginning with the dth occurrence of `/('

counting from the left. For example, the expression

`^/(.*/)/1' matches a line beginning with two repeated

occurrences of the same string.

11) The null regular expression standing alone (e.g., `//') is

equivalent to the last regular expression compiled.

To use one of the special characters (^ ___FCKpd___46nbsp;. * [ ] / /) as a

literal (to match an occurrence of itself in the input),

precede the special character by a backslash `/'.

For a context address to `match' the input requires that the

whole pattern within the address match some portion of the

pattern space.

2.3. Number of Addresses

The commands in the next section can have 0, 1, or 2

addresses. Under each command the maximum number of allowed

addresses is given. For a command to have more addresses

than the maximum allowed is considered an error.

If a command has no addresses, it is applied to every line

in the input.

If a command has one address, it is applied to all lines

which match that address.

If a command has two addresses, it is applied to the first

line which matches the first address, and to all subsequent

lines until (and including) the first subsequent line which

matches the second address. Then an attempt is made on

subsequent lines to again match the first address, and the

process is repeated.

Two addresses are separated by a comma.

Examples:

/an/ matches lines 1, 3, 4 in our sample text

/an.*an/ matches line 1

/^an/ matches no lines

/./ matches all lines

//./ matches line 5

/r*an/ matches lines 1,3, 4 (number = zero!)

//(an/).*/1/ matches line 1

3. FUNCTIONS

All functions are named by a single character. In the

following summary, the maximum number of allowable addresses

is given enclosed in parentheses, then the single character

function name, possible arguments enclosed in angles (< >),

an expanded English translation of the single-character

name, and finally a description of what each function does.

The angles around the arguments are not part of the

argument, and should not be typed in actual editing

commands.

3.1. Whole-line Oriented Functions

(2)d -- delete lines The d function deletes from the

file (does not write to the output) all those

lines matched by its address(es). It also

has the side effect that no further commands

are attempted on the corpse of a deleted line;

as soon as the d function is executed, a new line

is read from the input, and the list of

editing commands is re-started from the

beginning on the new line.

(2)n -- next line The n function reads the next line

from the input, replacing the current line.

The current line is written to the output if it

should be. The list of editing commands is

continued following the n command.

(1)a/

<text> -- append lines

The a function causes the argument <text> to be

written to the output after the line matched by

its address. The a command is inherently

multi-line; a must appear at the end of a line,

and <text> may contain any number of lines. To

preserve the one-command-to-a-line fiction, the

interior newlines must be hidden by a backslash

character (`/') immediately preceding the newline.

The <text> argument is terminated by the first

unhidden newline (the first one not immediately

preceded by backslash). Once an a function is

successfully executed, <text> will be written to

the output regardless of what later commands do to

the line which triggered it. The triggering line

may be deleted entirely; <text> will still be

written to the output. The <text> is not scanned

for address matches, and no editing commands are

attempted on it. It does not cause any change in

the line-number counter.

(1)i/

<text> -- insert lines

The i function behaves identically to the a

function, except that <text> is written to the

output before the matched line. All other

comments about the a function apply to the i

function as well.

(2)c/

<text> -- change lines

The c function deletes the lines selected by its

address(es), and replaces them with the lines in

<text>. Like a and i, c must be followed by a

newline hidden by a backslash; and interior new

lines in <text> must be hidden by backslashes.

The c command may have two addresses, and

therefore select a range of lines. If it does,

all the lines in the range are deleted, but only

one copy of <text> is written to the output, not

one copy per line deleted. As with a and i,

<text> is not scanned for address matches, and no

editing commands are attempted on it. It does not

change the line-number counter. After a line has

been deleted by a c function, no further commands

are attempted on the corpse. If text is appended

after a line by a or r functions, and the line is

subsequently changed, the text inserted by the c

function will be placed before the text of the a

or r functions. (The r function is described in

Section 3.4.)

Note: Within the text put in the output by these functions,

leading blanks and tabs will disappear, as always in sed

commands. To get leading blanks and tabs into the output,

precede the first desired blank or tab by a backslash; the

backslash will not appear in the output.

Example:

The list of editing commands:

XXXX

applied to our standard input, produces:

In Xanadu did Kubhla Khan

XXXX

Where Alph, the sacred river, ran

XXXX

Down to a sunless sea.

In this particular case, the same effect would be produced

by either of the two following command lists:

n n

i/ c/

XXXX XXXX

3.2. Substitute Function

One very important function changes parts of lines selected

by a context search within the line.

(2)s<pattern><replacement><flags> -- substitute The s

function replaces part of a line

(selected by <pattern>) with <replacement>. It

can best be read:

Substitute for <pattern>, <replacement>

The <pattern> argument contains a pattern, exactly

like the patterns in addresses (see 2.2 above).

The only difference between <pattern> and a

context address is that the context address must

be delimited by slash (`/') characters; <pattern>

may be delimited by any character other than space

or newline. By default, only the first string

matched by <pattern> is replaced, but see the g

flag below. The <replacement> argument begins

immediately after the second delimiting character

of <pattern>, and must be followed immediately by

another instance of the delimiting character.

(Thus there are exactly three instances of the

delimiting character.) The <replacement> is not a

pattern, and the characters which are special in

patterns do not have special meaning in

<replacement>. Instead, other characters are

special:

& is replaced by the string matched

by <pattern>

/d (where d is a single digit) is replaced by

the dth substring matched by

parts of <pattern> enclosed in `/('

and `/)'. If nested substrings occur

in <pattern>, the dth is determined by

counting opening delimiters (`/('). As

in patterns, special characters may be

made literal by preceding them with

backslash (`/').

The <flags> argument may contain the following

flags:

g -- substitute <replacement> for all

(non-overlapping) instances of

<pattern> in the line. After a

successful substitution, the scan for

the next instance of <pattern> begins

just after the end of the inserted

characters; characters put into the line

from <replacement> are not rescanned.

p -- print the line if a successful

replacement was done. The p flag

causes the line to be written to the

output if and only if a substitution was

actually made by the s function.

Notice that if several s

functions, each followed by a p

flag, successfully substitute in the

same input line, multiple copies of

the line will be written to the

output: one for each successful

substitution.

w <filename> -- write the line to a file if a

successful replacement was done. The w

flag causes lines which are actually

substituted by the s function to be

written to a file named by <filename>.

If <filename> exists before sed is run,

it is overwritten; if not, it is

created. A single space must separate

w and <filename>. The

possibilities of multiple, somewhat

different copies of one input line

being written are the same as for p. A

maximum of 10 different file names may

be mentioned after w flags and w

functions (see below), combined.

Examples:

The following command, applied to our standard input,

s/to/by/w changes

produces, on the standard output:

In Xanadu did Kubhla Khan

A stately pleasure dome decree:

Where Alph, the sacred river, ran

Through caverns measureless by man

Down by a sunless sea.

and, on the file `changes':

Through caverns measureless by man

Down by a sunless sea.

If the nocopy option is in effect, the command:

s/[.,;?:]/*P&*/gp

produces:

A stately pleasure dome decree*P:*

Where Alph*P,* the sacred river*P,* ran

Down to a sunless sea*P.*

Finally, to illustrate the effect of the g flag, the command:

/X/s/an/AN/p

produces (assuming nocopy mode):

In XANadu did Kubhla Khan

and the command:

/X/s/an/AN/gp

produces:

In XANadu did Kubhla KhAN

3.3. Input-output Functions

(2)p -- print The print function writes the addressed

lines to the standard output file. They are

written at the time the p function is

encountered, regardless of what succeeding editing

commands may do to the lines.

(2)w <filename> -- write on <filename> The write

function writes the addressed lines to the file

named by <filename>. If the file previously

existed, it is overwritten; if not, it is

created. The lines are written exactly as they

exist when the write function is encountered

for each line, regardless of what subsequent

editing commands may do to them. Exactly one

space must separate the w and <filename>. A

maximum of ten different files may be

mentioned in write functions and w flags

after s functions, combined.

(1)r <filename> -- read the contents of a file The read

function reads the contents of <filename>, and

appends them after the line matched by the

address. The file is read and appended

regardless of what subsequent editing commands

do to the line which matched its address. If

r and a functions are executed on the same line,

the text from the a functions and the r functions

is written to the output in the order that

the functions are executed. Exactly one

space must separate the r and <filename>. If a

file mentioned by a r function cannot be opened,

it is considered a null file, not an error, and no

diagnostic is given.

NOTE: Since there is a limit to the number of files that can

be opened simultaneously, care should be taken that no more

than ten files be mentioned in w functions or flags; that

number is reduced by one if any r functions are present.

(Only one read file is open at one time.)

Examples

Assume that the file `note1' has the following contents:

Note: Kubla Khan (more properly Kublai Khan;

1216-1294) was the grandson and most eminent

successor of Genghiz (Chingiz) Khan, and founder

of the Mongol dynasty in China.

Then the following command:

/Kubla/r note1

produces:

In Xanadu did Kubla Khan

Note: Kubla Khan (more properly Kublai Khan;

1216-1294) was the grandson and most eminent

successor of Genghiz (Chingiz) Khan, and founder

of the Mongol dynasty in China.

A stately pleasure dome decree:

Where Alph, the sacred river, ran

Through caverns measureless to man

Down to a sunless sea.

3.4.

Multiple Input-line Functions

Three functions, all spelled with capital letters, deal

specially with pattern spaces containing imbedded newlines;

they are intended principally to provide pattern matches

across lines in the input.

(2)N -- Next line The next input line is appended to

the current line in the pattern space; the two

input lines are separated by an imbedded

newline. Pattern matches may extend across the

imbedded newline(s).

(2)D -- Delete first part of the pattern space Delete

up to and including the first newline character

in the current pattern space. If the pattern

space becomes empty (the only newline was the

terminal newline), read another line from the

input. In any case, begin the list of editing

commands again from its beginning.

(2)P -- Print first part of the pattern space Print up

to and including the first newline in the

pattern space.

The P and D functions are equivalent to their lower-case

counterparts if there are no imbedded newlines in the pattern

space.

3.5. Hold and Get Functions

Four functions save and retrieve part of the input for

possible later use.

(2)h -- hold pattern space The h functions copies the

contents of the pattern space into a hold area

(destroying the previous contents of the hold area).

(2)H -- Hold pattern space The H function appends the

contents of the pattern space to the contents of the

hold area; the former and new contents are

separated by a newline.

(2)g -- get contents of hold area The g function copies the

contents of the hold area into the pattern space

(destroying the previous contents of the pattern

space).

(2)G -- Get contents of hold area The G function appends the

contents of the hold area to the contents of the

pattern space; the former and new contents are

separated by a newline.

(2)x -- exchange The exchange command interchanges the

contents of the pattern space and the hold area.

Example

The commands

1s/ did.*//

s//n/ :/

applied to our standard example, produce:

In Xanadu did Kubla Khan :In Xanadu

A stately pleasure dome decree: :In Xanadu

Where Alph, the sacred river, ran :In Xanadu

Through caverns measureless to man :In Xanadu

Down to a sunless sea. :In Xanadu

3.6. Flow-of-Control Functions

These functions do no editing on the input lines, but

control the application of functions to the lines selected

by the address part.

(2)! -- Don't The Don't command causes the next command

(written on the same line), to be applied to

all and only those input lines not selected by the

adress part.

(2){ -- Grouping The grouping command `{' causes the

next set of commands to be applied (or not

applied) as a block to the input lines selected by

the addresses of the grouping command. The first

of the commands under control of the grouping may

appear on the same line as the `{' or on the next

line.

The group of commands is terminated by a matching `}'

standing on a line by itself.

Groups can be nested.

(0):<label> -- place a label The label function marks a place in

the list of editing commands which may be referred to by b

and t functions. The <label> may be any sequence of

eight or fewer characters; if two different colon

functions have identical labels, a compile time

diagnostic will be generated, and no execution attempted.

(2)b<label> -- branch to label The branch function causes the

sequence of editing commands being applied to the current

input line to be restarted immediately after the place

where a colon function with the same <label> was

encountered. If no colon function with the same label

can be found after all the editing commands have been

compiled, a compile time diagnostic is produced, and

no execution is attempted. A b function with no <label>

is taken to be a branch to the end of the list of editing

commands; whatever should be done with the current input

line is done, and another input line is read; the list

of editing commands is restarted from the beginning on the

new line.

(2)t<label> -- test substitutions The t function tests whether

any successful substitutions have been made on the current

input line; if so, it branches to <label>; if not, it

does nothing. The flag which indicates that a successful

substitution has been executed is reset by:

1) reading a new input line, or

2) executing a t function.

3.7. Miscellaneous Functions

(1)= -- equals The = function writes to the standard

output the line number of the line matched by

its address.

(1)q -- quit The q function causes the current line to

be written to the output (if it should be),

any appended or read text to be written, and

execution to be terminated.

.SH

Reference

[1] Ken Thompson and Dennis M. Ritchie, The UNIX

Programmer's Manual. Bell Laboratories, 1978.

【上篇】项目总结（Ajax＋Struts＋Spring＋Hiberante＋SQLServer2000）第一部分
【下篇】awk命令大练习

作者: simgqdnpfvue

该日志由 simgqdnpfvue 于6年前发表在综合分类下，最后更新于 2018年04月26日.
转载请注明: sed-非交互式文本编辑器(L.E.McMahon 著,中文翻译) | 学步园 +复制链接

抱歉!评论已关闭.

学步园