现在的位置: 首页 > 综合 > 正文

sed学习笔记

2013年09月12日 ⁄ 综合 ⁄ 共 5788字 ⁄ 字号 评论关闭

sed学习笔记

2010-11-20 星期六 阴雾

 

## 从svnurl中获取保存本地的目录名
## 如:http://svn.alibaba-inc.com/repos/ali_intl/apps/intl-biz/escrow/trunk/ ==> /home/$USER/work/intl-biz/escrow/
##    http://svn.alibaba-inc.com/repos/ali_intl/apps/intl-biz/wsproductbase/client/branches/20101030_7849_1 ==>  /home/$USER/work/intl-biz/wsproductbase/client
get_path_from_svnurl()
{
	local svnurl=$1
        local basedir=$2
	echo $svnurl | sed 's#http://svn.alibaba-inc.com/repos/ali_intl/apps/#$basedir#' | sed 's#branches/.*##'
}

这里看到可以用#号代替/:

The slash as a delimiter

The character after the s is the delimiter. It is conventionally a slash, because this is what ed, more, and vi use. It can be anything you want, however. If you want to change a pathname that contains a slash - say /usr/local/bin to /common/bin - you could use the backslash to quote the slash:

sed 's///usr//local//bin///common//bin/' <old >new
Gulp. Some call this a 'Picket Fence' and it's ugly. It is easier to read if you use an underline instead of a slash as a delimiter:

sed 's_/usr/local/bin_/common/bin_' <old >new
Some people use colons:

sed 's:/usr/local/bin:/common/bin:' <old >new
Others use the "|" character.

sed 's|/usr/local/bin|/common/bin|' <old >new
Pick one you like. As long as it's not in the string you are looking for, anything goes. And remember that you need three delimiters. If you get a "Unterminated `s' command" it's because you are missing one of them.

但是发现在非替换命令下不能这么做:

forrest@ubuntu:~$ sed '/http:////svn.alibaba-inc.com//repos/!d' /home/forrest/Desktop/cnfm_branches_20101119.txt

而不能这么写:

forrest@ubuntu:~$ sed '#http://svn.alibaba-inc.com/repos#d#' /home/forrest/Desktop/cnfm_branches_20101119.txt

另外,A simple example is changing "day" in the "old" file to "night" in the "new" file:
sed s/day/night/ <old >new
Or another way (for Unix beginners),

sed s/day/night/ old >new
old和new不能是同一个文件,否则最终结果是空文件。
为了避免每次替换操作都保存在一个新的临时文件中,我们可以使用如下方式将替换操作串起来,就像pipe一样:

多次修改

如果需要对同一文件或行作多次修改,可以有三种方法来实现它。第一种是使用 "-e" 选项,它通知程序使用了多条编辑命令。例如:

$ echo The tiger cubs will meet on Tuesday after school | sed -e '
s/tiger/wolf/' -e 's/after/before/'
The wolf cubs will meet on Tuesday before school
$
这是实现它的非常复杂的方法,因此 "-e" 选项不常被大范围使用。更好的方法是用分号来分隔命令:

$ echo The tiger cubs will meet on Tuesday after school | sed '
s/tiger/wolf/; s/after/before/'
The wolf cubs will meet on Tuesday before school 
$
注意分号必须是紧跟斜线之后的下一个字符。如果两者之间有一个空格,操作将不能成功完成,并返回一条错误消息。这两种方法都很好,但许多管理员更喜欢另一种方法。要注意的一个关键问题是,两个撇号 (' ') 之间的全部内容都被解释为 sed 命令。直到您输入了第二个撇号,读入这些命令的 shell 程序才会认为您完成了输入。这意味着可以在多行上输入命令—同时 Linux 将提示符从 PS1 变为一个延续提示符(通常为 ">")—直到输入了第二个撇号。一旦输入了第二个撇号,并且按下了 Enter 键,则处理就进行并产生相同的结果,如下所示:

$ echo The tiger cubs will meet on Tuesday after school | sed '
> s/tiger/wolf/
> s/after/before/'
The wolf cubs will meet on Tuesday before school
$

笔者试验了一下,发现第二种方法是有效的。第一种方法ms有点问题,不过没有细看,具体原因不知。

实战

2010-11-23 星期二 晴朗

今天早上过来合并代码,由于需要将今天要发布的分支先合在我们的代码中,所以先在Aone上找到今天的发布列表,但是从Aone上copy下来的信息格式如下:

1 intl-aisn tradeManager根据ip展示中文页面 麦俊生  http://svn.alibaba-inc.com/repos/ali_intl/apps/intl-aisn/branches/20101119_26643_1 http://svn.alibaba-inc.com/repos/ali_intl/apps/intl-aisn/tags/20101123_r_release1 395958 395958 
UK站首页help us挖成天窗 李栋  http://svn.alibaba-inc.com/repos/ali_intl/apps/intl-aisn/branches/20101122_26769_1 
Sourcing Detail底部wholesale产品推荐优化 顾士元  http://svn.alibaba-inc.com/repos/ali_intl/apps/intl-aisn/branches/20101122_25291_2 
招商频道日常发布11.23 黄健  http://svn.alibaba-inc.com/repos/ali_intl/apps/intl-aisn/branches/20101122_26775_1 
深度认证-atm tab页url修改 刘亳  http://svn.alibaba-inc.com/repos/ali_intl/apps/intl-aisn/branches/20101119_26397_1 
2 intl-atmgateway cookielog配置修改 麦俊生  http://svn.alibaba-inc.com/repos/ali_intl/apps/intl-atmgateway/branches/20101122_26641_1 http://svn.alibaba-inc.com/repos/ali_intl/apps/intl-atmgateway/tags/20101123_r_release1 380286 380286 
3 intl-atmlogin cookielog配置修改 麦俊生  http://svn.alibaba-inc.com/repos/ali_intl/apps/intl-atmlogin/branches/20101122_26641_1 http://svn.alibaba-inc.com/repos/ali_intl/apps/intl-atmlogin/tags/20101123_r_release1 395965 395965 
。。。

一共有83个分支。人肉将预发布分支找出来是一件痛苦的事情,我们要得到是所有带tags标记的分支URL,也就是说对于第一个发布信息,我们要提取如下信息:

http://svn.alibaba-inc.com/repos/ali_intl/apps/intl-aisn/tags/20101123_r_release1

用sed的替换功能正好可以做这样的事情。
我们要抽取的信息的格式特征如下:http://svn.alibaba-inc.com/repos/ali_intl/apps/应用名称/tags/分支信息
首先先把所有SVN URL提取出来,好进一步做处理。

forrest@ubuntu:~/Desktop$ sed 's#.*http://svn.alibaba-inc.com/repos/ali_intl/apps//(.*/)#http://svn.alibaba-inc.com/repos/ali_intl/apps//1#; /^http/ !d' < aone_release_20101123.txt > sed_study_1.txt

得到类似这样的数据:

http://svn.alibaba-inc.com/repos/ali_intl/apps/intl-aisn/tags/20101123_r_release1 395958 395958 
http://svn.alibaba-inc.com/repos/ali_intl/apps/intl-aisn/branches/20101122_26769_1 
http://svn.alibaba-inc.com/repos/ali_intl/apps/intl-aisn/branches/20101122_25291_2 
http://svn.alibaba-inc.com/repos/ali_intl/apps/intl-aisn/branches/20101122_26775_1 
http://svn.alibaba-inc.com/repos/ali_intl/apps/intl-aisn/branches/20101119_26397_1 
http://svn.alibaba-inc.com/repos/ali_intl/apps/intl-atmgateway/tags/20101123_r_release1 380286 380286 
。。。

这里面有两个需要我们处理:
1. 去除非/tags/的分支
2. 将tags分支后面的版本号去除

对于第一个是很容易做到的。

forrest@ubuntu:~/Desktop$ sed '/tags/ !d' < sed_study_1.txt > sed_study_2.txt
http://svn.alibaba-inc.com/repos/ali_intl/apps/intl-aisn/tags/20101123_r_release1 395958 395958 
http://svn.alibaba-inc.com/repos/ali_intl/apps/intl-atmgateway/tags/20101123_r_release1 380286 380286 
http://svn.alibaba-inc.com/repos/ali_intl/apps/intl-atmlogin/tags/20101123_r_release1 395965 395965 
http://svn.alibaba-inc.com/repos/ali_intl/apps/intl-billing/tags/20101123_r_release1 380290 380290
。。。

第二个如果不匹配前面的release1则比较麻烦。
但是对于这种表格型特征的记录取field的需求,使用awk是最方便的:

forrest@ubuntu:~$ awk '{print $1}'  ~/Desktop/sed_study_1.txt > /home/forrest/Desktop/sed_study_2.txt
http://svn.alibaba-inc.com/repos/ali_intl/apps/intl-aisn/tags/20101123_r_release1
http://svn.alibaba-inc.com/repos/ali_intl/apps/intl-aisn/branches/20101122_26769_1
http://svn.alibaba-inc.com/repos/ali_intl/apps/intl-aisn/branches/20101122_25291_2
。。。

总结:
sed是一个非常强大而简单的面向行的文本流编辑工具,可以用它来做一些简单的文本处理,如:
The result is that nowadays, sed is most commonly used in just two kinds of applications: simple text substitutions (that don't involve fields!), and extractions of lines by number.
其他情况下,用AWK比sed要方便得多,这就是为什么要掌握多门语言,并且知道他们的各自的适用场景。



抱歉!评论已关闭.