002_006 Python 处理文件中的每个单词

返回顶部
查看留言
转到底部

现在的位置: 首页 > 综合 > 正文

002_006 Python 处理文件中的每个单词

2017年12月10日 ⁄ 综合 ⁄ 共 680字 ⁄ 字号小中大 ⁄ 评论关闭

代码如下：

#encoding=utf-8

print '中国'

#处理文件中的每个单词 假定词由空格分开

''' D:\123.txt的内容如下：
1 a b c 中 国
2 a b c 中 国
'''

#方案一
print '------1'
file_object = open(r'd:\123.txt','rU')

for line in file_object:
    for word in line.split():
        print word

file_object.close()

#方案二 正则表达式 不支持中文
print '------2'
import re
re_word = re.compile(r"[\w'-]+")

file_object = open(r'd:\123.txt','rU')

for line in file_object:
    for word in re_word.finditer(line):
        print word.group(0)

file_object.close()

#方案三 封装成迭代器
print '------3'

def wordsoffile(thefilepath, line_to_words = str.split):
    the_file = open(thefilepath)
    for line in the_file:
        for word in line_to_words(line):
            yield word
    the_file.close()
    
for word in wordsoffile(r'd:\123.txt'):
    print word

打印结果如下:

中国
------1
1
a
b
c
中
国
2
a
b
c
中
国
------2
1
a
b
c
2
a
b
c
------3
1
a
b
c
中
国
2
a
b
c
中
国