大数据

Python 强化训练:第二篇

强化训练:第二篇

摘要:心好累.

问题来源

  1. 爬虫中会经常会遇到字符串的处理

主要内容

  1. 拆分字符串
  2. 字符串开头结尾
  3. 调整字符串格式
  4. 拼接字符串
  5. 字符串对齐
  6. 出掉不需要的字符
  7. 匹配字符
  8. 搜索字符

1.

拆分字符串

  1. 内置str.split(): 只能分割一个
  2. re.split(): 按模式进行分割
import re
data_one = "ab;cd|efg|hi,jkl|mn\topq;rst,uv\twx\t  y\nz"
pattern = r";+|,+|\t+|\n+|\s+|\|+"
result = re.split(pattern, data_one)
print(result)  #['ab', 'cd', 'efg', 'hi', 'jkl', 'mn', 'opq', 'rst', 'uv', 'wx', '', 'y', 'z']

2.

字符串开头结尾

  1. str.startswith()
  2. str.endswith()
filename = "learnpython.py"

print(filename.startswith("learn"))

print(filename.endswith(".py"))

3.

调整字符串格式
2016-10-31替换成31/10/2016

  1. re.sub():替换
A="2016-10-31"
print(re.sub(r"(?P\d{4})-(?P\d{2})-(?P\d{2})", r"\g/\g/\g", A))
#31/10/2016

4.

拼接字符串

  1. +
  2. join
values = ["apple", 'orange', "pear", "banana"]
str_temp = ""
for i in values:
    str_temp += i

print(str_temp)  #appleorangepearbanana
str_other = ''.join(values)    #appleorangepearbanana
str_one = "+".join(values)    #apple+orange+pear+banana
str_two = "====".join((values))    #apple====orange====pear====banana
print(str_other, str_one, str_two)

5.

字符串对齐

  1. str.ljust()
  2. str.rjust()
  3. str.center()
  4. format()
sentence = 'Shanghai University'

print(sentence.ljust(50))
print(sentence.rjust(50))
print(sentence.center(50))

print(format(sentence, "<50"))
print(format(sentence, ">50"))
print(format(sentence, "^50"))



#Shanghai University                               
#                               Shanghai University
#               Shanghai University                
#Shanghai University                               
#                               Shanghai University
#               Shanghai University

6.

出掉不需要的字符

  1. str.strip()
  2. str.lstrip()
  3. str.rstrip()
  4. re.sub()
words = '============Shanghai++++++University==============='

print(words.strip("="))    #Shanghai++++++University

print(words.lstrip("="))    #Shanghai++++++University===============

print(words.rstrip("="))    #============Shanghai++++++University

word_pattern = r'=+|\++'

print(re.sub(word_pattern, '', words))    #ShanghaiUniversity

7.

匹配字符

  1. re.match()

8.

搜索字符

  1. str.find()
  2. re.findall()

参考:[python cookbook]