2019-12-18 捕抓异常 爬虫
捕抓异常
>>> ###捕抓异常
...
>>> a=10
>>> b=a+'hello'
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: unsupported operand type(s) for +: 'int' and 'str'
>>>
>>>
>>> try:
... a=10
... b=a+'hello'
... except Exception as e:
File "<stdin>", line 4
except Exception as e:
^
SyntaxError: invalid syntax
>>>
>>>
>>> try:
... a=10
... b=a+'hello'
... except Exception as e:
... print(e)
...
unsupported operand type(s) for +: 'int' and 'str'
>>>
>>>
>>> ##有遗留的问题---出现错误的时候,数据库怎么自动回滚
爬虫
>>> ###创建数据库
... #|-requests 用来获取页面的内容
... #|-BeautifulSoup 用来获取网页元素 这两个都有中文版,写上python就好
...
>>>
>>> url='https://bj.lianjia.com/zufang/'
KeyboardInterrupt
>>>
>>> ###安装模块
... #pip install requests
... #pip install bs4
...
>>> import requests
>>> from bs4 import BeautifulSoup
>>> url='https://bj.lianjia.com/zufang/'
>>> ##实验目的:通过首页链接,过去具体的租房信息(价格,大小,位置等)
...
>>> responce = requests.get(url) ##获取一个页面的信息
>>> soup = BeautifulSoup(responce.text,'lxml') ##text是已经获取的网页信息,beaut做元素铺抓,做成lxml的网页代码格式
>>> links_div = soup.find_all('div',class="content__list--item") ##进行查找,找出有链接的div
File "<stdin>", line 1
links_div = soup.find_all('div',class="content__list--item") ##进行查找,找出有链接的div
^
SyntaxError: invalid syntax
>>> links_div = soup.find_all('div',class_="content__list--item") ##进行查找,找出有链接的div
>>> ##报错的原因是,python有一个函数是class,为了区分,需要加_
>>> links_div[0]
>>> links=[div.a.get('href') for div in links_div]
>>> ##封装成函数,作用是获取列表页下面的所有租房页面的链接,返回一个链接列表
其他参数:https://blog.csdn.net/weixin_43930694/article/details/90142678

