Python爬虫入门:requests库入门
Python爬虫入门:requests库入门
Python爬虫入门:requests库入门
requests基本用法
简单起步
import requests
r = requests.get('https://www.baidu.com/')
print(type(r)) #<class 'requests.models.Response'>
print(r.status_code) #200
print(r.text) #html
print(r.cookies) #<RequestsCookieJar[<Cookie BDORZ=27315 for .baidu.com/>]>
支持方法
- get()
- post()
- put()
- delete()
- head()
- options()
GET请求
无参数基本实例
import requests
r = requests.get('http://httpbin.org/get')
print(r.text)
"""
输出为:
{
"args": {},
"headers": {
"Accept": "*/*",
"Accept-Encoding": "gzip, deflate",
"Host": "httpbin.org",
"User-Agent": "python-requests/2.22.0"
},
"origin": "123.121.82.132, 123.121.82.132",
"url": "https://httpbin.org/get"
}
"""
带参数请求
参数直接在url中
import requests
r = requests.get('http://httpbin.org/get?name=germey&age=22')
print(r.text)
参数通过params=data形式添加
import requests
data = {
'name':'germey',
'age':22
}
r = requests.get('http://httpbin.org/get', params=data)
print(r.text)
调用json解析
import requests
r = requests.get('http://httpbin.org/get')
print(type(r.text))
print(r.json())
print(type(r.json()))
获取网页中的ico图标
import requests
r = requests.get('https://github.com/favicon.ico')
with open('favicon.ico','wb') as f:
f.write(r.content)
添加headers
import requests
headers = {
'User-Agent':'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_4) AppleWebKit/537.36 (KHTML, like Gecko)\
Chrome/52.0.2743.116 Safari/537.36'
}
r= requests.get('https://www.zhihu.com/explore',headers=headers)
print(r.text)
POST请求
import requests
data = {'name':'germey','age':'22'}
r = requests.post('http://httpbin.org/post',data=data)
print(r.text)
"""
结果为:
{
"args": {},
"data": "",
"files": {},
"form": {
"age": "22",
"name": "germey"
},
"headers": {
"Accept": "*/*",
"Accept-Encoding": "gzip, deflate",
"Content-Length": "18",
"Content-Type": "application/x-www-form-urlencoded",
"Host": "httpbin.org",
"User-Agent": "python-requests/2.22.0"
},
"json": null,
"origin": "123.121.82.132, 123.121.82.132",
"url": "https://httpbin.org/post"
}
"""
文件上传
import requests
files = {'file':open('favicon.ico','rb')}
r = requests.post('http://httpbin.org/post',files=files)
print(r.text)
Cookies
获取Cookies
import requests
r = requests.get('https://www.baidu.com')
print(r.cookies)
for key, value in r.cookies.items():
print(key + '=' + value)
设置Cookies
import requests
headers = {
'Cookie':'Cookie信息',
'Host':'www.zhihu.com',
'User-Agent':'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/75.0.3770.142 Safari/537.36'
}
r = requests.get('https://www.zhihu.com',headers=headers)
print(r.text)
会话维持
多个请求不是同一个会话
多个请求时不相干的,相当于多个浏览器的cookies不干扰。可以为每个请求添加一个header来设置cookies,但是比较麻烦。
注:http://httpbin.org/cookies/set/number/123456789 能够把cookies设置为123456789
http://httpbin.org/cookies 能获取cookies
import requests
requests.get('http://httpbin.org/cookies/set/number/123456789')
r = requests.get('http://httpbin.org/cookies')
print(r.text)
"""
结果为
{
"cookies": {}
}
"""
Session会话维持
import requests
s = requests.Session()
s.get('http://httpbin.org/cookies/set/number/123456789')
r = s.get('http://httpbin.org/cookies')
print(r.text)
"""
结果为
{
"cookies": {
"number": "123456789"
}
}
"""
微信赞赏支付宝赞赏