Python爬虫入门:requests库入门

作者 : 郭然 本文共3023个字,预计阅读时间需要8分钟 发布时间: 2022-09-18 共119人阅读

 

Python爬虫入门:requests库入门

requests基本用法

简单起步

import requests
r = requests.get('https://www.baidu.com/')
print(type(r))	 		#<class 'requests.models.Response'>
print(r.status_code) 	#200
print(r.text) 			#html
print(r.cookies) 		#<RequestsCookieJar[<Cookie BDORZ=27315 for .baidu.com/>]>

支持方法

  • get()
  • post()
  • put()
  • delete()
  • head()
  • options()

GET请求

无参数基本实例

import requests
r = requests.get('http://httpbin.org/get')
print(r.text)
"""
输出为:
{
  "args": {},
  "headers": {
    "Accept": "*/*",
    "Accept-Encoding": "gzip, deflate",
    "Host": "httpbin.org",
    "User-Agent": "python-requests/2.22.0"
  },
  "origin": "123.121.82.132, 123.121.82.132",
  "url": "https://httpbin.org/get"
}
"""

带参数请求

参数直接在url中

import requests
r = requests.get('http://httpbin.org/get?name=germey&age=22')
print(r.text)

参数通过params=data形式添加

import requests
data = {
	'name':'germey',
	'age':22
}
r = requests.get('http://httpbin.org/get', params=data)
print(r.text)

调用json解析

import requests
r = requests.get('http://httpbin.org/get')
print(type(r.text))
print(r.json())
print(type(r.json()))

获取网页中的ico图标

import requests
r = requests.get('https://github.com/favicon.ico')
with open('favicon.ico','wb') as f:
	f.write(r.content)

添加headers

import requests
headers = {
	'User-Agent':'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_4) AppleWebKit/537.36 (KHTML, like Gecko)\
	Chrome/52.0.2743.116 Safari/537.36'
}
r= requests.get('https://www.zhihu.com/explore',headers=headers)
print(r.text)

POST请求

import requests
data = {'name':'germey','age':'22'}
r = requests.post('http://httpbin.org/post',data=data)
print(r.text)
"""
结果为:
{
  "args": {},
  "data": "",
  "files": {},
  "form": {
    "age": "22",
    "name": "germey"
  },
  "headers": {
    "Accept": "*/*",
    "Accept-Encoding": "gzip, deflate",
    "Content-Length": "18",
    "Content-Type": "application/x-www-form-urlencoded",
    "Host": "httpbin.org",
    "User-Agent": "python-requests/2.22.0"
  },
  "json": null,
  "origin": "123.121.82.132, 123.121.82.132",
  "url": "https://httpbin.org/post"
}
"""

文件上传

import requests
files = {'file':open('favicon.ico','rb')}
r = requests.post('http://httpbin.org/post',files=files)
print(r.text)

Cookies

获取Cookies

import requests
r = requests.get('https://www.baidu.com')
print(r.cookies)
for key, value in r.cookies.items():
	print(key + '=' + value)

设置Cookies

import requests
headers = {
	'Cookie':'Cookie信息',
	'Host':'www.zhihu.com',
	'User-Agent':'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/75.0.3770.142 Safari/537.36'
}
r = requests.get('https://www.zhihu.com',headers=headers)
print(r.text)

会话维持

多个请求不是同一个会话

多个请求时不相干的,相当于多个浏览器的cookies不干扰。可以为每个请求添加一个header来设置cookies,但是比较麻烦。
注:http://httpbin.org/cookies/set/number/123456789 能够把cookies设置为123456789
http://httpbin.org/cookies 能获取cookies

import requests
requests.get('http://httpbin.org/cookies/set/number/123456789')
r = requests.get('http://httpbin.org/cookies')
print(r.text)
"""
结果为
{
  "cookies": {}
}
"""

Session会话维持

import requests
s = requests.Session()
s.get('http://httpbin.org/cookies/set/number/123456789')
r = s.get('http://httpbin.org/cookies')
print(r.text)
"""
结果为
{
  "cookies": {
    "number": "123456789"
  }
}
"""

 

赞赏

微信赞赏支付宝赞赏

VIP部落提供编程技术、教育培训、优惠购物以及各类软件和网站源码、模板等资源下载。
VIP部落 » Python爬虫入门:requests库入门

常见问题FAQ

提供最优质的资源集合

立即查看 了解详情