在用BeautifulSoup解析HTML前对其中以JavaScript渲染部分的处理

在命令行运行：

# Python2
$ pip install requests-html

# Python3
# pip3 install requests-html

即可安装该模块。

例子如下：

#coding=utf-8
from bs4 import BeautifulSoup
import requests
from requests_html import HTMLSession
 
#使用requests抓取页面内容，并将响应赋值给page变量
html = requests.get('https://xcx.xzlzq.net/#/liftDetail?registerCode=31103301042010020002')
 
session = HTMLSession()
first_page = session.get('https://xcx.xzlzq.net/#/liftDetail?registerCode=31103301042010020002')
first_page.html.render(sleep=5)

#使用content属性获取页面的源页面
#使用BeautifulSoap解析，内容传递到BeautifulSoap类
soup = BeautifulSoup(first_page.html.html,'lxml')
links = soup.find_all('div',class_='content')
 
#link的内容就是div，我们取它的span内容就是我们需要段子的内容
for link in links:
    print(link.span.get_text())

参考链接

发布者

默默

码农查看默默的所有文章

发表回复取消回复