如何使用Python从HTML中提取中的特定标记

提问者：小点点

如何使用Python从HTML中提取<div>中的特定标记<p>

我要提取的数据来自这个网站https://www.adobe.com/support/security/advisories/apsa11-04.html。我只想提取

发布日期：2011年12月6日最后更新：2012年1月10日漏洞标识符:APSA11-04 CVE编号：CVE-2011-2462

代码：

from bs4 import BeautifulSoup
div = soup.find("div", attrs={"id": "L0C1-body"})
for p in div.findAll("p"):
    if p.find('strong'):
        print(p.text)

输出：

Release date: December 6, 2011
Last updated: January  10, 2012
Vulnerability identifier: APSA11-04
CVE number: CVE-2011-2462
Platform: All
*Note: Adobe Reader for Android and Adobe Flash Player are not affected by this issue.

我不想要这些信息。我该如何过滤呢？

平台：全部*注意：Adobe Reader for Android和Adobe Flash Player不受此问题影响。

共1个答案

匿名用户

如果您知道希望在

标记之后始终有前4个

标记，则可以使用以下示例：

import requests
from bs4 import BeautifulSoup


url = "https://www.adobe.com/support/security/advisories/apsa11-04.html"
soup = BeautifulSoup(requests.get(url).content, "html.parser")

txt = "\n".join(
    map(lambda x: x.get_text(strip=True, separator=" "), soup.select("h2 ~ p")[:4])
)
print(txt)

打印：

Release date: December 6, 2011
Last updated: January  10, 2012
Vulnerability identifier: APSA11-04
CVE number: CVE-2011-2462

如何使用Python从HTML中提取<div>中的特定标记<p>

共1个答案

相关问题

热门标签

如何使用Python从HTML中提取<div>中的特定标记<p>

共1个答案

相关问题

热门标签

微信关注