我要提取的数据来自这个网站https://www.adobe.com/support/security/advisories/apsa11-04.html。我只想提取
发布日期:2011年12月6日最后更新:2012年1月10日漏洞标识符:APSA11-04 CVE编号:CVE-2011-2462
代码:
from bs4 import BeautifulSoup
div = soup.find("div", attrs={"id": "L0C1-body"})
for p in div.findAll("p"):
if p.find('strong'):
print(p.text)
输出:
Release date: December 6, 2011
Last updated: January 10, 2012
Vulnerability identifier: APSA11-04
CVE number: CVE-2011-2462
Platform: All
*Note: Adobe Reader for Android and Adobe Flash Player are not affected by this issue.
我不想要这些信息。我该如何过滤呢?
平台:全部*注意:Adobe Reader for Android和Adobe Flash Player不受此问题影响。
如果您知道希望在标记之后始终有前4个
标记,则可以使用以下示例:
import requests
from bs4 import BeautifulSoup
url = "https://www.adobe.com/support/security/advisories/apsa11-04.html"
soup = BeautifulSoup(requests.get(url).content, "html.parser")
txt = "\n".join(
map(lambda x: x.get_text(strip=True, separator=" "), soup.select("h2 ~ p")[:4])
)
print(txt)
打印:
Release date: December 6, 2011
Last updated: January 10, 2012
Vulnerability identifier: APSA11-04
CVE number: CVE-2011-2462