考虑一下这段代码:
divTag = soup.find_all("div", {"class":"classname"})
print divTag
for tag in divTag:
ulTag = soup.find_all("ul", {"class":"classname"})
print ulTag
for tag in ulTag:
liTag = soup.find_all("li", {"class":"classname"})
print liTag
for tag in liTag:
diTag = soup.find_all("div", {"class":"classname"})
print diTag
for tag in diTag:
aTags = tag.find_next("a")
value = aTags.string
print value
它只打印“divTag”
更新:
<div class="classname">
<ul auto-load="true" class="classname" data-href="">
<li class="classname">
<div class="classname"><a href="">"value"</a> string <a href="">string1</a> <a class="muted"><abbr class="timeago" title=" 1 Jun, 2015, 10:23 am">7 hours ago</abbr></a>
</div>
</li>
<li>
</li>
</ul>
</div>
我基本上想在'a'标签中提取“字符串”值。
具有next_sibling的完整解决方案
ulTag = soup.find("ul", {"class": "classname"})
aTags = ulTag.find_all("a")
for aTag in aTags:
sibling = aTag.next_sibling
siblingString = str(sibling).strip()
if len(siblingString) > 0:
print siblingString
每当你在汤里寻找的时候。所以你失败了。您应该在标签的父标签中搜索标签。试着这样做:
divTag = soup.find_all("div", {"class":"classname"})
for ulTag in divTag:
for liTag in ulTag.find_all("li", {"class":"classname"}):
for tag in liTag.find_all("div", {"class":"classname"}):
for aTag in tag.find_all('a'):
print aTag.string
对于您提供的 html,输出为:
"value"
string1
7 hours ago