目前,在我的代码中,我分解了一个较大的汤,用下面的代码获得所有的'td'标记:
floorplans_all = sub_soup.findAll('td', {"data-label":"Rent"})
floorplan_soup = soup(floorplans_all[0].prettify(), "html.parser")
rent_span = floorplan_soup.findAll('span', {"class":"sr-only"})
print(floorplans_all)
结果如下:
<td data-label="Rent" data-selenium-id="Rent_6">
<span class="sr-only">
Monthly Rent
</span>
$2,335 -
<span class="sr-only">
to
</span>
$5,269
</td>
打印rent_span如下所示:
[<span class="sr-only">
Monthly Rent
</span>, <span class="sr-only">
to
</span>]
我似乎不能从上面得到“2335美元-”和“5269美元”。 我一直在尝试沿着HTML树走下去,但是我无法获得标签之间的文本。
td
元素有五个子元素:
span
节点span
节点您可以使用children
属性迭代这些子级:
soup = BeautifulSoup(text, 'html.parser')
for child in soup.td.children:
print(repr(child))
'\n'
<span class="sr-only">
Monthly Rent
</span>
'\n $2,335 -\n '
<span class="sr-only">
to
</span>
'\n $5,269\n '
如果要显式地查找文本节点,则可以搜索span
节点,并每次获取下一个同级节点:
>>> [span.next_sibling.string.strip() for span in soup.td.find_all(class_='sr-only')]
['$2,335 -', '$5,269']
soup = BeautifulSoup(res, 'html.parser')
row = soup.find('td', {'data-label': "Rent"})
for all in row.find_all('span'):
print(all.text.strip())
输出如下所示
Monthly Rent
$2,335
$5,269