BeautifulSoup使用-白红宇

BeautifulSoup使用

阅读量：5112 次

发布时间：2019-06-13

本文共 1220 字，大约阅读时间需要 4 分钟。

request能取到网页上面的数据，但是这些是属于结构化的数据，我们不能直接使用，需要将这些数据进行转化，从而方便使用

BeautifulSoup能将标签移除掉，从而获得网页上的数据以及内容

1、将特定标签的内容取出来

单个标签

from bs4 import BeautifulSoup html_sample = '\\ \ HelloWorld
\This is link1\ This is link2\\' soup= BeautifulSoup(html_sample,'html.parser') header=soup.select('h1') print(header[0].text) 多个相同的标签

from bs4 import BeautifulSoup html_sample = '\\ \ HelloWorld
\This is link1\ This is link2\\' soup= BeautifulSoup(html_sample,'html.parser') header=soup.select('a') for alink in header: print(alink.text)

2、取出含有特定css属性的元素 id前面需要加#

from bs4 import BeautifulSoup html_sample = '\\ \ HelloWorld
\This is link1\ This is link2\\' soup= BeautifulSoup(html_sample,'html.parser') header=soup.select('#title') print(header)

class前面加.

from bs4 import BeautifulSoup html_sample = '\\ \ HelloWorld
\This is link1\ This is link2\\' soup= BeautifulSoup(html_sample,'html.parser') header=soup.select('.link') for alink in header: print(alink.text)

3、取得a标签里面链接的内容

from bs4 import BeautifulSoup html_sample = '\\ \ HelloWorld
\This is link1\ This is link2\\' soup= BeautifulSoup(html_sample,'html.parser') header=soup.select('a') for alink in header: print(alink['href'])

转载于:https://www.cnblogs.com/zlj1992/p/6106653.html

你可能感兴趣的文章

Oracle中的rownum不能使用大于>的问题

查看>>

cassandra vs mongo (1)存储引擎

查看>>

Visual Studio基于CMake配置opencv1.0.0、opencv2.2