對于Excel和VBA我所知有限,僅能解決自己遇到的一些問題,并不一定適用于所有場景。以下內(nèi)容建立在了解基本VBA使用以及HTML語言知識的基礎(chǔ)上:
一、前期準(zhǔn)備
就我所知,VBA并不能操作任意瀏覽器及網(wǎng)頁,我們所能做的僅僅是對IE進行一些操作,是的,僅僅是IE。不要告訴我電腦上沒有IE,那樣就可以Exit Sub了。就像Python用import、C#用using一樣,VBA也需要引用一些庫才能對IE進行操作,不過好在同屬微軟產(chǎn)品,所以我們能很簡便的利用VBA自帶的一些庫。
首先我們要做的就是在VBA中引用Micorsoft Internet Controls,看這個名字就知道是幫助我們控制IE頁面用的。
二、網(wǎng)頁操作
引用Micorsoft Internet Controls之后,我們就可以對頁面為所欲為了,不過首頁我們要有個頁面,上帝說要有頁面!
1、打開網(wǎng)頁
我們以在百度搜索“扯乎”關(guān)鍵詞為例:
With CreateObject("internetexplorer.application")
.Visible = True
.Navigate "https://www.baidu.com/s?wd=扯乎"
'關(guān)閉網(wǎng)頁
' .Quit
End With
代碼很簡單,先創(chuàng)建一個IE對象,然后給一些屬性賦值。Visible是可見性,說的是在對網(wǎng)頁進行操作時,這個網(wǎng)頁是不是會被看見。熟練之后可以設(shè)置為False,不僅讓程序在跑的時候有種神秘感(并沒有),還能稍微加快一點速度。
不過有一點要記住,這個網(wǎng)頁我們打開之后并沒有關(guān)閉,也就是說程序結(jié)束后需要手動關(guān)閉,如果網(wǎng)頁不可見是無法手動關(guān)閉的。代碼中注釋的部分就是關(guān)閉網(wǎng)頁用的。Navigate不用多說就是URL。
我們必須要等網(wǎng)頁完全加載完才能開始信息的抓取,這個時候使用到:(從這里開始,所有的代碼都需要寫在With代碼塊中)
While .ReadyState <> 4 Or .Busy
DoEvents
Wend
Busy是網(wǎng)頁忙碌狀態(tài),ReadyState是HTTP的5種就緒狀態(tài),對應(yīng)如下:
- 0:請求未初始化(還沒有調(diào)用 open())。
- 1:請求已經(jīng)建立,但是還沒有發(fā)送(還沒有調(diào)用 send())。
- 2:請求已發(fā)送,正在處理中(通?,F(xiàn)在可以從響應(yīng)中獲取內(nèi)容頭)。
- 3:請求在處理中;通常響應(yīng)中已有部分數(shù)據(jù)可用了,但是服務(wù)器還沒有完成響應(yīng)的生成。
- 4:響應(yīng)已完成;您可以獲取并使用服務(wù)器的響應(yīng)了。
2、獲取信息
我們先把頁面中的所有內(nèi)容抓下來,后期篩選出有用的部分再慢慢給抓取添加條件。
Set dmt = .Document
For i = 0 To dmt.all.Length - 1
Set htMent = dmt.all(i)
With ActiveSheet
.Cells(i + 2, "A") = htMent.tagName
.Cells(i + 2, "B") = TypeName(htMent)
.Cells(i + 2, "C") = htMent.ID
.Cells(i + 2, "D") = htMent.Name
.Cells(i + 2, "E") = htMent.Value
.Cells(i + 2, "F") = htMent.Text
.Cells(i + 2, "G") = htMent.innerText
End With
Next i
這塊代碼和JS有些相似,需要從IE.Document.all中把頁面上所有節(jié)點找出來。這里也提供其他幾種方法:
- getElementById("IDName"):返回第一個內(nèi)部標(biāo)有IDName的標(biāo)簽
- getElementsByName("a") :返回所有的<a>標(biāo)簽,返回值為集合
- getElementsByClassName("css"):返回所有樣式名稱為css的標(biāo)簽,返回值為集合
這些都是在抓取了全部頁面內(nèi)容后幫助篩選有效信息時使用起來比較方便的。當(dāng)然all還是最好用的,因為all也存在all("IDName")以及all.IDName等等用法。
上面代碼部分返回的屬性值都是HTML基本內(nèi)容,就不一一解釋了。
3、填充信息
網(wǎng)抓神器當(dāng)然還是Python,大部分人使用Excel的目的還是在于對頁面內(nèi)容進行自動填充,直接讓表格提交網(wǎng)頁,問卷錄入之類的工作都省心不少。在抓取了頁面內(nèi)容之后,想填充更加是易如反掌的事情,只需要直接給頁面標(biāo)簽的Value屬性賦值就可以了。
不過網(wǎng)頁中除了文本框,可能還存在一些其他沒有Value的標(biāo)簽,比如:下拉菜單、單選框。給這些內(nèi)容賦值就需要一些基本的HTML知識了。
'下拉菜單選擇
.all("select")(0).Selected = True
'單選按鈕選擇
.all("radio").Checked = True
'復(fù)選按鈕選擇
.all("checkbox").Checked = True
下拉菜單是select標(biāo)簽,每個選項都在一個option標(biāo)簽里,所以返回一個集合,需要選中某個選項就要修改對應(yīng)的Selected屬性為True。單選和復(fù)選按鈕都是input標(biāo)簽,區(qū)別在于類型分別是radio和checkbox,要選中某個選項需要修改對應(yīng)的Checked屬性。
三、數(shù)據(jù)接口
有時候我們能直接拿到一些API,通過API返回數(shù)據(jù)當(dāng)然比打開網(wǎng)頁更方便快捷,所使用的方法也有一些不太一樣。
1、請求接口
比如我從網(wǎng)上得到一個能通過城市查詢免費WIFI的API,通過Excel接口訪問就使用下面的代碼:(雖然是免費的,為了避免麻煩還是把我的AppKey隱去了)
Dim http
Set http = CreateObject("Microsoft.XMLHTTP")
http.Open "GET", "http://api./Wifi/QueryByCity", False
http.setRequestHeader "CONTENT-TYPE", "application/x-www-form-urlencoded"
http.send "key=[AppKey]&city=北京&page=1"
這時我們創(chuàng)建的對象就不再是IE,而是HTTP對象。這里用的是ajax的Open方法,GET是數(shù)據(jù)發(fā)送方式,第二個參數(shù)是接口地址,第三個參數(shù)是指定請求方式是否為異步。如果這個API有賬號密碼,分別寫在第四第五個參數(shù)。
setRequestHeader就是給接口發(fā)送一個HTTP協(xié)議頭文件,最后send的內(nèi)容是接口參數(shù)。當(dāng)然,這個QueryString也可以直接寫在URL里,send一個空字符串就可以了。
2、接口返回
接口返回獲取的方式很簡單:
If http.Status = 200 Then Range("A1").Value = http.responseText
這里的HTTP狀態(tài)又變成200了,和之前說好的不一樣啊摔~有興趣可以自己查查具體有哪些。
不過接口返回要么是JSON要么是XML,Excel處理起來十分不方便。這里提供一個處理JSON的方法,是從網(wǎng)上找來的類模塊,具體內(nèi)容放在附錄里。在添加了這個clsJSON類模塊后,對JSON的處理就變得十分簡單了。
將上面的代碼改成:
If http.Status = 200 Then
Dim json$
json = http.responseText
Dim objJSON As New clsJSON, dicJSON As Object
Set dicJSON = objJSON.parse(json)
For i = 1 To dicJSON("result")("data").Count
Sheet1.Cells(i + 1, 1) = dicJSON("result")("data")(i)("name")
Sheet1.Cells(i + 1, 2) = dicJSON("result")("data")(i)("intro")
Sheet1.Cells(i + 1, 3) = dicJSON("result")("data")(i)("address")
Next i
End If
接口返回的示例我也放在附錄里了,根據(jù)接口返回的對象名、數(shù)組名去修改dicJSON后面的內(nèi)容就可以了。這個處理JSON的模塊用的是VBA中字典+集合的原理,所以數(shù)據(jù)處理后的調(diào)用方式也參照字典和集合。
以上是我用Excel+VBA進行網(wǎng)頁操作的一些個人經(jīng)驗,希望能幫助到一些有需要的人。有什么錯漏的地方,也希望知乎大牛批評指正。
附錄一:VBA處理JSON的類模塊
Option Explicit
'================================
' VBA處理JSON文件的類模塊
'
' http://www.
'================================
Const INVALID_JSON As Long = 1
Const INVALID_OBJECT As Long = 2
Const INVALID_ARRAY As Long = 3
Const INVALID_BOOLEAN As Long = 4
Const INVALID_NULL As Long = 5
Const INVALID_KEY As Long = 6
Private Sub Class_Initialize()
End Sub
Private Sub Class_Terminate()
End Sub
Public Function parse(ByRef str As String) As Object
Dim index As Long
index = 1
On Error Resume Next
Call skipChar(str, index)
Select Case Mid(str, index, 1)
Case "{"
Set parse = parseObject(str, index)
Case "["
Set parse = parseArray(str, index)
End Select
End Function
Private Function parseObject(ByRef str As String, ByRef index As Long) As Object
Set parseObject = CreateObject("Scripting.Dictionary")
' "{"
Call skipChar(str, index)
If Mid(str, index, 1) <> "{" Then Err.Raise vbObjectError + INVALID_OBJECT, Description:="char " & index & " : " & Mid(str, index)
index = index + 1
Do
Call skipChar(str, index)
If "}" = Mid(str, index, 1) Then
index = index + 1
Exit Do
ElseIf "," = Mid(str, index, 1) Then
index = index + 1
Call skipChar(str, index)
End If
Dim key As String
' add key/value pair
parseObject.Add key:=parseKey(str, index), Item:=parseValue(str, index)
Loop
End Function
Private Function parseArray(ByRef str As String, ByRef index As Long) As Collection
Set parseArray = New Collection
' "["
Call skipChar(str, index)
If Mid(str, index, 1) <> "[" Then Err.Raise vbObjectError + INVALID_ARRAY, Description:="char " & index & " : " + Mid(str, index)
index = index + 1
Do
Call skipChar(str, index)
If "]" = Mid(str, index, 1) Then
index = index + 1
Exit Do
ElseIf "," = Mid(str, index, 1) Then
index = index + 1
Call skipChar(str, index)
End If
' add value
parseArray.Add parseValue(str, index)
Loop
End Function
Private Function parseValue(ByRef str As String, ByRef index As Long)
Call skipChar(str, index)
Select Case Mid(str, index, 1)
Case "{"
Set parseValue = parseObject(str, index)
Case "["
Set parseValue = parseArray(str, index)
Case """", "'"
parseValue = parseString(str, index)
Case "t", "f"
parseValue = parseBoolean(str, index)
Case "n"
parseValue = parseNull(str, index)
Case Else
parseValue = parseNumber(str, index)
End Select
End Function
Private Function parseString(ByRef str As String, ByRef index As Long) As String
Dim quote As String
Dim char As String
Dim code As String
Call skipChar(str, index)
quote = Mid(str, index, 1)
index = index + 1
Do While index > 0 And index <= Len(str)
char = Mid(str, index, 1)
Select Case (char)
Case "\"
index = index + 1
char = Mid(str, index, 1)
Select Case (char)
Case """", "\\", "/"
parseString = parseString & char
index = index + 1
Case "b"
parseString = parseString & vbBack
index = index + 1
Case "f"
parseString = parseString & vbFormFeed
index = index + 1
Case "n"
parseString = parseString & vbNewLine
index = index + 1
Case "r"
parseString = parseString & vbCr
index = index + 1
Case "t"
parseString = parseString & vbTab
index = index + 1
Case "u"
index = index + 1
code = Mid(str, index, 4)
parseString = parseString & ChrW(Val("&h" + code))
index = index + 4
End Select
Case quote
index = index + 1
Exit Function
Case Else
parseString = parseString & char
index = index + 1
End Select
Loop
End Function
Private Function parseNumber(ByRef str As String, ByRef index As Long)
Dim value As String
Dim char As String
Call skipChar(str, index)
Do While index > 0 And index <= Len(str)
char = Mid(str, index, 1)
If InStr("+-0123456789.eE", char) Then
value = value & char
index = index + 1
Else
If InStr(value, ".") Or InStr(value, "e") Or InStr(value, "E") Then
parseNumber = CDbl(value)
Else
parseNumber = CInt(value)
End If
Exit Function
End If
Loop
End Function
Private Function parseBoolean(ByRef str As String, ByRef index As Long) As Boolean
Call skipChar(str, index)
If Mid(str, index, 4) = "true" Then
parseBoolean = True
index = index + 4
ElseIf Mid(str, index, 5) = "false" Then
parseBoolean = False
index = index + 5
Else
Err.Raise vbObjectError + INVALID_BOOLEAN, Description:="char " & index & " : " & Mid(str, index)
End If
End Function
Private Function parseNull(ByRef str As String, ByRef index As Long)
Call skipChar(str, index)
If Mid(str, index, 4) = "null" Then
parseNull = Null
index = index + 4
Else
Err.Raise vbObjectError + INVALID_NULL, Description:="char " & index & " : " & Mid(str, index)
End If
End Function
Private Function parseKey(ByRef str As String, ByRef index As Long) As String
Dim dquote As Boolean
Dim squote As Boolean
Dim char As String
Call skipChar(str, index)
Do While index > 0 And index <= Len(str)
char = Mid(str, index, 1)
Select Case (char)
Case """"
dquote = Not dquote
index = index + 1
If Not dquote Then
Call skipChar(str, index)
If Mid(str, index, 1) <> ":" Then
Err.Raise vbObjectError + INVALID_KEY, Description:="char " & index & " : " & parseKey
End If
End If
Case "'"
squote = Not squote
index = index + 1
If Not squote Then
Call skipChar(str, index)
If Mid(str, index, 1) <> ":" Then
Err.Raise vbObjectError + INVALID_KEY, Description:="char " & index & " : " & parseKey
End If
End If
Case ":"
If Not dquote And Not squote Then
index = index + 1
Exit Do
End If
Case Else
If InStr(vbCrLf & vbCr & vbLf & vbTab & " ", char) Then
Else
parseKey = parseKey & char
End If
index = index + 1
End Select
Loop
End Function
Public Sub skipChar(ByRef str As String, ByRef index As Long)
While index > 0 And index <= Len(str) And InStr(vbCrLf & vbCr & vbLf & vbTab & " ", Mid(str, index, 1))
index = index + 1
Wend
End Sub
Public Function toString(ByRef obj As Variant) As String
Select Case VarType(obj)
Case vbNull
toString = "null"
Case vbDate
toString = """" & CStr(obj) & """"
Case vbString
toString = """" & encode(obj) & """"
Case vbObject
Dim bFI, i
bFI = True
If TypeName(obj) = "Dictionary" Then
toString = toString & "{"
Dim keys
keys = obj.keys
For i = 0 To obj.Count - 1
If bFI Then bFI = False Else toString = toString & ","
Dim key
key = keys(i)
toString = toString & """" & key & """:" & toString(obj(key))
Next i
toString = toString & "}"
ElseIf TypeName(obj) = "Collection" Then
toString = toString & "["
Dim value
For Each value In obj
If bFI Then bFI = False Else toString = toString & ","
toString = toString & toString(value)
Next value
toString = toString & "]"
End If
Case vbBoolean
If obj Then toString = "true" Else toString = "false"
Case vbVariant, vbArray, vbArray + vbVariant
Dim sEB
toString = multiArray(obj, 1, "", sEB)
Case Else
toString = Replace(obj, ",", ".")
End Select
End Function
Private Function encode(str) As String
Dim i, j, aL1, aL2, c, p
aL1 = Array(&H22, &H5C, &H2F, &H8, &HC, &HA, &HD, &H9)
aL2 = Array(&H22, &H5C, &H2F, &H62, &H66, &H6E, &H72, &H74)
For i = 1 To Len(str)
p = True
c = Mid(str, i, 1)
For j = 0 To 7
If c = Chr(aL1(j)) Then
encode = encode & "\" & Chr(aL2(j))
p = False
Exit For
End If
Next
If p Then
Dim a
a = AscW(c)
If a > 31 And a < 127 Then
encode = encode & c
ElseIf a > -1 Or a < 65535 Then
encode = encode & "\u" & String(4 - Len(Hex(a)), "0") & Hex(a)
End If
End If
Next
End Function
Private Function multiArray(aBD, iBC, sPS, ByRef sPT) ' Array BoDy, Integer BaseCount, String PoSition
Dim iDU, iDL, i ' Integer DimensionUBound, Integer DimensionLBound
On Error Resume Next
iDL = LBound(aBD, iBC)
iDU = UBound(aBD, iBC)
Dim sPB1, sPB2 ' String PointBuffer1, String PointBuffer2
If Err.Number = 9 Then
sPB1 = sPT & sPS
For i = 1 To Len(sPB1)
If i <> 1 Then sPB2 = sPB2 & ","
sPB2 = sPB2 & Mid(sPB1, i, 1)
Next
' multiArray = multiArray & toString(Eval("aBD(" & sPB2 & ")"))
multiArray = multiArray & toString(aBD(sPB2))
Else
sPT = sPT & sPS
multiArray = multiArray & "["
For i = iDL To iDU
multiArray = multiArray & multiArray(aBD, iBC + 1, i, sPT)
If i < iDU Then multiArray = multiArray & ","
Next
multiArray = multiArray & "]"
sPT = Left(sPT, iBC - 2)
End If
Err.Clear
End Function
附錄二:JSON返回示例
{
"resultcode":"200",
"reason":"ReturnSuccessd!",
"result":{
"data":[
{
"name":"北京市法雨合",
"intro":"法雨合0層",
"address":"北京市朝陽區(qū)朝陽區(qū)三里屯",
"google_lat":"39.9372423",
"google_lon":"116.4480615",
"baidu_lat":"39.942952987502",
"baidu_lon":"116.45464108129",
"province":"北京市",
"city":"北京市"
},
{
"name":"北京朝陽西壩河光熙門北里",
"intro":"朝陽西壩河光熙門北里34-8號0層",
"address":"北京市朝陽區(qū)朝陽區(qū)西壩河光熙門北里34號-8號0層",
"google_lat":"39.9635121",
"google_lon":"116.435895",
"baidu_lat":"39.969407173324",
"baidu_lon":"116.44243487981",
"province":"北京市",
"city":"北京市"
},
{
"name":"北京朝陽三里屯北街",
"intro":"",
"address":"北京市朝陽區(qū)朝陽三里屯北街8號0層",
"google_lat":"39.9254286",
"google_lon":"116.4605935",
"baidu_lat":"39.931073085771",
"baidu_lon":"116.46719483818",
"province":"北京市",
"city":"北京市"
},
{
"name":"北京大都酒吧街",
"intro":"",
"address":"北京市朝陽區(qū)元大都酒吧街11號",
"google_lat":"39.975984",
"google_lon":"116.424389",
"baidu_lat":"39.982089966811",
"baidu_lon":"116.43086831752",
"province":"北京市",
"city":"北京市"
},
{
"name":"北京西城前海北沿",
"intro":"",
"address":"北京市西城區(qū)西城前海北沿10號0層",
"google_lat":"39.9369032",
"google_lon":"116.3919335",
"baidu_lat":"39.943215619704",
"baidu_lon":"116.39830652238",
"province":"北京市",
"city":"北京市"
},
{
"name":"北京市西城后海南沿36號對面",
"intro":"后海南沿36號對面0層",
"address":"北京市西城區(qū)后海南沿36號",
"google_lat":"39.9396792",
"google_lon":"116.389129",
"baidu_lat":"39.945967638433",
"baidu_lon":"116.39551153315",
"province":"北京市",
"city":"北京市"
},
{
"name":"北京市賽百味",
"intro":"ok",
"address":"北京市西城區(qū)中關(guān)村東路18號",
"google_lat":"39.9810991",
"google_lon":"116.3333866",
"baidu_lat":"39.9867766224",
"baidu_lon":"116.34001632032",
"province":"北京市",
"city":"北京市"
},
{
"name":"北京市光華路數(shù)碼01",
"intro":"",
"address":"北京市朝陽區(qū)光華路數(shù)碼01大廈0層",
"google_lat":"39.9132392",
"google_lon":"116.4592309",
"baidu_lat":"39.918885961978",
"baidu_lon":"116.46583845234",
"province":"北京市",
"city":"北京市"
},
{
"name":"北京市盛銘幫逸園會館",
"intro":"盛銘幫逸園會館0",
"address":"北京市朝陽區(qū)逸園25號",
"google_lat":"39.8710876",
"google_lon":"116.4602965",
"baidu_lat":"39.876744728506",
"baidu_lon":"116.46693498949",
"province":"北京市",
"city":"北京市"
},
{
"name":"北京市地平線酒吧",
"intro":"",
"address":"北京市朝陽區(qū)朝陽三里屯北街70號",
"google_lat":"39.9254286",
"google_lon":"116.4605935",
"baidu_lat":"39.931073085771",
"baidu_lon":"116.46719483818",
"province":"北京市",
"city":"北京市"
}
],
"pageinfo":{
"pnums":20,
"current":1
}
}
}
|