phpcurl抓取中文鏈接,phpcurl詳解

本文目錄一覽：

1、使用PHP的cURL庫進行網頁抓取
2、php中curl爬蟲怎麼樣通過網頁獲取所有鏈接
3、php curl用法
4、如何用php CURL 抓取微信網頁的內容
5、php curl 抓取頁面幾種方法介紹

使用PHP的cURL庫進行網頁抓取

使用PHP的cURL庫可以簡單和有效地去抓網頁你只需要運行一個腳本然後分析一下你所抓取的網頁然後就可以以程序的方式得到你想要的數據了無論是你想從從一個鏈接上取部分數據或是取一個XML文件並把其導入資料庫那怕就是簡單的獲取網頁內容 cURL 是一個功能強大的PHP庫本文主要講述如果使用這個PHP庫

啟用 cURL 設置

首先我們得先要確定我們的PHP是否開啟了這個庫你可以通過使用php_info()函數來得到這一信息

﹤?phpphpinfo();?﹥

如果你可以在網頁上看到下面的輸出那麼表示cURL庫已被開啟

如果你看到的話那麼你需要設置你的PHP並開啟這個庫如果你是在Windows平台下那麼非常簡單你需要改一改你的php ini文件的設置找到php_curl dll 並取消前面的分號注釋就行了如下所示

//取消下在的注釋extension=php_curl dll

如果你是在Linux下面那麼你需要重新編譯你的PHP了編輯時你需要打開編譯參數——在configure命令上加上 –with curl 參數

一個小示例

如果一切就緒下面是一個小常式

﹤?php// 初始化一個 cURL 對象$curl = curl_init();

// 設置你需要抓取的URLcurl_setopt($curl CURLOPT_URL //cocre );

// 設置headercurl_setopt($curl CURLOPT_HEADER );

// 設置cURL 參數要求結果保存到字元串中還是輸出到屏幕上 curl_setopt($curl CURLOPT_RETURNTRANSFER );

// 運行cURL 請求網頁$data = curl_exec($curl);

// 關閉URL請求curl_close($curl);

// 顯示獲得的數據var_dump($data);

如何POST數據

上面是抓取網頁的代碼下面則是向某個網頁POST數據假設我們有一個處理表單的網址// example /sendSMS php 其可以接受兩個表單域一個是電話號碼一個是簡訊內容

﹤?php$phoneNumber = ;$message = This message was generated by curl and php ;$curlPost = pNUMBER= urlencode($phoneNumber) MESSAGE= urlencode($message) SUBMIT=Send ;$ch = curl_init();curl_setopt($ch CURLOPT_URL // example /sendSMS php );curl_setopt($ch CURLOPT_HEADER );curl_setopt($ch CURLOPT_RETURNTRANSFER );curl_setopt($ch CURLOPT_POST );curl_setopt($ch CURLOPT_POSTFIELDS $curlPost);$data = curl_exec();curl_close($ch);?﹥

從上面的程序我們可以看到使用CURLOPT_POST設置HTTP協議的POST方法而不是GET方法然後以CURLOPT_POSTFIELDS設置POST的數據

　　關於代理伺服器

下面是一個如何使用代理伺服器的示例請注意其中高亮的代碼代碼很簡單我就不用多說了

﹤?php $ch = curl_init();curl_setopt($ch CURLOPT_URL // example );curl_setopt($ch CURLOPT_HEADER );curl_setopt($ch CURLOPT_RETURNTRANSFER );curl_setopt($ch CURLOPT_HTTPPROXYTUNNEL );curl_setopt($ch CURLOPT_PROXY fakeproxy : );curl_setopt($ch CURLOPT_PROXYUSERPWD user:password );$data = curl_exec();curl_close($ch);?﹥ 關於SSL和Cookie

關於SSL也就是HTTPS協議你只需要把CURLOPT_URL連接中的//變成//就可以了當然還有一個參數叫CURLOPT_SSL_VERIFYHOST可以設置為驗證站點

關於Cookie 你需要了解下面三個參數

CURLOPT_COOKIE 在當面的會話中設置一個cookie

CURLOPT_COOKIEJAR 當會話結束的時候保存一個Cookie

CURLOPT_COOKIEFILE Cookie的文件

HTTP伺服器認證

最後我們來看一看HTTP伺服器認證的情況

﹤?php $ch = curl_init();curl_setopt($ch CURLOPT_URL // example );curl_setopt($ch CURLOPT_RETURNTRANSFER );curl_setopt($ch CURLOPT_HTTPAUTH CURLAUTH_BASIC);curl_setopt(CURLOPT_USERPWD [username]:[password] )

$data = curl_exec();curl_close($ch);?﹥

關於其它更多的內容請參看相關的cURL手冊 lishixinzhi/Article/program/PHP/201311/21491

php中curl爬蟲怎麼樣通過網頁獲取所有鏈接

本文承接上面兩篇，本篇中的示例要調用到前兩篇中的函數，做一個簡單的URL採集。一般php採集網路數據會用file_get_contents、file和cURL。不過據說cURL會比file_get_contents、file更快更專業，更適合採集。今天就試試用cURL來獲取網頁上的所有鏈接。示例如下：

?php

* 使用curl 採集hao123.com下的所有鏈接。

include_once(‘function.php’);

$ch = curl_init();

curl_setopt($ch, CURLOPT_URL, ”);

// 只需返回HTTP header

curl_setopt($ch, CURLOPT_HEADER, 1);

// 頁面內容我們並不需要

// curl_setopt($ch, CURLOPT_NOBODY, 1);

// 返回結果，而不是輸出它

curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);

$html = curl_exec($ch);

$info = curl_getinfo($ch);

if ($html === false) {

echo “cURL Error: ” . curl_error($ch);

}

curl_close($ch);

$linkarr = _striplinks($html);

// 主機部分，補全用

$host = ”;

if (is_array($linkarr)) {

foreach ($linkarr as $k = $v) {

$linkresult[$k] = _expandlinks($v, $host);

}

printf(“p此頁面的所有鏈接為：/ppre%s/pren”, var_export($linkresult , true));

function.php內容如下（即為上兩篇中兩個函數的合集）：

?php

function _striplinks($document) {

preg_match_all(“‘s*as.*?hrefs*=s*([“‘])?(?(1) (.*?)\1 | ([^s]+))’isx”, $document, $links);

// catenate the non-empty matches from the conditional subpattern

while (list($key, $val) = each($links[2])) {

if (!empty($val))

$match[] = $val;

} while (list($key, $val) = each($links[3])) {

if (!empty($val))

$match[] = $val;

}

// return the links

return $match;

}

/*===================================================================*

Function: _expandlinks

Purpose: expand each link into a fully qualified URL

Input: $links the links to qualify

$URI the full URI to get the base from

Output: $expandedLinks the expanded links

*===================================================================*/

function _expandlinks($links,$URI)

{

$URI_PARTS = parse_url($URI);

$host = $URI_PARTS[“host”];

preg_match(“/^[^?]+/”,$URI,$match);

$match = preg_replace(“|/[^/.]+.[^/.]+$|”,””,$match[0]);

$match = preg_replace(“|/$|”,””,$match);

$match_part = parse_url($match);

$match_root =

$match_part[“scheme”].”://”.$match_part[“host”];

$search = array( “|^http://”.preg_quote($host).”|i”,

“|^(/)|i”,

“|^(?!http://)(?!mailto:)|i”,

“|/./|”,

“|/[^/]+/../|”

);

$replace = array( “”,

$match_root.”/”,

$match.”/”,

“/”,

“/”

);

$expandedLinks = preg_replace($search,$replace,$links);

return $expandedLinks;

}

php curl用法

curl 是使用URL語法的傳送文件工具，支持FTP、FTPS、HTTP HTPPS SCP SFTP TFTP TELNET DICT FILE和LDAP。curl 支持SSL證書、HTTP POST、HTTP PUT 、FTP 上傳，kerberos、基於HTT格式的上傳、代理、cookie、用戶＋口令證明、文件傳送恢復、http代理通道和大量其他有用的技巧。

如何用php CURL 抓取微信網頁的內容

給你簡單介紹幾個吧

一、file_get_contents函數

$content = file_get_contents(“URL”);//URL就是你要獲取的頁面的地址

二、利用curl擴展

代碼如下：

function getCurl($url){

$ch = curl_init();

curl_setopt($ch, CURLOPT_URL,$url);

curl_setopt($ch, CURLOPT_RETURNTRANSFER,1);//不輸出內容

curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, false);

curl_setopt($ch, CURLOPT_SSL_VERIFYHOST, false);

$result = curl_exec($ch);

curl_close ($ch);

return $result;

}

PS：需要安裝PHP的curl擴展

php curl 抓取頁面幾種方法介紹

使用代理進行抓取

為什麼要使用代理進行抓取呢？以google為例吧，如果去抓google的數據，短時間內抓的很頻繁的話，你就抓取不到了。google對你的ip地址做限制這個時候，你可以換代理重新抓。

代碼如下

?php

$ch = curl_init();

curl_setopt($ch, CURLOPT_URL, “

);

curl_setopt($ch, CURLOPT_HEADER, false);

curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);

curl_setopt($ch, CURLOPT_HTTPPROXYTUNNEL, TRUE);

curl_setopt($ch, CURLOPT_PROXY, 125.21.23.6:8080);

//url_setopt($ch, CURLOPT_PROXYUSERPWD, ‘user:password’);如果要密碼的話，加上這個

$result=curl_exec($ch);

curl_close($ch);

原創文章，作者：小藍，如若轉載，請註明出處：https://www.506064.com/zh-tw/n/151349.html

phpcurl抓取中文鏈接,phpcurl詳解

本文目錄一覽：

使用PHP的cURL庫進行網頁抓取

php中curl爬蟲 怎麼樣通過網頁獲取所有鏈接

php curl用法

如何用php CURL 抓取微信網頁的內容

php curl 抓取頁面幾種方法介紹

相關推薦

發表回復

php中curl爬蟲怎麼樣通過網頁獲取所有鏈接