php採集函數,php 採集

本文目錄一覽：

1、php函數preg_match採集正則保存問題
2、怎樣用php 採集百度地圖的數據
3、PHP 採集程序中常用的函數
4、php怎麼抓取網站中meta函數get

php函數preg_match採集正則保存問題

恭喜，魔術引用在我看來就是一個累贅，造成了很多迷惑。我現在安裝系統，直接就是關閉魔術引用，避免潛在的問題。

代碼級的安全性應該是由代碼編寫者操心，php只要負責好系統級的安全性就可以了。

怎樣用php 採集百度地圖的數據

一般來說，PHP採集數據最簡單的辦法是使用file_get_content函數，功能更強大的推薦使用cURL函數庫。

PHP 採集程序中常用的函數

複製代碼

代碼如下:

//獲得當前的腳本網址

function

get_php_url()

{

if(!empty($_SERVER[”REQUEST_URI”]))

{

$scriptName

$_SERVER[”REQUEST_URI”];

$nowurl

$scriptName;

}

else

{

$scriptName

$_SERVER[”PHP_SELF”];

if(empty($_SERVER[”QUERY_STRING”]))

$nowurl

$scriptName;

else

$nowurl

$scriptName.”?”.$_SERVER[”QUERY_STRING”];

}

return

$nowurl;

}

//把全角數字轉為半角數字

function

GetAlabNum($fnum)

{

$nums

array(”0”,”1”,”2”,”3”,”4”,”5”,”6”,”7”,”8”,”9”);

$fnums

“0123456789″;

for($i=0;$i=9;$i++)

$fnum

str_replace($nums[$i],$fnums[$i],$fnum);

$fnum

ereg_replace(”[^0-9\.]|^0{1,}”,””,$fnum);

if($fnum==””)

$fnum=0;

return

$fnum;

}

//去除HTML標記

function

Text2Html($txt)

{

$txt

str_replace(”

“,”　”,$txt);

$txt

str_replace(””,””,$txt);

$txt

str_replace(””,””,$txt);

$txt

preg_replace(”/[\r\n]{1,}/isU”,”br/\r\n”,$txt);

return

$txt;

}

//清除HTML標記

function

ClearHtml($str)

{

$str

str_replace(”,”,$str);

$str

str_replace(”,”,$str);

return

$str;

}

//相對路徑轉化成絕對路徑

function

relative_to_absolute($content,

$feed_url)

{

preg_match(‘/(http|https|ftp):\/\//’,

$feed_url,

$protocol);

$server_url

preg_replace(”/(http|https|ftp|news):\/\//”,

“”,

$feed_url);

$server_url

preg_replace(”/\/.*/”,

“”,

$server_url);

($server_url

”)

{

return

$content;

}

(isset($protocol[0]))

{

$new_content

preg_replace(‘/href=”\//’,

‘href=”‘.$protocol[0].$server_url.’/’,

$content);

$new_content

preg_replace(‘/src=”\//’,

‘src=”‘.$protocol[0].$server_url.’/’,

$new_content);

}

else

{

$new_content

$content;

}

return

$new_content;

}

//取得所有鏈接

function

get_all_url($code){

preg_match_all(‘/a\s+href=[”|\’]?([^”\’

]+)[”|\’]?\s*[^]*([^]+)\/a/i’,$code,$arr);

return

array(‘name’=$arr[2],’url’=$arr[1]);

}

//獲取指定標記中的內容

function

get_tag_data($str,

$start,

$end)

{

(

$start

”

$end

”

)

{

return;

}

$str

explode($start,

$str);

$str

explode($end,

$str[1]);

return

$str[0];

}

//HTML表格的每行轉為CSV格式數組

function

get_tr_array($table)

{

$table

preg_replace(”‘td[^]*?’si”,’”‘,$table);

$table

str_replace(”/td”,’”,’,$table);

$table

str_replace(”/tr”,”{tr}”,$table);

//去掉

HTML

標記

$table

preg_replace(”‘[\/\!]*?[^]*?’si”,””,$table);

//去掉空白字符

$table

preg_replace(”‘([\r\n])[\s]+’”,””,$table);

$table

str_replace(”

“,””,$table);

$table

str_replace(”

“,””,$table);

$table

explode(”,{tr}”,$table);

array_pop($table);

return

$table;

}

//將HTML表格的每行每列轉為數組，採集表格數據

function

get_td_array($table)

{

$table

preg_replace(”‘table[^]*?’si”,””,$table);

$table

preg_replace(”‘tr[^]*?’si”,””,$table);

$table

preg_replace(”‘td[^]*?’si”,””,$table);

$table

str_replace(”/tr”,”{tr}”,$table);

$table

str_replace(”/td”,”{td}”,$table);

//去掉

HTML

標記

$table

preg_replace(”‘[\/\!]*?[^]*?’si”,””,$table);

//去掉空白字符

$table

preg_replace(”‘([\r\n])[\s]+’”,””,$table);

$table

str_replace(”

“,””,$table);

$table

str_replace(”

“,””,$table);

$table

explode(‘{tr}’,

$table);

array_pop($table);

foreach

($table

$key=$tr)

{

$td

explode(‘{td}’,

$tr);

array_pop($td);

$td_array[]

$td;

}

return

$td_array;

}

//返回字符串中的所有單詞

$distinct=true

去除重複

function

split_en_str($str,$distinct=true)

{

preg_match_all(‘/([a-zA-Z]+)/’,$str,$match);

($distinct

true)

{

$match[1]

array_unique($match[1]);

}

sort($match[1]);

return

$match[1];

}

php怎麼抓取網站中meta函數get

參考如下

get_meta_tags — 從一個文件中提取所有的 meta 標籤 content 屬性，返回一個數組

描述

array get_meta_tags ( string filename [, int use_include_path])

打開 filename 逐行解析文件中的 meta 標籤。此參數可以是本地文件也可以是一個 URL。解析工作將在 /head 處停止。

將 use_include_path 設置為 1 將促使 PHP 嘗試按照 include_path 標準包含路徑中的每個指向去打開文件。這只用於本地文件，不適用於 URL。

下面實例分析了php中get_meta_tags()、CURL與user-agent用法。具體分析如下：

get_meta_tags()函數用於抓取網頁中meta name=”A” content=”1″meta name=”B” content=”2″形式的標籤,並裝入一維數組,name為元素下標,content為元素值,上例中的標籤可以獲得數組:array(‘A’=’1’, ‘b’=’2’),其他meta標籤不處理,並且此函數只處理到/head標籤時截止,之後的meta也不再繼續處理,不過head之前的meta還是會處理.

user-agent是瀏覽器在向服務器請求網頁時,提交的不可見的頭信息的一部分,頭信息是一個數組,包含多個信息,比如本地緩存目錄,cookies等,其中user-agent是瀏覽器類型申明,比如IE、Chrome、FF等.

今天在抓取一個網頁的meta標籤的時候,總是得到空值,但是直接查看網頁源代碼又是正常的,於是懷疑是否服務器設置了根據頭信息來判斷輸出,先嘗試使用get_meta_tags()來抓取一個本地的文件,然後這個本地文件將獲取的頭信息寫入文件,結果如下,其中替換成了/,方便查看,代碼如下:

代碼如下:

array (

‘HTTP_HOST’ = ‘192.168.30.205’,

‘PATH’ = ‘C:/Program Files/Common Files/NetSarang;C:/Program Files/NVIDIA Corporation/PhysX/Common;C:/Program Files/Common Files/Microsoft Shared/Windows Live;C:/Program Files/Intel/iCLS Client/;C:/Windows/system32;C:/Windows;C:/Windows/System32/Wbem;C:/Windows/System32/WindowsPowerShell/v1.0/;C:/Program Files/Intel/Intel(R) Management Engine Components/DAL;C:/Program Files/Intel/Intel(R) Management Engine Components/IPT;C:/Program Files/Intel/OpenCL SDK/2.0/bin/x86;C:/Program Files/Common Files/Thunder Network/KanKan/Codecs;C:/Program Files/QuickTime Alternative/QTSystem;C:/Program Files/Windows Live/Shared;C:/Program Files/QuickTime Alternative/QTSystem/; %JAVA_HOME%/bin;%JAVA_HOME%/jre/bin;’,

‘SystemRoot’ = ‘C:/Windows’,

‘COMSPEC’ = ‘C:/Windows/system32/cmd.exe’,

‘PATHEXT’ = ‘.COM;.EXE;.BAT;.CMD;.VBS;.VBE;.JS;.JSE;.WSF;.WSH;.MSC’,

‘WINDIR’ = ‘C:/Windows’,

‘SERVER_SIGNATURE’ = ”,

‘SERVER_SOFTWARE’ = ‘Apache/2.2.11 (Win32) PHP/5.2.8’,

‘SERVER_NAME’ = ‘192.168.30.205’,

‘SERVER_ADDR’ = ‘192.168.30.205’,

‘SERVER_PORT’ = ’80’,

‘REMOTE_ADDR’ = ‘192.168.30.205’,

‘DOCUMENT_ROOT’ = ‘E:/wamp/www’,

‘SERVER_ADMIN’ = ‘admin@admin.com’,

‘SCRIPT_FILENAME’ = ‘E:/wamp/www/user-agent.php’,

‘REMOTE_PORT’ = ‘59479’,

‘GATEWAY_INTERFACE’ = ‘CGI/1.1’,

‘SERVER_PROTOCOL’ = ‘HTTP/1.0’,

‘REQUEST_METHOD’ = ‘GET’,

‘QUERY_STRING’ = ”,

‘REQUEST_URI’ = ‘/user-agent.php’,

‘SCRIPT_NAME’ = ‘/user-agent.php’,

‘PHP_SELF’ = ‘/user-agent.php’,

‘REQUEST_TIME’ = 1400747529,

)

原創文章，作者：UTDJ，如若轉載，請註明出處：https://www.506064.com/zh-hant/n/139758.html

php採集函數,php 採集

本文目錄一覽：

php函數preg_match採集正則保存問題

怎樣用php 採集百度地圖的數據

PHP 採集程序中常用的函數

php怎麼抓取網站中meta函數get

相關推薦

發表回復