本文目錄一覽:
- 1、Jsoup發送http請求,get和post兩種方式,分別帶參數和不帶參數
- 2、求真正有效的可以模擬登錄新浪微博的java代碼,後續可以用Jsoup進行抓取。急急!!登錄成功馬上給分!
- 3、java的Jsoup登錄有驗證碼網頁獲取登錄後的cookie
- 4、Jsoup或者HttpClient抓取web頁面時,data,userAgent,cookie(),timeout(),post();為什麼要設置這些?
Jsoup發送http請求,get和post兩種方式,分別帶參數和不帶參數
dependency
groupIdorg.jsoup/groupId
artifactIdjsoup/artifactId
version1.10.3/version
/dependency
public void JsoupGet() throws Exception{
Connection connect = Jsoup.connect(“;password=lisi”);
// 帶參數開始
connect.data(“username”,”zhangsan”);
connect.data(“password”,”lisi”);
// 帶參數結束
Document document = connect.get();
System.out.println(document.toString());
}
public void JsoupPost() throwsException{
Connection connect = Jsoup.connect(“;password=lisi”);
// 帶參數開始
connect.data(“username”,”zhangsan”);
connect.data(“password”,”lisi”);
// 帶參數結束
Document document = connect.post();
System.out.println(document.toString());
}
求真正有效的可以模擬登錄新浪微博的java代碼,後續可以用Jsoup進行抓取。急急!!登錄成功馬上給分!
package jsoupTest;
import java.io.IOException;
import java.util.ArrayList;
import java.util.HashMap;
import java.util.List;
import java.util.Map;
import org.jsoup.Connection.Method;
import org.jsoup.Connection.Response;
import org.jsoup.Jsoup;
public class JsoupTest {
public static void main(String[] args) throws IOException {
MapString, String map = new HashMap();
//map.put請根據自己的微博cookie得到
Response res = Jsoup.connect(“別人的主頁id”)
.cookies(map).method(Method.GET).execute();
String s = res.body();
System.out.println(s);
String[] ss = s.split(“scriptFM.view”);
int i = 0;
// pl_content_homeFeed
// pl.content.homeFeed.index
ListString list = new ArrayList();
for (String x : ss) {
// System.out.println(i++ + “======================================”);
// System.out.println(x.substring(0,
// x.length() 200 ? 200 : x.length()));
// System.out.println(“===========================================”);
if (x.contains(“\”html\”:\””)) {
String value = getHtml(x);
list.add(value);
System.out.println(value);
}
}
// content=ss[8].split(“\”html\”:\””)[1].replaceAll(“(\\\\t|\\\\n)”,
// “”).replaceAll(“\\\\\””, “\””).replaceAll(“\\\\/”, “/”);
// content=content.substring(0,
// content.length()=13?content.length():content.length()-13);
// System.out.println(Native2AsciiUtils.ascii2Native(content));
}
public static String getHtml(String s) {
String content = s.split(“\”html\”:\””)[1]
.replaceAll(“(\\\\t|\\\\n)”, “”).replaceAll(“\\\\\””, “\””)
.replaceAll(“\\\\/”, “/”);
content = content.substring(0,
content.length() = 13 ? content.length()
: content.length() – 13);
return Native2AsciiUtils.ascii2Native(content);
}
java的Jsoup登錄有驗證碼網頁獲取登錄後的cookie
首先是jar倉庫:
dependency
groupIdorg.seleniumhq.selenium/groupId
artifactIdselenium-java/artifactId
version[3.0.1,)/version//獲取最新的版本庫
/dependency
dependency
groupIdorg.jsoup/groupId
artifactIdjsoup/artifactId
version1.8.2/version
typejar/type
/dependency
代碼:
public static void getIndex2() {
//之前運行程序發現生成了N多個chrome driver進程,搞不懂為什麼會有那麼多進程產生,網上查了下,說起這個service有用,拿來試下,效果未知
ChromeDriverService service = new
ChromeDriverService.Builder().usingDriverExecutable(new
File(“./driver/chromedriver.exe”)).usingAnyFreePort().build();
try {
service.start();
} catch (IOException ex) {
Logger.getLogger(kechengbiaoIndex.class.getName()).log(Level.SEVERE, null, ex);
}
//end
//正式開始
//先定義瀏覽器驅動,我用chrome瀏覽器,網上下載一個chromedriver.exe,啟動時需要載入
System.getProperties().setProperty(“webdriver.chrome.driver”, “./driver/chromedriver.exe”);
Jsoup或者HttpClient抓取web頁面時,data,userAgent,cookie(),timeout(),post();為什麼要設置這些?
userAgent讓伺服器感覺訪問者更像一個真實的瀏覽器在訪問,cookie是看伺服器需不需要,timeout還需要解釋嗎,你不設置有一個默認的超時時間
原創文章,作者:NIIU,如若轉載,請註明出處:https://www.506064.com/zh-tw/n/140614.html