python連hbase,python網路連接

本文目錄一覽：

1、在ubuntu環境下怎麼利用python將數據批量導入數據hbase
2、如何在python中訪問hbase的數據
3、Python訪問hbase集群
4、python可以把爬蟲的數據寫入hbase么
5、如何在Python中訪問HBase的數據

在ubuntu環境下怎麼利用python將數據批量導入數據hbase

能夠單條導入就能夠批量導入

配置 thrift

python使用的包 thrift

個人使用的python 編譯器是pycharm community edition. 在工程中設置中，找到project interpreter，在相應的工程下，找到package，然後選擇「+」添加，搜索 hbase-thrift (Python client for HBase Thrift interface),然後安裝包。

安裝伺服器端thrift。

參考官網，同時也可以在本機上安裝以終端使用。

thrift Getting Started

也可以參考安裝方法 python 調用HBase 範例

首先，安裝thrift

下載thrift，這裡，我用的是thrift-0.7.0-dev.tar.gz 這個版本

tar xzf thrift-0.7.0-dev.tar.gz

cd thrift-0.7.0-dev

sudo ./configure –with-cpp=no –with-ruby=no

sudo make

sudo make install

然後，到HBase的源碼包里，找到

src/main/resources/org/apache/hadoop/hbase/thrift/

執行

thrift –gen py Hbase.thrift

mv gen-py/hbase/ /usr/lib/python2.4/site-packages/ (根據python版本可能有不同)

獲取數據示例 1

# coding:utf-8

from thrift import Thrift

from thrift.transport import TSocket

from thrift.transport import TTransport

from thrift.protocol import TBinaryProtocol

from hbase import Hbase

# from hbase.ttypes import ColumnDescriptor, Mutation, BatchMutation

from hbase.ttypes import *

import csv

def client_conn():

# Make socket

transport = TSocket.TSocket(‘hostname,like:localhost’, port)

# Buffering is critical. Raw sockets are very slow

transport = TTransport.TBufferedTransport(transport)

# Wrap in a protocol

protocol = TBinaryProtocol.TBinaryProtocol(transport)

# Create a client to use the protocol encoder

client = Hbase.Client(protocol)

# Connect!

transport.open()

return client

if __name__ == “__main__”:

client = client_conn()

# r = client.getRowWithColumns(‘table name’, ‘row name’, [‘column name’])

# print(r[0].columns.get(‘column name’)), type((r[0].columns.get(‘column name’)))

result = client.getRow(“table name”,”row name”)

data_simple =[]

# print result[0].columns.items()

for k, v in result[0].columns.items(): #.keys()

#data.append((k,v))

# print type(k),type(v),v.value,,v.timestamp

data_simple.append((v.timestamp, v.value))

writer.writerows(data)

csvfile.close()

csvfile_simple = open(“data_xy_simple.csv”, “wb”)

writer_simple = csv.writer(csvfile_simple)

writer_simple.writerow([“timestamp”, “value”])

writer_simple.writerows(data_simple)

csvfile_simple.close()

print “finished”

會基礎的python應該知道result是個list，result[0].columns.items()是一個dict 的鍵值對。可以查詢相關資料。或者通過輸出變數，觀察變數的值與類型。

說明：上面程序中 transport.open()進行鏈接，在執行完後，還需要斷開transport.close()

目前只涉及到讀數據，之後還會繼續更新其他dbase操作。

如何在python中訪問hbase的數據

python訪問hbase需要額外的庫，一般用thrift。使用thrift調用hbase，由於篇幅限制在這裡不能說的很詳細。

請百度Phthon thrift 或 python hbase 自行查閱相關資料。

下面是一個例子僅供參考

# coding:utf-8

from thrift import Thrift

from thrift.transport import TSocket

from thrift.transport import TTransport

from thrift.protocol import TBinaryProtocol

from hbase import Hbase

from hbase.ttypes import *

import csv

def client_conn():

transport = TSocket.TSocket(‘hostname,like:localhost’, port)

transport = TTransport.TBufferedTransport(transport)

protocol = TBinaryProtocol.TBinaryProtocol(transport)

client = Hbase.Client(protocol)

transport.open()

return client

if __name__ == “__main__”:

client = client_conn()

result = client.getRow(“table name”,”row name”)

data_simple =[]

for k, v in result[0].columns.items(): #.keys()

data_simple.append((v.timestamp, v.value))

writer.writerows(data)

csvfile.close()

csvfile_simple = open(“data_xy_simple.csv”, “wb”)

writer_simple = csv.writer(csvfile_simple)

writer_simple.writerow([“timestamp”, “value”])

writer_simple.writerows(data_simple)

csvfile_simple.close()

Python訪問hbase集群

HBase-thrift項目是對HBase Thrift介面的封裝，屏蔽底層的細節，使用戶可以方便地通過HBase Thrift介面訪問HBase集群，python通過thrift訪問HBase。

python可以把爬蟲的數據寫入hbase么

在已經安裝了HBase服務的伺服器中，已經自動安裝了HBase的Thrift的腳本，路徑為：/usr/lib/hbase/include/thrift

。

需要使用這個腳本生成基於Python語言的HBase的Thrift腳本，具體命令如下：

thrift

–gen

hbase2.thrift

命令執行成功後會生成名為gen-py的目錄，其中包含了python版本的HBase包。

主要文件介紹如下：

Hbase.py

中定義了一些HbaseClient可以使用的方法

ttypes.py中定義了HbaseClient傳輸的數據類型

將生成的HBase包放入項目代碼或者放入Python環境的依賴包目錄中即可調用。

如何在Python中訪問HBase的數據

python訪問hbase數據

#!/usr/bin/python

import getopt,sys,time

from thrift.transport.TSocket import TSocket

from thrift.transport.TTransport import TBufferedTransport

from thrift.protocol import TBinaryProtocol

from hbase import Hbase

def usage():

print ”’Usage :

-h: Show help information;

-l: Show all table in hbase;

-t {table} Show table descriptors;

-t {table} -k {key} : show cell;

-t {table} -k {key} -c {coulmn} : Show the coulmn;

-t {table} -k {key} -c {coulmn} -v {versions} : Show more version;

(write by liuhuorong@koudai.com)

”’

class geilihbase:

def __init__(self):

self.transport = TBufferedTransport(TSocket(“127.0.0.1”, “9090”))

self.transport.open()

self.protocol = TBinaryProtocol.TBinaryProtocol(self.transport)

self.client = Hbase.Client(self.protocol)

def __del__(self):

self.transport.close()

def glisttable(self):

for table in self.client.getTableNames():

print table

def ggetColumnDescriptors(self,table):

rarr=self.client.getColumnDescriptors(table)

if rarr:

for (k,v) in rarr.items():

print “%-20s\t%s” % (k,v)

def gget(self,table,key,coulmn):

rarr=self.client.get(table,key,coulmn)

if rarr:

print “%-15s %-20s\t%s” % (rarr[0].timestamp,time.strftime(“%Y-%m-%d %H:%M:%S”,time.localtime(rarr[0].timestamp/1000)),rarr[0].value)

def ggetrow(self,table,key):

rarr=self.client.getRow(table, key)

if rarr:

for (k,v) in rarr[0].columns.items():

print “%-20s\t%-15s %-20s\t%s” % (k,v.timestamp,time.strftime(“%Y-%m-%d %H:%M:%S”,time.localtime(v.timestamp/1000)),v.value)

def ggetver(self, table, key, coulmn, versions):

rarr=self.client.getVer(table,key,coulmn, versions);

if rarr:

for row in rarr:

print “%-15s %-20s\t%s” % (row.timestamp,time.strftime(“%Y-%m-%d %H:%M:%S”,time.localtime(row.timestamp/1000)),row.value)

def main(argv):

tablename=””

key=””

coulmn=””

versions=””

try:

opts, args = getopt.getopt(argv, “lht:k:c:v:”, [“help”,”list”])

except getopt.GetoptError:

usage()

sys.exit(2)

for opt, arg in opts:

if opt in (“-h”, “–help”):

usage()

sys.exit(0)

elif opt in (“-l”, “–list”):

ghbase=geilihbase()

ghbase.glisttable()

sys.exit(0)

elif opt == ‘-t’:

tablename = arg

elif opt == ‘-k’:

key = arg

elif opt == ‘-c’:

coulmn = arg

elif opt == ‘-v’:

versions = int(arg)

if ( tablename and key and coulmn and versions ):

ghbase=geilihbase()

ghbase.ggetver(tablename, key, coulmn, versions)

sys.exit(0)

if (tablename and key and coulmn ):

ghbase=geilihbase()

ghbase.gget(tablename, key, coulmn)

sys.exit(0)

if (tablename and key ):

ghbase=geilihbase()

ghbase.ggetrow(tablename, key)

sys.exit(0)

if (tablename ):

ghbase=geilihbase()

ghbase.ggetColumnDescriptors(tablename)

sys.exit(0)

usage()

sys.exit(1)

if __name__ == “__main__”:

main(sys.argv[1:])

原創文章，作者：小藍，如若轉載，請註明出處：https://www.506064.com/zh-tw/n/301343.html

python連hbase,python網路連接

本文目錄一覽：

在ubuntu環境下怎麼利用python將數據批量導入數據hbase

如何在python中訪問hbase的數據

Python訪問hbase集群

python可以把爬蟲的數據寫入hbase么

如何在Python中訪問HBase的數據

相關推薦

發表回復