python中gdelt库用法(python gdal教程)

本文目录一览:

怎么在Apache Spark中使用IPython Notebook

IPython Configuration

This installation workflow loosely follows the one contributed by Fernando Perez here. This should be performed on the machine where the IPython Notebook will be executed, typically one of the Hadoop nodes.

First create an IPython profile for use with PySpark.

1

ipython profile create pyspark

This should have created the profile directory ~/.ipython/profile_pyspark/. Edit the file~/.ipython/profile_pyspark/ipython_notebook_config.py to have:

1

2

3

4

5

c = get_config()

c.NotebookApp.ip = ‘*’

c.NotebookApp.open_browser = False

c.NotebookApp.port = 8880 # or whatever you want; be aware of conflicts with CDH

If you want a password prompt as well, first generate a password for the notebook app:

1

python -c ‘from IPython.lib import passwd; print passwd()’ ~/.ipython/profile_pyspark/nbpasswd.txt

and set the following in the same …/ipython_notebook_config.py file you just edited:

1

2

PWDFILE=’~/.ipython/profile_pyspark/nbpasswd.txt’

c.NotebookApp.password = open(PWDFILE).read().strip()

Finally, create the file ~/.ipython/profile_pyspark/startup/00-pyspark-setup.py with the following contents:

1

2

3

4

5

6

7

8

9

import os

import sys

spark_home = os.environ.get(‘SPARK_HOME’, None)

if not spark_home:

raise ValueError(‘SPARK_HOME environment variable is not set’)

sys.path.insert(0, os.path.join(spark_home, ‘python’))

sys.path.insert(0, os.path.join(spark_home, ‘python/lib/py4j-0.8.1-src.zip’))

execfile(os.path.join(spark_home, ‘python/pyspark/shell.py’))

Starting IPython Notebook with PySpark

IPython Notebook should be run on a machine from which PySpark would be run on, typically one of the Hadoop nodes.

First, make sure the following environment variables are set:

1

2

3

4

5

# for the CDH-installed Spark

export SPARK_HOME=’/opt/cloudera/parcels/CDH/lib/spark’

# this is where you specify all the options you would normally add after bin/pyspark

export PYSPARK_SUBMIT_ARGS=’–master yarn –deploy-mode client –num-executors 24 –executor-memory 10g –executor-cores 5′

Note that you must set whatever other environment variables you want to get Spark running the way you desire. For example, the settings above are consistent with running the CDH-installed Spark in YARN-client mode. If you wanted to run your own custom Spark, you could build it, put the JAR on HDFS, set theSPARK_JAR environment variable, along with any other necessary parameters. For example, see here for running a custom Spark on YARN.

Finally, decide from what directory to run the IPython Notebook. This directory will contain the .ipynb files that represent the different notebooks that can be served. See the IPython docs for more information. From this directory, execute:

1

ipython notebook –profile=pyspark

Note that if you just want to serve the notebooks without initializing Spark, you can start IPython Notebook using a profile that does not execute the shell.py script in the startup file.

Example Session

At this point, the IPython Notebook server should be running. Point your browser to , which should open up the main access point to the available notebooks. This should look something like this:

This will show the list of possible .ipynb files to serve. If it is empty (because this is the first time you’re running it) you can create a new notebook, which will also create a new .ipynb file. As an example, here is a screenshot from a session that uses PySpark to analyze the GDELT event data set:

The full .ipynb file can be obtained as a GitHub gist.

原创文章,作者:EPMWJ,如若转载,请注明出处:https://www.506064.com/n/129936.html

(0)
打赏 微信扫一扫 微信扫一扫 支付宝扫一扫 支付宝扫一扫
EPMWJEPMWJ
上一篇 2024-10-03 23:27
下一篇 2024-10-03 23:27

相关推荐

  • Python计算阳历日期对应周几

    本文介绍如何通过Python计算任意阳历日期对应周几。 一、获取日期 获取日期可以通过Python内置的模块datetime实现,示例代码如下: from datetime imp…

    编程 2025-04-29
  • Python中引入上一级目录中函数

    Python中经常需要调用其他文件夹中的模块或函数,其中一个常见的操作是引入上一级目录中的函数。在此,我们将从多个角度详细解释如何在Python中引入上一级目录的函数。 一、加入环…

    编程 2025-04-29
  • 如何查看Anaconda中Python路径

    对Anaconda中Python路径即conda环境的查看进行详细的阐述。 一、使用命令行查看 1、在Windows系统中,可以使用命令提示符(cmd)或者Anaconda Pro…

    编程 2025-04-29
  • Python列表中负数的个数

    Python列表是一个有序的集合,可以存储多个不同类型的元素。而负数是指小于0的整数。在Python列表中,我们想要找到负数的个数,可以通过以下几个方面进行实现。 一、使用循环遍历…

    编程 2025-04-29
  • Python周杰伦代码用法介绍

    本文将从多个方面对Python周杰伦代码进行详细的阐述。 一、代码介绍 from urllib.request import urlopen from bs4 import Bea…

    编程 2025-04-29
  • 蝴蝶优化算法Python版

    蝴蝶优化算法是一种基于仿生学的优化算法,模仿自然界中的蝴蝶进行搜索。它可以应用于多个领域的优化问题,包括数学优化、工程问题、机器学习等。本文将从多个方面对蝴蝶优化算法Python版…

    编程 2025-04-29
  • Python字典去重复工具

    使用Python语言编写字典去重复工具,可帮助用户快速去重复。 一、字典去重复工具的需求 在使用Python编写程序时,我们经常需要处理数据文件,其中包含了大量的重复数据。为了方便…

    编程 2025-04-29
  • python强行终止程序快捷键

    本文将从多个方面对python强行终止程序快捷键进行详细阐述,并提供相应代码示例。 一、Ctrl+C快捷键 Ctrl+C快捷键是在终端中经常用来强行终止运行的程序。当你在终端中运行…

    编程 2025-04-29
  • Python清华镜像下载

    Python清华镜像是一个高质量的Python开发资源镜像站,提供了Python及其相关的开发工具、框架和文档的下载服务。本文将从以下几个方面对Python清华镜像下载进行详细的阐…

    编程 2025-04-29
  • Python程序需要编译才能执行

    Python 被广泛应用于数据分析、人工智能、科学计算等领域,它的灵活性和简单易学的性质使得越来越多的人喜欢使用 Python 进行编程。然而,在 Python 中程序执行的方式不…

    编程 2025-04-29

发表回复

登录后才能评论