CDH6.3.2安装python3

news/2024/5/2 16:10:29/文章来源:https://blog.csdn.net/m0_48830183/article/details/127012354

背景:需要使用pyspark或者python去自动读取远程文件,但是CDH集群里面自带着python2.7.5,python3.0是以后的趋势,所以决定自己安装python3。以下的安装步骤是参照网上的步骤,实操是自己亲自操作的。

1.1 系统版本信息

[root@cdh06 soft]# lsb_release -a
LSB Version:	:core-4.1-amd64:core-4.1-noarch
Distributor ID:	CentOS
Description:	CentOS Linux release 7.6.1810 (Core) 
Release:	7.6.1810
Codename:	Core

2.1 spark和python 信息
环境是基于CDH平台配置,spark只有一个版本,系统里面查看是2.4.0,而python的版本系统自带的2.7.5。

[root@cdh06 soft]# pyspark
Python 2.7.5 (default, Jun 28 2022, 15:30:04) 
[GCC 4.8.5 20150623 (Red Hat 4.8.5-44)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
Welcome to____              __/ __/__  ___ _____/ /___\ \/ _ \/ _ `/ __/  '_//__ / .__/\_,_/_/ /_/\_\   version 2.4.0-cdh6.3.2/_/Using Python version 2.7.5 (default, Jun 28 2022 15:30:04)
SparkSession available as 'spark'.
>>>
  1. 安装python 3.6环境
    目前pyspark支持到python3.6,所以本次就安装python3.6的版本。
    操作需要在Master 和slave节点都需要操作

2.1 安装 yum-utils
是yum的一个扩展插件
当然前提电脑之前已安装了yum
sudo yum -y install yum-utils

[root@cdh06 soft]# sudo yum -y install yum-utils
已加载插件:fastestmirror, langpacks
Repository cloudera-manager is listed more than once in the configuration
Determining fastest mirrors* base: mirrors.cn99.com* extras: ftp.sjtu.edu.cn* updates: mirrors.aliyun.com
base                                                                                                                                            | 3.6 kB  00:00:00     
cloudera-manager                                                                                                                                | 2.9 kB  00:00:00     
extras                                                                                                                                          | 2.9 kB  00:00:00     
updates                                                                                                                                         | 2.9 kB  00:00:00     
(1/2): extras/7/x86_64/primary_db                                                                                                               | 250 kB  00:00:00     
(2/2): updates/7/x86_64/primary_db                                                                                                              |  17 MB  00:00:01     
正在解决依赖关系
--> 正在检查事务
---> 软件包 yum-utils.noarch.0.1.1.31-50.el7 将被 升级
---> 软件包 yum-utils.noarch.0.1.1.31-54.el7_8 将被 更新
--> 解决依赖关系完成依赖关系解决=======================================================================================================================================================================Package                                 架构                                 版本                                            源                                  大小
=======================================================================================================================================================================
正在更新:yum-utils                               noarch                               1.1.31-54.el7_8                                 base                               122 k事务概要
=======================================================================================================================================================================
升级  1 软件包总下载量:122 k
Downloading packages:
No Presto metadata available for base
yum-utils-1.1.31-54.el7_8.noarch.rpm                                                                                                            | 122 kB  00:00:00     
Running transaction check
Running transaction test
Transaction test succeeded
Running transaction正在更新    : yum-utils-1.1.31-54.el7_8.noarch                                                                                                                   1/2 清理        : yum-utils-1.1.31-50.el7.noarch                                                                                                                     2/2 验证中      : yum-utils-1.1.31-54.el7_8.noarch                                                                                                                   1/2 验证中      : yum-utils-1.1.31-50.el7.noarch                                                                                                                     2/2 更新完毕:yum-utils.noarch 0:1.1.31-54.el7_8                                                                                                                                   完毕!

2.2 安装centos的开发工具
这个工具是用来编译代码的作用

sudo yum -y groupinstall development

[root@cdh06 soft]# sudo yum -y groupinstall development
已加载插件:fastestmirror, langpacks
Repository cloudera-manager is listed more than once in the configuration
没有安装组信息文件
Maybe run: yum groups mark convert (see man yum)
Loading mirror speeds from cached hostfile* base: mirrors.cn99.com* extras: ftp.sjtu.edu.cn* updates: mirrors.aliyun.com
正在解决依赖关系
--> 正在检查事务
---> 软件包 autoconf.noarch.0.2.69-11.el7 将被 安装
--> 正在处理依赖关系 perl(Data::Dumper),它被软件包 autoconf-2.69-11.el7.noarch 需要
---> 软件包 automake.noarch.0.1.13.4-3.el7 将被 安装
--> 正在处理依赖关系 perl(Thread::Queue),它被软件包 automake-1.13.4-3.el7.noarch 需要
--> 正在处理依赖关系 perl(TAP::Parser),它被软件包 automake-1.13.4-3.el7.noarch 需要
---> 软件包 bison.x86_64.0.3.0.4-2.el7 将被 安装
---> 软件包 byacc.x86_64.0.1.9.20130304-3.el7 将被 安装
---> 软件包 cscope.x86_64.0.15.8-10.el7 将被 安装
---> 软件包 ctags.x86_64.0.5.8-13.el7 将被 安装
---> 软件包 diffstat.x86_64.0.1.57-4.el7 将被 安装
---> 软件包 doxygen.x86_64.1.1.8.5-4.el7 将被 安装
---> 软件包 flex.x86_64.0.2.5.37-6.el7 将被 安装
---> 软件包 gcc.x86_64.0.4.8.5-44.el7 将被 安装
--> 正在处理依赖关系 libgomp = 4.8.5-44.el7,它被软件包 gcc-4.8.5-44.el7.x86_64 需要
--> 正在处理依赖关系 cpp = 4.8.5-44.el7,它被软件包 gcc-4.8.5-44.el7.x86_64 需要
--> 正在处理依赖关系 libgcc >= 4.8.5-44.el7,它被软件包 gcc-4.8.5-44.el7.x86_64 需要
--> 正在处理依赖关系 glibc-devel >= 2.2.90-12,它被软件包 gcc-4.8.5-44.el7.x86_64 需要
---> 软件包 gcc-c++.x86_64.0.4.8.5-44.el7 将被 安装
--> 正在处理依赖关系 libstdc++-devel = 4.8.5-44.el7,它被软件包 gcc-c++-4.8.5-44.el7.x86_64 需要
--> 正在处理依赖关系 libstdc++ = 4.8.5-44.el7,它被软件包 gcc-c++-4.8.5-44.el7.x86_64 需要
---> 软件包 gcc-gfortran.x86_64.0.4.8.5-44.el7 将被 安装
--> 正在处理依赖关系 libquadmath-devel = 4.8.5-44.el7,它被软件包 gcc-gfortran-4.8.5-44.el7.x86_64 需要
--> 正在处理依赖关系 libquadmath = 4.8.5-44.el7,它被软件包 gcc-gfortran-4.8.5-44.el7.x86_64 需要
--> 正在处理依赖关系 libgfortran = 4.8.5-44.el7,它被软件包 gcc-gfortran-4.8.5-44.el7.x86_64 需要
--> 正在处理依赖关系 libgfortran.so.3()(64bit),它被软件包 gcc-gfortran-4.8.5-44.el7.x86_64 需要
---> 软件包 git.x86_64.0.1.8.3.1-23.el7_8 将被 安装
--> 正在处理依赖关系 perl-Git = 1.8.3.1-23.el7_8,它被软件包 git-1.8.3.1-23.el7_8.x86_64 需要
--> 正在处理依赖关系 perl(Term::ReadKey),它被软件包 git-1.8.3.1-23.el7_8.x86_64 需要
--> 正在处理依赖关系 perl(Git),它被软件包 git-1.8.3.1-23.el7_8.x86_64 需要
--> 正在处理依赖关系 perl(Error),它被软件包 git-1.8.3.1-23.el7_8.x86_64 需要
---> 软件包 indent.x86_64.0.2.2.11-13.el7 将被 安装
---> 软件包 intltool.noarch.0.0.50.2-7.el7 将被 安装
--> 正在处理依赖关系 perl(XML::Parser),它被软件包 intltool-0.50.2-7.el7.noarch 需要
--> 正在处理依赖关系 gettext-devel,它被软件包 intltool-0.50.2-7.el7.noarch 需要
---> 软件包 libtool.x86_64.0.2.4.2-22.el7_3 将被 安装
........
//总共24个包需要下载。10个包需要更新................已安装:autoconf.noarch 0:2.69-11.el7         automake.noarch 0:1.13.4-3.el7        bison.x86_64 0:3.0.4-2.el7                         byacc.x86_64 0:1.9.20130304-3.el7     cscope.x86_64 0:15.8-10.el7           ctags.x86_64 0:5.8-13.el7             diffstat.x86_64 0:1.57-4.el7                       doxygen.x86_64 1:1.8.5-4.el7          flex.x86_64 0:2.5.37-6.el7            gcc.x86_64 0:4.8.5-44.el7             gcc-c++.x86_64 0:4.8.5-44.el7                      gcc-gfortran.x86_64 0:4.8.5-44.el7    git.x86_64 0:1.8.3.1-23.el7_8         indent.x86_64 0:2.2.11-13.el7         intltool.noarch 0:0.50.2-7.el7                     libtool.x86_64 0:2.4.2-22.el7_3       patchutils.x86_64 0:0.3.3-5.el7_9     rcs.x86_64 0:5.9.0-7.el7              redhat-rpm-config.noarch 0:9.1.0-88.el7.centos     rpm-build.x86_64 0:4.11.3-48.el7_9    rpm-sign.x86_64 0:4.11.3-48.el7_9     subversion.x86_64 0:1.7.14-16.el7     swig.x86_64 0:2.0.10-5.el7                         systemtap.x86_64 0:4.0-13.el7         作为依赖被安装:cpp.x86_64 0:4.8.5-44.el7                                 dwz.x86_64 0:0.11-3.el7                               gettext-common-devel.noarch 0:0.19.8.1-3.el7         gettext-devel.x86_64 0:0.19.8.1-3.el7                     glibc-devel.x86_64 0:2.17-326.el7_9                   glibc-headers.x86_64 0:2.17-326.el7_9                kernel-debug-devel.x86_64 0:3.10.0-1160.76.1.el7          kernel-headers.x86_64 0:3.10.0-1160.76.1.el7          libgfortran.x86_64 0:4.8.5-44.el7                    libquadmath.x86_64 0:4.8.5-44.el7                         libquadmath-devel.x86_64 0:4.8.5-44.el7               libstdc++-devel.x86_64 0:4.8.5-44.el7                perl-Data-Dumper.x86_64 0:2.145-3.el7                     perl-Error.noarch 1:0.17020-2.el7                     perl-Git.noarch 0:1.8.3.1-23.el7_8                   perl-TermReadKey.x86_64 0:2.30-20.el7                     perl-Test-Harness.noarch 0:3.28-3.el7                 perl-Thread-Queue.noarch 0:3.02-2.el7                perl-XML-Parser.x86_64 0:2.41-10.el7                      perl-srpm-macros.noarch 0:1-8.el7                     subversion-libs.x86_64 0:1.7.14-16.el7               systemtap-client.x86_64 0:4.0-13.el7                      systemtap-devel.x86_64 0:4.0-13.el7                  作为依赖被升级:gettext.x86_64 0:0.19.8.1-3.el7         gettext-libs.x86_64 0:0.19.8.1-3.el7      libgcc.x86_64 0:4.8.5-44.el7                libgomp.x86_64 0:4.8.5-44.el7        libstdc++.x86_64 0:4.8.5-44.el7         rpm.x86_64 0:4.11.3-48.el7_9              rpm-build-libs.x86_64 0:4.11.3-48.el7_9     rpm-libs.x86_64 0:4.11.3-48.el7_9    rpm-python.x86_64 0:4.11.3-48.el7_9     systemtap-runtime.x86_64 0:4.0-13.el7    完毕!

2.3 安装iUS第三方包
安装这个包是为了通过yum安装软件是,可以获得最新软件版本
sudo yum -y install https://repo.ius.io/ius-release-el7.rpm

[root@cdh06 soft]# sudo yum -y install https://repo.ius.io/ius-release-el7.rpm
已加载插件:fastestmirror, langpacks
Repository cloudera-manager is listed more than once in the configuration
ius-release-el7.rpm                                                                                                                             | 8.2 kB  00:00:00     
正在检查 /var/tmp/yum-root-udneDv/ius-release-el7.rpm: ius-release-2-1.el7.ius.noarch
/var/tmp/yum-root-udneDv/ius-release-el7.rpm 将被安装
正在解决依赖关系
--> 正在检查事务
---> 软件包 ius-release.noarch.0.2-1.el7.ius 将被 安装
--> 正在处理依赖关系 epel-release = 7,它被软件包 ius-release-2-1.el7.ius.noarch 需要
Loading mirror speeds from cached hostfile* base: mirrors.cn99.com* extras: ftp.sjtu.edu.cn* updates: mirrors.aliyun.com
--> 正在检查事务
---> 软件包 epel-release.noarch.0.7-11 将被 安装
--> 解决依赖关系完成依赖关系解决=======================================================================================================================================================================Package                                 架构                              版本                                      源                                           大小
=======================================================================================================================================================================
正在安装:ius-release                             noarch                            2-1.el7.ius                               /ius-release-el7                            4.5 k
为依赖而安装:epel-release                            noarch                            7-11                                      extras                                       15 k事务概要
=======================================================================================================================================================================
安装  1 软件包 (+1 依赖软件包)总计:19 k
总下载量:15 k
安装大小:29 k
Downloading packages:
epel-release-7-11.noarch.rpm                                                                                                                    |  15 kB  00:00:00     
Running transaction check
Running transaction test
Transaction test succeeded
Running transaction正在安装    : epel-release-7-11.noarch                                                                                                                           1/2 正在安装    : ius-release-2-1.el7.ius.noarch                                                                                                                     2/2 验证中      : ius-release-2-1.el7.ius.noarch                                                                                                                     1/2 验证中      : epel-release-7-11.noarch                                                                                                                           2/2 已安装:ius-release.noarch 0:2-1.el7.ius                                                                                                                                     作为依赖被安装:epel-release.noarch 0:7-11                                                                                                                                           完毕!

2.4 安装python

1.安装python
sudo yum -y install python36u

[root@cdh06 soft]# sudo yum -y install python36u
已加载插件:fastestmirror, langpacks
Repository cloudera-manager is listed more than once in the configuration
Loading mirror speeds from cached hostfile
epel/x86_64/metalink                                                                                                                            | 6.1 kB  00:00:00     * base: mirrors.cn99.com* epel: ftp.riken.jp* extras: ftp.sjtu.edu.cn* updates: mirrors.aliyun.com
epel                                                                                                                                            | 4.7 kB  00:00:00     
ius                                                                                                                                             | 1.3 kB  00:00:00     
epel/x86_64/primary_db         FAILED                                          
https://mirror.misakamikoto.network/fedora-epel/7/x86_64/repodata/7e09d0257e4d6d597cc84629bac5836c3789baf6aff6a46a7e0e6f1404a260b6-primary.sqlite.bz2: [Errno 14] HTTPS Error 404 - Not Found
正在尝试其它镜像。
To address this issue please refer to the below wiki article https://wiki.centos.org/yum-errorsIf above article doesn't help to resolve this issue please use https://bugs.centos.org/.(1/4): epel/x86_64/group_gz                                                                                                                     |  97 kB  00:00:00     
(2/4): ius/x86_64/primary                                                                                                                       |  55 kB  00:00:00     
(3/4): epel/x86_64/updateinfo                                                                                                                   | 1.1 MB  00:00:01     
(4/4): epel/x86_64/primary_db                                                                                                                   | 7.0 MB  00:00:03     
ius                                                                                                                                                            217/217
软件包 python36 已经被 python3 取代,改为尝试安装 python3-3.6.8-18.el7.x86_64
正在解决依赖关系
--> 正在检查事务
---> 软件包 python3.x86_64.0.3.6.8-18.el7 将被 安装
--> 正在处理依赖关系 python3-libs(x86-64) = 3.6.8-18.el7,它被软件包 python3-3.6.8-18.el7.x86_64 需要
--> 正在处理依赖关系 python3-setuptools,它被软件包 python3-3.6.8-18.el7.x86_64 需要
--> 正在处理依赖关系 python3-pip,它被软件包 python3-3.6.8-18.el7.x86_64 需要
--> 正在处理依赖关系 libpython3.6m.so.1.0()(64bit),它被软件包 python3-3.6.8-18.el7.x86_64 需要
--> 正在检查事务
---> 软件包 python3-libs.x86_64.0.3.6.8-18.el7 将被 安装
---> 软件包 python3-pip.noarch.0.9.0.3-8.el7 将被 安装
---> 软件包 python3-setuptools.noarch.0.39.2.0-10.el7 将被 安装
--> 解决依赖关系完成依赖关系解决=======================================================================================================================================================================Package                                        架构                               版本                                      源                                   大小
=======================================================================================================================================================================
正在安装:python3                                        x86_64                             3.6.8-18.el7                              updates                              70 k
为依赖而安装:python3-libs                                   x86_64                             3.6.8-18.el7                              updates                             6.9 Mpython3-pip                                    noarch                             9.0.3-8.el7                               base                                1.6 Mpython3-setuptools                             noarch                             39.2.0-10.el7                             base                                629 k事务概要
=======================================================================================================================================================================
安装  1 软件包 (+3 依赖软件包)总下载量:9.3 M
安装大小:47 M
Downloading packages:
(1/4): python3-3.6.8-18.el7.x86_64.rpm                                                                                                          |  70 kB  00:00:00     
(2/4): python3-pip-9.0.3-8.el7.noarch.rpm                                                                                                       | 1.6 MB  00:00:00     
(3/4): python3-setuptools-39.2.0-10.el7.noarch.rpm                                                                                              | 629 kB  00:00:00     
(4/4): python3-libs-3.6.8-18.el7.x86_64.rpm                                                                                                     | 6.9 MB  00:00:02     
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------
总计                                                                                                                                   3.4 MB/s | 9.3 MB  00:00:02     
Running transaction check
Running transaction test
Transaction test succeeded
Running transaction正在安装    : python3-libs-3.6.8-18.el7.x86_64                                                                                                                   1/4 正在安装    : python3-3.6.8-18.el7.x86_64                                                                                                                        2/4 正在安装    : python3-setuptools-39.2.0-10.el7.noarch                                                                                                            3/4 正在安装    : python3-pip-9.0.3-8.el7.noarch                                                                                                                     4/4 验证中      : python3-setuptools-39.2.0-10.el7.noarch                                                                                                            1/4 验证中      : python3-libs-3.6.8-18.el7.x86_64                                                                                                                   2/4 验证中      : python3-3.6.8-18.el7.x86_64                                                                                                                        3/4 验证中      : python3-pip-9.0.3-8.el7.noarch                                                                                                                     4/4 已安装:python3.x86_64 0:3.6.8-18.el7                                                                                                                                        作为依赖被安装:python3-libs.x86_64 0:3.6.8-18.el7                   python3-pip.noarch 0:9.0.3-8.el7                   python3-setuptools.noarch 0:39.2.0-10.el7                  完毕!

2.查看python的版本

[root@cdh06 soft]# python3.6 -V
Python 3.6.8

3.接下来安装 python36u-devel,目的是为了IUS提供python3的类库和头文件。
sudo yum -y install python36u-devel

[root@cdh06 soft]# sudo yum -y install python36u-devel
已加载插件:fastestmirror, langpacks
Repository cloudera-manager is listed more than once in the configuration
Loading mirror speeds from cached hostfile* base: mirrors.cn99.com* epel: ftp.riken.jp* extras: ftp.sjtu.edu.cn* updates: mirrors.aliyun.com
软件包 python36-devel 已经被 python3-devel 取代,改为尝试安装 python3-devel-3.6.8-18.el7.x86_64
正在解决依赖关系
--> 正在检查事务
---> 软件包 python3-devel.x86_64.0.3.6.8-18.el7 将被 安装
--> 正在处理依赖关系 python3-rpm-macros,它被软件包 python3-devel-3.6.8-18.el7.x86_64 需要
--> 正在处理依赖关系 python3-rpm-generators,它被软件包 python3-devel-3.6.8-18.el7.x86_64 需要
--> 正在检查事务
---> 软件包 python3-rpm-generators.noarch.0.6-2.el7 将被 安装
---> 软件包 python3-rpm-macros.noarch.0.3-34.el7 将被 安装
--> 解决依赖关系完成依赖关系解决=======================================================================================================================================================================Package                                           架构                              版本                                     源                                  大小
=======================================================================================================================================================================
正在安装:python3-devel                                     x86_64                            3.6.8-18.el7                             updates                            217 k
为依赖而安装:python3-rpm-generators                            noarch                            6-2.el7                                  base                                20 kpython3-rpm-macros                                noarch                            3-34.el7                                 base                               8.1 k事务概要
=======================================================================================================================================================================
安装  1 软件包 (+2 依赖软件包)总下载量:244 k
安装大小:678 k
Downloading packages:
(1/3): python3-rpm-macros-3-34.el7.noarch.rpm                                                                                                   | 8.1 kB  00:00:00     
(2/3): python3-devel-3.6.8-18.el7.x86_64.rpm                                                                                                    | 217 kB  00:00:00     
(3/3): python3-rpm-generators-6-2.el7.noarch.rpm                                                                                                |  20 kB  00:00:00     
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------
总计                                                                                                                                   1.1 MB/s | 244 kB  00:00:00     
Running transaction check
Running transaction test
Transaction test succeeded
Running transaction正在安装    : python3-rpm-generators-6-2.el7.noarch                                                                                                              1/3 正在安装    : python3-rpm-macros-3-34.el7.noarch                                                                                                                 2/3 正在安装    : python3-devel-3.6.8-18.el7.x86_64                                                                                                                  3/3 验证中      : python3-devel-3.6.8-18.el7.x86_64                                                                                                                  1/3 验证中      : python3-rpm-macros-3-34.el7.noarch                                                                                                                 2/3 验证中      : python3-rpm-generators-6-2.el7.noarch                                                                                                              3/3 已安装:python3-devel.x86_64 0:3.6.8-18.el7                                                                                                                                  作为依赖被安装:python3-rpm-generators.noarch 0:6-2.el7                                             python3-rpm-macros.noarch 0:3-34.el7                                            完毕!

2.5 配置python的环境(重点!)

  1. 虚拟环境(推荐)
    使用venv方法
[root@cdh06 soft]# python3.6 -m venv py3
[root@cdh06 soft]# source py3/bin/activate
(py3) [root@cdh06 soft]# python -V
Python 3.6.8
(py3) [root@cdh06 soft]# deactivate
[root@cdh06 soft]# 

修改环境变量:

[root@cdh06 soft]# vim /etc/profile
## 添加以下内容
export PYSPARK_PYTHON=python3
[root@cdh06 soft]# source /etc/profile

2.验证pyspark:

[root@cdh06 soft]# pyspark
Python 3.6.8 (default, Nov 16 2020, 16:55:22) 
[GCC 4.8.5 20150623 (Red Hat 4.8.5-44)] on linux
Type "help", "copyright", "credits" or "license" for more information.
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
22/09/23 17:15:27 WARN cluster.YarnSchedulerBackend$YarnSchedulerEndpoint: Attempted to request executors before the AM has registered!
Welcome to____              __/ __/__  ___ _____/ /___\ \/ _ \/ _ `/ __/  '_//__ / .__/\_,_/_/ /_/\_\   version 2.4.0-cdh6.3.2/_/Using Python version 3.6.8 (default, Nov 16 2020 16:55:22)
SparkSession available as 'spark'.
>>> 

成功!

3.在CM配置Python环境变量

1.通过export设置python命令的安装路径:
1.1先查看python3的路径:

[root@cdh05 bin]# whereis python3
python3: /usr/bin/python3 /usr/bin/python3.6 /usr/bin/python3.6m /usr/bin/python3.6-config /usr/bin/python3.6m-config /usr/bin/python3.6m-x86_64-config /usr/lib/python3.6 /usr/lib64/python3.6 /usr/include/python3.6m /usr/share/man/man1/python3.1.gz

1.2 在配置中修改spark_env
在这里插入图片描述
修改完成后,回到CM主页根据提示重启相关服务。

4、pyspark命令测试
1.获取kerberos凭证–省略
2.使用Pyspark命令测试

[root@cdh05 ~]# pyspark
Python 3.6.8 (default, Nov 16 2020, 16:55:22) 
[GCC 4.8.5 20150623 (Red Hat 4.8.5-44)] on linux
Type "help", "copyright", "credits" or "license" for more information.
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
Welcome to____              __/ __/__  ___ _____/ /___\ \/ _ \/ _ `/ __/  '_//__ / .__/\_,_/_/ /_/\_\   version 2.4.0-cdh6.3.2/_/Using Python version 3.6.8 (default, Nov 16 2020 16:55:22)
SparkSession available as 'spark'.
>>> x = sc.parallelize([1,2,3])
>>> y = x.flatMap(lambda x: (x, 100*x, x**2))
>>> print(x.collect())
[1, 2, 3]                                                                       
>>> print(y.collect())
[1, 100, 1, 2, 200, 4, 3, 300, 9]                                               
>>> 

5.使用spark-submit提交一个Pyspark作业
这个demo主要使用spark-submit提交pyspark job,模拟从hdfs中读取数据,并转换成DateFrame,然后注册表并执行SQL条件查询,将查询结果输出到hdfs中。

1.在/tmp目录下创建一个test,将测试数据上传至hdfs目录/tmp/test/
执行put命令上传文件
查看使用cat命令
在这里插入图片描述

[root@cdh02 ~]# hadoop fs -mkdir/tmp/test/
[root@cdh02 ~]# hadoop fs -put people.txt /tmp/test
[root@cdh02 ~]# hadoop fs -cat /tmp/test/people.txt
anand,14
oner,19
carol,14
job,17
mary,20
divid,20
Eric,16
Faerl,28
rice,25
kumar,30
zhuli,16
marfer,23
rakie,19

2.将pyspark程序上传至CDH集群其中一个节点上,该节点部署了Spark的Gateway角色和Python3

PySparkTest_to_HDFS.py在pysparktest目录中,内容如下:

# 初始化sqlContext
from pyspark import SparkConf,SparkContext
from pyspark.sql import SQLContext, Row
conf=(SparkConf().setAppName('PySparkTest_to_HDFS'))
sc=SparkContext(conf=conf)
sqlContext = SQLContext(sc)# 加载文本文件并转换成Row.
lines = sc.textFile("/tmp/test/people.txt")
parts = lines.map(lambda l: l.split(","))
people = parts.map(lambda p: Row(name=p[0], age=int(p[1])))# 将DataFrame注册为table.
schemaPeople = sqlContext.createDataFrame(people)
schemaPeople.registerTempTable("people")# 执行sql查询,查下条件年龄在13岁到19岁之间
teenagers = sqlContext.sql("SELECT name,age FROM people WHERE age >= 13 AND age <= 19")# 将查询结果保存至hdfs中
teenagers.write.save("/tmp/test/teenagers")

3.使用spark-submit命令向集群提交PySpark作业

[root@cdh02 ~]# spark-submit pyspark_to_hdfs.py 
22/09/23 19:01:29 INFO spark.SparkContext: Running Spark version 2.4.0-cdh6.3.2
22/09/23 19:01:29 INFO logging.DriverLogger: Added a local log appender at: /tmp/spark-effcdcd5-0278-4188-9baf-5c43b5d97666/__driver_logs__/driver.log
22/09/23 19:01:29 INFO spark.SparkContext: Submitted application: PySparkTest_to_HDFS
22/09/23 19:01:29 INFO spark.SecurityManager: Changing view acls to: root
22/09/23 19:01:29 INFO spark.SecurityManager: Changing modify acls to: root
22/09/23 19:01:29 INFO spark.SecurityManager: Changing view acls groups to: 
22/09/23 19:01:29 INFO spark.SecurityManager: Changing modify acls groups to: 
22/09/23 19:01:29 INFO spark.SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users  with view permissions: Set(root); groups with view permissions: Set(); users  with modify permissions: Set(root); groups with modify permissions: Set()
22/09/23 19:01:29 INFO util.Utils: Successfully started service 'sparkDriver' on port 39084.
22/09/23 19:01:29 INFO spark.SparkEnv: Registering MapOutputTracker
22/09/23 19:01:29 INFO spark.SparkEnv: Registering BlockManagerMaster
22/09/23 19:01:29 INFO storage.BlockManagerMasterEndpoint: Using org.apache.spark.storage.DefaultTopologyMapper for getting topology information
22/09/23 19:01:29 INFO storage.BlockManagerMasterEndpoint: BlockManagerMasterEndpoint up
22/09/23 19:01:29 INFO storage.DiskBlockManager: Created local directory at /tmp/blockmgr-c25c60ee-4275-4735-af0c-b1f1d78f9a9d
22/09/23 19:01:29 INFO memory.MemoryStore: MemoryStore started with capacity 366.3 MB
22/09/23 19:01:29 INFO spark.SparkEnv: Registering OutputCommitCoordinator
22/09/23 19:01:29 INFO util.log: Logging initialized @1640ms
22/09/23 19:01:29 INFO server.Server: jetty-9.3.z-SNAPSHOT, build timestamp: 2018-09-05T05:11:46+08:00, git hash: 3ce520221d0240229c862b122d2b06c12a625732
22/09/23 19:01:30 INFO server.Server: Started @1701ms
22/09/23 19:01:30 INFO server.AbstractConnector: Started ServerConnector@4b6333cc{HTTP/1.1,[http/1.1]}{0.0.0.0:4040}
22/09/23 19:01:30 INFO util.Utils: Successfully started service 'SparkUI' on port 4040.
22/09/23 19:01:30 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@26a1432a{/jobs,null,AVAILABLE,@Spark}
22/09/23 19:01:30 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@4d4a97e9{/jobs/json,null,AVAILABLE,@Spark}
22/09/23 19:01:30 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@5f614298{/jobs/job,null,AVAILABLE,@Spark}
22/09/23 19:01:30 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@237cfc5{/jobs/job/json,null,AVAILABLE,@Spark}
22/09/23 19:01:30 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@1bdbe5b5{/stages,null,AVAILABLE,@Spark}
22/09/23 19:01:30 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@4e7b11d4{/stages/json,null,AVAILABLE,@Spark}
22/09/23 19:01:30 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@2aafbfca{/stages/stage,null,AVAILABLE,@Spark}
22/09/23 19:01:30 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@448bf3a6{/stages/stage/json,null,AVAILABLE,@Spark}
22/09/23 19:01:30 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@138d4512{/stages/pool,null,AVAILABLE,@Spark}
22/09/23 19:01:30 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@55344422{/stages/pool/json,null,AVAILABLE,@Spark}
22/09/23 19:01:30 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@42a707c7{/storage,null,AVAILABLE,@Spark}
22/09/23 19:01:30 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@59aadaf6{/storage/json,null,AVAILABLE,@Spark}
22/09/23 19:01:30 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@2076a104{/storage/rdd,null,AVAILABLE,@Spark}
22/09/23 19:01:30 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@5794e47c{/storage/rdd/json,null,AVAILABLE,@Spark}
22/09/23 19:01:30 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@2d775876{/environment,null,AVAILABLE,@Spark}
22/09/23 19:01:30 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@22055a44{/environment/json,null,AVAILABLE,@Spark}
22/09/23 19:01:30 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@37eeb181{/executors,null,AVAILABLE,@Spark}
22/09/23 19:01:30 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@47290404{/executors/json,null,AVAILABLE,@Spark}
22/09/23 19:01:30 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@508a507f{/executors/threadDump,null,AVAILABLE,@Spark}
22/09/23 19:01:30 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@58d5f750{/executors/threadDump/json,null,AVAILABLE,@Spark}
22/09/23 19:01:30 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@1af53772{/static,null,AVAILABLE,@Spark}
22/09/23 19:01:30 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@44594929{/,null,AVAILABLE,@Spark}
22/09/23 19:01:30 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@3d9423fd{/api,null,AVAILABLE,@Spark}
22/09/23 19:01:30 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@5a6bdfe7{/jobs/job/kill,null,AVAILABLE,@Spark}
22/09/23 19:01:30 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@5c7f6901{/stages/stage/kill,null,AVAILABLE,@Spark}
22/09/23 19:01:30 INFO ui.SparkUI: Bound SparkUI to 0.0.0.0, and started at http://cdh02:4040
22/09/23 19:01:30 INFO yarn.SparkRackResolver: Got an error when resolving hostNames. Falling back to /default-rack for all
22/09/23 19:01:30 INFO util.Utils: Using initial executors = 0, max of spark.dynamicAllocation.initialExecutors, spark.dynamicAllocation.minExecutors and spark.executor.instances
22/09/23 19:01:30 INFO yarn.Client: Requesting a new application from cluster with 4 NodeManagers
22/09/23 19:01:30 INFO conf.Configuration: resource-types.xml not found
22/09/23 19:01:30 INFO resource.ResourceUtils: Unable to find 'resource-types.xml'.
22/09/23 19:01:30 INFO yarn.Client: Verifying our application has not requested more than the maximum memory capability of the cluster (41127 MB per container)
22/09/23 19:01:30 INFO yarn.Client: Will allocate AM container, with 896 MB memory including 384 MB overhead
22/09/23 19:01:30 INFO yarn.Client: Setting up container launch context for our AM
22/09/23 19:01:30 INFO yarn.Client: Setting up the launch environment for our AM container
22/09/23 19:01:30 INFO yarn.Client: Preparing resources for our AM container
22/09/23 19:01:30 INFO yarn.Client: Uploading resource file:/tmp/spark-effcdcd5-0278-4188-9baf-5c43b5d97666/__spark_conf__2487934838763565949.zip -> hdfs://nameservice1/user/root/.sparkStaging/application_1660017172277_0243/__spark_conf__.zip
22/09/23 19:01:31 INFO spark.SecurityManager: Changing view acls to: root
22/09/23 19:01:31 INFO spark.SecurityManager: Changing modify acls to: root
22/09/23 19:01:31 INFO spark.SecurityManager: Changing view acls groups to: 
22/09/23 19:01:31 INFO spark.SecurityManager: Changing modify acls groups to: 
22/09/23 19:01:31 INFO spark.SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users  with view permissions: Set(root); groups with view permissions: Set(); users  with modify permissions: Set(root); groups with modify permissions: Set()
22/09/23 19:01:31 INFO yarn.Client: Submitting application application_1660017172277_0243 to ResourceManager
22/09/23 19:01:31 INFO yarn.SparkRackResolver: Got an error when resolving hostNames. Falling back to /default-rack for all
22/09/23 19:01:31 INFO impl.YarnClientImpl: Submitted application application_1660017172277_0243
22/09/23 19:01:32 INFO yarn.SparkRackResolver: Got an error when resolving hostNames. Falling back to /default-rack for all
22/09/23 19:01:32 INFO yarn.Client: Application report for application_1660017172277_0243 (state: ACCEPTED)
22/09/23 19:01:32 INFO yarn.Client: client token: N/Adiagnostics: AM container is launched, waiting for AM container to Register with RMApplicationMaster host: N/AApplicationMaster RPC port: -1queue: root.users.rootstart time: 1663930891063final status: UNDEFINEDtracking URL: http://cdh02:8088/proxy/application_1660017172277_0243/user: root
22/09/23 19:01:33 INFO yarn.SparkRackResolver: Got an error when resolving hostNames. Falling back to /default-rack for all
22/09/23 19:01:33 INFO cluster.YarnClientSchedulerBackend: Add WebUI Filter. org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter, Map(PROXY_HOSTS -> cdh02,cdh03, PROXY_URI_BASES -> http://cdh02:8088/proxy/application_1660017172277_0243,http://cdh03:8088/proxy/application_1660017172277_0243, RM_HA_URLS -> cdh02:8088,cdh03:8088), /proxy/application_1660017172277_0243
22/09/23 19:01:33 INFO ui.JettyUtils: Adding filter org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter to /jobs, /jobs/json, /jobs/job, /jobs/job/json, /stages, /stages/json, /stages/stage, /stages/stage/json, /stages/pool, /stages/pool/json, /storage, /storage/json, /storage/rdd, /storage/rdd/json, /environment, /environment/json, /executors, /executors/json, /executors/threadDump, /executors/threadDump/json, /static, /, /api, /jobs/job/kill, /stages/stage/kill.
22/09/23 19:01:33 INFO yarn.Client: Application report for application_1660017172277_0243 (state: RUNNING)
22/09/23 19:01:33 INFO yarn.Client: client token: N/Adiagnostics: N/AApplicationMaster host: 10.110.17.36ApplicationMaster RPC port: -1queue: root.users.rootstart time: 1663930891063final status: UNDEFINEDtracking URL: http://cdh02:8088/proxy/application_1660017172277_0243/user: root
22/09/23 19:01:33 INFO cluster.YarnClientSchedulerBackend: Application application_1660017172277_0243 has started running.
........22/09/23 19:01:41 INFO cluster.YarnScheduler: Adding task set 1.0 with 2 tasks
22/09/23 19:01:41 INFO scheduler.TaskSetManager: Starting task 0.0 in stage 1.0 (TID 1, cdh06, executor 1, partition 0, NODE_LOCAL, 7910 bytes)
22/09/23 19:01:41 INFO storage.BlockManagerInfo: Added broadcast_2_piece0 in memory on cdh06:39890 (size: 78.0 KB, free: 398.6 MB)
22/09/23 19:01:42 INFO spark.ExecutorAllocationManager: Requesting 2 new executors because tasks are backlogged (new desired total will be 2)
22/09/23 19:01:42 INFO scheduler.TaskSetManager: Starting task 1.0 in stage 1.0 (TID 2, cdh06, executor 1, partition 1, NODE_LOCAL, 7910 bytes)
22/09/23 19:01:42 INFO scheduler.TaskSetManager: Finished task 0.0 in stage 1.0 (TID 1) in 1009 ms on cdh06 (executor 1) (1/2)
22/09/23 19:01:42 INFO scheduler.TaskSetManager: Finished task 1.0 in stage 1.0 (TID 2) in 112 ms on cdh06 (executor 1) (2/2)
22/09/23 19:01:42 INFO cluster.YarnScheduler: Removed TaskSet 1.0, whose tasks have all completed, from pool 
22/09/23 19:01:42 INFO scheduler.DAGScheduler: ResultStage 1 (save at NativeMethodAccessorImpl.java:0) finished in 1.134 s
22/09/23 19:01:42 INFO scheduler.DAGScheduler: Job 1 finished: save at NativeMethodAccessorImpl.java:0, took 1.139997 s
22/09/23 19:01:42 INFO datasources.FileFormatWriter: Write Job 71c5176c-9db0-4399-b3bc-71a64b666fe3 committed.
22/09/23 19:01:42 INFO datasources.FileFormatWriter: Finished processing stats for write job 71c5176c-9db0-4399-b3bc-71a64b666fe3.
22/09/23 19:01:42 INFO spark.SparkContext: Invoking stop() from shutdown hook
22/09/23 19:01:42 INFO server.AbstractConnector: Stopped Spark@4b6333cc{HTTP/1.1,[http/1.1]}{0.0.0.0:4040}
22/09/23 19:01:42 INFO ui.SparkUI: Stopped Spark web UI at http://cdh02:4040
22/09/23 19:01:42 INFO cluster.YarnClientSchedulerBackend: Interrupting monitor thread
22/09/23 19:01:42 INFO cluster.YarnClientSchedulerBackend: Shutting down all executors
22/09/23 19:01:42 INFO cluster.YarnSchedulerBackend$YarnDriverEndpoint: Asking each executor to shut down
22/09/23 19:01:42 INFO cluster.SchedulerExtensionServices: Stopping SchedulerExtensionServices
(serviceOption=None,services=List(),started=false)
22/09/23 19:01:42 INFO cluster.YarnClientSchedulerBackend: Stopped
22/09/23 19:01:42 INFO spark.MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped!
22/09/23 19:01:42 INFO memory.MemoryStore: MemoryStore cleared
22/09/23 19:01:42 INFO storage.BlockManager: BlockManager stopped
22/09/23 19:01:42 INFO storage.BlockManagerMaster: BlockManagerMaster stopped
22/09/23 19:01:42 INFO scheduler.OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped!
22/09/23 19:01:42 INFO spark.SparkContext: Successfully stopped SparkContext
22/09/23 19:01:42 INFO util.ShutdownHookManager: Shutdown hook called
22/09/23 19:01:42 INFO util.ShutdownHookManager: Deleting directory /tmp/spark-effcdcd5-0278-4188-9baf-5c43b5d97666/pyspark-f00c77c5-73c2-4775-b09d-94cbee658a49
22/09/23 19:01:42 INFO util.ShutdownHookManager: Deleting directory /tmp/spark-effcdcd5-0278-4188-9baf-5c43b5d97666
22/09/23 19:01:42 INFO util.ShutdownHookManager: Deleting directory /tmp/spark-0a5b450a-a378-41cf-b64e-5d37a7a5237d

4.作业执行成功
在这里插入图片描述
查看Yarn界面
在这里插入图片描述

在这里插入图片描述
通过以上信息,可以看到作业执行成功。

5.查看生成的文件,如下图:

[root@cdh02 ~]# hadoop fs -ls /tmp/test/teenagers
Found 3 items
-rw-r--r--   3 root supergroup          0 2022-09-23 19:01 /tmp/test/teenagers/_SUCCESS
-rw-r--r--   3 root supergroup        703 2022-09-23 19:01 /tmp/test/teenagers/part-00000-54158861-54c1-4e3e-b8c5-36bff50444bf-c000.snappy.parquet
-rw-r--r--   3 root supergroup        656 2022-09-23 19:01 /tmp/test/teenagers/part-00001-54158861-54c1-4e3e-b8c5-36bff50444bf-c000.snappy.parquet
[root@cdh02 ~]# 

因为生成的是parquet文件,它是二进制文件,无法直接使用命令查看,所以我们可以在pyspark上验证文件内容是否正确.

我们上面使用spark-submit提交的任务使用sql查询条件是13到19岁,可以看到在pyspark上查询的数据是在这个区间的数据

>>> parquetFile = sqlContext.read.parquet("/tmp/test/teenagers")
>>> parquetFile.registerTempTable("parquetTable")                               
>>> teenagers = sqlContext.sql("select* from parquetTable").show()
+------+---+                                                                    
|  name|age|
+------+---+
| anand| 14|
|  oner| 19|
| carol| 14|
|   job| 17|
|  mary| 20|
| divid| 20|
|  Eric| 16|
|  rice| 25|
| zhuli| 16|
|marfer| 23|
| rakie| 19|
+------+---+

6.PySpark写数据到MySQL
1.将上面的作业增加如下代码

# 初始化sqlContext
from pyspark import SparkConf,SparkContext
from pyspark.sql import SQLContext, Row
conf=(SparkConf().setAppName('PySpar_to_MySQL'))
sc=SparkContext(conf=conf)
sqlContext = SQLContext(sc)# 加载文本文件并转换成Row.
lines = sc.textFile("/tmp/test/people.txt")
parts = lines.map(lambda l: l.split(","))
people = parts.map(lambda p: Row(name=p[0], age=int(p[1])))# 将DataFrame注册为table.
schemaPeople = sqlContext.createDataFrame(people)
schemaPeople.registerTempTable("people")# 执行sql查询,查下条件年龄在13岁到29岁之间
teenagers = sqlContext.sql("SELECT name,age FROM people WHERE age >= 13 AND age <= 29")url = "jdbc:mysql://10.110.17.37:3306/mes_gd"
table = "teenagers"
prop = {"user":"xxx","password":"xxx@xx96"}teenagers.write.jdbc(url, table, "append", prop)

2.在命令行加载MySQL的驱动包到Spark环境变量,然后执行命令
在这里插入图片描述
本地刚好有一个MySQL的驱动包,执行以下命令添加到spark环境变量中。
先将驱动包复制到opt/cloudera/parcels/CDH/lib/spark/jars目录下。

[root@cdh02 ~]# cp mysql-connector-java-5.1.47.jar /opt/cloudera/parcels/CDH/lib/spark/jars
[root@cdh02 ~]# export SPARK_CLASSPATH=$SPARK_CLASSPATH:/opt/cloudera/parcels/CDH/lib/spark/jars/mysql-connector-java-5.1.47.jar
[root@cdh02 ~]# spark-submit pyspark_to_mysql.py
22/09/23 19:24:36 INFO spark.SparkContext: Running Spark version 2.4.0-cdh6.3.2
22/09/23 19:24:36 INFO logging.DriverLogger: Added a local log appender at: /tmp/spark-f34e3512-a30a-4137-ba20-5d59e4772966/__driver_logs__/driver.log
22/09/23 19:24:36 INFO spark.SparkContext: Submitted application: pyspark_to_mysql
22/09/23 19:24:36 INFO spark.SecurityManager: Changing view acls to: root
22/09/23 19:24:36 INFO spark.SecurityManager: Changing modify acls to: root
22/09/23 19:24:36 INFO spark.SecurityManager: Changing view acls groups to: 
22/09/23 19:24:36 INFO spark.SecurityManager: Changing modify acls groups to: 
22/09/23 19:24:36 INFO spark.SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users  with view permissions: Set(root); groups with view permissions: Set(); users  with modify permissions: Set(root); groups with modify permissions: Set()
22/09/23 19:24:36 INFO util.Utils: Successfully started service 'sparkDriver' on port 45020.。。。。
22/09/23 19:24:52 INFO spark.SparkContext: Invoking stop() from shutdown hook
22/09/23 19:24:52 INFO server.AbstractConnector: Stopped Spark@58529c8c{HTTP/1.1,[http/1.1]}{0.0.0.0:4040}
22/09/23 19:24:52 INFO ui.SparkUI: Stopped Spark web UI at http://cdh02:4040
22/09/23 19:24:52 INFO cluster.YarnClientSchedulerBackend: Interrupting monitor thread
22/09/23 19:24:52 INFO cluster.YarnClientSchedulerBackend: Shutting down all executors
22/09/23 19:24:52 INFO cluster.YarnSchedulerBackend$YarnDriverEndpoint: Asking each executor to shut down
22/09/23 19:24:52 INFO cluster.SchedulerExtensionServices: Stopping SchedulerExtensionServices
(serviceOption=None,services=List(),started=false)
22/09/23 19:24:52 INFO cluster.YarnClientSchedulerBackend: Stopped
22/09/23 19:24:52 INFO spark.MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped!
22/09/23 19:24:52 INFO memory.MemoryStore: MemoryStore cleared
22/09/23 19:24:52 INFO storage.BlockManager: BlockManager stopped
22/09/23 19:24:52 INFO storage.BlockManagerMaster: BlockManagerMaster stopped
22/09/23 19:24:52 INFO scheduler.OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped!
22/09/23 19:24:52 INFO spark.SparkContext: Successfully stopped SparkContext
22/09/23 19:24:52 INFO util.ShutdownHookManager: Shutdown hook called
22/09/23 19:24:52 INFO util.ShutdownHookManager: Deleting directory /tmp/spark-f34e3512-a30a-4137-ba20-5d59e4772966/pyspark-d8c43dbb-9758-46a7-8359-7269d20f61f3
22/09/23 19:24:52 INFO util.ShutdownHookManager: Deleting directory /tmp/spark-f34e3512-a30a-4137-ba20-5d59e4772966
22/09/23 19:24:52 INFO util.ShutdownHookManager: Deleting directory /tmp/spark-9940c301-63b4-417b-a9b9-aefa1ea7dff8

在这里插入图片描述

执行成功!

3.使用Yarn查看作业是否运行成功
在这里插入图片描述
4.验证MySQL表中是否有数据

mysql> use mes_gd;
Reading table information for completion of table and column names
You can turn off this feature to get a quicker startup with -ADatabase changed
mysql> show tables;
+--------------------+
| Tables_in_mes_gd   |
+--------------------+
| ae_order_equipment |
| ae_order_materials |
| ae_order_physical  |
| clicks             |
| letter             |
| student            |
| teenagers          |
| tinvbill_daishuyun |
| tinvbill_kongming  |
| tinvbill_shulan    |
+--------------------+
10 rows in set (0.00 sec)mysql> select * from teenagers;
+------+---+                                                                    
|  name|age|
+------+---+
| anand| 14|
|  oner| 19|
| carol| 14|
|   job| 17|
|  mary| 20|
| divid| 20|
|  Eric| 16|
|  rice| 25|
| zhuli| 16|
|marfer| 23|
| rakie| 19|
+------+---+

到此验证结束!!
注意:这里将数据写入MySQL时需要在环境变量中加载MySQL的JDBC驱动包,MySQL表可以不存在,pyspark在写数据时会自动创建该表。

本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若转载,请注明出处:http://www.luyixian.cn/news_show_12924.aspx

如若内容造成侵权/违法违规/事实不符,请联系dt猫网进行投诉反馈email:809451989@qq.com,一经查实,立即删除!

相关文章

Vim编辑器的使用

一、什么是Vim编辑器 Vim是从vi发展出来的一个文本编辑器。代码补全、编译及错误跳转等方便编程的功能特别丰富,在程序员中被广泛使用。 简单的来说&#xff0c;vi 是老式的字处理器,不过功能已经很齐全了,但是还是有可以进步的地方。vim则可以说是程序开发者的一项很好用的工具…

.NET MAUI学习笔记——1.初识.NET MAUI_初级篇

文章目录一、 引言二、什么是.NET MAUI&#xff1f;1. .NET MAUI为谁服务&#xff1f;2. .NET MAUI是怎样工作的&#xff1f;3. .NET MAUI提供了什么&#xff1f;3.1. 设备特性的跨平台API3.2. 单一项目3.3. 热重载三、结尾一、 引言 打算开发一款简单的Android的App&#xff…

springboot-鑫源停车场管理系统 毕业设计 -附源码 290915

springboot鑫源停车场管理系统 摘 要 21世纪时信息化的时代&#xff0c;几乎任何一个行业都离不开计算机&#xff0c;将计算机运用于停车场管理也是十分常见的。过去使用手工的管理方式对停车场进行管理&#xff0c;造成了管理繁琐、难以维护等问题&#xff0c;如今使用计算机对…

03-JVM-对象内存、执行引擎

一、创建对象的方式 new、Xxx的静态方法、XXXBuilder/XXXFactory的静态方法Class的newInstance&#xff0c;只能调用空参构造器&#xff0c;权限必须是publicConstruct的newInstance&#xff0c;可以调用空参、带参的构造器&#xff0c;权限无要求Clone()&#xff0c;不调用构…

瑞吉外卖(16) - 新增菜品功能开发

文章目录需求分析数据模型dish表dis flaow表代码开发搭建框架新增dishFlavor实体类新增Mapper接口DishFlavorMapper新增业务接口DishFlavorService新增业务层实现类DishFlavorServiceImpl新增控制层DishController梳理交互过程编辑CategoryControoler下拉框功能测试导入DTO编写…

香橙派Orange Pi i96 初次使用遇到的坑和解决方法+附加c# iot .net 代码实例

香橙派Orange Pi i96这个主板是香橙派最便宜的一样主板&#xff0c;功能也不少&#xff0c;我觉得可以满足我的需求&#xff0c;如果可以用好的话&#xff0c;可以做很多东西&#xff0c;批量买也便宜&#xff0c;做出来的产品&#xff0c;定价也不用很高&#xff0c;符合效益 …

4计算机网络与信息安全——软件设计师

一、七层模型 内容&#xff1a; 习题&#xff1a; 局域网不能通过三层通信 二、网络技术标准与协议 一般分为4/5层(没有明确说明&#xff0c;了解主要几层即可) TCP协议: DHCP协议 DNS协议 递归直接回答&#xff0c;迭代刨根到底 三、网络类型与拓扑类型 计算机网络按不同…

一起Talk Android吧(第三百八十五回:数据绑定-DataBinding总结)

文章目录数据绑定使用方法7.单向数据绑8.双向数据绑定9.绑定事件演示结果使用总结各位看官们&#xff0c;大家好&#xff0c;上一回中咱们说的是数据绑定-DataBinding的例子,这一回中咱们继承介绍该例子。闲话休提&#xff0c;言归正转&#xff0c;让我们一起Talk Android吧&am…

线程池概念2

线程池类图 ExecutorService:是一个线程池最基本的接口&#xff0c;提供了提交任务&#xff0c;关闭线程池这些基本的方法。 ScheduledExecutorService:扩展接口&#xff0c;在基础线程池的功能上又新增了任务调度的功能&#xff0c;可以用来定时执行任务。 线程池状态 线程…

Devtools Console 面板输入了 1+1 ,浏览器内部发生了什么?

背景 新来的实习生妹子问了一个问题&#xff1a;「你看 Chrome 的 Devtools 调试工具&#xff0c;代码写一半还没运行下面就会提示输出结果&#xff0c;这个咋做到的&#xff1f;」 咋做的&#xff1f;对于 Devtools 的内部执行逻辑&#xff0c;咱不了解&#xff0c;但咱也不能…

WhatsApp和WhatsApp Business之间的区别

关键词&#xff1a;WhatsApp、WhatsApp Business WhatsApp 无疑是一款超级方便、免费且毫不费力的即时通讯应用程序。这就是庞大的用户群的原因。对于企业来说&#xff0c;它也是一个超级方便、免费且不费吹灰之力的即时通讯应用程序。特别是自从 WhatsApp 推出了 WhatsApp Bus…

《MySQL DBA封神打怪之路》专栏学习大纲

《MySQL DBA封神打怪之路》专栏学习大纲 文章目录《MySQL DBA封神打怪之路》专栏学习大纲1、作者介绍2、专栏介绍3、专栏部分文章截图3.1.所有文章一栏3.2.文章内容截图4、专栏大纲学习指南4.1对数据库的初步认识4.2.四种类型的SQL语句基本使用4.3.超丰富的多表联查案例4.4.事物…

消除笔去水印怎么做?教你怎么用这些消除笔软件

在网上看到好看的照片想要用作壁纸&#xff0c;但是上面有水印看起来不好看怎么办&#xff1f;我们可以用有消除笔的软件将图片水印去除&#xff0c;那么消除笔去水印怎么做呢&#xff1f;今天的这个教程分享给你们。方法一&#xff1a;借助“Styler”进行去水印操作 这是一款拥…

Echarts绘制geo地图属性设置大全(一)

1、Echarts版本 "echarts": "^5.3.3", 2、基础地图绘制&#xff08;以中国地图为例&#xff09; 绘制地图需要有用于绘制地理坐标系的数据&#xff0c;如示例使用的china.js <template><div ref"mapBar" class"map-class"&…

面试之HashMap

1.初始大小&#xff1a;HashMap默认初始大小是16&#xff0c;这个默认值是可以设置的&#xff0c;如果事先知道大概的数据量有多大&#xff0c;可以通过修改默认初始大小&#xff0c;减少动态扩容的次数&#xff0c;这样会大大提高HashMap的性能 2.动态扩容&#xff1a;最大 装…

兼容性测试包含哪几类呢?

兼容性测试包含哪几类呢? (1)浏览器方面 关于浏览器的兼容性测试&#xff0c;主要是检查页面的交互、元素和样式展示是否正常。我们都知道&#xff0c;目前市面上主流的浏览器非常多&#xff0c;像&#xff1a;360、搜狗、火狐等等。 在进行测试的时候&#xff0c;由于兼容性问…

2022出海东南亚:越南电商市场现状及网红营销特点

近几年&#xff0c;东南亚整体发展态势非常好&#xff0c;加上国内市场饱和&#xff0c;不少国内企业在计划出海或已经出海东南亚。作为东南亚第二大电商市场&#xff0c;越南自然也成了香饽饽&#xff0c;越南蓬勃发展的数字经济和电商领域&#xff0c;让其成为了卖家挖掘新商…

Vue 力导图d3js 实现

PowerBI 自定义组件推荐用D3JS 实现&#xff0c;实现一个PowerBI 的力导图组件&#xff0c;为调试方便&#xff0c;先用Vue 实现一个&#xff0c;然后再移植到PowerBI 中&#xff0c;话不多说&#xff0c;上效果&#xff1a; 体验入口 上代码是最好的老师&#xff1a; <t…

【easyExcel】后端将模板文件写入流供前端下载报错,easyexcel下载模板文件出错

目录事件起因环境和工具操作过程解决办法参考内容&#xff1a;结束语事件起因 报错内容&#xff08;我主要搜索的两个错误内容点&#xff09;&#xff1a; com.alibaba.excel.exception.ExcelGenerateException: Create workbook failure 和 Caused by: org.apache.poi.openx…

kingbaseES(人大金仓)数据库语法和常用函数 以及 踩坑记录

前言 最近公司弄了个新项目&#xff0c;数据库指定使用kingbase数据库 刚开始一看这名字都不知道这是啥数据库&#xff0c;后来百度一搜&#xff0c;看到中文名字就知道了。 虽然没用过&#xff0c;但那几个国产数据库也听说过 这不&#xff0c;刚开始用&#xff0c;语法都不熟…