创建旅游景点图数据库Neo4J技术验证

news/2024/7/27 8:40:05/文章来源:https://blog.csdn.net/qq_39813001/article/details/136596461

文章目录

  • 创建旅游景点图数据库Neo4J技术验证
    • 写在前面
    • 基础数据建库
      • python3源代码
      • KG效果
      • KG入库效率优化方案
      • PostGreSQL建库

创建旅游景点图数据库Neo4J技术验证

写在前面

本章主要实践内容:
(1)neo4j知识图谱库建库。使用导航poi中的公园、景点两类csv直接建库。
(2)pg建库。携程poi入库tripdata的poibaseinfo表,之后,导航poi中的公园、景点也导入该表。

基础数据建库

python3源代码

以下,实现了csv数据初始导入KG。如果是增量更新,代码需要调整。
另外,星级、旅游时间 是随机生成,不具备任何真实性。

import csv
from py2neo import *
import random
import geohashdef importCSV2NeoKG( graph,csvPath,csvType ):#单纯的查询方法node_Match = NodeMatcher(graph)seasons = ["春季","夏季","秋季","冬季"]stars = ["A","AA","AAA","AAAA","AAAAA"]with open(csvPath,"r",encoding="utf-8") as f:reader = csv.reader(f)datas = list(reader)print("csv连接成功",len(datas))newDatas = []#for data in datas:for k in range(0,len(datas)):data = datas[k]if k==0:newDatas.append(data)else:if datas[k][0]==datas[k-1][0] and datas[k][1]==datas[k-1][1]:#通过 名称+区县 组合判断是否唯一continueelse:newDatas.append(data)print("去除csv中重复记录")nodeCity_new = Node("chengshi",name="北京")cityMatch = node_Match.match("chengshi",name="北京")if cityMatch==None :graph.merge(nodeCity_new,"chengshi","name")for i in range(0,len(newDatas)):nodeQu_new = Node("quxian",name=newDatas[i][1])        rel1 = Relationship(nodeQu_new,"属于",nodeCity_new)graph.merge(rel1,"quxian","name")geoxy_encode = geohash.encode( newDatas[i][4],newDatas[i][3],6 )nodeJingdian = Node(csvType,name=newDatas[i][0],quyu=newDatas[i][1],jianjie=newDatas[i][0],dizhi=newDatas[i][2],zuobiao=geoxy_encode)jingdianMatch = node_Match.match(csvType,name=newDatas[i][0]).where(quyu=newDatas[i][1]).first()if jingdianMatch==None :graph.create(nodeJingdian)rel2 = Relationship(nodeJingdian,"位于",nodeQu_new)graph.create(rel2)nodeTime = Node("traveltime",time=random.choice(seasons))#graph.create(nodeTime)rel3 = Relationship(nodeJingdian,"旅游时间",nodeTime)graph.merge(rel3,"traveltime","time")nodeAAA = Node("Stars",star=random.choice(stars))#graph.create(nodeAAA)rel4 = Relationship(nodeJingdian,"星级",nodeAAA)graph.merge(rel4,"Stars","star")if __name__ == '__main__':graph = Graph("bolt://localhost:7687",auth=("neo4j","neo4j?"))print("neo4j连接成功")importCSV2NeoKG(graph,"公园2050101Attr.csv","gongyuan")print("gongyuan ok")importCSV2NeoKG(graph,"景点2050201and20600102Attr.csv","jingdian")print("jingdian ok")

坐标用到了geohash,尝试安装过几种geohash库,均有错误。最后,直接复制源代码生成.py文件。
geohash.py代码如下:

from __future__ import division
from collections import namedtuple
from builtins import range
import decimal
import mathbase32 = '0123456789bcdefghjkmnpqrstuvwxyz'def _indexes(geohash):if not geohash:raise ValueError('Invalid geohash')for char in geohash:try:yield base32.index(char)except ValueError:raise ValueError('Invalid geohash')def _fixedpoint(num, bound_max, bound_min):"""Return given num with precision of 2 - log10(range)Params------num: A numberbound_max: max bound, e.g max latitude of a geohash cell(NE)bound_min: min bound, e.g min latitude of a geohash cell(SW)Returns-------A decimal"""try:decimal.getcontext().prec = math.floor(2-math.log10(bound_max- bound_min))except ValueError:decimal.getcontext().prec = 12return decimal.Decimal(num)def bounds(geohash):"""Returns SW/NE latitude/longitude bounds of a specified geohash::|      .| NE|    .  ||  .    |SW |.      |:param geohash: string, cell that bounds are required of:returns: a named tuple of namedtuples Bounds(sw(lat, lon), ne(lat, lon)). >>> bounds = geohash.bounds('ezs42')>>> bounds>>> ((42.583, -5.625), (42.627, -5.58)))>>> bounds.sw.lat>>> 42.583"""geohash = geohash.lower()even_bit = Truelat_min = -90lat_max = 90lon_min = -180lon_max = 180# 5 bits for a char. So divide the decimal by power of 2, then AND 1# to get the binary bit - fast modulo operation.for index in _indexes(geohash):for n in range(4, -1, -1):bit = (index >> n) & 1if even_bit:# longitudelon_mid = (lon_min + lon_max) / 2if bit == 1:lon_min = lon_midelse:lon_max = lon_midelse:# latitudelat_mid = (lat_min + lat_max) / 2if bit == 1:lat_min = lat_midelse:lat_max = lat_mideven_bit = not even_bitSouthWest = namedtuple('SouthWest', ['lat', 'lon'])NorthEast = namedtuple('NorthEast', ['lat', 'lon'])sw = SouthWest(lat_min, lon_min)ne = NorthEast(lat_max, lon_max)Bounds = namedtuple('Bounds', ['sw', 'ne'])return Bounds(sw, ne)def decode(geohash):"""Decode geohash to latitude/longitude. Location is approximate centre of thecell to reasonable precision.:param geohash: string, cell that bounds are required of:returns: Namedtuple with decimal lat and lon as properties.>>> geohash.decode('gkkpfve')>>> (70.2995, -27.9993)"""(lat_min, lon_min), (lat_max, lon_max) = bounds(geohash)lat = (lat_min + lat_max) / 2lon = (lon_min + lon_max) / 2lat = _fixedpoint(lat, lat_max, lat_min)lon = _fixedpoint(lon, lon_max, lon_min)Point = namedtuple('Point', ['lat', 'lon'])return Point(lat, lon)def encode(lat, lon, precision):"""Encode latitude, longitude to a geohash.:param lat: latitude, a number or string that can be converted to decimal.Ideally pass a string to avoid floating point uncertainties.It will be converted to decimal.:param lon: longitude, a number or string that can be converted to decimal.Ideally pass a string to avoid floating point uncertainties.It will be converted to decimal.:param precision: integer, 1 to 12 represeting geohash levels upto 12.:returns: geohash as string.>>> geohash.encode('70.2995', '-27.9993', 7)>>> gkkpfve"""lat = decimal.Decimal(lat)lon = decimal.Decimal(lon)index = 0  # index into base32 mapbit = 0   # each char holds 5 bitseven_bit = Truelat_min = -90lat_max = 90lon_min = -180lon_max = 180ghash = []while(len(ghash) < precision):if even_bit:# bisect E-W longitudelon_mid = (lon_min + lon_max) / 2if lon >= lon_mid:index = index * 2 + 1lon_min = lon_midelse:index = index * 2lon_max = lon_midelse:# bisect N-S latitudelat_mid = (lat_min + lat_max) / 2if lat >= lat_mid:index = index * 2 + 1lat_min = lat_midelse:index = index * 2lat_max = lat_mideven_bit = not even_bitbit += 1if bit == 5:# 5 bits gives a char in geohash. Start overghash.append(base32[index])bit = 0index = 0return ''.join(ghash)def adjacent(geohash, direction):"""Determines adjacent cell in given direction.:param geohash: cell to which adjacent cell is required:param direction: direction from geohash, string, one of n, s, e, w:returns: geohash of adjacent cell>>> geohash.adjacent('gcpuyph', 'n')>>> gcpuypk"""if not geohash:raise ValueError('Invalid geohash')if direction not in ('nsew'):raise ValueError('Invalid direction')neighbour = {'n': ['p0r21436x8zb9dcf5h7kjnmqesgutwvy','bc01fg45238967deuvhjyznpkmstqrwx'],'s': ['14365h7k9dcfesgujnmqp0r2twvyx8zb','238967debc01fg45kmstqrwxuvhjyznp'],'e': ['bc01fg45238967deuvhjyznpkmstqrwx','p0r21436x8zb9dcf5h7kjnmqesgutwvy'],'w': ['238967debc01fg45kmstqrwxuvhjyznp','14365h7k9dcfesgujnmqp0r2twvyx8zb'],}border = {'n': ['prxz',     'bcfguvyz'],'s': ['028b',     '0145hjnp'],'e': ['bcfguvyz', 'prxz'],'w': ['0145hjnp', '028b'],}last_char = geohash[-1]parent = geohash[:-1]  # parent is hash without last chartyp = len(geohash) % 2# check for edge-cases which don't share common prefixif last_char in border[direction][typ] and parent:parent = adjacent(parent, direction)index = neighbour[direction][typ].index(last_char)return parent + base32[index]def neighbours(geohash):"""Returns all 8 adjacent cells to specified geohash::| nw | n | ne ||  w | * | e  || sw | s | se |:param geohash: string, geohash neighbours are required of:returns: neighbours as namedtuple of geohashes with properties n,ne,e,se,s,sw,w,nw>>> neighbours = geohash.neighbours('gcpuyph')>>> neighbours>>> ('gcpuypk', 'gcpuypm', 'gcpuypj', 'gcpuynv', 'gcpuynu', 'gcpuyng', 'gcpuyp5', 'gcpuyp7')>>> neighbours.ne>>> gcpuypm"""n = adjacent(geohash, 'n')ne = adjacent(n, 'e')e = adjacent(geohash, 'e')s = adjacent(geohash, 's')se = adjacent(s, 'e')w = adjacent(geohash, 'w')sw = adjacent(s, 'w')nw = adjacent(n, 'w')Neighbours = namedtuple('Neighbours',['n', 'ne', 'e', 'se', 's', 'sw', 'w', 'nw'])return Neighbours(n, ne, e, se, s, sw, w, nw)

KG效果

命令行里启动neo4j:
neo4j.bat console

KG入库效率优化方案

上文的python方法是py2neo的基本方法,经过本人亲测,当节点量到3~5w的时候,入库开始变慢,以小时计。

百度后,有大神提供了另外一种方法:
采用这种方法,建立50w个节点和50w个关系,流程包括node、rel的建立、append到list、入库,全过程4分钟以内搞定。测试环境在VM虚拟机实现。
代码如下:

from py2neo import Graph, Subgraph, Node, Relationship
from progressbar import *
import datetimedef batch_create(graph, nodes_list, relations_list):subgraph = Subgraph(nodes_list, relations_list)tx_ = graph.begin()tx_.create(subgraph)graph.commit(tx_)if __name__ == '__main__':# 连接neo4jgraph = Graph("bolt://localhost:7687",auth=("neo4j","neo4j?"))# 批量创建节点nodes_list = []  # 一批节点数据relations_list = []  # 一批关系数据nodeCity_new = Node("chengshi",name="北京")nodes_list.append(nodeCity_new)widgets = ['CSV导入KG进度: ', Percentage(), ' ', Bar('#'), ' ', Timer(), ' ', ETA(), ' ']bar = ProgressBar(widgets=widgets, maxval=500000)bar.start()#for i in range(0,500000):bar.update(i+1)nodeQu_new = Node("quxian",name="Test{0}".format(i))nodes_list.append(nodeQu_new)rel1 = Relationship(nodeQu_new,"属于",nodeCity_new)relations_list.append(rel1)bar.finish()current_time = datetime.datetime.now()print("current_time:    " + str(current_time))# 批量创建节点/关系batch_create(graph, nodes_list, relations_list)current_time = datetime.datetime.now()print("current_time:    " + str(current_time))print("batch ok")

PostGreSQL建库

pg建库。携程poi入库tripdata的poibaseinfo表,之后,导航poi中的公园、景点也导入该表。

携程poi导入代码:psycopg2_004.py

import psycopg2
import csv
import random
import geohash
from progressbar import *#
#携程爬虫csv数据入库
# 
def importCtripCSV2PG(cur,csvpath,csvcity,csvprovice):#     csvPath = "pois_bj_ctrip.csv"with open(csvpath,"r",encoding="utf-8") as f:reader = csv.reader(f)datas = list(reader)print("csv datas number = {}".format(len(datas)))print("")widgets = ['爬虫数据导入PG进度: ', Percentage(), ' ', Bar('#'), ' ', Timer(), ' ', ETA(), ' ']bar = ProgressBar(widgets=widgets, maxval=len(datas))bar.start()##sCol = "namec,namec2,namee,tags,brief,ticket,ticketmin,ticketadult,ticketchild,ticketold,ticketstudent,scores,scorenumber,opentime,spendtime,introduceinfo,salesinfo,guid,quyu,city,province,contry"#sCol = "namec,namee,tags,brief,ticket,ticketmin,ticketadult,ticketchild,ticketold,ticketstudent,scores,scorenumber,opentime,spendtime,introduceinfo,salesinfo,city,province,contry"sCol = "namec,namee,tags,brief,ticket,ticketmin,ticketadult,ticketchild,ticketold,ticketstudent,scores,scorenumber,opentime,spendtime,introduceinfo,salesinfo,x,y,geos,photourl,city,province,contry"#     print("sCol number = {}".format(len(sCol)))for i in range(0,len(datas)):bar.update(i+1)data = datas[i]if data==None or len(data)==0:#print("{}行None值".format(i))continue if data[0]=="名称" or data[0]==None :continue geoxy_encode = geohash.encode( data[5],data[4],7 )values = ",".join("\'{0}\'".format(w) for w in [data[0].replace("\'","''"),data[1].replace("\'","''"),data[6],data[7],data[8],data[9],data[13],data[16],data[14],data[15],data[10],data[11],data[18].replace("\'","''"),data[17].replace("\'","''"),data[19].replace("\'","''"),data[20].replace("\'","''"),data[5],data[4],geoxy_encode,data[12],csvcity,csvprovice,"中国"])#     print(values)sqlpre = "insert into poibaseinfo({})".format(sCol)sql = sqlpre+" values ({})".format(values)#     print(sql)try:cur.execute(sql)except psycopg2.Error as e:print(e)bar.finish()if __name__ == '__main__':user = "postgres"pwd = "你的密码"port = "5432"hostname = "127.0.0.1"conn = psycopg2.connect(database = "tripdata", user = user, password = pwd, host = "127.0.0.1", port = port)print(conn)sql = "select * from poibaseinfo"cur = conn.cursor()cur.execute(sql)cols = cur.descriptionprint("PG cols number = {}".format(len(cols)))#CSV文件导入PGcsvPath = "pois_bj_ctrip.csv"    importCtripCSV2PG(cur,csvPath,"北京","北京")#其他CSV文件导入PG#TODO...conn.commit()cur.close()conn.close()print("ok")

本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若转载,请注明出处:http://www.luyixian.cn/news_show_1006725.aspx

如若内容造成侵权/违法违规/事实不符,请联系dt猫网进行投诉反馈email:809451989@qq.com,一经查实,立即删除!

相关文章

SpringController返回值和异常自动包装

今天遇到一个需求&#xff0c;在不改动原系统代码的情况下。将Controller的返回值和异常包装到一个统一的返回对象中去。 例如原系统的接口 public String myIp(ApiIgnore HttpServletRequest request);返回的只是一个IP字符串"0:0:0:0:0:0:0:1"&#xff0c;目前接口…

PandasAI—让AI做数据分析

安装 pip install pandasai !pip install --upgrade pandas pandasai 导入依赖项 import pandas as pdfrom pandasai import PandasAIfrom pandasai.llm.openai import OpenAI使用pandas创建一个数据框 df pd.DataFrame({"country": ["United States",…

如何解决ChatGPT消息发不出问题,GPT消息无法发出去,没有响应的问题

前言 今天工作到一半&#xff0c;登陆ChatGPT想咨询一些代码上的问题&#xff0c;结果发现发不了消息了。 ChatGPT 无法发送消息&#xff0c;但是能查看历史的对话。不过首先可以先打开官方的网站&#xff1a;https://status.openai.com/ 。 查看当前Open AI的状态&#xff0…

运放失调电压及其影响

运放失调电压及其影响 在运放的应用中&#xff0c;我们经常会遇到一个重要的性能指标——失调电压。本文将介绍失调电压的定义、优劣范围&#xff0c;并提供一些应对失调电压的方法。 定义 在运放开环使用时&#xff0c;加载在两个输入端之间的直流电压使得放大器直流输出电…

基于斑翠鸟优化算法(Pied Kingfisher Optimizer ,PKO)的无人机三维路径规划(MATLAB)

一、无人机路径规划模型介绍 二、算法介绍 斑翠鸟优化算法(Pied Kingfisher Optimizer ,PKO),是由Abdelazim Hussien于2024年提出的一种基于群体的新型元启发式算法,它从自然界中观察到的斑翠鸟独特的狩猎行为和共生关系中汲取灵感。PKO 算法围绕三个不同的阶段构建:栖息…

利用Java实现数据矩阵的可视化

1. 引言 在进行工程开发时&#xff0c;通常需要在窗口的某个区域将有效数据形象化地呈现出来&#xff0c;例如&#xff1a;对于某一区域的高程数据以伪色彩的方式呈现出高度的变化&#xff0c;这就需要解决利用Java进行数据呈现的问题。本文将建立新工程开始&#xff0c;逐步地…

VScode(Python)使用ssh远程开发(Linux系统树莓派)时,配置falke8和yapf总结避坑!最详细,一步到位!

写在前面&#xff1a;在Windows系统下使用VScode时可以很舒服的使用flake8和yapf&#xff0c;但是在ssh远程开发树莓派时&#xff0c;我却用不了&#xff0c;总是出现问题。当时我就开始了漫长的探索求知之路。中间也请教过许多大佬&#xff0c;但是他们就讲“能用不就行了&…

LabVIEW电磁阀特性测控系统

LabVIEW电磁阀特性测控系统 电磁阀作为自动化工程中的重要组成部分&#xff0c;其性能直接影响系统的稳定性和可靠性。设计一种基于LabVIEW的电磁阀特性测控系统&#xff0c;通过高精度数据采集和智能化控制技术&#xff0c;实现电磁阀流阻、响应时间及脉冲特性的准确测量和分…

【MySQL性能优化】- 一文了解MVCC机制

MySQL理解MVCC &#x1f604;生命不息&#xff0c;写作不止 &#x1f525; 继续踏上学习之路&#xff0c;学之分享笔记 &#x1f44a; 总有一天我也能像各位大佬一样 &#x1f3c6; 博客首页 怒放吧德德 To记录领地 &#x1f31d;分享学习心得&#xff0c;欢迎指正&#xff…

docker部署springboot jar包项目

docker部署springboot jar包项目 前提&#xff0c;服务器环境是docker环境&#xff0c;如果服务器没有安装docker&#xff0c;可以先安装docker环境。 各个环境安装docker&#xff1a; Ubuntu上安装Docker&#xff1a; ubuntu离线安装docker: CentOS7离线安装Docker&#xff1…

应急响应实战笔记03权限维持篇(6)

0x00 前言 在渗透测试中&#xff0c;有三个非常经典的渗透测试框架----Metasploit、Empire、Cobalt Strike。 那么&#xff0c;通过漏洞获取到目标主机权限后&#xff0c;如何利用框架获得持久性权限呢&#xff1f; 0x01 MSF权限维持 使用MSF维持权限的前提是先获得一个met…

开关电源的线性调整率是什么?怎么检测线性调整率?

开关电源线性调整率 开关电源线性调整率是指输入电压在额定范围内变化时&#xff0c;开关电源输出电压随之变化的比率。线性调整率对开关电源的电压稳定性有着重要影响&#xff0c;通常开关电源的线性调整率在1%~5%之间。线性调整率越小&#xff0c;说明电压越稳定&#xff1b;…

Maven编译:Failed to execute goal……Fatal error compiling

问题描述 mvn编译报错 Failed to execute goal org.apache.maven.plugins:maven-compiler-plugin:3.10.1:compile (default-compile) on project postion-report-service: Fatal error compiling 解决方法 pom.xml 中java.version配置为11 检查Maven中的jre是否配置正确

1.下载安装ESP32开发环境ESP-IDE

ESP32简介 ESP32介绍 说到ESP32&#xff0c;首先ESP32不是一个芯片&#xff0c;ESP32是一个系列芯片&#xff0c; 是乐鑫自主研发的一系列芯片微控制器。它主要的功能就是支持WiFi和蓝牙&#xff0c; ESP32指的是ESP32裸芯片。但是&#xff0c;“ESP32”一词通常指ESP32系列芯…

11.用AI运行AI

文章目录 Godmode使用示例概念Recursive Reprompting and Revision(Re3)Language Models as Zero-Shot PlannersHuggingGPTLanguage models can solve computer tasks 部分截图来自原课程视频《2023李宏毅最新生成式AI教程》&#xff0c;B站自行搜索。 用AI运行AI&#xff0c;顾…

PBKDF2算法:保障密码安全的利器

title: PBKDF2算法&#xff1a;保障密码安全的利器 date: 2024/3/14 16:40:05 updated: 2024/3/14 16:40:05 tags: PBKDF2算法密码安全性迭代盐值密钥 PBKDF2算法起源&#xff1a; PBKDF2&#xff08;Password-Based Key Derivation Function 2&#xff09;算法是一种基于密码…

【计算机网络实践】FileZilla Server1.8.1实现局域网ftp文件传输

大二新生随便写写笔记&#xff0c;轻喷&#xff0c;鉴于本人在网络搜索中并未搜索到1.8.1版本的使用方法&#xff0c;因而瞎写一页。 一、准备 下载一个FileZilla Server1.8.1在你想作为服务器的主机上&#xff08;此处直接在官网下载即可&#xff1a;Download FileZilla Serve…

Apache如何配置https以及80重定向443(一文搞懂)

最近公司项目考虑到安全性要使用https&#xff0c;于是领导就把这个任务交给了我&#xff0c;今天就一次性搞懂https如何配置。 文章目录 一、HTTP和HTTPS概念二、HTTP和HTTPS区别三、Apache安装1. 通过ssh连接到我们的服务器2. 使用yum安装apache 四、配置证书1. 安装ssl证书模…

Docker自建一款开源,实用,多功能的网络工具箱

项目地址 https://github.com/jason5ng32/MyIP 项目介绍 查询IP&#xff0c;出口IP&#xff0c;IP信息 测试地址&#xff1a;https://ipcheck.ing/ 功能特点 &#x1f5a5;️ 看自己的 IP&#xff1a;从多个 IPv4 和 IPv6 来源检测显示本机的 IP &#x1f575;️ 看 IP …

设计模式之中介者模式

设计模式专栏&#xff1a; http://t.csdnimg.cn/4Mt4u 目录 1.概述 2.结构 3.实现 4.使用 5.总结 1.概述 我们编写的大部分代码都有不同的组件(类)&#xff0c;它们通过直接引用或指针相互通信。但是在某些情况下&#xff0c;我们不希望对象知道彼此的存在。有时&#xff…