neo4j使用指南

现在的位置: 首页 > 综合 > 正文

2013年08月25日 ⁄ 综合 ⁄ 共 5805字 ⁄ 字号小中大 ⁄ 评论关闭

目前neo4j文档只限于官方文档，本文将neo4j官方主要的文档整理了一下，供需要的人参考，内容也会不断更新！

开始

特点
1. ACID事务
2. 高可用
3. 可伸缩到亿级结点、关系
4. 高速遍历
部署模式

Single Instance Multiple Instances

Embedded EmbeddedGraphDatabase HighlyAvailableGraphDatabase

Standalone Neo4j Server not yet available
Maven配置

$HOME/.m2/setting.xml增加如下配置：
<profiles><profile><repositories><repository><id>neo4j-public-repository</id><name>Publically available Maven 2 repository for Neo4j</name><url>http://m2.neo4j.org</url><snapshots><enabled>true</enabled></snapshots></repository></repositories></profile></profiles>

项目的pom.xml中增加如下配置：
<dependencies><dependency><groupId>org.neo4j</groupId><artifactId>neo4j</artifactId><version>{neo4j.version}</version></dependency></dependencies>

安装
- windows
  1. 从http://neo4j.org/下载
  2. 解压安装文件
  3. 配置环境变量：%NEO4J_HOME%和%PATH%
  4. 安装系统服务：%NEO4J_HOME%/bin/InstallNeo4j.bat
  5. 启动：%NEO4J_HOME%/bin/Neo4j.bat
- Linux基本同windows
关键词
- node：结点
- relationship：关系
- property：属性
- indexer：索引器
- traverser：遍历器
Get Started

neo4j建模

Design Guide
Domain Modeling
一个具体的例子：从建模到类设计到编码全过程
code tips
1. wrap nodes：将node包装
2. Subreferences：设置一个root的子引用结点（相当于namespace），再在此结点下建立你应用域的结点
工具neoclipse

REST API

使用curl进行测试：curl -H http_header -X [GET,POST,PUT,DELETE] -d post_data your_url
-H http的header
-X http的类型
-d post的数据
Accept:application/json -- 可以接受的回复编码格式
Content-type:application/json -- 内容类型格式
例如：
curl -H Accept:application/json -H Content-type:application/json -X GET http://localhost:7474/db/data/
curl -H Accept:application/json -H Content-Type:application/json -X post -d '{"key1":"value1"}' http://localhost:7474/db/data/node

更多REST API信息

索引

分类
1. 内部的自然索引：因为其本身就相当于tree结构（一般数据库也用tree做索引，如B树），被traverser使用作遍历
2. 使用集成索引：如Lucene
可以对结点、关系进行索引
为了更新一个索引，必须先移除，再添加
某一索引按key缓存，可设置缓存大小：setCacheCapacity
旧索引机制
- IndexService
- 不要实例化多个LuceneIndexService，因为内部其指向同一个index数据源
- LuceneFulltextIndexService 和LuceneIndexService两者指向不同的index数据源
- IndexService文档
新索引机制
- 索引文档

事务

事务Overview
嵌套事务：只有一个高层事务，所有嵌套事务均在高层事务内
事务必须在try finally块中，事务必须显示success()
死锁
不要让事务太小（耗I/O），也不要太大（耗内存）
批量插入
- 特点：
  1. 用于数据初始化，批量写入
  2. 非事务
  3. 非线程安全
  4. 性能稍好
  5. 可能corrupt数据集
  6. 非批量插入时，内存被分配在堆外；在批量插入时，内存被分配在堆内；因此，需要保证足够的堆大小
Spring事务配置：具体见IMDB Example Transaction

<tx:annotation-driven /><bean id="transactionManager"class="org.springframework.transaction.jta.JtaTransactionManager"><property name="transactionManager" ref="neo4jTransactionManagerService" /><property name="userTransaction" ref="neo4jUserTransactionService" /></bean><bean id="neo4jTransactionManagerService" class="org.neo4j.kernel.impl.transaction.SpringTransactionManager" /><bean id="neo4jUserTransactionService" class="org.neo4j.kernel.impl.transaction.UserTransactionImpl"><constructor-arg index="0" ref="graphDbService" /></bean>

@Transactional
public void newActors( final List<ActorData> actorList ){// code that operates on the graph}

性能

性能关注

CPU：32位、64位
Memory：至少1G，推荐4～8G
Disk：至少SCSI、EIDE，推荐SSD、SATA
FileSystem ：支持fsync，fdatasync，至少ext3，推荐ext4、zfs
Software：Java 1.6+，OS：Windows，Linux等

性能调整（OS保留内存+File Mapping内存+JVM内存）

两个方面：caches和jvm
1. caches
  - File buffer cache（File Mapping）
    - 低层次缓存，操作系统内存映射，尽可能使用操作系统特性
    - 事务提交时，logic log被立即fdatasync to disk，但是数据文件不是被立即存flush到磁盘上，直到logic log进行了rotate，增长logic log大小可以减少flush到磁盘的次数（可以在conf/neo4j-wrapper.conf中的wrapper.logfile.maxsize参数配置）
    - 防止不必要的page flush，配置OS当dirty page达到一定大小时才flush到磁盘（Linux OS参数配置）
    - 延迟写，直到逻辑日志轮转（sync、fsync、fdatasync）
    - 逻辑日志可用于崩溃恢复
    - 定长记录
      - nodestore 9bytes
      - relationshipstore 33bytes
      - propertystore 25bytes（primitive属性）
      - stringstore 133bytes（指针存在propertystore中，具体数据存在此处，120bytes有效）
      - arraystore 133bytes（指针存在propertystore中，具体数据存在此处，120bytes有效）
    - 可以配置缓存在JVM内存中，也可以不在JVM内存中
    - 相关配置参数
      1. use_memory_mapped_buffers：true使用操作系统内存mapping，false使用堆内存作映射
      2. neostore.nodestore.db.mapped_memory：结点file memory mapping
      3. neostore.relationshipstore.db.mapped_memory：关系file memory mapping
      4. neostore.propertystore.db.mapped_memory：属性file memory mapping
      5. neostore.propertystore.db.strings.mapped_memory：字符串属性file memory mapping
      6. neostore.propertystore.db.arrays.mapped_memory：数组属性file memory mapping
      7. string_block_size：property中类型string块大小，默认为120bytes（块大小在创建后不能改变，如果一个字符串大小大于块大小，字符串会分布在多个块中，有性能影响）
      8. array_block_size：property中类型array块大小，默认为120bytes（块大小在创建后不能改变，同上）
      9. dump_configuration：是否启动时dump配置
    - batch_inserter模式时，只需考虑node和relationship，因为在此模式下一般property在写入后不会读取，node和relationship因为会在其中保存属性等会被读取
  - Object cache
    - 高层次缓存
    - 以优化形式存储，利于快速遍历
    - 缓存在JVM堆中
    - 延迟加载策略
    - LRU策略
    - 相关配置参数：cache_type：值为soft（default）（SoftReference）、weak（WeakReference）、none（始终保持在jvm中，容易OOM）
2. jvm
  - 参数：-d32或-d64（32位或64位），-server，内存相关参数（内存大小、分代比例、线程堆栈大小-Xss（所有线程共享？））、垃圾收集器（推荐使用CMS收集器），OOM时自动产生内存dump
  - 大量使用NIO包，很多堆外内存分配
  - 要防止page in or out（观察swap交换情况，不要给OS保留太小内存）
  - 要考虑jvm的pause time（暂停时间） / throughput（吞吐量）
  - NUMA structure：version 1.6.0 update 18实现，-XX:+UseNUMA
  - 在堆中每个对象大小
    1. Node：688bytes
    2. Relationship：392bytes
    3. Property：232bytes
    4. Primitive：24bytes
    5. String：(64 + 2 * len(string))bytes
    6. Relationships：？？？
3. 写性能：事务大小（事务小，I/O多；事务大，内存多）；避免dirty page flushed to disk unexpectedly（不需要flush时flush）
4. 更多配置，见Configuration Settings
5. 更多性能，见Performance Guide

配置

neo4j-wrapper.conf：logfile相关配置，jvm相关参数配置
conf/neo4j-server.properties：neo4j参数配置文件
1. org.neo4j.server.database.location：db位置，相对于NEO4J_HOME
2. org.neo4j.server.webserver.port：端口号，默认7474
3. org.neo4j.server.webadmin.rrdb.location：运行数据metrics信息文件，默认data/graph.db/../rrd
4. org.neo4j.server.webadmin.data.url：REST API endpoint url，默认/db/data/
5. org.neo4j.server.webadmin.management.uri：管理界面url，默认/db/manage/
6. org.neo4j.server.db.tuning.properties：指定性能参数配置文件，默认conf/neo4j.properties
conf/neo4j.properties：neo4j低层参数配置文件
conf/log4j.properties：log4j配置文件
conf/coord-wrapper.conf：ha结点启动相关配置
conf/coord.cfg：ha结点的zookeeper相关配置

neo4j高可用部署

neo4j-ha架构图
功能
1. fault-tolerant
2. horizontally scaling
3. slave可写，同时同步到master，异步同步到其他他slave；对于master的写，会异步同步到slave
使用Apache Zookeeper保持各节点间的状态信息
HA配置参数
1. ha. machine_id：server id
2. ha. server：master的host和port
3. ha. zoo_keeper_servers：zookeeper连接，以逗号分隔
4. ha. pull_interval：slave从master拉数据的间隔
更多信息，见neo4j_ha_cluster
例子：examples

neo4j shell

配置文件加入：enable_remote_shell = true（默认端口为1337）或enable_remote_shell = port = 1331（指定端口）
neo4j-shell -host -port
man可以查找命令，具体命令参数用man

neo4j备份

参数：keep_logical_logs：true

neo4j监控

监控，见Monitor

neo4j安全

Trouble Shooting

neo4j中的所有api都是线程安全的，因此不需要额外的同步；串行，性能是否可以？

学步园

neo4j使用指南

开始

neo4j建模

REST API

索引

事务

性能

性能关注

性能调整（OS保留内存+File Mapping内存+JVM内存）

配置

neo4j高可用部署

neo4j shell

neo4j备份

neo4j监控

neo4j安全

Trouble Shooting

相关文档

作者: rococo

书签

最新文章New

本站推荐

返回首页

	Single Instance	Multiple Instances
Embedded	EmbeddedGraphDatabase	HighlyAvailableGraphDatabase
Standalone	Neo4j Server	not yet available