现在的位置: 首页 > 综合 > 正文

GREENPLUM中的gp_dist_random()函数

2013年10月05日 ⁄ 综合 ⁄ 共 13519字 ⁄ 字号 评论关闭

虽说GREENPLUM不是开源的,可是自带的一些脚本还是可以看一看的。最近在学习其中的几个,期间经常看到使用gp_dist_random()函数,奇怪自己在ADMIN文档里并没有看到过这个函数啊,去搜了一下也没搜到。试了一试,大概知道了是个什么功能。

首先连到MASTER上

[gpadmin1@hadoop7 ~]$ psql
psql (8.2.13)
Type "help" for help.

testutf8=# select count(1) from gp_dist_random('pg_class');
 count
-------
  1633
(1 row)

testutf8=# select count(1) from pg_class;
 count
-------
   272
(1 row)

testutf8=# select * from gp_id;
 gpname | numsegments | dbid | content
--------+-------------+------+---------
 GP     |           6 |    1 |      -1
(1 row)

testutf8=# select 272*6;
 ?column?
----------
     1632
(1 row)

 

看出些端倪来了吧,单节点*节点个数=gp_dist_random()函数的结果

因为我们在日常使用的过程中MASTER以及各个SEGMENT的PG_CLASS正常情况下应该都是一致的,所以这样乘是合理的。我这里差了1(1633-1632=1)是因为我之前有直接连到某个SEGMENT上直接做操作,没有通过MASTER,因此可能会有一些不一致。

 

也就是说,gp_dist_random()函数的作用就是从MASTER可以查到某一个表在各个SEGMENT上的情况(不包含MASTER)。

当然,还是有一些限制条件的,整个函数不是对每个表都有效的。

1、这类表必须是单独连到各个POSTGRES进程上创建的,不能是直接连上MASTER在整个集群内进行创建。

2、在各个SEGMENT上表的OID要是一致的。这一点可能就比较难控制了,不过刚才我在实验环境中由于没其他人操作,各个SEGMENT都很一致,所以还是实验成功了。

 

下面贴一下结果看看吧

连接到MASTER所在端口号5345的那个POSTGRES进程

[gpadmin1@hadoop8 ~]$ PGOPTIONS="-c gp_session_role=utility" psql -h hadoop7 -p 5345 -U r1
psql (8.2.13)
Type "help" for help.

testutf8=# create table t1(id int);
CREATE TABLE
testutf8=# insert into t1 values(1);
INSERT 0 1
testutf8=# insert into t1 values(12);
INSERT 0 1
testutf8=# select * from t1;
 id
----
  1
 12
(2 rows)

现在正常通过MASTER连接到集群上,查询一下试试

testutf8=# select * from t1;
 id
----
  1
 12
(2 rows)

testutf8=# select * from gp_dist_random('t1');
ERROR:  relation with OID 43268 does not exist  (seg0 slice1 hadoop7:20001 pid=1511)

这里报错了,因为t1表这个对象现在只存在MASTER的那个POSTGRES进程上,在其他各个SEGMENT上并不存在。因此我们需要单独连接到各个SEGMENT上创建这个对象。

[gpadmin1@hadoop8 ~]$ PGOPTIONS="-c gp_session_role=utility" psql -h hadoop7 -p 20001 -U r1
psql (8.2.13)
Type "help" for help.

testutf8=# create table t1(id int);
CREATE TABLE
testutf8=# /q
[gpadmin1@hadoop8 ~]$ PGOPTIONS="-c gp_session_role=utility" psql -h hadoop7 -p 20002 -U r1
psql (8.2.13)
Type "help" for help.

testutf8=# create table t1(id int);
CREATE TABLE
testutf8=# /q
[gpadmin1@hadoop8 ~]$ PGOPTIONS="-c gp_session_role=utility" psql -h hadoop8 -p 20002 -U r1
psql (8.2.13)
Type "help" for help.

testutf8=# create table t1(id int);
CREATE TABLE
testutf8=# /q
[gpadmin1@hadoop8 ~]$ PGOPTIONS="-c gp_session_role=utility" psql -h hadoop8 -p 20001 -U r1
psql (8.2.13)
Type "help" for help.

testutf8=# create table t1(id int);
CREATE TABLE
testutf8=# /q
[gpadmin1@hadoop8 ~]$ PGOPTIONS="-c gp_session_role=utility" psql -h hadoop9 -p 20001 -U r1
psql (8.2.13)
Type "help" for help.

testutf8=# create table t1(id int);
CREATE TABLE
testutf8=# /q
[gpadmin1@hadoop8 ~]$ PGOPTIONS="-c gp_session_role=utility" psql -h hadoop9 -p 20002 -U r1
psql (8.2.13)
Type "help" for help.

testutf8=# create table t1(id int);
CREATE TABLE

在这个SEGMENT上插入两条数据试试

testutf8=# insert into t1 values(2);
INSERT 0 1
testutf8=# insert into t1 values(22);
INSERT 0 1
testutf8=# select * from t1;
 id
----
  2
 22
(2 rows)

testutf8=# select * from gp_dist_random('t1');
 id
----
  2
 22
(2 rows)

可以看到,在某个SEGMENT上也可以使用gp_dist_random()函数,只不过仍然只能看到本SEGMENT上的数据而已。

这时我们切换回到正常通过MASTER连接到集群上的窗口,查询一下看看

testutf8=# select * from gp_dist_random('t1');
 id
----
  2
 22
(2 rows)

这时就通过函数gp_dist_random()就查看到了在SEGMENT上插入的那两条数据,注意这里并不包含在MASTER上的1,12两条数据。

再连接到某个SEGMENT上插入两条数据试试

[gpadmin1@hadoop7 ~]$ PGOPTIONS="-c gp_session_role=utility" psql -h hadoop9 -p 20001 -U r1
psql (8.2.13)
Type "help" for help.

testutf8=# insert into t1 values(33);
INSERT 0 1
testutf8=# insert into t1 values(3);

INSERT 0 1

然后再回到正常通过MASTER连接到集群上的窗口,查询一下看看

testutf8=# select * from gp_dist_random('t1');
 id
----
  2
 22
 33
  3
(4 rows)

新增的两条数据也看到了。

 

在GREENPLUM的源程序中使用这个函数来进行MASTER和各个SEGMENT之间的数据比对。

因为正常情况下都会通过MASTER来进行操作,因此不会存在这种特殊类型的表,所以暂时还没想到这个函数能够应用在其他什么地方。

不过对于有些系统表,我们还是可以使用这个函数来做一些事情的。比如说我在MASTER上发起了一个查询,虽然通过pg_stat_activity可以看到当前执行的查询,不过如果一个查询好久好久都没有结束,那么我可能会想它是不是死掉了,想查看一下这个这个查询究竟有没有在各个SEGMENT上执行。

在MASTER上创建一个test表,插入多点数据,然后发起一个JOIN多次的查询

testutf8=# drop table test;
DROP TABLE
testutf8=# create table test as select * from pg_class;
NOTICE:  Table doesn't have 'DISTRIBUTED BY' clause -- Using column(s) named 'relname' as the Greenplum Database data distribution key for this table.
HINT:  The 'DISTRIBUTED BY' clause determines the distribution of data. Make sure column(s) chosen are the optimal data distribution key to minimize skew.
SELECT 275
testutf8=# insert into test select * from test;
INSERT 0 275
testutf8=# insert into test select * from test;
INSERT 0 550
testutf8=# insert into test select * from test;
INSERT 0 1100
testutf8=# insert into test select * from test;
INSERT 0 2200
testutf8=# insert into test select * from test;
INSERT 0 4400
testutf8=# insert into test select * from test;
INSERT 0 8800
testutf8=# insert into test select * from test;
INSERT 0 17600
testutf8=# insert into test select * from test;
INSERT 0 35200
testutf8=# insert into test select * from test;
INSERT 0 70400
testutf8=# insert into test select * from test;
INSERT 0 140800
testutf8=#
testutf8=# select * from test a cross join test b cross join test c;

 

另外开启一个窗口连接

通过函数gp_dist_random('pg_stat_activity')可以看到pg_stat_activity在各个SEGMENT上的情况,这里是不包含MASTER的

testutf8=# select * from gp_dist_random('pg_stat_activity');
 datid | datname  | procpid | sess_id | usesysid | usename  |                       current_query                       | waiting |          query_start          |         backend_start         | client_addr  | client_port
-------+----------+---------+---------+----------+----------+-----------------------------------------------------------+---------+-------------------------------+-------------------------------+--------------+-------------
 25691 | testutf8 |   20837 |      29 |       10 | gpadmin1 | select * from test a cross join test b cross join test c; | f       | 2010-11-30 11:02:35.595449+08 | 2010-11-30 11:02:35.597638+08 | 10.20.151.11 |       37041
 25691 | testutf8 |   20741 |      29 |       10 | gpadmin1 | <IDLE>                                                    | f       | 2010-11-30 11:02:35.595449+08 | 2010-11-30 11:01:44.967478+08 | 10.20.151.11 |       37023
 25691 | testutf8 |   20658 |      30 |       10 | gpadmin1 | select * from gp_dist_random('pg_stat_activity');         | f       | 2010-11-30 11:02:41.733726+08 | 2010-11-30 11:00:00.918739+08 | 10.20.151.11 |       36992
 25691 | testutf8 |   18629 |      29 |       10 | gpadmin1 | select * from test a cross join test b cross join test c; | f       | 2010-11-30 11:02:35.595449+08 | 2010-11-30 10:08:54.113912+08 | 10.20.151.11 |       46839
 25691 | testutf8 |   20705 |      -1 |       10 | gpadmin1 | <IDLE>                                                    | f       |                               | 2010-11-30 11:01:23.691262+08 | 10.20.151.93 |       51429
 25691 | testutf8 |   20839 |      29 |       10 | gpadmin1 | select * from test a cross join test b cross join test c; | f       | 2010-11-30 11:02:35.595449+08 | 2010-11-30 11:02:35.601519+08 | 10.20.151.11 |       50706
 25691 | testutf8 |   20743 |      29 |       10 | gpadmin1 | <IDLE>                                                    | f       | 2010-11-30 11:02:35.595449+08 | 2010-11-30 11:01:44.970788+08 | 10.20.151.11 |       50688
 25691 | testutf8 |   20660 |      30 |       10 | gpadmin1 | select * from gp_dist_random('pg_stat_activity');         | f       | 2010-11-30 11:02:41.733726+08 | 2010-11-30 11:00:00.922336+08 | 10.20.151.11 |       50657
 25691 | testutf8 |   18631 |      29 |       10 | gpadmin1 | select * from test a cross join test b cross join test c; | f       | 2010-11-30 11:02:35.595449+08 | 2010-11-30 10:08:54.134902+08 | 10.20.151.11 |       36656
 25691 | testutf8 |   20706 |      -1 |       10 | gpadmin1 | <IDLE>                                                    | f       |                               | 2010-11-30 11:01:23.693834+08 | 10.20.151.93 |       59344
 25691 | testutf8 |   14200 |      29 |       10 | gpadmin1 | select * from test a cross join test b cross join test c; | f       | 2010-11-30 11:02:35.595449+08 | 2010-11-30 11:02:35.683182+08 | 10.20.151.11 |       46190
 25691 | testutf8 |   14104 |      29 |       10 | gpadmin1 | <IDLE>                                                    | f       | 2010-11-30 11:02:35.595449+08 | 2010-11-30 11:01:45.044488+08 | 10.20.151.11 |       46172
 25691 | testutf8 |   14049 |      30 |       10 | gpadmin1 | select * from gp_dist_random('pg_stat_activity');         | f       | 2010-11-30 11:02:41.733726+08 | 2010-11-30 11:00:00.997564+08 | 10.20.151.11 |       46141
 25691 | testutf8 |   12203 |      29 |       10 | gpadmin1 | select * from test a cross join test b cross join test c; | f       | 2010-11-30 11:02:35.595449+08 | 2010-11-30 10:08:54.361576+08 | 10.20.151.11 |       41952
 25691 | testutf8 |   14092 |      -1 |       10 | gpadmin1 | <IDLE>                                                    | f       |                               | 2010-11-30 11:01:23.75539+08  | 10.20.151.93 |       55913
 25691 | testutf8 |   14202 |      29 |       10 | gpadmin1 | select * from test a cross join test b cross join test c; | f       | 2010-11-30 11:02:35.595449+08 | 2010-11-30 11:02:35.686193+08 | 10.20.151.11 |       41996
 25691 | testutf8 |   14106 |      29 |       10 | gpadmin1 | <IDLE>                                                    | f       | 2010-11-30 11:02:35.595449+08 | 2010-11-30 11:01:45.047272+08 | 10.20.151.11 |       41978
 25691 | testutf8 |   14051 |      30 |       10 | gpadmin1 | select * from gp_dist_random('pg_stat_activity');         | f       | 2010-11-30 11:02:41.733726+08 | 2010-11-30 11:00:01.004602+08 | 10.20.151.11 |       41947
 25691 | testutf8 |   12205 |      29 |       10 | gpadmin1 | select * from test a cross join test b cross join test c; | f       | 2010-11-30 11:02:35.595449+08 | 2010-11-30 10:08:54.365583+08 | 10.20.151.11 |       57566
 25691 | testutf8 |   13836 |      -1 |       10 | gpadmin1 | <IDLE>                                                    | f       |                               | 2010-11-30 10:53:35.973226+08 | 10.20.151.93 |       44622
 25691 | testutf8 |    6246 |      29 |       10 | gpadmin1 | select * from test a cross join test b cross join test c; | f       | 2010-11-30 11:02:35.595449+08 | 2010-11-30 11:02:35.678602+08 | 10.20.151.11 |       47749
 25691 | testutf8 |    6229 |      29 |       10 | gpadmin1 | <IDLE>                                                    | f       | 2010-11-30 11:02:35.595449+08 | 2010-11-30 11:01:45.042156+08 | 10.20.151.11 |       47731
 25691 | testutf8 |    6188 |      30 |       10 | gpadmin1 | select * from gp_dist_random('pg_stat_activity');         | f       | 2010-11-30 11:02:41.733726+08 | 2010-11-30 11:00:00.992359+08 | 10.20.151.11 |       47700
 25691 | testutf8 |    4638 |      29 |       10 | gpadmin1 | select * from test a cross join test b cross join test c; | f       | 2010-11-30 11:02:35.595449+08 | 2010-11-30 10:08:54.315053+08 | 10.20.151.11 |       42496
 25691 | testutf8 |    6057 |      -1 |       10 | gpadmin1 | <IDLE>                                                    | f       |                               | 2010-11-30 10:56:10.860697+08 | 10.20.151.93 |       47089
 25691 | testutf8 |    6244 |      29 |       10 | gpadmin1 | select * from test a cross join test b cross join test c; | f       | 2010-11-30 11:02:35.595449+08 | 2010-11-30 11:02:35.667617+08 | 10.20.151.11 |       48144
 25691 | testutf8 |    6227 |      29 |       10 | gpadmin1 | <IDLE>                                                    | f       | 2010-11-30 11:02:35.595449+08 | 2010-11-30 11:01:45.038651+08 | 10.20.151.11 |       48126
 25691 | testutf8 |    6186 |      30 |       10 | gpadmin1 | select * from gp_dist_random('pg_stat_activity');         | f       | 2010-11-30 11:02:41.733726+08 | 2010-11-30 11:00:00.984353+08 | 10.20.151.11 |       48095
 25691 | testutf8 |    4636 |      29 |       10 | gpadmin1 | select * from test a cross join test b cross join test c; | f       | 2010-11-30 11:02:35.595449+08 | 2010-11-30 10:08:54.211114+08 | 10.20.151.11 |       37456
 25691 | testutf8 |    6214 |      -1 |       10 | gpadmin1 | <IDLE>                                                    | f       |                               | 2010-11-30 11:01:23.758726+08 | 10.20.151.93 |       42670
(30 rows)

 

下面这样查询,只能看到MASTER上的pg_stat_activity
内容


testutf8=# select * from pg_stat_activity;
 datid | datname  | procpid | sess_id | usesysid | usename  |                       current_query                       | waiting |          query_start          |         backend_start         | client_addr  | client_port
-------+----------+---------+---------+----------+----------+-----------------------------------------------------------+---------+-------------------------------+-------------------------------+--------------+-------------
 25691 | testutf8 |   20728 |      29 |       10 | gpadmin1 | <IDLE>                                                    | f       | 2010-11-30 11:01:32.185858+08 | 2010-11-30 11:01:32.495914+08 | 10.20.151.11 |       38653
 25691 | testutf8 |   20655 |      30 |       10 | gpadmin1 | select * from pg_stat_activity;                           | f       | 2010-11-30 11:02:48.213233+08 | 2010-11-30 10:59:59.05798+08  |              |          -1
 25691 | testutf8 |   18621 |      29 |       10 | gpadmin1 | select * from test a cross join test b cross join test c; | f       | 2010-11-30 11:02:35.595449+08 | 2010-11-30 10:08:40.245568+08 |              |          -1
(3 rows)

testutf8=#

连到某个特定的SEGMETN上看一下

testutf8=# /q
[gpadmin1@hadoop7 ~]$ PGOPTIONS="-c gp_session_role=utility" psql -h hadoop7 -p 20001 -U r1
psql (8.2.13)
Type "help" for help.

testutf8=# select * from pg_stat_activity;
 datid | datname  | procpid | sess_id | usesysid | usename  |                       current_query                       | waiting |          query_start          |         backend_start         | client_addr  | client_port
-------+----------+---------+---------+----------+----------+-----------------------------------------------------------+---------+-------------------------------+-------------------------------+--------------+-------------
 25691 | testutf8 |   20837 |      29 |       10 | gpadmin1 | select * from test a cross join test b cross join test c; | f       | 2010-11-30 11:02:35.595449+08 | 2010-11-30 11:02:35.597638+08 | 10.20.151.11 |       37041
 25691 | testutf8 |   20741 |      29 |       10 | gpadmin1 | <IDLE>                                                    | f       | 2010-11-30 11:02:35.595449+08 | 2010-11-30 11:01:44.967478+08 | 10.20.151.11 |       37023
 25691 | testutf8 |   20889 |      -1 |    26271 | r1       | select * from pg_stat_activity;                           | f       | 2010-11-30 11:04:16.063552+08 | 2010-11-30 11:04:13.766148+08 | 10.20.151.11 |       55612
 25691 | testutf8 |   18629 |      29 |       10 | gpadmin1 | select * from test a cross join test b cross join test c; | f       | 2010-11-30 11:02:35.595449+08 | 2010-11-30 10:08:54.113912+08 | 10.20.151.11 |       46839
 25691 | testutf8 |   20705 |      -1 |       10 | gpadmin1 | <IDLE>                                                    | f       |                               | 2010-11-30 11:01:23.691262+08 | 10.20.151.93 |       51429
(5 rows)

抱歉!评论已关闭.