现在的位置: 首页 > 综合 > 正文

扩展greenplum集群

2013年10月31日 ⁄ 综合 ⁄ 共 17515字 ⁄ 字号 评论关闭

也许我们在设计系统架构的已经评估过了集群的大小,可是现实往往是我们不能预料的,当来自业务的压力越来越大的时候,系统已经变得不堪重负,这时也许我们就该对其进行扩容了。

在GP里通过使用GPEXPAND工具可以帮助我们对现有集群进行扩充。

整个过程大致分为以下几个阶段:

1、Preparing

准备机器,配置好软件环境,使用GP自带的一些工具,例如checkos进行检测。为新添SEGMENT建立相应的目录,注意与现有集群保持一致。

2、Initializing New Segments

在这个阶段需要一个input file,你可以手动创建或者通过gpexpand进行交互式的创建,ADMIN文档上写着GP推荐使用后者。详细的步骤可以参考ADMIN文档,这里我就把我的操作记录下来吧。

 

//现有集群的一个大体情况

[gpadmin1@hadoop1 conf]$ gpstate -c
20101029:14:19:59:gpstate:hadoop1:gpadmin1-[INFO]:-Starting gpstate with args: -c
20101029:14:20:00:gpstate:hadoop1:gpadmin1-[INFO]:-local Greenplum Version: 'postgres (Greenplum Database) 4.0.1.0 build 1'
20101029:14:20:00:gpstate:hadoop1:gpadmin1-[INFO]:-Obtaining Segment details from master...
20101029:14:20:00:gpstate:hadoop1:gpadmin1-[INFO]:--------------------------------------------------------------
20101029:14:20:00:gpstate:hadoop1:gpadmin1-[INFO]:--Current GPDB mirror list and status
20101029:14:20:00:gpstate:hadoop1:gpadmin1-[INFO]:--Type = Group
20101029:14:20:00:gpstate:hadoop1:gpadmin1-[INFO]:--------------------------------------------------------------
20101029:14:20:00:gpstate:hadoop1:gpadmin1-[INFO]:-   Status                             Data State     Primary   Datadir                          Port    Mirror    Datadir                          Port
20101029:14:20:00:gpstate:hadoop1:gpadmin1-[INFO]:-   Primary Active, Mirror Available   Synchronized   hadoop1   /home/gpadmin1/gpdatap1/aligp0   30000   hadoop2   /home/gpadmin1/gpdatam1/aligp0   40000
20101029:14:20:00:gpstate:hadoop1:gpadmin1-[INFO]:-   Primary Active, Mirror Available   Synchronized   hadoop1   /home/gpadmin1/gpdatap2/aligp1   30001   hadoop2   /home/gpadmin1/gpdatam2/aligp1   40001
20101029:14:20:00:gpstate:hadoop1:gpadmin1-[INFO]:-   Primary Active, Mirror Available   Synchronized   hadoop2   /home/gpadmin1/gpdatap1/aligp2   30000   hadoop3   /home/gpadmin1/gpdatam1/aligp2   40000
20101029:14:20:00:gpstate:hadoop1:gpadmin1-[INFO]:-   Primary Active, Mirror Available   Synchronized   hadoop2   /home/gpadmin1/gpdatap2/aligp3   30001   hadoop3   /home/gpadmin1/gpdatam2/aligp3   40001
20101029:14:20:00:gpstate:hadoop1:gpadmin1-[INFO]:-   Primary Active, Mirror Available   Synchronized   hadoop3   /home/gpadmin1/gpdatap1/aligp4   30000   hadoop1   /home/gpadmin1/gpdatam1/aligp4   40000
20101029:14:20:00:gpstate:hadoop1:gpadmin1-[INFO]:-   Primary Active, Mirror Available   Synchronized   hadoop3   /home/gpadmin1/gpdatap2/aligp5   30001   hadoop1   /home/gpadmin1/gpdatam2/aligp5   40001
20101029:14:20:00:gpstate:hadoop1:gpadmin1-[INFO]:--------------------------------------------------------------

 

//通过gpexpand命令交互式创建input file

[gpadmin1@hadoop1 conf]$ gpexpand
20101029:14:21:33:gpexpand:hadoop1:gpadmin1-[INFO]:-Querying gpexpand schema for current expansion state

System Expansion is used to add segments to an existing GPDB array.
gpexpand did not detect a System Expansion that is in progress.

Before initiating a System Expansion, you need to provision and burn-in
the new hardware.  Please be sure to run gpcheckperf/gpcheckos to make
sure the new hardware is working properly.

Please refer to the Admin Guide for more information.

Would you like to initiate a new System Expansion Yy|Nn (default=N):
> y

This utility can handle some expansion scenarios by asking a few questions.
More complex expansions can be done by providing an input file with
the --input <file>.  Please see the docs for the format of this file.

       The current system appears to be non-standard.
       The address value for hadoop2 does not correspond to a standard address.
       gpexpand may not be able to symmetrically distribute the new segments appropriately.
       It is recommended that you specify your own input file with appropriate values.

Are you sure you want to continue with this gpexpand session? Yy|Nn (default=N):
> y

Enter a comma separated list of new hosts you want
to add to your array.  Do not include interface hostnames.
**Enter a blank line to only add segments to existing hosts**[]:
> hadoop7,hadoop8

You must now specify a mirroring strategy for the new hosts.  Spread mirroring places
a given hosts mirrored segments each on a separate host.  You must be
adding more hosts than the number of segments per host to use this.
Grouped mirroring places all of a given hosts segments on a single
mirrored host.  You must be adding at least 2 hosts in order to use this.

What type of mirroring strategy would you like?
 spread|grouped (default=grouped):
> grouped

    By default, new hosts are configured with the same number of primary
    segments as existing hosts.  Optionally, you can increase the number
    of segments per host.

    For example, if existing hosts have two primary segments, entering a value
    of 2 will initialize two additional segments on existing hosts, and four
    segments on new hosts.  In addition, mirror segments will be added for
    these new primary segments if mirroring is enabled.
   
//Enter the number of new primary segments to add, if any. By default, new hosts are initialized with the same number of primary segments as existing hosts. Optionally, you can increase the number of segments per host.
If you want to increase the number of segments per host, enter a number greater than zero. This number of additional segments will be initialized on all hosts. For example, if existing hosts currently have two segments per host, entering a value of 2 will initialize two additional segments on existing hosts, and four new segments on new hosts.

How many new primary segments per host do you want to add? (default=0):
> 0

Generating configuration file...

20101029:14:22:10:gpexpand:hadoop1:gpadmin1-[INFO]:-Generating input file...

Input configuration files were written to 'gpexpand_inputfile_20101029_142210' and 'None'.
Please review the file and make sure that it is correct then re-run
with: gpexpand -i gpexpand_inputfile_20101029_142210 -D aligputf8
               
20101029:14:22:10:gpexpand:hadoop1:gpadmin1-[INFO]:-Exiting...
[gpadmin1@hadoop1 conf]$ ls
gpexpand_inputfile_20101029_142210  host1  host2  hostlist_all  initgp  initgp.bak

 

//我们可以看一下input file里的内容

[gpadmin1@hadoop1 conf]$ cat gpexpand_inputfile_20101029_142210
hadoop7:hadoop7:30000:/home/gpadmin1/gpdatap1/aligp6:15:6:p:10000
hadoop8:hadoop8:40000:/home/gpadmin1/gpdatam1/aligp6:21:6:m:20000
hadoop7:hadoop7:30001:/home/gpadmin1/gpdatap2/aligp7:16:7:p:10001
hadoop8:hadoop8:40001:/home/gpadmin1/gpdatam2/aligp7:22:7:m:20001
hadoop8:hadoop8:30000:/home/gpadmin1/gpdatap1/aligp8:17:8:p:10000
hadoop7:hadoop7:40000:/home/gpadmin1/gpdatam1/aligp8:19:8:m:20000
hadoop8:hadoop8:30001:/home/gpadmin1/gpdatap2/aligp9:18:9:p:10001
hadoop7:hadoop7:40001:/home/gpadmin1/gpdatam2/aligp9:20:9:m:20001

 

3、Redistributing Tables

//继而通过input file里的内容来指导gpexpand命令进行操作

[gpadmin1@hadoop1 conf]$ gpexpand -i gpexpand_inputfile_20101029_142210 -D aligputf8
20101029:14:23:01:gpexpand:hadoop1:gpadmin1-[INFO]:-Querying gpexpand schema for current expansion state
20101029:14:23:02:gpexpand:hadoop1:gpadmin1-[INFO]:-Readying Greenplum Database for a new expansion
20101029:14:23:14:gpexpand:hadoop1:gpadmin1-[INFO]:-Checking database aligputf8 for unalterable tables...
20101029:14:23:14:gpexpand:hadoop1:gpadmin1-[INFO]:-Checking database postgres for unalterable tables...
20101029:14:23:15:gpexpand:hadoop1:gpadmin1-[INFO]:-Checking database template1 for unalterable tables...
20101029:14:23:15:gpexpand:hadoop1:gpadmin1-[INFO]:-Checking database aligputf8 for tables with unique indexes...
20101029:14:23:15:gpexpand:hadoop1:gpadmin1-[INFO]:-Checking database postgres for tables with unique indexes...
20101029:14:23:15:gpexpand:hadoop1:gpadmin1-[INFO]:-Checking database template1 for tables with unique indexes...
20101029:14:23:15:gpexpand:hadoop1:gpadmin1-[INFO]:-Creating segment template
20101029:14:23:15:gpexpand:hadoop1:gpadmin1-[INFO]:-VACUUM FULL on the catalog tables
20101029:14:23:19:gpexpand:hadoop1:gpadmin1-[INFO]:-Starting copy of segment dbid 1 to locaiton /home/gpadmin1/gpmaster/gpexpand_10292010_15322
20101029:14:23:30:gpexpand:hadoop1:gpadmin1-[INFO]:-Copying postgresql.conf from existing segment into template
20101029:14:23:31:gpexpand:hadoop1:gpadmin1-[INFO]:-Copying pg_hba.conf from existing segment into template
20101029:14:23:31:gpexpand:hadoop1:gpadmin1-[INFO]:-Adding new segments into template pg_hba.conf
20101029:14:23:31:gpexpand:hadoop1:gpadmin1-[INFO]:-Creating schema tar file
20101029:14:23:32:gpexpand:hadoop1:gpadmin1-[INFO]:-Distributing template tar file to new hosts
20101029:14:23:35:gpexpand:hadoop1:gpadmin1-[INFO]:-Configuring new segments (primary)
20101029:14:23:36:gpexpand:hadoop1:gpadmin1-[INFO]:-Configuring new segments (mirror)
20101029:14:23:38:gpexpand:hadoop1:gpadmin1-[INFO]:-Backing up pg_hba.conf file on original segments
20101029:14:23:38:gpexpand:hadoop1:gpadmin1-[INFO]:-Copying new pg_hba.conf file to original segments
20101029:14:23:38:gpexpand:hadoop1:gpadmin1-[INFO]:-Configuring original segments
20101029:14:23:38:gpexpand:hadoop1:gpadmin1-[INFO]:-Cleaning up temporary template files
20101029:14:23:47:gpexpand:hadoop1:gpadmin1-[INFO]:-Starting Greenplum Database in restricted mode
20101029:14:23:58:gpexpand:hadoop1:gpadmin1-[INFO]:-Stopping database
20101029:14:24:09:gpexpand:hadoop1:gpadmin1-[INFO]:-Configuring new segment filespaces
20101029:14:24:09:gpexpand:hadoop1:gpadmin1-[INFO]:-Cleaning up databases in new segments.
20101029:14:24:09:gpexpand:hadoop1:gpadmin1-[INFO]:-Starting master in utility mode
20101029:14:24:10:gpexpand:hadoop1:gpadmin1-[INFO]:-Stopping master in utility mode
20101029:14:24:16:gpexpand:hadoop1:gpadmin1-[INFO]:-Starting Greenplum Database in restricted mode
20101029:14:24:25:gpexpand:hadoop1:gpadmin1-[INFO]:-Creating expansion schema
20101029:14:24:30:gpexpand:hadoop1:gpadmin1-[INFO]:-Populating gpexpand.status_detail with data from database aligputf8
20101029:14:24:31:gpexpand:hadoop1:gpadmin1-[INFO]:-Populating gpexpand.status_detail with data from database postgres
20101029:14:24:33:gpexpand:hadoop1:gpadmin1-[INFO]:-Populating gpexpand.status_detail with data from database template1
20101029:14:24:35:gpexpand:hadoop1:gpadmin1-[INFO]:-Stopping Greenplum Database
20101029:14:24:48:gpexpand:hadoop1:gpadmin1-[INFO]:-Starting Greenplum Database
20101029:14:24:56:gpexpand:hadoop1:gpadmin1-[INFO]:-Starting new mirror segment synchronization
20101029:14:25:09:gpexpand:hadoop1:gpadmin1-[INFO]:-************************************************
20101029:14:25:09:gpexpand:hadoop1:gpadmin1-[INFO]:-Initialization of the system expansion complete.
20101029:14:25:09:gpexpand:hadoop1:gpadmin1-[INFO]:-To begin table expansion onto the new segments
20101029:14:25:09:gpexpand:hadoop1:gpadmin1-[INFO]:-rerun gpexpand
20101029:14:25:09:gpexpand:hadoop1:gpadmin1-[INFO]:-************************************************
20101029:14:25:09:gpexpand:hadoop1:gpadmin1-[INFO]:-Exiting...

 

//我们现在可以检查一下系统配置,已经扩展成功了,还差一步就是重分布表。在此阶段GP会把表都变为DISTRIBUTED RANDOMLY,经过重分布表以后,表就会恢复了原有的分布策略。

[gpadmin1@hadoop1 conf]$ psql
psql (8.2.14)
Type "help" for help.

aligputf8=# select * from gp_segment_configuration;
 dbid | content | role | preferred_role | mode | status | port  | hostname | address | replication_port | san_mounts
------+---------+------+----------------+------+--------+-------+----------+---------+------------------+------------
    1 |      -1 | p    | p              | s    | u      |  5342 | hadoop1  | hadoop1 |                  |
    2 |       0 | p    | p              | s    | u      | 30000 | hadoop1  | hadoop1 |            10000 |
    4 |       2 | p    | p              | s    | u      | 30000 | hadoop2  | hadoop2 |            10000 |
    6 |       4 | p    | p              | s    | u      | 30000 | hadoop3  | hadoop3 |            10000 |
    3 |       1 | p    | p              | s    | u      | 30001 | hadoop1  | hadoop1 |            10001 |
    5 |       3 | p    | p              | s    | u      | 30001 | hadoop2  | hadoop2 |            10001 |
    7 |       5 | p    | p              | s    | u      | 30001 | hadoop3  | hadoop3 |            10001 |
    8 |       0 | m    | m              | s    | u      | 40000 | hadoop2  | hadoop2 |            20000 |
    9 |       1 | m    | m              | s    | u      | 40001 | hadoop2  | hadoop2 |            20001 |
   10 |       2 | m    | m              | s    | u      | 40000 | hadoop3  | hadoop3 |            20000 |
   11 |       3 | m    | m              | s    | u      | 40001 | hadoop3  | hadoop3 |            20001 |
   12 |       4 | m    | m              | s    | u      | 40000 | hadoop1  | hadoop1 |            20000 |
   13 |       5 | m    | m              | s    | u      | 40001 | hadoop1  | hadoop1 |            20001 |
   14 |      -1 | m    | m              | s    | u      |  5342 | hadoop2  | hadoop2 |                  |
   17 |       8 | p    | p              | s    | u      | 30000 | hadoop8  | hadoop8 |            10000 |
   21 |       8 | m    | m              | s    | u      | 40000 | hadoop7  | hadoop7 |            20000 |
   18 |       9 | p    | p              | s    | u      | 30001 | hadoop8  | hadoop8 |            10001 |
   22 |       9 | m    | m              | s    | u      | 40001 | hadoop7  | hadoop7 |            20001 |
   15 |       6 | p    | p              | s    | u      | 30000 | hadoop7  | hadoop7 |            10000 |
   19 |       6 | m    | m              | s    | u      | 40000 | hadoop8  | hadoop8 |            20000 |
   16 |       7 | p    | p              | s    | u      | 30001 | hadoop7  | hadoop7 |            10001 |
   20 |       7 | m    | m              | s    | u      | 40001 | hadoop8  | hadoop8 |            20001 |
(22 rows)
[gpadmin1@hadoop1 conf]$ gpstate -c
20101029:14:59:21:gpstate:hadoop1:gpadmin1-[INFO]:-Starting gpstate with args: -c
20101029:14:59:21:gpstate:hadoop1:gpadmin1-[INFO]:-local Greenplum Version: 'postgres (Greenplum Database) 4.0.1.0 build 1'
20101029:14:59:21:gpstate:hadoop1:gpadmin1-[INFO]:-Obtaining Segment details from master...
20101029:14:59:21:gpstate:hadoop1:gpadmin1-[INFO]:--------------------------------------------------------------
20101029:14:59:21:gpstate:hadoop1:gpadmin1-[INFO]:--Current GPDB mirror list and status
20101029:14:59:21:gpstate:hadoop1:gpadmin1-[INFO]:--Type = Group
20101029:14:59:21:gpstate:hadoop1:gpadmin1-[INFO]:--------------------------------------------------------------
20101029:14:59:21:gpstate:hadoop1:gpadmin1-[INFO]:-   Status                             Data State     Primary   Datadir                          Port    Mirror    Datadir                          Port
20101029:14:59:21:gpstate:hadoop1:gpadmin1-[INFO]:-   Primary Active, Mirror Available   Synchronized   hadoop1   /home/gpadmin1/gpdatap1/aligp0   30000   hadoop2   /home/gpadmin1/gpdatam1/aligp0   40000
20101029:14:59:21:gpstate:hadoop1:gpadmin1-[INFO]:-   Primary Active, Mirror Available   Synchronized   hadoop1   /home/gpadmin1/gpdatap2/aligp1   30001   hadoop2   /home/gpadmin1/gpdatam2/aligp1   40001
20101029:14:59:21:gpstate:hadoop1:gpadmin1-[INFO]:-   Primary Active, Mirror Available   Synchronized   hadoop2   /home/gpadmin1/gpdatap1/aligp2   30000   hadoop3   /home/gpadmin1/gpdatam1/aligp2   40000
20101029:14:59:21:gpstate:hadoop1:gpadmin1-[INFO]:-   Primary Active, Mirror Available   Synchronized   hadoop2   /home/gpadmin1/gpdatap2/aligp3   30001   hadoop3   /home/gpadmin1/gpdatam2/aligp3   40001
20101029:14:59:21:gpstate:hadoop1:gpadmin1-[INFO]:-   Primary Active, Mirror Available   Synchronized   hadoop3   /home/gpadmin1/gpdatap1/aligp4   30000   hadoop1   /home/gpadmin1/gpdatam1/aligp4   40000
20101029:14:59:21:gpstate:hadoop1:gpadmin1-[INFO]:-   Primary Active, Mirror Available   Synchronized   hadoop3   /home/gpadmin1/gpdatap2/aligp5   30001   hadoop1   /home/gpadmin1/gpdatam2/aligp5   40001
20101029:14:59:21:gpstate:hadoop1:gpadmin1-[INFO]:-   Primary Active, Mirror Available   Synchronized   hadoop7   /home/gpadmin1/gpdatap1/aligp6   30000   hadoop8   /home/gpadmin1/gpdatam1/aligp6   40000
20101029:14:59:21:gpstate:hadoop1:gpadmin1-[INFO]:-   Primary Active, Mirror Available   Synchronized   hadoop7   /home/gpadmin1/gpdatap2/aligp7   30001   hadoop8   /home/gpadmin1/gpdatam2/aligp7   40001
20101029:14:59:21:gpstate:hadoop1:gpadmin1-[INFO]:-   Primary Active, Mirror Available   Synchronized   hadoop8   /home/gpadmin1/gpdatap1/aligp8   30000   hadoop7   /home/gpadmin1/gpdatam1/aligp8   40000
20101029:14:59:21:gpstate:hadoop1:gpadmin1-[INFO]:-   Primary Active, Mirror Available   Synchronized   hadoop8   /home/gpadmin1/gpdatap2/aligp9   30001   hadoop7   /home/gpadmin1/gpdatam2/aligp9   40001
20101029:14:59:21:gpstate:hadoop1:gpadmin1-[INFO]:--------------------------------------------------------------

//进行redistribution

[gpadmin1@hadoop1 conf]$ gpexpand -d 00:30:00
20101029:15:08:54:gpexpand:hadoop1:gpadmin1-[INFO]:-Querying gpexpand schema for current expansion state
20101029:15:09:00:gpexpand:hadoop1:gpadmin1-[INFO]:-EXPANSION COMPLETED SUCCESSFULLY
20101029:15:09:00:gpexpand:hadoop1:gpadmin1-[INFO]:-Exiting...

//在整个过程中会生成一个特殊的schema

aligputf8=# select oid,* from pg_namespace;
  oid  |      nspname       | nspowner |               nspacl               
-------+--------------------+----------+-------------------------------------
  8001 | gp_toolkit         |       10 | {gpadmin1=UC/gpadmin1,=U/gpadmin1}
    99 | pg_toast           |       10 |
  3012 | pg_bitmapindex     |       10 |
 16987 | gpexpand          
|       10 |
  6104 | pg_aoseg           |       10 |
    11 | pg_catalog         |       10 | {gpadmin1=UC/gpadmin1,=U/gpadmin1}
  2200 | public             |       10 | {gpadmin1=UC/gpadmin1,=UC/gpadmin1}
 10673 | information_schema |       10 | {gpadmin1=UC/gpadmin1,=U/gpadmin1}
(8 rows)

aligputf8=# select relname from pg_class where relnamespace=16987;
      relname      
--------------------
 status
 status_detail
 expansion_progress


(3 rows)

 

//下面是文档上关于gpexpand的一些说明、

About the Expansion Schema
At initialization time, gpexpand creates an expansion schema. If you do not specify a particular database at initialization time (gpexpand -D), the schema is created in the database indicated by the PGDATABASE environment variable.
The expansion schema stores metadata for each table in the system so that its status can be tracked throughout the expansion process. It consists of two tables and a view for tracking the progress of an expansion operation:
•gpexpand.status
•gpexpand.status_detail
•gpexpand.expansion_progress
You can control aspects of the expansion process by modifying gpexpand.status_detail. For example, removing a record from this table prevents the table from being expanded across new segments. By updating the rank value for a record, you can control the order in which tables are processes for redistribution.

 

4、Removing the Expansion Schema

//删除gpexpand,以便下次进行gpexpand操作

[gpadmin1@hadoop1 conf]$ gpexpand -c
20101029:15:32:49:gpexpand:hadoop1:gpadmin1-[INFO]:-Querying gpexpand schema for current expansion state

Do you want to dump the gpexpand.status_detail table to file? Yy|Nn (default=Y):
> n
20101029:15:34:20:gpexpand:hadoop1:gpadmin1-[INFO]:-Removing gpexpand schema
20101029:15:34:37:gpexpand:hadoop1:gpadmin1-[INFO]:-Cleanup Finished.  exiting...              
       
[gpadmin1@hadoop1 conf]$        

 

抱歉!评论已关闭.