Automating partitioning, sharding and failover with MongoDB

现在的位置: 首页 > 综合 > 正文

Automating partitioning, sharding and failover with MongoDB

2014年01月03日 ⁄ 综合 ⁄ 共 14067字 ⁄ 字号小中大 ⁄ 评论关闭

文章来源：http://blog.boxedice.com/2010/08/03/automating-partitioning-sharding-and-failover-with-mongodb/

Two of the most eagerly anticipated features for MongoDB
, the database backend we use for our server monitoring service, Server Density
, are auto sharding
and replica sets
.
Sharding will allow us to let MongoDB handle distribution of data
across any number of nodes to maximise use of disk space and dynamically
load balance queries. Replica sets provides automated failover and
redundancy so you can be sure your data exists on any number of servers
across multiple data centres.

This functionality has been in development for some time and it
finally entering stable in the upcoming v1.6.0 release, due out in a few
days. This post will take you through the basics of setting up a
MongoDB cluster using auto sharding and ensuring you have full failover
using replica sets.

Starting up the replica set

You can have any number of members in a replica set and your data
will exist in full on each member of the set. This allows you to have
servers distributed across data centres and geographies to ensure full
redundancy. One server is aways the primary to which reads and writes
are sent, with the other members being secondary and accepting reads
only. In the event of the primary failing, another member will take over
automatically.

The video embedded below
shows the setup process.

You need a minimum of 2 members in each set and they must both be
started before the set becomes available. We will start the first one on
server1A
now:

`1`	`./mongod --rest --shardsvr --replSet set1/server1B`

--rest
This enables the admin web UI which is useful
for viewing the status of the set. It is publicly accessible via HTTP on
port 28017 so ensure it is properly firewalled.
--shardsvr
This enables sharding on this instance, which will be configured later.
--replSet
This uses the form setname/serverList
.
You must give each set a name (“set1″) and specify at least 1 other
member for the set. You do not need to specify them all – if the
instance gets terminated, it will re-read the config from the specified
servers when it comes back up.

When in production you will want to use the --fork
and --logpath
parameters so that mongod spawns off into a separate process and
continues running when you close your console. They’re not used here so
we can see the console output. Further tips about running MongoDB in the
real world can be found here
.

The naming convention we are using for the serverhostname is server[set][server]
,
so this is server A in set 1 that will be connecting to server B in set
1. This just makes it a little easier to explain but in the real usage,
these will need be actual hostnames that resolve correctly.

If you are running the instances on different ports, you must specify the ports as part of the parameters e.g. --replSet set1/server1B:1234,server1C:1234

You will see mongod
start up with the following output to the console:

01

Sun Aug  1 04:27:15 [initandlisten] waiting 

for

 connections on port 27017

02

Sun Aug  1 04:27:15 [initandlisten] ******

03

Sun Aug  1 04:27:15 [initandlisten] creating replication oplog of size: 944MB... (use --oplogSize to change)

04

Sun Aug  1 04:27:15 allocating new datafile data/

local

.ns, filling with zeroes...

05

Sun Aug  1 04:27:15 [startReplSets] replSet can't get 

local

.system.replset config from self or any seed (yet)

06

Sun Aug  1 04:27:15 

done

 allocating datafile data/

local

.ns, size: 16MB,  took 0.036 secs

07

Sun Aug  1 04:27:15 allocating new datafile data/

local

.0, filling with zeroes...

08

Sun Aug  1 04:27:15 

done

 allocating datafile data/

local

.0, size: 64MB,  took 0.163 secs

09

Sun Aug  1 04:27:15 allocating new datafile data/

local

.1, filling with zeroes...

10

Sun Aug  1 04:27:16 

done

 allocating datafile data/

local

.1, size: 128MB,  took 0.377 secs

11

Sun Aug  1 04:27:16 allocating new datafile data/

local

.2, filling with zeroes...

12

Sun Aug  1 04:27:19 

done

 allocating datafile data/

local

.2, size: 1024MB,  took 3.019 secs

13

Sun Aug  1 04:27:25 [startReplSets] replSet can't get 

local

.system.replset config from self or any seed (yet)

14

Sun Aug  1 04:27:35 [startReplSets] replSet can't get 

local

.system.replset config from self or any seed (yet)

15

Sun Aug  1 04:27:43 [initandlisten] ******

16

Sun Aug  1 04:27:43 [websvr] web admin interface listening on port 28017

17

Sun Aug  1 04:27:45 [initandlisten] connection accepted from 127.0.0.1:43135 

#1

18

Sun Aug  1 04:27:45 [startReplSets] replSet can't get 

local

.system.replset config from self or any seed (yet)

Next, we start the second member of the set (server1B
). This will be connecting to server1A
, the instance we just set up.

`1`	`./mongod --rest --shardsvr --replSet set1/server1A`

you will see similar console output on both the servers, something like:

1

Sun Aug  1 04:27:53 [websvr] web admin interface listening on port 28017

2

Sun Aug  1 04:27:55 [initandlisten] connection accepted from server1A:38289 

#1

3

Sun Aug  1 04:27:56 [initandlisten] connection accepted from 127.0.0.1:48610 

#2

4

Sun Aug  1 04:27:56 [startReplSets] replSet can't get 

local

.system.replset config from self or any seed (EMPTYCONFIG)

5

Sun Aug  1 04:27:56 [startReplSets] replSet   have you ran replSetInitiate yet?

Initialising the replica set

Now the two instances are communicating, you need to initialise the
replica set. This only needs to be done on one of the servers (either,
it doesn’t matter) so from the MongoDB console on that server:

01

./mongo localhost:27017

02

MongoDB shell version: 1.5.7

03

connecting to: localhost:27017/test

04

> cfg = {

05

... _id : 

"set1"

,

06

... members : [

07

... { _id : 0, host : 

"server1A:27017"

},

08

... { _id : 1, host : 

"server1B:27017"

}

09

... ] }

10

> rs.initiate(cfg)

11

{

12

"info"

 : 

"Config now saved locally.  Should come online in about a minute."

,

13

"ok"

 : 1

14

}

Here I created the config object and specified the members manually.
This is because I wanted to specify the ports but you can just execute

`1`	`rs.initiate()`

and the current server, plus any members you specified on the command
line parameters when starting mongod will be added automatically. You
may also want to specify some extra options, all of which are documented here
.

Perhaps the most important of these extra options is priority
.
Setting this for each host will allow you to determine in which order
they become primary during failover. This is useful if you have 3
servers, 2 in the same data centre and 1 outside for disaster recovery.
You might want the second server in the same DC to become primary first;
setting its priority higher than the outside server allows this.

1

> cfg = {

2

_id : 

"set1"

,

3

members : [

4

{ _id : 0, host : 

"server1A:27017"

, priority : 2},

5

{ _id : 1, host : 

"server1B:27017"

, priority : 1}

6

{ _id : 2, host : 

"server1C:27017"

, priority : 0.5}

7

] }

The web console enabled by the --rest
parameter can be accessed at the standard port 28017 e.g. http://example.com:28017
. This shows the live status of the mongod
instance.

Adding another server to the set

Adding a new server (server1C
) to the replica set is
really easy. Start the instance up specifying any one of the other
members (or all of them) as part of the parameters:

`1`	`./mongod --rest --shardsvr --replSet set1/server1A`

then on the primary server, connect to the MongoDB console and execute the add command:

1

./mongo localhost:27017

2

MongoDB shell version: 1.5.7

3

connecting to: localhost:27017/test

4

> rs.add(

"server1C:27017"

)

5

{ 

"ok"

 : 1 }

This server will then become part of the set and will immediately start syncing with the other members.

Setting up sharding

Now we have our 3 member replica set, we can configure sharding. This has 3 parts:

Shard servers – the mongod
instances. We have already set these up with the --shardsvr
parameter when starting each mongod
.
Config servers – these are mongod
instances run with a --configsvr
parameter that store the meta data for the shard. As per the
documentation, “a production shard cluster will have three config server
processes, each existing on a separate machine. Writes to config
servers use a two-phase commit to ensure an atomic and replicated
transaction of the shard cluster’s metadata.”
mongos
– processes that your clients connect to which
route queries to the appropriate shards. They are self contained and
will usually be run on each of your application servers.

Sharding Diagram

The video embedded below
shows the setup process.

Config servers

Having already set up the shard servers above, the next step is to
set up the config servers. We need 3 of these, which will exist on each
of our shard servers but can be on their own, lower spec machines if you
wish. They will not require high spec servers
as they will have relatively low load, but should be positioned
redundantly so a machine or data centre failure will not take them all
offline.

`1`	`./mongod --configsvr --dbpath config/`

--configsvr
This enables config server mode on this mongod
instance.
--dbpath
Since I am running this config server on a server that already has another mongod
instance running, a separate data path is specified. This isn’t necessary if the config server is running on its own.

This is executed on each of our 3 servers already running the shards. The console output will be something like this:

1

Sun Aug  1 08:14:30 db version v1.5.7, pdfile version 4.5

2

Sun Aug  1 08:14:30 git version: 5b667e49b1c88f201cdd3912b3d1d1c1098a25b4

3

Sun Aug  1 08:14:30 sys info: Linux domU-12-31-39-06-79-A1 2.6.21.7-2.ec2.v1.2.fc8xen 

#1 SMP Fri Nov 20 17:48:28 EST 2009 x86_64 BOOST_LIB_VERSION=1_41

4

Sun Aug  1 08:14:30 [initandlisten] diagLogging = 1

5

Sun Aug  1 08:14:30 [initandlisten] waiting 

for

 connections on port 27019

6

Sun Aug  1 08:14:30 [websvr] web admin interface listening on port 28019

nothing else happens until you connect the mongos
to the config server.

Router processes

The router process is what you connect your clients to. They download
the meta data from the config servers and then route queries to the
correct shard servers. They stay up to date and are independent of each
other so require no redundancy per se. Specify each config server in
comma separated form:

01

./mongos --configdb server1A,server1B,server1C

02

Sun Aug  1 08:19:44 mongodb-linux-x86_64-1.5.7/bin/mongos db version v1.5.7, pdfile version 4.5 starting (--help 

for

 usage)

03

Sun Aug  1 08:19:44 git version: 5b667e49b1c88f201cdd3912b3d1d1c1098a25b4

04

Sun Aug  1 08:19:44 sys info: Linux domU-12-31-39-06-79-A1 2.6.21.7-2.ec2.v1.2.fc8xen 

#1 SMP Fri Nov 20 17:48:28 EST 2009 x86_64 BOOST_LIB_VERSION=1_41

05

Sun Aug  1 08:19:44 SyncClusterConnection connecting to [server1A:27019]

06

Sun Aug  1 08:19:44 SyncClusterConnection connecting to [server1B:27019]

07

Sun Aug  1 08:19:44 SyncClusterConnection connecting to [server1C:27019]

08

Sun Aug  1 08:19:54 [websvr] web admin interface listening on port 28017

09

Sun Aug  1 08:19:54 [Balancer] SyncClusterConnection connecting to [server1A:27019]

10

Sun Aug  1 08:19:54 SyncClusterConnection connecting to [server1A:27019]

11

Sun Aug  1 08:19:54 waiting 

for

 connections on port 27017

12

Sun Aug  1 08:19:54 [Balancer] SyncClusterConnection connecting to [server1B:27019]

13

Sun Aug  1 08:19:54 SyncClusterConnection connecting to [server1B:27019]

14

Sun Aug  1 08:19:54 [Balancer] SyncClusterConnection connecting to [server1C:27019]

15

Sun Aug  1 08:19:54 SyncClusterConnection connecting to [server1C:27019]

Create the shard

Now you have all the mongo server instances running, you need to create the shard. Connect to the mongos
instance using the MongoDB console, switch to the admin
database and then add the shard.

01

./mongo

02

MongoDB shell version: 1.5.7

03

connecting to: test

04

> use admin

05

switched to db admin

06

> db.runCommand( { addshard : 

"set1/server1A,server1B,server1C"

, name : 

"shard1"

 } );

07

{ 

"shardAdded"

 : 

"shard1"

, 

"ok"

 : 1 }

08

> db.runCommand( { listshards : 1 } );

09

{

10

"shards"

 : [

11

{

12

"_id"

 : 

"shard1"

,

13

"host"

 : 

"server1A,server1B,server1C"

14

}

15

],

16

"ok"

 : 1

17

}

Note that the list of server hostnames includes the replica set name in the form [setName]/[servers]
. The set name is what you called the replica set when you started the mongod
instance with --shardsvr
above. In our case we called it set1
.

There are a number of config options
here including the ability to set a maximum size on the shard so you can control disk space usage.

Shard the database

We can now finally use the shard by enabling sharding on a database
and executing a couple of test queries. Here we will use the test
database using the MongoDB console connected to the mongos
instance:

1

> use admin

2

switched to db admin

3

> db.runCommand( { enablesharding : 

"test"

 } );

4

{ 

"ok"

 : 1 }

5

> use test

6

switched to db test

7

> db.hats.insert({hats: 5})

8

> db.hats.find()

9

{ 

"_id"

 : ObjectId(

"4c5568021fd8e7e6a0636729"

), 

"hats"

 : 5 }

You can confirm this is working from the console output on each of the shards themselves:

01

Sun Aug  1 08:26:42 [conn6] CMD fsync:  

sync

:1 lock:0

02

Sun Aug  1 08:26:42 [initandlisten] connection accepted from 10.255.62.79:38953 

#7

03

Sun Aug  1 08:26:42 allocating new datafile data/

test

.ns, filling with zeroes...

04

Sun Aug  1 08:26:42 

done

 allocating datafile data/

test

.ns, size: 16MB,  took 0.046 secs

05

Sun Aug  1 08:26:42 allocating new datafile data/

test

.0, filling with zeroes...

06

Sun Aug  1 08:26:42 

done

 allocating datafile data/

test

.0, size: 64MB,  took 0.183 secs

07

Sun Aug  1 08:26:42 [conn6] building new index on { _id: 1 } 

for

 test

.hats

08

Sun Aug  1 08:26:42 [conn6] Buildindex 

test

.hats idxNo:0 { name: 

"_id_"

, ns: 

"test.hats"

, key: { _id: 1 } }

09

Sun Aug  1 08:26:42 [conn6] 

done

 for

 0 records 0secs

10

Sun Aug  1 08:26:42 [conn6] insert 

test

.hats 248ms

11

Sun Aug  1 08:26:42 [conn6] fsync from getlasterror

12

Sun Aug  1 08:26:42 allocating new datafile data/

test

.1, filling with zeroes...

13

Sun Aug  1 08:26:42 

done

 allocating datafile data/

test

.1, size: 128MB,  took 0.402 secs

Sharding the collection

The database test
is sharded now but the documents will
only exist on a single shard. To actually use the automated partitioning
of data, you need to shard at the collection level. For example,
setting the shard key on a timestamp will cause MongoDB to partition the
data across shards based on that timestamp e.g. day 1 on shard 1, day 2
on shard 2, etc.

In our example above, we only have 1 shard and the test document is
very simple but we could create a shard key on the number of hats:

1

> use admin

2

switched to db admin

3

> db.runCommand( { shardcollection : 

"test.hats.hats"

, key : { hats : 1 } } )

4

{ 

"collectionsharded"

 : 

"test.hats.hats"

, 

"ok"

 : 1 }

In the mongos console you will see

1

Mon Aug  2 22:10:55 [conn1] CMD: shardcollection: { shardcollection: 

"test.hats.hats"

, key: { hats: 1.0 } }

2

Mon Aug  2 22:10:55 [conn1] 

enable

 sharding on: 

test

.hats.hats with shard key: { hats: 1.0 }

3

Mon Aug  2 22:10:55 [conn1] no chunks 

for

:

test

.hats.hats so creating first: ns:

test

.hats.hats at: shard1:set1/server1A,server1B,server1C lastmod: 1|0 min: { hats: MinKey } max: { hats: MaxKey }

The default chunk size is 50MB so data will not start to be distributed to multiple shards until you hit that.

Notes on failover

Terminating a mongod instance, either a config server or a shard
server, will have no effect on the availability of the data and the
ability to perform all operations on it. A detailed explanation of what
happens when certain servers fail can be found here
.
However, in our case, if 2 out of 3 of the members of a replica set
fail, the set will become read only even though there is a server
remaining online. (source
).
As such, a 3 member replica set with 2 members in one data centre
and 1 member in another has a point of failure if you want the set to
remain fully operational in the event of a DC outage – the 1 server on
its own. To protect against this you would need to have 4 members – 2
per data centre.
Multiple replica sets per shard are not supported (source
)

Future expansion

Now this is set up, adding additional shards is as easy as
provisioning a new replica set and using the addShard command on that
new set. The data will be balanced automatically so you have real
horizontal scaling.

We have not yet deployed sharding and replica sets into production
with Server Density – this is on our roadmap so I’ll be reporting back
in a couple of months when we have been using it for some time. Subscribe
to stay up to date!

【上篇】js — 注意
【下篇】线程同步

作者: bhwww

该日志由 bhwww 于10年前发表在综合分类下，最后更新于 2014年01月03日.
转载请注明: Automating partitioning, sharding and failover with MongoDB | 学步园 +复制链接

抱歉!评论已关闭.

学步园

Automating partitioning, sharding and failover with MongoDB

作者: bhwww

书签

最新文章New

本站推荐

返回首页