Saturday, March 10, 2007

OpenPoker how it works

Writing low-pain massively scalable multiplayer servers

Another oldie. First penned for DevMaster over a year ago. Original article comes with a heated forum discussion!

I have since exited the poker business and removed related articles and links from this site. The Erlang source code is still available, though. I’ll dissect it in detail in a series of articles over the next few weeks.

Let me know if you have any trouble viewing this article. Use resizing page controls above to expand the page if you need to.

Introduction

This article describes an alternative approach to building massively scalable online multiplayer systems using my OpenPoker project as an example. OpenPoker is a massively multiplayer poker server with fault-tolerance, load balancing and unlimited scalability built-in. The source code to OpenPoker is available from my site under the GPL and comes in under 10,000 lines of code of which about 1/3 are dedicated to testing.

I prototyped extensively before coming up with the final version of OpenPoker and tried Delphi, Python, C#, C/C++ and Scheme. I also wrote a full-blown poker engine in Common Lisp. While I did spend over 9 months on research and prototyping, the final rewrite only took about 6 weeks of coding. I attribute most of the time and cost savings to choosing Erlang as my platform.

By comparison, it took a team of 4-5 people about 9 months to build the old OpenPoker. The original team also built a Windows poker client but even if I cut development time in half to account for this 1.5 month, it is far from 18 months that I will end up with. In today's world of bloated game development budgets such savings are nothing to sneeze at!

What is Erlang

I suggest you browse through the Erlang FAQ before continuing but I'll give you a quick summary here…

Erlang is a functional, dynamically typed language with built-in support for concurrency. It was specifically designed by Ericsson for telecommunications applications such as controlling a switch or converting protocols, and thus is particularly suitable for building distributed, soft real-time concurrent systems.

Applications written in Erlang are often composed of hundreds or thousands of lightweight processes communicating via message passing. Context switching between Erlang processes is typically one or two orders of magnitude cheaper than switching between threads in a C program.

It's easy to write distributed applications in Erlang because its distribution mechanisms are transparent: programs need not be aware that they are distributed.

The Erlang runtime environment is a virtual machine (VM), much like the Java virtual machine. This means that code compiled on one architecture runs anywhere. The runtime system also allows code in a running system to be updated without interrupting the program and the byte code can be compiled to native code when you need that extra boost.

Please head to the Erlang site and check out the excellent resources in the Getting started, Documentation and Examples sections.

Why Erlang

The concurrency model built into Erlang makes it particularly suitable for writing online multiplayer servers.

A massively scalable multiplayer backend in Erlang is built as a "cluster" with different "nodes" dedicated to different tasks. An Erlang node is an instance of the Erlang VM and you can run multiple Erlang nodes/VMs on your desktop, laptop or server. One node per CPU is recommended.

Erlang nodes track all other nodes connected to them. All you need to do to add a new node to the cluster is point it to an existing node. As soon as the two nodes establish contact all other nodes in the cluster become aware of the new node.

Erlang processes send messages to other processes using a process id which encodes information about the node where the process is running. Processes need not be aware of where other processes are located to communicate with them. A bunch of Erlang nodes linked together can be viewed as a grid or supercomputing facility.

Players, NPCs and other entities in massively multiplayer games are best modelled as concurrently running processes but concurrency is notoriously hard to work with. Erlang makes concurrency easy.

Erlang's bit syntax∞ makes it trivial to work with binary data and bests the structure packing/unpacking facilities of Perl and Python. This makes Erlang particularly suitable for handling binary network protocols.

The OpenPoker architecture

Everything in OpenPoker is a process. Players, bots, games, pots, etc. are all processes. For every poker client connected to OpenPoker there's a player "proxy" handling network messages. Depending on whether the player is logged in, some messages are ignored while others are passed to the process handling card game logic.

The card game process is an uber-state machine composed of state machine modules for every stage of the game. This lets me treat card game logic as a Lego constructor and add new card games by putting together the state machine building blocks. Take a look at the start function in cardgame.erl if you want to learn more about my approach.

The card game state machine lets different messages through depending on the game stage. It also uses a separate game process to handle the machinery common to all games such as keeping track of players, pots, limits and so on. When simulating 27,000 poker games on my laptop I found that I had about 136,000 players and close to 800,000 processes in total.

That said, I would like to focus on how Erlang makes it simple to implement scalability, fault tolerance and load balancing using OpenPoker as an example. My approach is not particular to poker or card games. The same approach can be used to quickly put together massively scalable multiplayer backends, do it cheaply and with a minimum amount of pain.

Scalability

I implement scalability and load-balancing by means of a multi-tier architecture. The first tier is represented by gateway nodes. Game server nodes form tier two and Mnesia "master" nodes can be thought of as the third tier.

Mnesia is the Erlang real-time distributed database. The Mnesia FAQ has a good explanation but Mnesia is basically a fast, replicating, in-memory database. There are no objects in Erlang but Mnesia can be thought of as object-oriented as it can store any Erlang data.

There are two types of Mnesia nodes: those that write to disk and those that do not. Regardless of this, all Mnesia nodes keep their data in memory. Mnesia master nodes in OpenPoker are nodes that write to disk. Gateways and game servers pick up their database from Mnesia masters upon startup and are memory-only nodes.

There's a handy set of command-line arguments that you can give to the Erlang VM and interpreter when starting up to tell Mnesia where the master database is located. After the new local Mnesia node establishes contact with the master Mnesia node, the new node becomes part of the master node’s cluster.

Assuming that the master nodes are located on hosts apple and orange, adding a new gateway, game server, etc. node to your OpenPoker cluster is as simple as

erl -mnesia extra_db_nodes \['db@apple','db@orange'\] -s mnesia start

where

-s mnesia start

is equivalent to starting Mnesia from the erlang shell like this

erl -mnesia extra_db_nodes \['db@apple','db@orange'\]
Erlang (BEAM) emulator version 5.4.8 [source] [hipe] [threads:0]

Eshell V5.4.8 (abort with ^G)
1> mnesia:start().
ok

OpenPoker keeps configuration information in Mnesia tables and this information is automatically downloaded by new nodes as soon as Mnesia starts. Zero configuration required!

Fault tolerance

OpenPoker lets me grow as high as I want by adding cheap Linux boxes to my server farm. Put together a couple of racks of 1U servers and you can easily handle 500,000 or even 1,000,000 players online. This would work just as well for a MMORPG as for poker.

I can dedicate some boxes to run gateway nodes and some to be database masters that write database transactions to disk. I can dedicate the rest of my boxes to run my game servers. I can limit game servers to accept a maximum of, say, 5000 simultaneous players so that no more than 5000 players are affected when my game server box crashes.

It's important to note that no information is lost when a game server crashes since all the Mnesia database transactions are replicated in real-time to all other nodes running Mnesia, game server nodes included.

In case of errors some assistance from the game client is required for the player to smoothly reconnect to the OpenPoker cluster. As soon as the poker client notices a network error it should connect to the gateway, receive a new game server address in a hand-off packet and reconnect to the new game server. What happens then is a little tricky as different types of reconnect scenarios need to be handled.

OpenPoker will handle the following reconnect scenarios:

  1. The game server crashed
  2. The client crashed or timed out due to a network error
  3. The player is online on a different connection
  4. The player is online on a different connection and is in a game

The most common scenario will probably be a poker client that disconnected due to a network error. A less likely but still possible scenario is a client reconnecting from one computer while already playing at another.

Each OpenPoker game buffers packets sent to players and every reconnecting poker client will first receive all the game packets since the game started before starting to receiving packets as usual. OpenPoker uses TCP connections so I don't need to worry about packet ordering – packets will simply arrive in proper order.

Every poker client connection is represented by two OpenPoker processes: the socket process and the actual player process. A visitor process with restricted functionality is used until the player logs in. Visitors cannot join games, for example. The socket process will be dead after a poker client disconnects while the player process will still be alive.

A player process can notice a dead socket when attempting to forward a game packet and should put itself into auto-play mode or fold the hand. The login code will check for the combination of a dead socket and live player process when reconnecting. The code to determine the condition looks like this:

login({atomic, [Player]}, [_Nick, Pass|_] = Args)
when is_record(Player, player) ->
Player1 = Player#player {
socket = fix_pid(Player#player.socket),
pid = fix_pid(Player#player.pid)
},
Condition = check_player(Player1, [Pass],
[
fun is_account_disabled/2,
fun is_bad_password/2,
fun is_player_busy/2,
fun is_player_online/2,
fun is_client_down/2,
fun is_offline/2
]),
...

whereas the conditions themselves will be determined by the following code:

is_player_busy(Player, _) ->
{Online, _} = is_player_online(Player, []),
Playing = Player#player.game /= none,
{Online and Playing, player_busy}.

is_player_online(Player, _) ->
SocketAlive = Player#player.socket /= none,
PlayerAlive = Player#player.pid /= none,
{SocketAlive and PlayerAlive, player_online}.

is_client_down(Player, _) ->
SocketDown = Player#player.socket == none,
PlayerAlive = Player#player.pid /= none,
{SocketDown and PlayerAlive, client_down}.

is_offline(Player, _) ->
SocketDown = Player#player.socket == none,
PlayerDown = Player#player.pid == none,
{SocketDown and PlayerDown, player_offline}.

Notice that the first thing the login function does is to fix up dead process ids. This makes processing simple down the road and is accomplished with the following bits of code:

fix_pid(Pid)
when is_pid(Pid) ->
case util:is_process_alive(Pid) of
true ->
Pid;
_ ->
none
end;

fix_pid(Pid) ->
Pid.

and

-module(util).

-export([isprocessalive/1]).



isprocessalive(Pid)
when is_pid(Pid) ->
rpc:call(node(Pid), erlang, isprocessalive, [Pid]).


A process id in Erlang includes the id of the node where the process is running. is_pid(Pid) tells me if its argument is a process id (pid) but cannot tell me if the process is alive or dead. Erlang’s built-in erlang:is_process_alive(Pid) tells me whether a local process (running on the same node) is dead or alive. There's no variant of is_process_alive for checking remote nodes.

Fortunately, I can use the Erlang rpc facility together with node(pid) to call is_process_alive() on the remote node. In fact, this will work just as well on the local node so the code above functions as a universal distributed process checker.

All that is left to do is to act on the various login conditions. In the simplest case where the player is offline I start a player process, connect the player to the socket and update the player record.

login(Player, player_offline, [Nick, _, Socket]) ->
{ok, Pid} = player:start(Nick),
OID = gen_server:call(Pid, 'ID'),
gen_server:cast(Pid, {'SOCKET', Socket}),
Player1 = Player#player {
oid = OID,
pid = Pid,
socket = Socket
},
{Player1, {ok, Pid}}.

Should the player login information not match I can return an error and increase the number of bad login attempts. If this number exceeds a predefined maximum I disable the account like this:

login(Player, bad_password, _) ->
N = Player#player.login_errors + 1,
{atomic, MaxLoginErrors} =
db:get(clusterconfig, 0, maxlogin_errors),
if
N > MaxLoginErrors ->
Player1 = Player#player {
disabled = true
},
{Player1, {error, ?ERRACCOUNTDISABLED}};
true ->
Player1 = Player#player {
login_errors = N
},
{Player1, {error, ?ERRBADLOGIN}}
end;

login(Player, account_disabled, _) ->
{Player, {error, ?ERRACCOUNTDISABLED}};


Logging out the player involves finding the player process id using their Object ID (which is just a number), stopping the player process and updating the player record in the database. This is accomplished by the following bit of code:

logout(OID) ->
case db:find(player, OID) of
{atomic, [Player]} ->
player:stop(Player#player.pid),
{atomic, ok} = db:set(player, OID,
[{pid, none},
{socket, none}]);
_ ->
oops
end.

With logout out of the way I can address the various reconnect conditions. If the player is online but idle, i.e. hanging out in the lobby or watching a game (drinking a Bud? Wazzup!) and is reconnecting from a different computer, I can just log them out and log them back in as if they were offline:

login(Player, player_online, Args) ->
logout(Player#player.oid),
login(Player, player_offline, Args);

If the player was idle when their poker client disconnected then all I need to do is replace the socket process id in the player record and tell the player process about the new socket.

login(Player, clientdown, [, _, Socket]) ->
gen_server:cast(Player#player.pid, {'SOCKET', Socket}),
Player1 = Player#player {
socket = Socket
},
{Player1, {ok, Player#player.pid}};

If the player was in a game then we run the code above and tell the game to resend the event history.

login(Player, player_busy, Args) ->
Temp = login(Player, client_down, Args),
cardgame:cast(Player#player.game,
{'RESEND UPDATES', Player#player.pid}),
Temp;

Overall, a combination of a real-time replicating database, a poker client that knows to reconnect to a different game server and some crafty login code allows me to provide a high degree of fault tolerance transparently to the player.

Load balancing

I can build my OpenPoker cluster from as many game server nodes as I want . I might want to allocate, say, 5000 players per game server and spread the load among the active game servers in my cluster. I can add new game servers at any time and they will automatically make themselves available to accept new players.

Gateway nodes spread the player load among the active game servers in the OpenPoker cluster. The job of a gateway node is to pick a random game server, ask it for the number of players connected and its address, host and port number where the game server is running. As soon as the gateway finds a game server where the number of players connected is less than the preset maximum it will return the address of that game server to the connected poker client and close the connection.

There's absolutely no load on gateway nodes and connections to them are extremely short-lived. You can have any cheap box acting as your gateway node.

Nodes should generally come at least in pairs so that if one node fails another one can take over. You would need a mechanism like Round-robin DNS to employ more than a single gateway node.

How do gateways learn about game servers?

OpenPoker uses the Erlang Distributed Named Process Groups facility to group game servers. The group is globally visible on all nodes, this happens automatically. New game servers join the game server group and when a game server node goes down it's automatically deleted.

This is what the code to find a game server with a maximum capacity of MaxPlayers looks like:

find_server(MaxPlayers) ->
case pg2:getclosestpid(?GAME_SERVERS) of
Pid when is_pid(Pid) ->
{Time, {Host, Port}} = timer:tc(gen_server, call, [Pid, 'WHERE']),
Count = gen_server:call(Pid, 'USER COUNT'),
if
Count <>
io:format("~s:~w: ~w players~n", [Host, Port, Count]),
{Host, Port};
true ->
io:format("~s:~w is full...~n", [Host, Port]),
find_server(MaxPlayers)
end;
Any ->
Any
end.

pg2:get_closest_pid() returns a random game server process id since a gateway node is not expected to run any game servers. If a process id of the game server is returned I ask the game server for its address (host and port) as well the number of players connected. So long as the number of players connected is less than the maximum I return the game server address to the caller, otherwise I keep looking.

Multiple-outlet powerstrip middleware

OpenPoker is open source software and I have been pitching it to various poker vendors recently. All the vendors have the same problem with scalability and fault tolerance, even after several years of development. Some have recently finished major rewrites of their server software while others are just embarking on this journey. All of the vendors are heavily invested in their Java infrastructure and, understandably, do not want to switch to Erlang.

Still, it sounds to me like there is a need to be filled. The more I think about it the more it looks like Erlang can still be used to provide a cost-efficient solution while keeping things simple and straightforward. I see this solution as a multiple-outlet electrical power strip, just like the one you are probably using right now.

You write your game server as a simple socket-based server that uses a database backend. In fact, more likely than not this is how your game server is written now. Your game server is the standard electrical plug and multiple instances of your game server are plugged into my power outlets while players flow in through the other end.

You supply the game servers and I provide you with scalability, load balancing, and fault tolerance. I keep players connected to the power strip and monitor your game servers, restarting them as needed. I switch your players to another game server when one goes down and you can plug in as many game servers as you like into my power outlets.

The power strip middleware is a black box sitting between your players and your servers and likely won't even require any changes to your code. You will get all the benefits of a highly scalable, load-balanced, fault-tolerant solution while keeping your investment and modifying little of your existing infrastructure.

You can write this middleware in Erlang today, run it on a Linux box with a kernel specially tuned for a high number of TCP connections and put this box in a demilitarized zone while keeping your game servers behind a firewall. Even if you don't, I suggest that you take a close look at Erlang today and think about using it to simplify your massively multiplayer server architectures. And I will be here to help!

No comments: