Introduction

Inspired by the paper on cooperation and defection on a grid that was presented in class¹, we decided to study El Farol with players on a grid all going to one bar but only locally exchanging information.

The broad approach of studying El Farol with players imitating strategies of neighbors is admittedly not novel²³⁴⁵⁶. However, this existing literature does not seem to have discouraged previous authors from studying their favored setup. Given that the specifics of our setup were entirely developed by us before we were aware of any similar studies, it seems highly unlikely that the exact setup we study already exists.

Furthermore, all existing literature we stumbled on in three hours of literature search does in fact differ from our setup in some way. For example, in Kalinowski et al.² the players know the previous decisions of their neighbors, whereas in our setup they are only aware of the previous attendance rates, Slanina³ uses imitator and leader roles with imitation being costly, and Chen et al. ⁶ play on a random graph instead of a grid. We therefore feel that our work is not entirely derivative and hope it will be of interest to the reader.

Preliminaries

Nash equilibria of El Farol

A simple strategy is to go to the bar with probability \(0.6\) and stay at home with probability \(0.4\). If all players use this strategy, none can improve their payoff by changing to a different strategy⁷. Therefore, this strategy is a symmetric (non-cooperative) Nash equilibrium.

Furthermore, if 60% of the players always go while the rest always stay at home, there is no way for a single player to increase his expected payoff and thus this is a Nash equilibrium as well. Such Nash equilibria are called corner solutions⁷.

Optimal strategy given fixed opponents

Nearly all strategies we handcraft for El Farol in one way or another estimate the anticipated occupancy of the bar and go to the bar if and only if this expected occupancy is less than 60 percent. These strategies always return the mean occupancy in expectation. This is also the case for most strategies proposed in the original El Farol paper⁸ by Brian Arthur.

But is this actually optimal? Suppose we are dropped into an environment in which all other actors engage a strategy that is potentially random but does not depend on previous occupancy rates. So let \(X\) be the random variable representing the percentage of other players attending. Then it is in fact in general not optimal to the “anticipated occupancy” to be the mean of \(X\).

Indeed, the expected return of a strategy is given by

\[\begin{align*} P(\text{going},\text{bar is not full}) + P(\text{not going},\text{bar is full}) &=P(\text{going})P(\text{bar is not full}\;|\;\text{going}) + P(\text{not going})P(\text{bar is full}\;|\;\text{not going}) \\ &\approx P(\text{going})P(X < 0.6) + P(\text{not going})P(X > 0.6) \end{align*}\]

Note that this is only approximate because there is some small chance that our decision to go is actually what shifts the bar from not being full to being full.

This approximation yields another result of interest to us: When \(P(\text{X > 0.6})=0.5\), all strategies give basically identical payoff in expectation. Thus, if, say, all other players are playing the Nash equilibrium, all possible strategies for the remaining player are equivalent.

Simulations

Setup

We simulate multiple iterations of the El Farol bar game on a two-dimensional grid of side length \(n\) with \(n^2\) players and one bar. Every iteration consists of five rounds of El Farol in which each player decides whether or not to go to the bar based on the attendance rates in previous rounds. Like in the original paper⁸, we consider the bar crowded if more than \(60\%\) of the players choose to attend. We use a symmetric payoff: a player’s action is considered a success if they go to the bar when it is not crowded or stay at home when it is crowded. Each player receives a payoff of \(1\) for each successful and \(0\) for each unsuccessful round.

After the end of an iteration, each player may choose to adopt a strategy of one of his neighbors based on its performance over the last five rounds.

Handcrafted strategies with discrete updates

Methods

First, each player chooses uniformly at random from a list strategies similar to those proposed in the original paper⁸. At the end of each iteration, each player chooses to imitate a strategy in their Von Neumann neighbourhood.

More specifically, we play on a grid of size \(100 \times 100\) with up to \(16\) distinct strategies. In each iteration, every player decides based on their current strategy whether to attend the bar or stay at home. The attendance rate is computed and the players get rewarded according to the symmetric payoff as described. Then each player’s strategy is updated based on the following procedure:

Let \(P_0\) be the current player and \(P_1, \dots, P_4\) the neighbors with scores \(S_0, \dots, S_4\) respectively. With probability \(R\), which we will call the retention rate, the player retains their current strategy. Otherwise they adapt to the strategy of player \(P_i\) with probability \(\frac{\exp(S_i/T)}{\sum_{j=0}^4 \exp(S_j/T)}\) where \(T > 0\) is called the temperature. Notice that as \(T\) approaches \(0\), this softmax converges to the argmax and as \(T\) approaches infinity, it converges to a uniform distribution. We therefore by slight abuse of notation denote the argmax as \(T=0\).

Results and Discussion

Observe first that if we set both the retention rate \(R\) and the temperature \(T\) to zero, the game is deterministic and entirely determined by the initial configuration. In this case the players always adapt to the best strategy in their neighborhood.

We randomly initialize the grid with two simple strategies: always go and always stay at home. In the first iteration, roughly half of the players go to the bar and the bar is therefore not crowded. Thus, the always go strategy receives higher payoff and, consequently, any player that has at least one neighbor using this strategy adopts it. There then are too many players attending the bar in the next iteration, making it crowded. However, because the players only look at their immediate neighborhood while adapting their strategies, there are some always stay home players left that happened to be completely surrounded by others with this strategy. Here, we see the advantage of only considering other strategies in a small neighborhood. Indeed, if we considered the best strategies in a larger radius, the always stay home strategy would completely die out and all players would be worse off.

In our setup, we instead observe pulsating behavior: The bar is overcrowded in one iteration and under-attended in the next one because too many players adapt to what would have been the best strategy in the previous round. This overshooting is the reason why we don’t observe convergence to the corner solution.