Participants
The study complied with all relevant ethical regulations. The study protocol was approved by the Institute of Neuroscience and Psychology Ethics Committee at the University of Glasgow. Written informed consent was obtained in accordance with the Institute of Neuroscience and Psychology Ethics Committee at the University of Glasgow. Twenty-seven same-sex pairs of adult human participants participated in the fMRI experiment. This number was determined based on a priori estimates of sample size necessary to ensure replicability on a task of similar length97. All were recruited from the participants database of the department of Psychology at the University of Glasgow. For each couple one participant was in the scanner and the other in an adjacent room. Two pairs were removed from the analysis: one for excessive head movements inside the scanner, the other for a technical problem with the scanner. The remaining couple of participants (7 of males, 18 of females), were all right handed, had normal or corrected-to-normal vision and reported no history of psychiatric, neurological or major medical problems, and were free of psychoactive medications at the time of the study.
All participants played the Space Dilemma in pairs of two. Before starting the game they were given a set of instructions explaining that they had to imagine that they were foraging for food in a territory and asked to make a prediction about the position of the food (a straight line that represents the territory, Fig.1). They were told that in each trial the target food would appear somewhere in the territory as its position is randomly sampled from a predefined uniform probability distribution. They were shown examples of possible outcomes of a trial (Fig. 1) and they were given information about the conditions of the game. During the game, in each trial, they were presented with a bar moving across the space (representing their location) and asked to commit to a location by pressing a button while the bar passes through it while moving in the linear space. Participants therefore choose their locations in the space through the timing of a button press. They indicated their choice by pressing one of three buttons on a response box. The bar takes 4s to move from one end to the other end of the space. Once stopped, it remains at the chosen location for the remainder of the 4s. This location signalled their prediction about the target position. The two participants played simultaneously, making first their predictions and thenwatching the other players responses (for 11.5s). After both players had responded, the target would be shown (for 1.5s). Inter-trial intervals were 22.5s long. At any trial, the participant who made the best prediction (minimising the distance d to the target) was indicated as the trials winner through the colour of the target, obtaining a reward which would depend on the distance to the target: the shorter the distance the higher the reward. In the rare circumstance where players were equidistant from the target such reward was split in half between the two players who were both winners in the trial.
In order to enforce different social contexts we introduced a reward distribution rule whereby each trial reward would be shared between the winner and the loser according to the rule
$${R}_{{win}}=alpha R; , {R}_{{lose}}=left(1-alpha right)R$$
(2)
Where is a trade-off factor controlling the redistribution between winners and losers in each trial. By redistributing the reward between winner and loser the latter would also benefit from the co-player minimising their distance to the target. Increasing the amount of redistribution (decreasing below 1) constitutes an incentive to work out a cooperative strategy to decrease the average distance of the winner from the target (that is, irrespective of who the winner is) and therefore increase the reward available in each trial which would be redistributed. Decreasing the amount of redistribution can instead lead to punishment for the losers (increasing alpha above 1) adding an incentive to compete to win the trial.
All participants first participated in a behavioural session where they were randomly coupled with one another and played three sessions of the game in three different conditions specified by the value of the trade-off factor . In the first condition (=0.5, cooperative condition), the reward was shared equally between the two players, irrespective of the winner. In the second condition, the winner gets twice the amount of the reward (=2, competitive condition), while the other player will lose from their initial stock an amount equivalent to the reward. In the third condition, the winner will get the full amount of the reward and the other will get nothing (=1, intermediate condition). The participants were instructed about the different reward distribution (through a panel similar to Fig. 2c). In total, participants played 60 trials in each of the three conditions for a total of 180 trials.
At the end of the behavioural session, participants were then asked to fill in a questionnaire where their understanding of the game was assessed together with their social value orientation98. If they showed to have understood the task and were eligible for fMRI scanning they were later invited to the fMRI session which occurred 13 weeks later. In total, 81 participants took part in the behavioural session and 54 participated to the fMRI session.
In the fMRI sessions, participants were matched with an unfamiliar co-player they had not played with in the behavioural session and it was emphasised not to assume anything about their behaviour in the game. We did not use deception: participants briefly met before the experiment when a coin toss determined who would go into the scanner and who would play the game in a room adjacent to the fMRI control room. Both in the behavioural and fMRI session participants were rewarded according to their performance in the game, with a fixed fee of 6 and 8 respectively and an additional amount of money based on their task performance of up to additional 9. At the end of the fMRI sessions, participants were asked to describe what their strategy was in the different social context. Their response revealed a good understanding of the social implication of their choices (Supplementary Table4). Both in the behavioural and fMRI sessions, the order of the condition was kept constant (cooperation-competition-intermediate) as we wanted all couples to have the same history of interactions.
Visual stimuli were generated from client computers using Presentation software (Neurobehavioral Systems) controlled by a common server running the master script in MATLAB. The stimuli were presented to the players simultaneously. Each experiment was preceded by a short tutorial where players could experience a few trials in each of the three sessions to allow probing the effect of the variability in the task parameter.
We computed a payoff matrix for the Space Dilemma in the following way. Since the target position in each trial is random, the reward in each trial will also be random, but because the target position is sampled from a uniform distribution, each position in the space is associated with an expected payoff which depends on the position of the other player (Fig.1b). In a two-player game, the midpoint maximizes the chance of winning the trial. For simplicity we therefore assume that players can either compete, positioning in the middle of the space and maximizing their chance of winning, or cooperate, deviating from this position by a distance to sample the space and maximize the dyads reward. For all combinations of competitive and cooperative choice, we can build an expected (average) payoff matrix which depends parametrically on . We defined R as the expected reward for each of two players cooperating with each other, T as the expected temptation payoff for someone who decides to compete against a player who is cooperating. S is the sucker payoff for a cooperator betrayed by its partner. P is the punishment payoff when both players compete all the times. R, T, S and P can be computed analytically integrating over all possible position of the target and are equal to:
$$R=left(frac{3}{8}+frac{triangle }{2}-{triangle }^{2}right)$$
(3)
$$T=alpha left(frac{3}{8}+frac{triangle }{2}-frac{{triangle }^{2}}{8}right)+left(1-alpha right)left(frac{3}{8}-frac{5{triangle }^{2}}{8}right)$$
(4)
$$S=alpha left(frac{3}{8}-frac{5{triangle }^{2}}{8}right)+left(1-alpha right)left(frac{3}{8}+frac{triangle }{2}-frac{{triangle }^{2}}{8}right)$$
(5)
The expected reward for cooperative players R is the same in all conditions. This is because the expected reward is equal to the average of the possible rewards associated with win and loss and players who cooperate with equal have an equal chance of winning the trial.
Therefore (R=({R}_{{win}}{+R}_{{lose}})/2=(alpha {R}_{{trial}}+left(1-alpha right){R}_{{trial}})/2={R}_{{trial}})/2 which does not depend on . Likewise for the expected reward for competitive players P. When one player cooperates and the other competes however, players dont have the same chance of winning a trial and therefore T and S depend also on . For =0.5 the reward is shared equally no matter what players do so if one compete against a cooperator, they both are expected to win:
$$T=S=frac{3}{8}+frac{triangle }{4}-frac{{3triangle }^{2}}{8}$$
(7)
For =2, T diverges quickly from S as
$$T-S=frac{3}{2}left(triangle+{triangle }^{2}right)$$
(8)
We also computed the expected payoff by simulating 10000 trials of two players competing and/or cooperating by in the three conditions of the game and the results matched the analytical solutions. For the intermediate and competitive conditions, for all values of it is also true that (T>R>P>S) thus demonstrating that the Space Dilemma in these conditions is a continuous probabilistic form of Prisoners Dilemma in the strong sense. For >0.4 and in all conditions the payoff for a dyad always cooperating is always higher that for one where one player is always competing and other always cooperating or if both alternate cooperation and competition (2R>T+S), therefore for >0.4 the space dilemma is a probabilistic form of iterated prisoners dilemma. Furthermore, for all conditions the maximum payoff for the dyad is reached for =0.25.
To model the behaviour in the game we fitted eighteen different models belonging to three different classes all assuming that players implement some sort of titxtat. The first class of models (Model S1-S4) is based on the assumption that players decide their behaviour simply based on the last observed behaviour of their counterpart, by reciprocating either their last position, their last change in position, or a combination of the two. A second class of models goes further in assuming that a player learns to anticipate the co-players position in a fashion that is predicted quantitatively by a Bayesian learner (Bayesian models in B1-B8). The eight Bayesian models differ in how this expectation is mapped into a choice, allowing for different degrees of influence of the context, their counterpart behaviour and the player own bias. A third class of models assumes that participants were choosing what to do based not only on the other player behaviour but also on the outcome of each trial, with different assumptions on how winning a trial should change their behaviour in the next (becoming more or less cooperative). This class of models were effectively assuming that the player behaviour would be shaped by the reward collected (Reward models in Fig.3d).
For simplicity, we remapped positions in the space to a cooperation space so that choosing the midpoint (competitive position) would correspond to minimum cooperation while going to the extreme ends of the space (either x=0 or x=1) would correspond to maximum cooperation. Therefore is symmetrical to the midpoint and is defined as
$$theta=left|x-0.5right|/0.5,({{{{{rm{S}}}}}}1-{{{{{rm{S}}}}}}4,, {{{{{rm{B}}}}}}1-{{{{{rm{B}}}}}}8,, {{{{{rm{R}}}}}}1-{{{{{rm{R}}}}}}6)$$
(9)
All models include a precision parameter capturing intrinsic response variability linked to sensory-motor precision of the participant, such that, given each models prediction about the players decision, the actual choice will be normally distributed around that prediction with standard deviation equal to the inverse of the precision parameter, constrained to be in the range (0:10000).
For models S1-S4, we assumed that participants were simply reacting to their counterpart recent choice. Model S1 simply assumed that players would attempt to reciprocate their co-players level of cooperation . As the model operate in a symmetrical cooperation space this implies matching their expected level of cooperation in the opposite hemifield.
$${choice}left(tright) sim N,left(theta left(t-1right){{{{{rm{;}}}}}} , 1/{{{{{rm{Precision}}}}}}right)({{{{{rm{S}}}}}}1)$$
(10)
Model S2 simply assumed that players would attempt to reciprocate their co-players updates in their level of cooperation moving from their previous position plus a fixed SocialBias parameter, capturing their a priori desired level of cooperation, constrained to be in the range (1000:1000).
$${choice}left(tright) sim N,left({{{{{rm{SocialBias}}}}}}+{choice}left(t-1right)+triangle theta (t-1){{{{{rm{;}}}}}} ,1/{{{{{rm{Precision}}}}}}right)({{{{{rm{S}}}}}}2)$$
(11)
Model S3 was identical to model S2 with the only difference of having three different SocialBias parameters, one for each social context. Model S4 simply assumed that players would reciprocate their co-players last level of cooperation scaled by a TitXtat multiplicative parameter, constrained to be in the range (0:2). If this is bigger than 1, a participant would cooperate more than the counterpart.
$${choice}left(tright) sim N,left({{{{{rm{SocialBias}}}}}}+{{{{{rm{TitXTat}}}}}} * theta left(t-1right){{{{{rm{;}}}}}} , 1/{{{{{rm{Precision}}}}}}right)({{{{{rm{S}}}}}}4)$$
(12)
For models B1-B8, we used a Bayesian decision framework that has been shown to explain how humans learn in social contexts very well32,99 for modelling how participants made decisions in the task and how the social context (reward distribution) can modulate these decisions. Our ideal Bayesian learner was assumed to update its expectation about the co-players level of cooperation on a trial by trial basis by observing the position of its counterpart. In our Bayesian framework, knowledge about has two sources: a prior distribution P() on based initially on the social context and thereafter on past experience and a likelihood function P(D) based on the observed position of the counterpart in the last trial. The product of prior and likelihood is the posterior distribution that defines the expectation about the counterparts position in the next trial:
$$Pleft(theta left(t+1right)right)=P(theta (t+1)|D)=frac{left(Pleft(D right|theta left(tright)right) * P(theta (t))}{P(D)},({{{{{rm{B}}}}}}1-{{{{{rm{B}}}}}}8)$$
(13)
According to Bayesian decision theory (Berger, 1985; OReilly et al., 2013), the posterior distribution P(D) captures all the information that the participant has about . In the first trial of a block, when players have no evidence on past position of the co-players, we chose normal priors that correspond to the social context: in the competition context prior=0, in the cooperation context, prior=1, and in the intermediate context where the winner takes all, prior=0.5, whereas in all cases the standard deviation is fixed to prior=0.05 which heuristically speeds up the fit. The likelihood function is also assumed to be a normal distribution centred on the observed location of the co-player with standard deviation fixed to the average variability in positions observed so far in the block (that is, in all trials up to the one in which is estimated). Being the product of two Gaussian distribution the posterior distribution is also Gaussian. All distributions are computed for all values of the linear space at a resolution of d=0.01.
While all Bayesian models assume that players update their expectations about the co-player choices, they differ in how the translate these expectations into their own choices. We built 8 Bayesian models based on increasing level of complexity. In short, all models include a Precision parameter. Model B1 simply assumes that players will aim to reciprocate the expected position of the co-player (coplayer_exp_pos).
$${coplayer}_{exp }_{pos},(t)=Eleft(Pleft(theta (t)right)right)({{{{{rm{B}}}}}}1-{{{{{rm{B}}}}}}8)$$
(14)
$${choice}left(tright) sim N,left({coplayer}_{exp }_{pos},left(tright){{{{{rm{;}}}}}} , 1/{{{{{rm{Precision}}}}}}right)({{{{{rm{B}}}}}}1)$$
(15)
Model B2 assumes that players will aim for a level of cooperation shifted compared to coplayer_exp_pos. Such a shift is captured by the SocialBias parameter which sets an a priori tendency to be more or less cooperative and all further Bayesian models include it.
$${choice}left(tright) sim N,({coplayer}_{exp }_{pos},left(tright)+{{{{{rm{SocialBias;}}}}}} , 1/{{{{{rm{Precision}}}}}}) , ({{{{{rm{B}}}}}}2)$$
(16)
Model B3 further assumes that participants can fluctuate in how much they reciprocate their co-player cooperation. This effect is modelled multiplying coplayer_exp_pos by a TitXTat parameter.
$${choice}left(tright) sim N,({{{{{rm{TitXTat}}}}}} * {coplayer}_{exp }_{pos},left(tright)+{{{{{rm{SocialBias;}}}}}} , 1/{{{{{rm{Precision}}}}}}) , ({{{{{rm{B}}}}}}3)$$
(17)
Model B4 further assumes that players keep track of the target position, updating their expectations after each trial in a similar way as they keep track of the co-player position, with a Bayesian update. They then decide their level of cooperation based on the prediction of Model 3 plus a linear term that depends on the expected position of the target scaled by a TargetBias parameter. As the target was random we did not expect this model to significantly increase the fit compared to Model 3.
$${choice}left(tright) sim N,(T{itXTat} * {coplayer}_{exp }_{pos},left(tright)+{{{{{rm{SocialBias}}}}}} \ +{{{{{rm{TargetBias}}}}}} * left(Pleft({x}_{{target}}right)right){{{{{rm{;}}}}}} , 1/{{{{{rm{Precision}}}}}}) , ({{{{{rm{B}}}}}}4)$$
(18)
Model B5 further assumes that participants modulate how much they are willing to reciprocate their co-player behaviour based on the social risk associated to the context. In this model the TitXtat takes the form of a multiplying TitXTat factor
$${TitXTat; factor}=frac{1}{1+q_{risk} * {social}_{risk}},({{{{{rm{B}}}}}}5)$$
(19)
$${choice}left(tright) sim N({TitXTat; factor} * {coplayer}_{exp }_{pos}left(tright)+{{{{{rm{SocialBias}}}}}} \ +{{{{{rm{TargetBias}}}}}} * left(Pleft({x}_{{target}}right)right){{{{{rm{;}}}}}} , 1/{{{{{rm{Precision}}}}}}) , ({{{{{rm{B}}}}}}5)$$
(20)
Where q_risk is a parameter capturing the sensitivity to the social risk induced by the context, which is proportional to the redistribution parameter :
$${social; risk}=2,alpha -1,({{{{{rm{B}}}}}}5-{{{{{rm{B}}}}}}8)$$
(21)
Model B6, B7 and B8 do not include the target term. They all model the TitXtat factor with two parameters as in
$${TitXTat; factor}=frac{{TitXTat}}{1+{q_risk} * {social_risk}} , left({{{{{rm{B}}}}}}6-{{{{{rm{B}}}}}}8right)$$
(22)
$${choice}left(tright) sim Nleft({{{{{rm{TitXTat; factor}}}}}} * {coplayer}_{exp }_{pos}left(tright){{{{{rm{;}}}}}},1/{{{{{rm{Precision}}}}}}right)({{{{{rm{B}}}}}}6-{{{{{rm{B}}}}}}8)$$
(23)
Model B7 and B8 further assume that participants estimate the probability that their co-player will betray their expectations and behave more competitively than expected. This is computed updating their betrayal expectations after each trial in a Bayesian fashion using the difference between the observed and expected position of the co-player to update a distribution over all possible discrepancies. This produces, for each trial, an expected level of change in the co-player position. Model B7 and B8 both weigh this expected betrayal with a betrayal sensitivity parameter and add this betrayal term either to the social risk, increasing it by an amount proportional to the expected betrayal (model B7) or to the choice prediction, shifting it towards competition by an amount proportional to the expected betrayal (model B8). Model B6 does not include any modelling of the betrayal.
For models R1-R6, we assumed that participants were simply adjusting their position based on the feedback received in the previous trial. Model R1 assumed that after losing, players would become more competitive and after winning, more cooperative. These updates in different directions would be captured by two parameters Shiftwin and Shiftlose both constrained to be in the range (0:10).
$$ch{oice}left(tright) sim N(ch{oice}(t-1)pm {Sh{ift}}_{({win},{lose})}{{{{{rm{;}}}}}} , 1/{Precision}) , ({{{{{rm{R}}}}}}1)$$
(24)
Model R2 assumed that after losing, players would shift their position in the opposite direction than they did in the previous trial, while after winning, they would keep shifting in the same direction. These updates in different directions would be captured by two parameters Shiftwin and Shiftlose both constrained to be in the range (0:10).
$$ch{oice}(t) sim N(ch{oice}(t-1)pm {Sh{ift}}_{left(right.{win},{lose},{sign}(triangle ch{oice}(t-1))}; , 1/{Precision}) , ({{{{{rm{R}}}}}}2)$$
(25)
Model R3 and R4 are similar to model R1 and R2 in how they update the position following winning or losing but now players would also take into account their co-players last level of cooperation scaled by a TitXtat multiplicative parameter and their own a priori tendency to be more or less cooperative captured by a SocialBias parameter.
$$ch{oice}left(tright) sim N({{{{{rm{SocialBias}}}}}}+{{{{{rm{TitXTat}}}}}} * theta left(t-1right)pm {Sh{ift}}_{left({win},{lose}right)}{{{{{rm{;}}}}}} , 1/{Precision}) , ({{{{{rm{R}}}}}}3)$$
(26)
$$choice(t) sim N({{{{{rm{SocialBias}}}}}}+{{{{{rm{TitXTat}}}}}} * theta (t - 1) \ pm {Shift}_{left(right.{win},{lose},{sign}(triangle choice(t - 1))}; , 1/{Precision}) , ({{{{{rm{R}}}}}}4)$$
(27)
Model R5 and R6 are identical to model R1 and R2 with the only difference of fitting each choice using the actual value of the previous choice made by the players rather than its fitted value (to prevent under fitting because of recursive errors).
We fit all models to individual participants data from all three social contexts using custom scripts in MATLAB and the MATLAB function fmincon. Log likelihood was computed for each model by
$${LL}left({model}right)=mathop{sum}limits_{{subjects}}mathop{sum}limits_{t}{LL}({choice}(t))$$
(28)
where
$${LL}({choice}(t))={log }left( sqrt{frac{{Precision}}{2pi }} * {{exp }}left(right.-0.5 * {(({{{{{rm{choice}}}}}}({{{{{rm{t}}}}}})-{{{{{rm{prediction}}}}}}({{{{{rm{t}}}}}})) * {Precision})}^{2}right.$$
(29)
We compared models computing the Bayesian information Criterion
$${BIC}left({model}right)=klog left(nright)-2 * {LL}({model})$$
(30)
where k is the number of parameters for each model and n = number of trials * number of participants.
All Bayesian models significantly outperformed both the simple reactive models and the rewards-based ones. To validate this modelling approach and confirm that players were trying to predict others positions rather than just reciprocating preceding choices, we ran a regressions model to explain participants choices based on both the last position of the co-player and its Bayesian expectation in the following trial (see supplementary figure6b).
The winning model is B6, a Bayesian model that contained features that accounted for both peoples biases towards cooperativeness, how the behaviour of the other player influenced subsequent choices and the influence of the social context. For this model, participants choose where to position themselves in each trial based on (21), (22) and (23).
Precision, SocialBias, TitXTat, q_risk are the four free parameters of the model. Notice that TitXTat is a parameter capturing the context-independent amount of titXtat which is then normalised by the context-dependant social risk.
We assessed the degree to which we could reliably estimate model parameters given our fitting procedure. More specifically, we generated one simulated behavioral data set (i.e., choices for an interacting couple for 60 trials in three different social contexts) using the average parameters estimated originally on the real behavioral data. Additionally we generated five more simulated behavioral data sets using five randomly sampled parameter sets from the range used in the original fit. For each simulated behavioral data set we ran the winning model B6 this time trying to fit the generated data and identify the set of model parameters that maximized the log-likelihood in the same way we did for original behavioral data. To assess the recoverability of our parameters we repeated this procedure 10 times for each simulated data set (i.e., 60 repetitions). The recoverability of the parameters was high in almost all cases as can be seen in Supplementary Fig.6c.
The Bayesian framework allowed us to derive how counterparts position influenced participants initial impressions of the level of cooperation needed in a given context. Given this framework, we measured how much the posterior distribution over the co-player position differs from the prior distribution. We did so by computing, for each trial, the KullbackLeibler divergence (KLD) between the posterior and prior probability distribution over the co-player response. This absolute difference formally represents the degree with which P2 violated P1s expectation and is a trial-by-trial measure of a social prediction error that triggers a change in P1s belief, guiding future decisions. A greater KL divergence indicates a higher cooperation-competition update. We, therefore, estimated a social prediction error signal by computing the surprise each player experienced when observing the co-player position, based on its current expectation. In the following equation, where p and q represent respectively prior and posterior density functions over the co-player position, the KL divergence is given by:
$${KLD}left(p,, qright)=-int pleft(xright)log qleft(xright){dx}+int pleft(xright)log pleft(xright){dx}=int pleft(xright)left(right.log (pleft(xright)-log qleft(xright)){dx}$$
Excerpt from: