<?xml version="1.0" encoding="utf-8" standalone="yes" ?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom">
  <channel>
    <title>Posts | Matteo Lisi</title>
    <link>http://mlisi.xyz/post/</link>
      <atom:link href="http://mlisi.xyz/post/index.xml" rel="self" type="application/rss+xml" />
    <description>Posts</description>
    <generator>Source Themes Academic (https://sourcethemes.com/academic/)</generator><language>en-us</language><copyright>© 2024 Matteo Lisi</copyright><lastBuildDate>Sat, 26 Sep 2020 00:00:00 +0000</lastBuildDate>
    <image>
      <url>http://mlisi.xyz/img/shademe.png</url>
      <title>Posts</title>
      <link>http://mlisi.xyz/post/</link>
    </image>
    
    <item>
      <title>How much is this game worth?</title>
      <link>http://mlisi.xyz/post/question-interview/</link>
      <pubDate>Sat, 26 Sep 2020 00:00:00 +0000</pubDate>
      <guid>http://mlisi.xyz/post/question-interview/</guid>
      <description>


&lt;p&gt;This question was posed during an interview for a AI / data science position for a global financial firm:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;&lt;em&gt;Consider a game of chance in which a player can roll up to 3 times a dice. They win an amount of money proportional to the outcome of the last dice roll (1, 2, 3, 4, 5, or 6 £). They don’t need to do all the 3 throws and can stop before and collect their win if they want. You are the house in this game, what is the minimum amount of £ that you can charge for playing this game such that you won’t take losses in the long term.&lt;/em&gt;&lt;/strong&gt;&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;
 
&lt;/p&gt;
&lt;details&gt;
&lt;p&gt;&lt;summary&gt;&lt;mark&gt; &lt;strong&gt;Answer by Oliver Perkins.&lt;/strong&gt; &lt;/mark&gt;&lt;/summary&gt;&lt;/p&gt;
&lt;p&gt;Oliver Perkins pointed out that this can be calculated working backwards from the last throw:&lt;/p&gt;
&lt;blockquote class=&#34;twitter-tweet&#34; data-conversation=&#34;none&#34; data-theme=&#34;dark&#34;&gt;
&lt;p lang=&#34;en&#34; dir=&#34;ltr&#34;&gt;
£4.65? &lt;br&gt;&lt;br&gt;As the punter on roml 3, E(£) = 3.5, so we stick on 4+ on throw 2. Knowing this our EV for the final 2 throws is is (0.5&lt;em&gt;3.5)+(0.5&lt;/em&gt;5)=4.25. Therefore we stick on 5+ on the first throw so E(£) = (0.667&lt;em&gt;4.25)+(0.333&lt;/em&gt;5.5) = ~4.64
&lt;/p&gt;
— Oli Perkins 🔥🌍🏳️
🌈 (&lt;span class=&#34;citation&#34;&gt;@OliPerkins2&lt;/span&gt;) &lt;a href=&#34;https://twitter.com/OliPerkins2/status/1310254755424464898?ref_src=twsrc%5Etfw&#34;&gt;September 27, 2020&lt;/a&gt;
&lt;/blockquote&gt;
&lt;script async src=&#34;https://platform.twitter.com/widgets.js&#34; charset=&#34;utf-8&#34;&gt;&lt;/script&gt;
&lt;p&gt;His solution makes perfect sense and differs from my initial one in that the player would accept a 4 at throw 2 (although now I have updated my answer to take that into account). This small difference in the strategy it’s rational since it increases the overall expected value of the game by &lt;span class=&#34;math inline&#34;&gt;\(\approx\)&lt;/span&gt; 0.05 from a strategy in which the player do not settle for anything less than 5 at any throw.&lt;/p&gt;
&lt;/details&gt;
&lt;p&gt;
 
&lt;/p&gt;
&lt;details&gt;
&lt;p&gt;&lt;summary&gt;&lt;mark&gt; &lt;strong&gt;Alternative approach.&lt;/strong&gt; &lt;/mark&gt;&lt;/summary&gt;&lt;/p&gt;
&lt;p&gt;
 
&lt;/p&gt;
&lt;p&gt;Let’s define with &lt;span class=&#34;math inline&#34;&gt;\(\mathcal{C}\)&lt;/span&gt; the price that the player pay to play the game, and with &lt;span class=&#34;math inline&#34;&gt;\(\mathcal{W}\)&lt;/span&gt; the amount they win. A possible strategy could be to keep playing until their winning are &lt;em&gt;at least&lt;/em&gt; equal to the price, i.e. until &lt;span class=&#34;math inline&#34;&gt;\(\mathcal{W} \ge \mathcal{C}\)&lt;/span&gt; (and may also continue after that if continuing increases their expected win).&lt;/p&gt;
&lt;hr /&gt;
&lt;p&gt;Assume the player continue until they get a 6 (). Let’s define &lt;span class=&#34;math inline&#34;&gt;\(D_1, D_2, D_3\)&lt;/span&gt; indicate the outcomes of throws 1, 2, and 3, respectively. The probability of getting a 6 is&lt;/p&gt;
&lt;p&gt;&lt;span class=&#34;math display&#34;&gt;\[
p\left(\mathcal{W} =6\right) = \frac{1}{6} + \underbrace{\left( 1 - \frac{1}{6} \right) \times  \frac{1}{6}}_{p \left(D_2=6 \mid D_1&amp;lt;6\right)} +  \underbrace{\left( 1 - \frac{1}{6} \right)^2 \times  \frac{1}{6}}_{p \left(D_3=6 \mid D_1&amp;lt;6 \cup D_2&amp;lt;6\right)} = \frac{91}{216} \approx 0.42
\]&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;If instead the player aims for at least &lt;span class=&#34;math inline&#34;&gt;\(5\)&lt;/span&gt;, the probability of getting it is&lt;/p&gt;
&lt;p&gt;&lt;span class=&#34;math display&#34;&gt;\[
p\left(\mathcal{W} \ge 5 \right) = \frac{2}{6} + \underbrace{\left( 1 - \frac{2}{6} \right) \times  \frac{2}{6}}_{p \left(D_2 \ge5 \mid D_1&amp;lt;5\right)} +  \underbrace{\left( 1 - \frac{2}{6} \right)^2 \times  \frac{2}{6}}_{p \left(D_3 \ge 5 \mid D_1&amp;lt;5 \cup D_2&amp;lt;5\right)} = \frac{19}{27} \approx 0.70
\]&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;Say the player obtain 5 at the first throw. Is it worth to continue?&lt;/p&gt;
&lt;p&gt;The probability of getting at least &lt;span class=&#34;math inline&#34;&gt;\(5\)&lt;/span&gt; in the next two throws is &lt;span class=&#34;math inline&#34;&gt;\(\frac{2}{6} + \left(1 - \frac{2}{6} \right)\times \frac{2}{6} = \frac{20}{36} \approx 0.55\)&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;The probability of getting a &lt;span class=&#34;math inline&#34;&gt;\(6\)&lt;/span&gt; in the next two throws is &lt;span class=&#34;math inline&#34;&gt;\(\frac{1}{6} + \left(1 - \frac{1}{6} \right)\times \frac{1}{6} = \frac{11}{36} \approx 0.30\)&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;If the player reach the last throw without obtaining at least &lt;span class=&#34;math inline&#34;&gt;\(5\)&lt;/span&gt; the remaining outcomes, &lt;span class=&#34;math inline&#34;&gt;\(1\)&lt;/span&gt; to &lt;span class=&#34;math inline&#34;&gt;\(4\)&lt;/span&gt;, are equally likely with probability &lt;span class=&#34;math inline&#34;&gt;\(\frac{1}{4}\)&lt;/span&gt;. Thus continuing after obtaining &lt;span class=&#34;math inline&#34;&gt;\(5\)&lt;/span&gt; at the first throw has an expected value of &lt;span class=&#34;math display&#34;&gt;\[\frac{11}{36}\times6 + \left(\frac{20}{36} - \frac{11}{36}\right)\times 5 + \left(1 - \frac{20}{36}\right)\times \sum_{i=1}^4 \frac{1}{4}i \approx 4.19\]&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;The value of stopping after having obtained &lt;span class=&#34;math inline&#34;&gt;\(5\)&lt;/span&gt; is &lt;span class=&#34;math inline&#34;&gt;\(5\)&lt;/span&gt;, thus the player should stop and not continue.&lt;/p&gt;
&lt;hr /&gt;
&lt;p&gt;
 
&lt;/p&gt;
&lt;div id=&#34;expected-value-of-the-game&#34; class=&#34;section level2&#34;&gt;
&lt;h2&gt;Expected value of the game&lt;/h2&gt;
&lt;p&gt;If &lt;span class=&#34;math inline&#34;&gt;\(\mathcal{C}=5\)&lt;/span&gt; we have that &lt;span class=&#34;math inline&#34;&gt;\(p\left(\mathcal{W} &amp;gt; 5 \right) = \frac{91}{216} \approx 0.42\)&lt;/span&gt;, and also that &lt;span class=&#34;math inline&#34;&gt;\(p\left(\mathcal{W} &amp;gt; 4 \right) = \frac{19}{27} \approx 0.70\)&lt;/span&gt;. Thus in the long run the player will win more than they paid about &lt;span class=&#34;math inline&#34;&gt;\(42\)&lt;/span&gt;% of the times, they will be even with the house &lt;span class=&#34;math inline&#34;&gt;\(28\)&lt;/span&gt;% of the times, and they will loose money (obtaining any number from &lt;span class=&#34;math inline&#34;&gt;\(1\)&lt;/span&gt; to &lt;span class=&#34;math inline&#34;&gt;\(4\)&lt;/span&gt;, with equal probability) about &lt;span class=&#34;math inline&#34;&gt;\(30\)&lt;/span&gt;% of the times.&lt;/p&gt;
&lt;p&gt;To see if this is a good deal we calculate the expected value assuming the player continue until they either obtain a &lt;span class=&#34;math inline&#34;&gt;\(5\)&lt;/span&gt; or complete 3 throws. The probability of getting at least 5 in three throws is &lt;span class=&#34;math inline&#34;&gt;\(\frac{19}{27}\)&lt;/span&gt;, and conditional of getting at least 5 the two outcomes of 6 and 5 have both the same probability &lt;span class=&#34;math inline&#34;&gt;\(\frac{1}{2}\)&lt;/span&gt;. We have that for a price of &lt;span class=&#34;math inline&#34;&gt;\(5\)&lt;/span&gt; the player that follow this strategy is expected to incur a loss in the long run since the expected value is:&lt;/p&gt;
&lt;p&gt;&lt;span class=&#34;math display&#34;&gt;\[\underbrace{\frac{19}{27}\frac{1}{2}\times6 + \frac{19}{27}\frac{1}{2}\times5 + \left(1 - \frac{19}{27}\right) \frac{1}{4}\times \sum_{i=1}^4 i}_{\text{expected value if player keep playing until } \mathcal{W} \ge 5} = \frac{83}{18} \approx 4.61\]&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;Now, say the player obtain 4 at the second throw, should they keep it? Yes, since the expected value of the last throw is &lt;span class=&#34;math inline&#34;&gt;\(\frac{1}{6}\sum_{i=1}^6 i=3.5\)&lt;/span&gt;. To take this into account we need a slightly different calculation in which the outcomes are considered separately for each throw (the underbraces indicate the acceptable score in each throw):&lt;/p&gt;
&lt;p&gt;&lt;span class=&#34;math display&#34;&gt;\[
\underbrace{\frac{2}{6}\frac{1}{2}\times \left(5+6\right)}_{\text{5 or 6 in 1st throw}} + 
\underbrace{\left(1-\frac{2}{6}\right)\frac{3}{6}\frac{1}{3}\times \left(4+5+6\right)}_{\text{4, 5 or 6 in 2nd throw}}
+ \underbrace{\left(1-\frac{2}{6}\right)\frac{3}{6} \frac{1}{6}\times \sum_{i=1}^6 i}_{\text{any number in last throw}} = \frac{14}{3} \approx 4.67
\]&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;mark&gt; Thus, the house needs to set a price &lt;span class=&#34;math inline&#34;&gt;\(\mathcal{C}\)&lt;/span&gt; that is at least &lt;span class=&#34;math inline&#34;&gt;\(\frac{14}{3}\approx 4.67\)&lt;/span&gt; otherwise they risk incurring losses. &lt;/mark&gt;&lt;/p&gt;
&lt;/details&gt;
&lt;/div&gt;
</description>
    </item>
    
    <item>
      <title>Setting Kazam to correctly record full-screen on HiDPI displays in Ubuntu</title>
      <link>http://mlisi.xyz/post/kazam/</link>
      <pubDate>Thu, 11 Jun 2020 00:00:00 +0000</pubDate>
      <guid>http://mlisi.xyz/post/kazam/</guid>
      <description>


&lt;p&gt;In this new covid-19 world it happens more and more often that I need to record full-screen videos, for example for lectures. This is something that one can do live with Zoom, but that is not the most practical option for non-live recordings.&lt;/p&gt;
&lt;p&gt;In Ubuntu there is a nice software, Kazam screencaster, that is perfect for the job, except that it does not get correctly the screen size if you have a high pixel density display (HiDPI): you end up with a video with only the top-left corner of the screen cropped.&lt;/p&gt;
&lt;p&gt;There is a simple patch to fix that issue, which I describe here in case it’s useful to someone else and for the benefit of my future self.&lt;/p&gt;
&lt;p&gt;First, you need to find the files &lt;code&gt;gstreamer.py&lt;/code&gt; and &lt;code&gt;prefs.py&lt;/code&gt; in the Kazam installation. For me they were in &lt;code&gt;/usr/lib/python3/dist-packages/kazam/backend/&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;Next, you have to fix these such that they take into account the screen scaling factor, which is obtained from the &lt;code&gt;get_monitor_scale_factor&lt;/code&gt; function in the Gtk library&lt;/p&gt;
&lt;p&gt;This cane be done by adding these lines to the file &lt;code&gt;gstreamer.py&lt;/code&gt;.&lt;/p&gt;
&lt;pre class=&#34;python&#34;&gt;&lt;code&gt; scale = self.video_source[&amp;#39;scale&amp;#39;]
 startx = startx * scale 
 starty = starty * scale 
 endx = endx * scale 
 endy = endy * scale &lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;They should be added around lines 120 or so, right after the properties &lt;code&gt;endx&lt;/code&gt; and &lt;code&gt;endy&lt;/code&gt; are set up (&lt;code&gt;endy = starty + height - 1&lt;/code&gt;).&lt;/p&gt;
&lt;p&gt;Next, open the file &lt;code&gt;prefs.py&lt;/code&gt; and, around line 324, change this bit&lt;/p&gt;
&lt;pre class=&#34;python&#34;&gt;&lt;code&gt;for i in range(self.default_screen.get_n_monitors()):
   rect = self.default_screen.get_monitor_geometry(i)
   self.logger.debug(&amp;quot;  Monitor {0} - X: {1}, Y: {2}, W: {3}, H: {4}&amp;quot;.format(i,
                                      rect.x,
                                      rect.y,
                                      rect.width,
                                      rect.height))
                                      rect.height))

   self.screens.append({&amp;quot;x&amp;quot;: rect.x,
                        &amp;quot;y&amp;quot;: rect.y,
                        &amp;quot;width&amp;quot;: rect.width,
                        &amp;quot;height&amp;quot;: rect.height})&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;into this&lt;/p&gt;
&lt;pre class=&#34;python&#34;&gt;&lt;code&gt;for i in range(self.default_screen.get_n_monitors()):
     rect = self.default_screen.get_monitor_geometry(i)
     scale = self.default_screen.get_monitor_scale_factor(i)

     self.logger.debug(&amp;quot;  Monitor {0} - X: {1}, Y: {2}, W: {3}, H: {4}, scale: {5}&amp;quot;.format(i,
                                        rect.x,
                                        rect.y,
                                        rect.width,
                                        rect.height,
                                        scale))

    self.screens.append({&amp;quot;x&amp;quot;: rect.x,
                         &amp;quot;y&amp;quot;: rect.y,
                         &amp;quot;width&amp;quot;: rect.width,
                         &amp;quot;height&amp;quot;: rect.height,
                         &amp;quot;scale&amp;quot;: scale}) &lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;That’s it! Restart Kazam and the next fullscreen recording should work OK.&lt;/p&gt;
&lt;p&gt;Thanks to user sllorente for describing the patch &lt;a href=&#34;https://bugs.launchpad.net/ubuntu/+bug/1283424&#34;&gt;here&lt;/a&gt;!&lt;/p&gt;
</description>
    </item>
    
    <item>
      <title>Defending the null hypothesis</title>
      <link>http://mlisi.xyz/post/defending-the-null/</link>
      <pubDate>Mon, 09 Sep 2019 00:00:00 +0000</pubDate>
      <guid>http://mlisi.xyz/post/defending-the-null/</guid>
      <description>&lt;p&gt;A friend was working on a paper and found himself in the situation of having to defend the null hypothesis that a particular effect is absent (or not measurable) when tested under more controlled conditions than those used in previous studies. He asked for some practical advice: &lt;em&gt;&amp;ldquo;what would convince you as as a reviewer of a null result?&amp;rdquo;&lt;/em&gt;&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;No statistical test can &amp;ldquo;prove&amp;rdquo; a null results (intended as the point-null hypothesis that an effect of interest is zero). You can however: (&lt;strong&gt;i&lt;/strong&gt;) present evidence that the data are more likely under the null hypothesis than under the alternative; or (&lt;strong&gt;ii&lt;/strong&gt;) put a cap on the size of the effect, which could enable you to argue that any effect, if present, is so small that can be considered theoretically or pragmatically irrelevant.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;blockquote&gt;
&lt;p&gt;(&lt;strong&gt;i&lt;/strong&gt;) is the Bayesian approach and requires calculating a Bayes factor - that is the ratio between the average (or marginal) likelihood of the data under the null and alternative hypothesis. Note that Bayes factor calculation is highly influenced by the priors (e.g. the prior expectations about the effect size). Luckily, in the case of a single comparison (e.g. a t-test), there is popular way of computing Bayes factors which requires minimal assumptions about the effect of interest, as it is developed using uninformative or minimally informative priors, called the JZW prior (technically correspond to assuming a Cauchy prior on the standardized effect size and a uninformative Jeffrey&amp;rsquo;s prior on the variances of your measurements). It&amp;rsquo;s been derived in a paper by Rouder et al. [@Rouder2009c] and there is a easy-to-use R implementation of it in the package &lt;a href=&#34;https://cran.r-project.org/web/packages/BayesFactor/index.html&#34;&gt;BayesFactor&lt;/a&gt;, (see function &lt;code&gt;ttestBF()&lt;/code&gt;).&lt;/p&gt;
&lt;/blockquote&gt;
&lt;blockquote&gt;
&lt;p&gt;(&lt;strong&gt;ii&lt;/strong&gt;) is the frequentist alternative. In a frequentist approach you don&amp;rsquo;t express belief in an hypothesis in terms of probability; uncertainty is characterized in relation to the data-generating process (e.g. how many times you would reject the null if you repeated the experiment a zillion time? - probability is interpreted as the long-run frequency in an imaginary very, very large sample). Under this approach you can estimate what is the maximum size of the effect since you did not detected it in your current experiment. Daniel Lakens has written an easy-to-use package for that, called TOSTER; see &lt;a href=&#34;https://cran.rstudio.com/web/packages/TOSTER/vignettes/IntroductionToTOSTER.html&#34;&gt;this vignette&lt;/a&gt; for an introduction.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;&lt;img src=&#34;https://imgs.xkcd.com/comics/null_hypothesis.png&#34; alt=&#34;&#34;&gt;&lt;/p&gt;
</description>
    </item>
    
    <item>
      <title>Installing RStan on HPC cluster</title>
      <link>http://mlisi.xyz/post/rstan-cluster/</link>
      <pubDate>Sun, 04 Aug 2019 00:00:00 +0000</pubDate>
      <guid>http://mlisi.xyz/post/rstan-cluster/</guid>
      <description>


&lt;p&gt;This took me some time to make it work, so I’ll write the details here for the benefit of my future self and anyone else facing similar issues.&lt;/p&gt;
&lt;p&gt;To run R in the &lt;a href=&#34;https://docs.hpc.qmul.ac.uk/&#34;&gt;Apocrita&lt;/a&gt; cluster (which runs CentOS 7) first load the modules&lt;/p&gt;
&lt;pre class=&#34;bash&#34;&gt;&lt;code&gt;module load R
module load gcc&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;(gcc is required to compile the packages from source.)&lt;/p&gt;
&lt;p&gt;Before starting you should make sure that you don’t have any previous installation of RStan in your system. From an R terminal, type:&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;remove.packages(&amp;quot;rstan&amp;quot;)
remove.packages(&amp;quot;StanHeaders&amp;quot;)
if (file.exists(&amp;quot;.RData&amp;quot;)) file.remove(&amp;quot;.RData&amp;quot;)&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;One problem that I had initially was (I think) due to the fact that Rcpp and rstan had been installed with different compiler or compilation flags.
Thanks to the IT support at Queen Mary University, the correct C++ toolchain configuration that made the trick for me is the following:&lt;/p&gt;
&lt;pre class=&#34;bash&#34;&gt;&lt;code&gt;CXX14 = g++ -std=c++1y
CXX14FLAGS = -O3 -Wno-unused-variable -Wno-unused-function -fPIC&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;To write the correct configuration in the &lt;code&gt;~/.R/Makevars&lt;/code&gt; file from an R terminal:&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;dotR &amp;lt;- file.path(Sys.getenv(&amp;quot;HOME&amp;quot;), &amp;quot;.R&amp;quot;)
if (!file.exists(dotR)) dir.create(dotR)
M &amp;lt;- file.path(dotR, &amp;quot;Makevars&amp;quot;)
if (!file.exists(M)) file.create(M)
cat(&amp;quot;\nCXX14 = g++ -std=c++1y&amp;quot;, &amp;quot;CXX14FLAGS = -O3 -Wno-unused-variable -Wno-unused-function -fPIC&amp;quot;, file = M, sep = &amp;quot;\n&amp;quot;, append = TRUE)&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Finally, install RStan:&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;Sys.setenv(MAKEFLAGS = &amp;quot;-j4&amp;quot;) # four cores used for building install
install.packages(&amp;quot;rstan&amp;quot;, type = &amp;quot;source&amp;quot;)&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Note that in my case it worked correctly without requiring to run the instructions specific for CentOS 7.0 indicated at &lt;a href=&#34;https://github.com/stan-dev/rstan/wiki/Installing-RStan-on-Linux#special-note-centos-70&#34;&gt;rstan installation page&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;Another thing that I did, although I am not sure it is strictly necessary, was to install RStan on a new R library, that is in a directory that contained only packages necessary to run RStan.&lt;/p&gt;
</description>
    </item>
    
    <item>
      <title>Attitude shift towards Remain in European Elections obscured in press by rebranded Farage party</title>
      <link>http://mlisi.xyz/post/eu/</link>
      <pubDate>Mon, 27 May 2019 00:00:00 +0000</pubDate>
      <guid>http://mlisi.xyz/post/eu/</guid>
      <description>


&lt;p&gt;The European Election results reveal a shift towards parties that support a soft or no Brexit compared to votes in 2014. Instead, major news outlets in the UK and Europe claim that Hard Brexit has gained support. To see why this is a misguided conclusion, just look at the numbers:&lt;/p&gt;
&lt;div class=&#34;figure&#34;&gt;
&lt;img src=&#34;http://mlisi.xyz/img/EU_election.png&#34; alt=&#34;Almost complete vote share results of UK’s EU elections 2019, put in perspective.&#34; /&gt;
&lt;p class=&#34;caption&#34;&gt;Almost complete vote share results of UK’s EU elections 2019, put in perspective.&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;Parties supporting strong UK independence have lost overwhelming numbers of voters, with UKIP losing 24.2% of the votes, and conservative losing 14.8%. This loss is only partially recovered by Farage’s Brexit Party, which gained 31.6%, suggesting that a substantial proportion of voters switched to parties favoring a soft or no Brexit.&lt;/p&gt;
&lt;p&gt;Amongst parties favoring closer European connections, those vocally against Brexit show major wins (Lib Dem, 13.4%, Green 4.2%), whilst the Labor party without clear Brexit stance, has lost 11.3% of its votes.&lt;/p&gt;
&lt;p&gt;The numerical shift towards parties against Brexit, is obscured by Farage’s clever rebranding of UKIP. The apparent overnight success of Farage’s new Brexit party, now the largest party around, is interpreted in major news outlets such as the Guardian and the French24, as a victory for hard Brexiteers. This conclusion overlooks that the vast majority of the gained votes are funneled directly from Farrage’s own former party UKIP. This rebranding allows the Farage to claim a victory of 28 new seats over 2014, instead of the correct increase of 5.&lt;/p&gt;
&lt;p&gt;Despite claims on the triumph of Farage’s party on the media, the numbers actually suggest a rising scepticism toward Brexit.&lt;/p&gt;
&lt;p&gt;(After making this plot we realized that the Guardian had written a &lt;a href=&#34;https://www.theguardian.com/politics/2019/may/27/remain-hard-brexit-what-uk-european-election-results-tell-us&#34;&gt;perspective article&lt;/a&gt; on the same lines.).&lt;/p&gt;
&lt;p&gt;&lt;a href=&#34;https://www.ucl.ac.uk/~ucjttb1/&#34;&gt;Tessa Dekker&lt;/a&gt; &amp;amp; Matteo Lisi&lt;/p&gt;
</description>
    </item>
    
    <item>
      <title>Ephemeral patterns of complexity</title>
      <link>http://mlisi.xyz/post/ephemeral/</link>
      <pubDate>Fri, 05 Apr 2019 00:00:00 +0000</pubDate>
      <guid>http://mlisi.xyz/post/ephemeral/</guid>
      <description>


&lt;div class=&#34;figure&#34;&gt;
&lt;img src=&#34;http://mlisi.xyz/img/complexity.png&#34; alt=&#34;From ‘The Big Picture: On the Origins of Life, Meaning, and the Universe Itself’ by Sean Carrol, www.preposterousuniverse.com/bigpicture&#34; /&gt;
&lt;p class=&#34;caption&#34;&gt;From ‘&lt;em&gt;The Big Picture: On the Origins of Life, Meaning, and the Universe Itself&lt;/em&gt;’ by Sean Carrol, &lt;a href=&#34;https://www.preposterousuniverse.com/bigpicture/&#34;&gt;www.preposterousuniverse.com/bigpicture&lt;/a&gt;&lt;/p&gt;
&lt;/div&gt;
</description>
    </item>
    
    <item>
      <title>Bayesian model selection at the group level</title>
      <link>http://mlisi.xyz/post/bms/</link>
      <pubDate>Fri, 25 Jan 2019 00:00:00 +0000</pubDate>
      <guid>http://mlisi.xyz/post/bms/</guid>
      <description>




&lt;p&gt;In experimental psychology and neuroscience the classical approach when comparing different models that make quantitative predictions about the behavior of participants is to aggregate the predictive ability of the model (e.g. as quantified by Akaike Information criterion) across participants, and then see which one provide on average the best performance. Although correct, this approach neglect the possibility that different participants might use different strategies that are best described by alternative, competing models. To account for this, Stephan et al. &lt;span class=&#34;citation&#34;&gt;(Stephan et al. 2009)&lt;/span&gt; proposed a more conservative approach where models are treated as random effects that could differ between subjects and have a fixed (unknown) distribution in the population. The relevant statistical quantity is the frequency with which any model prevails in the population. Note that this is different from the definition of random-effects in classical statistic where random effects models have multiple sources of variation, e.g. within- and between- subject variance. An useful and popular way to summarize the results of this analysis is by reporting the model’s &lt;em&gt;exceedance probabilities&lt;/em&gt;, which measures how likely it is that any given model is more frequent than all other models in the set. The following exposition is largerly based on Stephan et al’s paper &lt;span class=&#34;citation&#34;&gt;(Stephan et al. 2009)&lt;/span&gt;.&lt;/p&gt;
&lt;div id=&#34;model-evidence&#34; class=&#34;section level1&#34;&gt;
&lt;h1&gt;Model evidence&lt;/h1&gt;
&lt;p&gt;Let’s say we have an experiment with &lt;span class=&#34;math inline&#34;&gt;\(\left(1,\dots,N\right)\)&lt;/span&gt; participants. Their performance is quantitatively predicted by a set &lt;span class=&#34;math inline&#34;&gt;\(\left(1,\dots,K\right)\)&lt;/span&gt; competing models. The behaviour of any subject &lt;span class=&#34;math inline&#34;&gt;\(n\)&lt;/span&gt; can be fit by the model &lt;span class=&#34;math inline&#34;&gt;\(k\)&lt;/span&gt; by finding the value(s) of the parameter(s) &lt;span class=&#34;math inline&#34;&gt;\(\theta_k\)&lt;/span&gt; that maximize the likelihood of the data &lt;span class=&#34;math inline&#34;&gt;\(y_n\)&lt;/span&gt; under the model. In a fully Bayesian setting each unknown parameter would have a prior probability distribution, and the quantity of choice ofr comparing the goodness of fit of the model is the marginal likelihood, that is
&lt;span class=&#34;math display&#34;&gt;\[
  p \left(y_n \mid k \right) = \int p\left(y_n \mid k, \theta_k \right) \, p\left(\theta_k \right) d\theta.
\]&lt;/span&gt;
By integrating over the prior probability of parameters the marginal likelihood provide a measure of the evidence in favour of a specific model while taking into account the complexity of the model. We might also do something simpler and approximate the model evidence using e.g. the Akaike information criterion.&lt;/p&gt;
&lt;/div&gt;
&lt;div id=&#34;models-as-random-effects&#34; class=&#34;section level1&#34;&gt;
&lt;h1&gt;Models as random effects&lt;/h1&gt;
&lt;p&gt;We are interested in finding which model does better at predicting behavior, however we allow for different participants to use different strategies which can be represented by different models. To achieve that we treat the model as random effects and we assume that the frequency or probability of models in the population, &lt;span class=&#34;math inline&#34;&gt;\((r_1, \dots, r_K)\)&lt;/span&gt;, is described by a Dirichlet distribution with parameters &lt;span class=&#34;math inline&#34;&gt;\(\boldsymbol{\alpha } = \alpha_1, \dots, \alpha_k\)&lt;/span&gt;,
&lt;span class=&#34;math display&#34;&gt;\[
\begin{align}
p\left(r \mid  \boldsymbol{\alpha } \right) &amp;amp; = \text{Dir} \left(r, \boldsymbol{ \alpha } \right) \\
&amp;amp; = \frac{1}{\mathbf{B} \left(\boldsymbol{ \alpha }  \right)} \prod_{i=1}^K r_i^{\alpha_i -1} \nonumber
\end{align}.
\]&lt;/span&gt;
Where the normalizing constant &lt;span class=&#34;math inline&#34;&gt;\(\mathbf{B} \left(\boldsymbol{ \alpha } \right)\)&lt;/span&gt; is the multivariate Beta function. The probabilities &lt;span class=&#34;math inline&#34;&gt;\(r\)&lt;/span&gt; generates ‘switches’ or indicator variables &lt;span class=&#34;math inline&#34;&gt;\(m_n = m_1, \dots, m_N\)&lt;/span&gt; where &lt;span class=&#34;math inline&#34;&gt;\(m \in \left \{ 0, 1\right \}\)&lt;/span&gt; and &lt;span class=&#34;math inline&#34;&gt;\(\sum_1^K m_{nk}=1\)&lt;/span&gt;. These indicator variables prescribe the model for the subjects &lt;span class=&#34;math inline&#34;&gt;\(n\)&lt;/span&gt;, $ p(m_{nk}=1)=r_k$.
Given the probabilities &lt;span class=&#34;math inline&#34;&gt;\(r\)&lt;/span&gt;, the indicator variables have thus a multinomial distribution, that is
&lt;span class=&#34;math display&#34;&gt;\[
p\left(m_n \mid  \mathbf{r} \right) =  \prod_{k=1}^K r_k^{m_{nk}}.
\]&lt;/span&gt;
The graphical model that summarizes these dependencies is shown the following graph:&lt;/p&gt;
&lt;div class=&#34;figure&#34;&gt;
&lt;img src=&#34;http://mlisi.xyz/img/bms.png&#34; alt=&#34;&#34; style=&#34;width:50.0%&#34; /&gt;
&lt;p class=&#34;caption&#34;&gt;&lt;/p&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;div id=&#34;variational-bayesian-approach&#34; class=&#34;section level1&#34;&gt;
&lt;h1&gt;Variational Bayesian approach&lt;/h1&gt;
&lt;p&gt;The goal is to estimate the parameters &lt;span class=&#34;math inline&#34;&gt;\(\boldsymbol{\alpha}\)&lt;/span&gt; that define the posterior distribution of model frequencies given the data, $ p ( r | y)$. To do so we need an estimate of the model evidence &lt;span class=&#34;math inline&#34;&gt;\(p \left(m_{nk}=1 \mid y_n \right)\)&lt;/span&gt;, that is the belief that the model &lt;span class=&#34;math inline&#34;&gt;\(k\)&lt;/span&gt; generated data from subject &lt;span class=&#34;math inline&#34;&gt;\(m\)&lt;/span&gt;. There are many possible approach that can be used to estimate the model evidence, either exactly or approximately. Importantly, these would need to be normalized so that they sum to one across models, so that is one were using the Akaike Information criterion, this should be transformed into Akaike weights &lt;span class=&#34;citation&#34;&gt;(Burnham and Anderson 2002)&lt;/span&gt;.&lt;/p&gt;
&lt;div id=&#34;generative-model&#34; class=&#34;section level2&#34;&gt;
&lt;h2&gt;Generative model&lt;/h2&gt;
&lt;p&gt;Given the graphical model illustrated above, the joint probability of parameters and data can be expressed as
&lt;span class=&#34;math display&#34;&gt;\[
\begin{align}
p \left( y, r, m \right) &amp;amp; = p \left( y \mid m \right) \, p \left( m \mid r \right) \, p \left( r \mid \boldsymbol{\alpha} \right) \\
&amp;amp; = p \left( r \mid \boldsymbol{\alpha} \right) \left[ \prod_{n=1}^N p \left( y_n \mid m_n \right) \, p\left(m_n \mid r \right) \right] \nonumber \\
&amp;amp; = \frac{1}{\mathbf{B} \left(\boldsymbol{ \alpha }  \right)} \left[ \prod_{k=1}^K r_k^{\alpha_k -1} \right] \left[ \prod_{n=1}^N p \left( y_n \mid m_n\right) \, \prod_{k=1}^K r_k^{m_{nk}} \right] \nonumber \\
&amp;amp; = \frac{1}{\mathbf{B} \left(\boldsymbol{ \alpha }  \right)} \prod_{n=1}^N \left[ \prod_{k=1}^K \left[ p \left( y_n \mid m_{nk} \right) \, r_k \right]^{m_{nk}} \, r_k^{\alpha_k -1} \right]. \nonumber
\end{align}
\]&lt;/span&gt;
And the log probability is
&lt;span class=&#34;math display&#34;&gt;\[
\log p \left( y, r, m \right)  = - \log \mathbf{B} \left(\boldsymbol{ \alpha }  \right)
+ \sum_{n=1}^N \sum_{k=1}^K \left[ \left(\alpha_k -1 \right) \log r_k
+ m_{nk} \left( p \left( \log y_n \mid m_{nk} \right) + \log r_k\right)\right].
\]&lt;/span&gt;&lt;/p&gt;
&lt;/div&gt;
&lt;div id=&#34;variational-approximation&#34; class=&#34;section level2&#34;&gt;
&lt;h2&gt;Variational approximation&lt;/h2&gt;
&lt;p&gt;In order to fit this hierarchical model following the variational approach one needs to define an approximate posterior distribution over model frequencies and assignments, &lt;span class=&#34;math inline&#34;&gt;\(q\left(r,m\right)\)&lt;/span&gt;, which is assumed to be adequately described by a mean-field factorisation, that is &lt;span class=&#34;math inline&#34;&gt;\(q\left(r,m\right) = q\left(r\right) \, q\left(m\right)\)&lt;/span&gt;. The two densities are proportional to the exponentiated &lt;em&gt;variational energies&lt;/em&gt; &lt;span class=&#34;math inline&#34;&gt;\(I(m), I(r)\)&lt;/span&gt;, which are essentially the un-normalized approximated log-posterior densities, that is
&lt;span class=&#34;math display&#34;&gt;\[
\begin{align}
q\left(r\right) &amp;amp; \propto e^{I(r)}, \, q\left(m\right)\propto e^{I(m)} \\
I(r) &amp;amp; = \left&amp;lt; \log p \left( y, r, m \right) \right&amp;gt;_{q(r)} \\
I(m) &amp;amp; = \left&amp;lt; \log p \left( y, r, m \right) \right&amp;gt;_{q(m)}
\end{align}
\]&lt;/span&gt;
For the approximate posterior over model assignment &lt;span class=&#34;math inline&#34;&gt;\(q(m)\)&lt;/span&gt; we first compute &lt;span class=&#34;math inline&#34;&gt;\(I(m)\)&lt;/span&gt; and then an appropriate normalization constant. From the expression above of the joint log-probability, and removing all the terms that do not depend on &lt;span class=&#34;math inline&#34;&gt;\(m\)&lt;/span&gt; we have that the un-normalized approximate log-posterior (the variational energy) can be expressed as
&lt;span class=&#34;math display&#34;&gt;\[
\begin{align}
I(m) &amp;amp; = \int p \left( y, r, m \right) \, q(r) \, dr \\
&amp;amp; = \sum_{n=1}^N \sum_{k=1}^K m_{nk} \left[ p \left( \log y_n \mid m_{nk} \right) + \int q(r_k) \log r_k \, d r_k \right] \nonumber \\
&amp;amp; = \sum_{n=1}^N \sum_{k=1}^K m_{nk} \left[ p \left( \log y_n \mid m_{nk} \right) + \psi (\alpha_k) -\psi \left(  \alpha_S \right) \right] \nonumber
\end{align}
\]&lt;/span&gt;
where &lt;span class=&#34;math inline&#34;&gt;\(\alpha_S = \sum_{k=1}^K \alpha_k\)&lt;/span&gt; and &lt;span class=&#34;math inline&#34;&gt;\(\psi\)&lt;/span&gt; is the digamma function. If you wonder (as I did when reading this the first time) where the hell does the digamma function comes from here: well it is here due to a property of the Dirichlet distribution, which says that the expected value of &lt;span class=&#34;math inline&#34;&gt;\(\log r_k\)&lt;/span&gt; can be computed as
&lt;span class=&#34;math display&#34;&gt;\[
\mathbb{E} \left[\log r_k \right] = \int p(r_k) \log r_k \, d r_k = \psi (\alpha_k) -\psi \left( \sum_{k=1}^K \alpha_k \right)
\]&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;From this, we have that the un-normalized posterior belief that model &lt;span class=&#34;math inline&#34;&gt;\(k\)&lt;/span&gt; generated data from subject &lt;span class=&#34;math inline&#34;&gt;\(n\)&lt;/span&gt; is
&lt;span class=&#34;math display&#34;&gt;\[
u_{nk} =  \exp {\left[ p \left( \log y_n \mid m_{nk} \right) + \psi (\alpha_k) -\psi \left(  \alpha_S \right) \right]}
\]&lt;/span&gt;
and the normalized belief is
&lt;span class=&#34;math display&#34;&gt;\[
g_{nk} = \frac{u_{nk}}{\sum_{k=1}^K u_{nk}}
\]&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;We need also to compute the approximate posterior density &lt;span class=&#34;math inline&#34;&gt;\(q(r)\)&lt;/span&gt;, and we begin as above by computing the un-normalized, approximate log-posterior or variational energy
&lt;span class=&#34;math display&#34;&gt;\[
\begin{align}
I(r) &amp;amp; = \int p \left( y, r, m \right) \, q(m) \, dm \\
&amp;amp; = \sum_{k=1}^K \left[\log r_k \left(\alpha_{0k} -1 \right) +  \sum_{n=1}^N g_{nk} \log r_k \right]
\end{align}
\]&lt;/span&gt;
The logarithm of a Dirichlet density is &lt;span class=&#34;math inline&#34;&gt;\(\log \text{Dir} (r , \boldsymbol{\alpha}) = \sum_{k=1}^K \log r_k \left(\alpha_{0k} -1 \right) + \dots\)&lt;/span&gt;, therefore the parameters of the approximate posterior are
&lt;span class=&#34;math display&#34;&gt;\[
  \boldsymbol{\alpha} = \boldsymbol{\alpha}_0 + \sum_{n=1}^N g_{nk}
\]&lt;/span&gt;&lt;/p&gt;
&lt;/div&gt;
&lt;div id=&#34;iterative-algorithm&#34; class=&#34;section level2&#34;&gt;
&lt;h2&gt;Iterative algorithm}&lt;/h2&gt;
&lt;p&gt;The algorithm &lt;span class=&#34;citation&#34;&gt;(Stephan et al. 2009)&lt;/span&gt; proceeds by estimating iteratively the posterior belief that a given model generated the data from a certain subject, by integrating out the prior probabilities of the models (the &lt;span class=&#34;math inline&#34;&gt;\(r_k\)&lt;/span&gt; predicted by the Dirichlet distribution that describes the frequency of models in the population) in log-space as described above. Next the parameters of the approximate Dirichlet posterior are updated, which gives new priors to integrate out from the model evidence, and so on until convergence.Convergence is assessed by keeping track of how much the vector $ $ change from one iteration to the next, i.e. is common to consider that the procedure has converged when &lt;span class=&#34;math inline&#34;&gt;\(\left\Vert \boldsymbol{\alpha}_{t-1} \cdot \boldsymbol{\alpha}_t \right\Vert &amp;lt; 10^{-4}\)&lt;/span&gt; (where &lt;span class=&#34;math inline&#34;&gt;\(\cdot\)&lt;/span&gt; is the dot product).&lt;/p&gt;
&lt;/div&gt;
&lt;div id=&#34;exceedance-probabilities&#34; class=&#34;section level2&#34;&gt;
&lt;h2&gt;Exceedance probabilities&lt;/h2&gt;
&lt;p&gt;After having found the optimised values of &lt;span class=&#34;math inline&#34;&gt;\(\boldsymbol{\alpha}\)&lt;/span&gt;, one popular way to report the results and rank the models is by their exceedance probability, which is defined as the (second order) probability that participants were more likely to choose a certain model to generate behavior rather than any other alternative model, that is
&lt;span class=&#34;math display&#34;&gt;\[
\forall j \in \left\{1, \dots, K, j \ne k \right\}, \,\,\, \varphi_k = p \left(r_k &amp;gt; r_j \mid y, \boldsymbol{\alpha} \right).
\]&lt;/span&gt;
In the case of &lt;span class=&#34;math inline&#34;&gt;\(K&amp;gt;2\)&lt;/span&gt; models, the exceedance probabilities &lt;span class=&#34;math inline&#34;&gt;\(\varphi_k\)&lt;/span&gt; are computed by generating random samples from univariate Gamma densities and then normalizing. Specifically, each multivariate Dirichlet sample is composed of &lt;span class=&#34;math inline&#34;&gt;\(K\)&lt;/span&gt; independent random samples &lt;span class=&#34;math inline&#34;&gt;\((x_1, \dots, x_K)\)&lt;/span&gt; distributed according to the density &lt;span class=&#34;math inline&#34;&gt;\(\text{Gamma}\left(\alpha_i, 1\right) = \frac{x_i^{\alpha_i-1} e^{-x_i}}{\Gamma(\alpha_i)}\)&lt;/span&gt;, and then set normalize them by taking &lt;span class=&#34;math inline&#34;&gt;\(z_i = \frac{x_i}{ \sum_{i=1}^K x_i}\)&lt;/span&gt;. The exceedance probability &lt;span class=&#34;math inline&#34;&gt;\(\varphi_k\)&lt;/span&gt; for each model &lt;span class=&#34;math inline&#34;&gt;\(k\)&lt;/span&gt; is then computed as
&lt;span class=&#34;math display&#34;&gt;\[
\varphi_k = \frac{\sum \mathop{\bf{1}}_{z_k&amp;gt;z_j, \forall j \in \left\{1, \dots, K, j \ne k \right\} }}{ \text{n. of samples}}
\]&lt;/span&gt;
where &lt;span class=&#34;math inline&#34;&gt;\(\mathop{\bf{1}}_{\dots}\)&lt;/span&gt; is the indicator function (&lt;span class=&#34;math inline&#34;&gt;\(\mathop{\bf{1}}_{x&amp;gt;0} = 1\)&lt;/span&gt; if &lt;span class=&#34;math inline&#34;&gt;\(x&amp;gt;0\)&lt;/span&gt; and &lt;span class=&#34;math inline&#34;&gt;\(0\)&lt;/span&gt; otherwise), summed over the total number of multivariate samples drawn.&lt;/p&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;div id=&#34;code&#34; class=&#34;section level1&#34;&gt;
&lt;h1&gt;Code!&lt;/h1&gt;
&lt;p&gt;All this is already implementd in Matlab code in &lt;a href=&#34;https://www.fil.ion.ucl.ac.uk/spm/software/spm12/&#34;&gt;SPM 12&lt;/a&gt;. However, if you don’t like Matlab, I have translated it into R, and put it into a &lt;a href=&#34;https://github.com/mattelisi/bmsR&#34;&gt;package on Github&lt;/a&gt;.&lt;/p&gt;
&lt;/div&gt;
&lt;div id=&#34;references&#34; class=&#34;section level1 unnumbered&#34;&gt;
&lt;h1&gt;References&lt;/h1&gt;
&lt;div id=&#34;refs&#34; class=&#34;references&#34;&gt;
&lt;div id=&#34;ref-Burnham2002&#34;&gt;
&lt;p&gt;Burnham, Kenneth P., and David R. Anderson. 2002. &lt;em&gt;Model Selection and Multimodel Inference: A Practical Information-Theoretic Approach&lt;/em&gt;. 2nd editio. New York, US: Springer New York. &lt;a href=&#34;https://doi.org/10.1007/b97636&#34;&gt;https://doi.org/10.1007/b97636&lt;/a&gt;.&lt;/p&gt;
&lt;/div&gt;
&lt;div id=&#34;ref-Stephan2009&#34;&gt;
&lt;p&gt;Stephan, Klaas Enno, Will D. Penny, Jean Daunizeau, Rosalyn J. Moran, and Karl J. Friston. 2009. “Bayesian model selection for group studies.” &lt;em&gt;NeuroImage&lt;/em&gt; 46 (4). Elsevier Inc.: 1004–17. &lt;a href=&#34;https://doi.org/10.1016/j.neuroimage.2009.03.025&#34;&gt;https://doi.org/10.1016/j.neuroimage.2009.03.025&lt;/a&gt;.&lt;/p&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;
</description>
    </item>
    
    <item>
      <title>Bayesian multilevel models using R and Stan (part 1)</title>
      <link>http://mlisi.xyz/post/bayesian-multilevel-models-r-stan/</link>
      <pubDate>Thu, 01 Mar 2018 00:00:00 +0000</pubDate>
      <guid>http://mlisi.xyz/post/bayesian-multilevel-models-r-stan/</guid>
      <description>


&lt;div class=&#34;figure&#34;&gt;
&lt;img src=&#34;http://mlisi.xyz/img/turtlepile.jpg&#34; alt=&#34;Photo ©Roxie and Lee Carroll, www.akidsphoto.com.&#34; /&gt;
&lt;p class=&#34;caption&#34;&gt;Photo ©Roxie and Lee Carroll, www.akidsphoto.com.&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;In my previous lab I was known for promoting the use of multilevel, or mixed-effects model among my colleagues. (The slides on the &lt;a href=&#34;https://mattelisi.github.io/#notes&#34;&gt;/misc&lt;/a&gt; section of this website are part of this effort.) Multilevel models should be the standard approach in fields like experimental psychology and neuroscience, where the data is naturally grouped according to “observational units”, i.e. individual participants. I agree with Richard McElreath when he writes that &lt;em&gt;“multilevel regression deserves to be the default form of regression”&lt;/em&gt; (see &lt;a href=&#34;http://xcelab.net/rmpubs/rethinking/Statistical_Rethinking_sample.pdf&#34;&gt;here&lt;/a&gt;, section 1.3.2) and that, at least in our fields, studies not using a multilevel approach should justify the choice of not using it.&lt;/p&gt;
&lt;p&gt;In &lt;span class=&#34;math inline&#34;&gt;\(\textsf{R}\)&lt;/span&gt;, the easiest way to fit multilevel linear and generalized-linear models is provided by the &lt;code&gt;lme4&lt;/code&gt; library &lt;span class=&#34;citation&#34;&gt;(Bates et al. 2014)&lt;/span&gt;. &lt;code&gt;lme4&lt;/code&gt; is a great package, which allows users to test different models very easily and painlessly. However it has also some limitations: it can be used to fit only classical forms of linear and generalized linear models, and can’t, for example, use to fit psychometric functions that take attention lapses into account (see &lt;a href=&#34;https://mattelisi.github.io/post/model-averaging/&#34;&gt;here&lt;/a&gt;). Also, &lt;code&gt;lme4&lt;/code&gt; allows to fit multilevel models from a frequentist approach, and thus do not allow to incorporate prior knowledge into the model, or to use regularizing priors to reduce the risk of overfitting. For this reason, I have recently started using &lt;a href=&#34;http://mc-stan.org&#34;&gt;Stan&lt;/a&gt;, through its &lt;a href=&#34;http://mc-stan.org/users/interfaces/rstan.html&#34;&gt;&lt;span class=&#34;math inline&#34;&gt;\(\textsf{R}\)&lt;/span&gt;Stan&lt;/a&gt; interface, to fit multilevel models in a Bayesian settings, and I find it great! It certainly requires more effort to define the models, however I think that the flexibility offered by a software like Stan is well worth the time spent to learning how to use it.&lt;/p&gt;
&lt;p&gt;For people like me, used to work with &lt;code&gt;lme4&lt;/code&gt;, Stan can be a bit discouraing at first. The approach to write the model is quite different, and it requires specifying explicitly all the distributional assumptions. Also, implementing models with correlated random effects requires some specific notions of algebra. So I prepared a first tutorial showing how to analyse in Stan one of the most common introductory examples to mixed-effects models, the &lt;code&gt;sleepstudy&lt;/code&gt; dataset (contained in the &lt;code&gt;lme4&lt;/code&gt; package). This will be followed by another tutorial showing how to use this approach to fit dataset where the dependent variable is a binary outcome, as it is the case for most psychophysical data.&lt;/p&gt;
&lt;div id=&#34;the-sleepstudy-example&#34; class=&#34;section level1&#34;&gt;
&lt;h1&gt;The &lt;code&gt;sleepstudy&lt;/code&gt; example&lt;/h1&gt;
&lt;p&gt;This dataset contains part of the data from a published study &lt;span class=&#34;citation&#34;&gt;(Belenky et al. 2003)&lt;/span&gt; that examined the effect of sleep deprivation on reaction times. (This is a sensible topic: think for example to long-distance truck drivers.) The dataset contains the average reaction times for the 18 subjects of the sleep-deprived group, for the first 10 days of the study, up to the recovery period.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;library(lme4)
Loading required package: Matrix
str(sleepstudy)
&amp;#39;data.frame&amp;#39;:   180 obs. of  3 variables:
 $ Reaction: num  250 259 251 321 357 ...
 $ Days    : num  0 1 2 3 4 5 6 7 8 9 ...
 $ Subject : Factor w/ 18 levels &amp;quot;308&amp;quot;,&amp;quot;309&amp;quot;,&amp;quot;310&amp;quot;,..: 1 1 1 1 1 1 1 1 1 1 ...&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;The model I want to fit to the data will contain both random intercepts and slopes; in addition the correlation between the random effects should also be estimated. Using &lt;code&gt;lme4&lt;/code&gt;, this model could be estimated by using&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;lmer(Reaction ~ Days + (Days | Subject), sleepstudy)&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;The model could be formally notated as
&lt;span class=&#34;math display&#34;&gt;\[
y_{ij} = \beta_0 + u_{0j} + \left( \beta_1 + u_{1j} \right) \cdot {\rm{Days}} + e_i
\]&lt;/span&gt;
where &lt;span class=&#34;math inline&#34;&gt;\(\beta_0\)&lt;/span&gt; and &lt;span class=&#34;math inline&#34;&gt;\(\beta_1\)&lt;/span&gt; are the fixed effects parameters (intercept and slope), &lt;span class=&#34;math inline&#34;&gt;\(u_{0j}\)&lt;/span&gt; and &lt;span class=&#34;math inline&#34;&gt;\(u_{1j}\)&lt;/span&gt; are the subject specific random intercept and slope (the index &lt;span class=&#34;math inline&#34;&gt;\(j\)&lt;/span&gt; denotes the subject), and &lt;span class=&#34;math inline&#34;&gt;\(e \sim\cal N \left( 0,\sigma_e^2 \right)\)&lt;/span&gt; is the (normally distributed) residual error. The random effects &lt;span class=&#34;math inline&#34;&gt;\(u_0\)&lt;/span&gt; and &lt;span class=&#34;math inline&#34;&gt;\(u_1\)&lt;/span&gt; have a multivariate normal distribution, with mean 0 and covariance matrix &lt;span class=&#34;math inline&#34;&gt;\(\Omega\)&lt;/span&gt;
&lt;span class=&#34;math display&#34;&gt;\[
\left[ {\begin{array}{*{20}{c}}
{{u_0}}\\
{{u_1}}
\end{array}} \right] \sim\cal N \left( {\left[ {\begin{array}{*{20}{c}}
0\\
0
\end{array}} \right],\Omega  = \left[ {\begin{array}{*{20}{c}}
{\sigma _0^2}&amp;amp;{{\mathop{\rm cov}} \left( {{u_0},{u_1}} \right)}\\
{{\mathop{\rm cov}} \left( {{u_0},{u_1}} \right)}&amp;amp;{\sigma _1^2}
\end{array}} \right]} \right)
\]&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;In Stan, fitting this model requires preparing a separate text file (usually saved with the ‘.stan’ extension), containing several “blocks”. The 3 main types of blocks in Stan are:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;&lt;code&gt;data&lt;/code&gt;&lt;/strong&gt; all the dependent and independent variables needs to be declared in this blocks&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;&lt;code&gt;parameters&lt;/code&gt;&lt;/strong&gt; here one should declare the free parameters of the model; what Stan do is essentially use a MCMC algorithm to draw samples from the posterior distribution of the parameters given the dataset&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;&lt;code&gt;model&lt;/code&gt;&lt;/strong&gt; here one should define the likelihood function and, if used, the priors&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Additionally, we will use two other types of blocks, &lt;strong&gt;&lt;code&gt;transformed parameters&lt;/code&gt;&lt;/strong&gt; and &lt;strong&gt;&lt;code&gt;generated quantities&lt;/code&gt;&lt;/strong&gt;. The first is necessary because we are estimating also the full correlation matrix of the random effects. We will parametrize the covariance matrix as the Cholesky factor of the correlation matrix (see &lt;a href=&#34;https://mattelisi.github.io/post/simulating-correlated-variables-with-the-cholesky-factorization/&#34;&gt;my post on the Cholesky factorization&lt;/a&gt;), and in the &lt;code&gt;transformed parameters&lt;/code&gt; block we will multiply the random effects with the Choleki factor, to transform them so that they have the intended correlation matrix. The &lt;code&gt;generated quantities&lt;/code&gt; block can be used to compute any additional quantities we may want to compute once for each sample; I will use it to transform the Cholesky factor into the correlation matrix (this step is not essential but makes the examination of the model easier).&lt;/p&gt;
&lt;div id=&#34;data&#34; class=&#34;section level2&#34;&gt;
&lt;h2&gt;Data&lt;/h2&gt;
&lt;p&gt;RStan requires the data to be organized in a list object. It can be done with the following command&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;d_stan &amp;lt;- list(Subject = as.numeric(factor(sleepstudy$Subject, 
    labels = 1:length(unique(sleepstudy$Subject)))), Days = sleepstudy$Days, 
    RT = sleepstudy$Reaction/1000, N = nrow(sleepstudy), J = length(unique(sleepstudy$Subject)))&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Note that I also included two scalar variables, &lt;code&gt;N&lt;/code&gt; and &lt;code&gt;J&lt;/code&gt;, indicating respectively the number of observation and the number of subjects. &lt;code&gt;Subject&lt;/code&gt; was a categorical factor, but to input it in Stan I transformed it into an integer index. I also rescaled the reaction times, so that they are in seconds instead of milliseconds.&lt;/p&gt;
&lt;p&gt;These variables can be declared in Stan with the following block. We need to declare the variable type (e.g. real or integer, similarly to programming languages as C++) and for vectors we need to declare the length of the vectors (hence the need of the two scalar variables &lt;code&gt;N&lt;/code&gt; and &lt;code&gt;J&lt;/code&gt;). Note that variables can be given lower and upper bounds. See the Stan reference manual for more information of the variable types.&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;data {
  int&amp;lt;lower=1&amp;gt; N;            //number of observations
  real RT[N];                //reaction times

  int&amp;lt;lower=0,upper=9&amp;gt; Days[N];   //predictor (days of sleep deprivation)

  // grouping factor
  int&amp;lt;lower=1&amp;gt; J;                   //number of subjects
  int&amp;lt;lower=1,upper=J&amp;gt; Subject[N];  //subject id
}&lt;/code&gt;&lt;/pre&gt;
&lt;/div&gt;
&lt;div id=&#34;parameters&#34; class=&#34;section level2&#34;&gt;
&lt;h2&gt;Parameters&lt;/h2&gt;
&lt;p&gt;Here is the parameter block. Stan will draw samples from the posterior distribution of all the parameters listed here. Note that for parameters representing standard deviations is necessary to set the lower bound to 0 (variances and standard deviations cannot be negative). This is equivalent to estimating the logarithm of the standard deviation (which can be both positive or negative) and exponentiating before computing the likelihood (because &lt;span class=&#34;math inline&#34;&gt;\(e^x&amp;gt;0\)&lt;/span&gt; for any &lt;span class=&#34;math inline&#34;&gt;\(x\)&lt;/span&gt;). Note that we have also one parameter for the standard deviation of the residual errors (which was implicit in &lt;code&gt;lme4&lt;/code&gt;).
The random effects are parametrixed by a 2 x &lt;code&gt;J&lt;/code&gt; random effect matrix &lt;code&gt;z_u&lt;/code&gt;, and by the Cholesky factor of the correlation matrix &lt;code&gt;L_u&lt;/code&gt;. I have added also the transformed parameters block, where the Cholesky factor is first multipled by the diagonal matrix formed by the vector of the random effect variances &lt;code&gt;sigma_u&lt;/code&gt;, and then is multiplied with the random effect matrix, to obtain a random effects matrix with the intended correlations, which will be used in the model block below to compute the likelihood of the data.&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;parameters {
  vector[2] beta;                   // fixed-effects parameters
  real&amp;lt;lower=0&amp;gt; sigma_e;            // residual std
  vector&amp;lt;lower=0&amp;gt;[2] sigma_u;       // random effects standard deviations

  // declare L_u to be the Choleski factor of a 2x2 correlation matrix
  cholesky_factor_corr[2] L_u;

  matrix[2,J] z_u;                  // random effect matrix
}

transformed parameters {
  // this transform random effects so that they have the correlation
  // matrix specified by the correlation matrix above
  matrix[2,J] u;
  u = diag_pre_multiply(sigma_u, L_u) * z_u;

}&lt;/code&gt;&lt;/pre&gt;
&lt;/div&gt;
&lt;div id=&#34;model&#34; class=&#34;section level2&#34;&gt;
&lt;h2&gt;Model&lt;/h2&gt;
&lt;p&gt;Finally the model block. Here we can define priors for the parameters, and then write the likelihood of the data given the parameters. The likelihood function corresponds to the model equation we saw before.&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;model {
  real mu; // conditional mean of the dependent variable

  //priors
  L_u ~ lkj_corr_cholesky(1.5); // LKJ prior for the correlation matrix
  to_vector(z_u) ~ normal(0,2);
  sigma_e ~ normal(0, 5);       // prior for residual standard deviation
  beta[1] ~ normal(0.3, 0.5);   // prior for fixed-effect intercept
  beta[2] ~ normal(0.2, 2);     // prior for fixed-effect slope

  //likelihood
  for (i in 1:N){
    mu = beta[1] + u[1,Subject[i]] + (beta[2] + u[2,Subject[i]])*Days[i];
    RT[i] ~ normal(mu, sigma_e);
  }
}&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;For the correlation matrix, Stan manual suggest to use a LKJ prior&lt;a href=&#34;#fn1&#34; class=&#34;footnote-ref&#34; id=&#34;fnref1&#34;&gt;&lt;sup&gt;1&lt;/sup&gt;&lt;/a&gt;. This prior has one single shape parameters, &lt;span class=&#34;math inline&#34;&gt;\(\eta\)&lt;/span&gt;: if you set &lt;span class=&#34;math inline&#34;&gt;\(\eta=1\)&lt;/span&gt; then you have effectively a uniform prior distribution over any (Cholesky factor of) 2x2 correlation matrices. For values &lt;span class=&#34;math inline&#34;&gt;\(\eta&amp;gt;1\)&lt;/span&gt; instead you get a more conservative prior, with a mode in the identity matrix (where the correlations are 0). For more information about the LKJ prior see page 556 of Stan reference manual, version 2.17.0, and also &lt;a href=&#34;http://www.psychstatistics.com/2014/12/27/d-lkj-priors/&#34;&gt;this page&lt;/a&gt; for an intuitive demonstration.&lt;/p&gt;
&lt;p&gt;Importantly, I have used (weakly) informative priors for the fixed effect estimates. We know from the literature that simple reaction times are around 300ms, hence the prior for the intercept, which represents the avearage reaction times at Day 0, i.e. before the sleep deprivation. We expect the reaction times to increase with sleep deprivation, so I have used for the slope a Gaussian prior centered at a small positive value (0.2 seconds), which would represents the increase in reaction times with each day of sleep deprivation, however using a very broad standard deviation (2 seconds), which could accomodate also negative or very different slope values if needed. It may be useful to visualize with a plot the priors.
&lt;img src=&#34;http://mlisi.xyz/post/2018-03-4-Bayesian-multilevel-models-R-Stan_files/figure-html/fig1-1.png&#34; width=&#34;624&#34; /&gt;&lt;/p&gt;
&lt;/div&gt;
&lt;div id=&#34;generated-quantities&#34; class=&#34;section level2&#34;&gt;
&lt;h2&gt;Generated quantities&lt;/h2&gt;
&lt;p&gt;Finally, we can add one last block to the model file, to store for each sampling iteration the correlation matrix of the random effect, which can be computed multyplying the Cholesky factor with its transpose.&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;generated quantities {
  matrix[2, 2] Omega;
  Omega = L_u * L_u&amp;#39;; // so that it return the correlation matrix
}&lt;/code&gt;&lt;/pre&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;div id=&#34;estimating-the-model&#34; class=&#34;section level1&#34;&gt;
&lt;h1&gt;Estimating the model&lt;/h1&gt;
&lt;p&gt;Having written all the above blocks in a separate text file (I called it “sleep_model.stan”), we can call Stan from R with following commands. I run 4 independent chains (each chain is a stochastic process which sequentially generate random values; they are called &lt;em&gt;chain&lt;/em&gt; because each sample depends on the previous one), each for 2000 samples. The first 1000 samples are the &lt;em&gt;warmup&lt;/em&gt; (or sometimes called &lt;em&gt;burn-in&lt;/em&gt;), which are intended to allow the sampling process to settle into the posterior distribution; these samples will not be used for inference. Each chain is independent from the others, therefore having multiple chains is also useful to check the convergence (i.e. by looking if all chains converged to the same regions of the parameter space). Additionally, having multiple chain allows to compute a statistic which is also used to check convergence: this is called &lt;span class=&#34;math inline&#34;&gt;\(\hat R\)&lt;/span&gt; and it corresponds to the ratio of the between-chain variance and the within-chain variance. If the sampling has converged then &lt;span class=&#34;math inline&#34;&gt;\({\hat R} \approx 1 \pm 0.01\)&lt;/span&gt;.
When we call function &lt;code&gt;stan&lt;/code&gt;, it will compile a C++ program which produces samples from the joint posterior of the parameter using a powerful variant of MCMC sampling, called &lt;em&gt;Hamiltomian Monte Carlo&lt;/em&gt; (see &lt;a href=&#34;http://elevanth.org/blog/2017/11/28/build-a-better-markov-chain/&#34;&gt;here&lt;/a&gt; for an intuitive explanation of the sampling algorithm).&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;library(rstan)
options(mc.cores = parallel::detectCores())  # indicate stan to use multiple cores if available
sleep_model &amp;lt;- stan(file = &amp;quot;sleep_model.stan&amp;quot;, data = d_stan, 
    iter = 2000, chains = 4)&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;One way to check the convergence of the model is to plot the chain of samples. They should look like a &lt;em&gt;“fat, hairy caterpillar which does not bend”&lt;/em&gt; &lt;span class=&#34;citation&#34;&gt;(Sorensen, Hohenstein, and Vasishth 2016)&lt;/span&gt;, suggesting that the sampling was stable at the posterior.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;traceplot(sleep_model, pars = c(&amp;quot;beta&amp;quot;), inc_warmup = FALSE)&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;img src=&#34;http://mlisi.xyz/post/2018-03-4-Bayesian-multilevel-models-R-Stan_files/figure-html/fig2-1.png&#34; width=&#34;480&#34; /&gt;
There is a &lt;code&gt;print()&lt;/code&gt; method for visualising the estimates of the parameters. The values of the &lt;span class=&#34;math inline&#34;&gt;\({\hat R}\)&lt;/span&gt; (&lt;code&gt;Rhat&lt;/code&gt;) statistics also confirm that the chains converged. The method automatically report credible intervals for the parameters (computed with the percentile method from the samples of the posterior distribution).&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;print(sleep_model, pars = c(&amp;quot;beta&amp;quot;), probs = c(0.025, 0.975), 
    digits = 3)
Inference for Stan model: sleep_model_v1.
5 chains, each with iter=6000; warmup=3000; thin=1; 
post-warmup draws per chain=3000, total post-warmup draws=15000.

         mean se_mean    sd  2.5% 97.5% n_eff Rhat
beta[1] 0.255       0 0.006 0.243 0.268  6826    1
beta[2] 0.011       0 0.001 0.008 0.013  7830    1

Samples were drawn using NUTS(diag_e) at Sat Sep 22 17:15:42 2018.
For each parameter, n_eff is a crude measure of effective sample size,
and Rhat is the potential scale reduction factor on split chains (at 
convergence, Rhat=1).&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;And we can visualze the posterior distribution as histograms (here for the fixed effects parameters and the standard deviations of the corresponding random effects).&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;plot(sleep_model, plotfun = &amp;quot;hist&amp;quot;, pars = c(&amp;quot;beta&amp;quot;, &amp;quot;sigma_u&amp;quot;))
`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;img src=&#34;http://mlisi.xyz/post/2018-03-4-Bayesian-multilevel-models-R-Stan_files/figure-html/fig3-1.png&#34; width=&#34;384&#34; /&gt;
Finally, we can also examine the correlation matrix of random-effects.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;print(sleep_model, pars = c(&amp;quot;Omega&amp;quot;), digits = 3)
Inference for Stan model: sleep_model_v1.
5 chains, each with iter=6000; warmup=3000; thin=1; 
post-warmup draws per chain=3000, total post-warmup draws=15000.

            mean se_mean    sd   2.5%   25%   50%   75% 97.5% n_eff  Rhat
Omega[1,1] 1.000     NaN 0.000  1.000 1.000 1.000 1.000 1.000   NaN   NaN
Omega[1,2] 0.221   0.007 0.344 -0.546 0.011 0.251 0.467 0.807  2228 1.001
Omega[2,1] 0.221   0.007 0.344 -0.546 0.011 0.251 0.467 0.807  2228 1.001
Omega[2,2] 1.000   0.000 0.000  1.000 1.000 1.000 1.000 1.000   160 1.000

Samples were drawn using NUTS(diag_e) at Sat Sep 22 17:15:42 2018.
For each parameter, n_eff is a crude measure of effective sample size,
and Rhat is the potential scale reduction factor on split chains (at 
convergence, Rhat=1).&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;The &lt;code&gt;Rhat&lt;/code&gt; values for the first entry of the correlation matrix is NaN. This is expected for variables that remain constant during samples. We can check that this variable resulted in a series of identical values during sampling with the following command&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;all(unlist(extract(sleep_model, pars = &amp;quot;Omega[1,1]&amp;quot;)) == 1)  # all values are =1 ?
[1] TRUE&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;That’s all! You can check by yourself that the values are the sufficiently similar to what we would obtain using &lt;code&gt;lmer&lt;/code&gt;, and eventually experiment by yourself how the estimates changes when more informative priors are used. For more examples on how to fit linear mixed-effects models using Stan I recommend the article by Sorensen &lt;span class=&#34;citation&#34;&gt;(Sorensen, Hohenstein, and Vasishth 2016)&lt;/span&gt;, which also show how to implement &lt;em&gt;crossed&lt;/em&gt; random effects of subjects and item (words), as it is conventional in linguistics.&lt;/p&gt;
&lt;/div&gt;
&lt;div id=&#34;references&#34; class=&#34;section level1 unnumbered&#34;&gt;
&lt;h1&gt;References&lt;/h1&gt;
&lt;div id=&#34;refs&#34; class=&#34;references&#34;&gt;
&lt;div id=&#34;ref-Bates2014&#34;&gt;
&lt;p&gt;Bates, D, M Maechler, B Bolker, and S Walker. 2014. “lme4: Linear mixed-effects models using Eigen and S4.” R package version 1.1-7. &lt;a href=&#34;http://cran.r-project.org/package=lme4&#34;&gt;http://cran.r-project.org/package=lme4&lt;/a&gt;.&lt;/p&gt;
&lt;/div&gt;
&lt;div id=&#34;ref-Belenky2003&#34;&gt;
&lt;p&gt;Belenky, Gregory, Nancy J Wesensten, David R Thorne, Maria L Thomas, Helen C Sing, Daniel P Redmond, Michael B Russo, and J Balkin, Thomas. 2003. “Patterns of performance degradation and restoration during sleep restriction and subsequent recovery: a sleep dose-response study.” &lt;em&gt;Journal of Sleep Research&lt;/em&gt; 12 (1): 1–12. &lt;a href=&#34;https://doi.org/10.1046/j.1365-2869.2003.00337.x&#34;&gt;https://doi.org/10.1046/j.1365-2869.2003.00337.x&lt;/a&gt;.&lt;/p&gt;
&lt;/div&gt;
&lt;div id=&#34;ref-Sorensen2016&#34;&gt;
&lt;p&gt;Sorensen, Tanner, Sven Hohenstein, and Shravan Vasishth. 2016. “Bayesian linear mixed models using Stan: A tutorial for psychologists, linguists, and cognitive scientists.” &lt;em&gt;The Quantitative Methods for Psychology&lt;/em&gt; 12 (3): 175–200. &lt;a href=&#34;https://doi.org/10.20982/tqmp.12.3.p175&#34;&gt;https://doi.org/10.20982/tqmp.12.3.p175&lt;/a&gt;.&lt;/p&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;div class=&#34;footnotes&#34;&gt;
&lt;hr /&gt;
&lt;ol&gt;
&lt;li id=&#34;fn1&#34;&gt;&lt;p&gt;The LKJ prior is named after the authors, see: Lewandowski, D., Kurowicka, D., and Joe, H. (2009). Generating random correlation matrices based on vines and extended onion method. &lt;em&gt;Journal of Multivariate Analysis&lt;/em&gt;, 100:1989–2001&lt;a href=&#34;#fnref1&#34; class=&#34;footnote-back&#34;&gt;↩&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;
&lt;/div&gt;
</description>
    </item>
    
    <item>
      <title>Simulating correlated variables with the Cholesky factorization</title>
      <link>http://mlisi.xyz/post/simulating-correlated-variables-with-the-cholesky-factorization/</link>
      <pubDate>Sun, 21 Jan 2018 00:00:00 +0000</pubDate>
      <guid>http://mlisi.xyz/post/simulating-correlated-variables-with-the-cholesky-factorization/</guid>
      <description>


&lt;p&gt;Generating random variables with given variance-covariance matrix can be useful for many purposes. For example it is useful for generating random intercepts and slopes with given correlations when simulating a multilevel, or mixed-effects, model (e.g. see &lt;a href=&#34;https://rpubs.com/adrbart/random_slope_simulation&#34;&gt;here&lt;/a&gt;). This can be achieved efficiently with the &lt;a href=&#34;https://en.wikipedia.org/wiki/Cholesky_decomposition&#34;&gt;Choleski factorization&lt;/a&gt;. In linear algebra the factorization or decomposition of a matrix is the factorization of a matrix into a product of matrices. More specifically, the Choleski factorization is a decomposition of a positive-defined, symmetric&lt;a href=&#34;#fn1&#34; class=&#34;footnote-ref&#34; id=&#34;fnref1&#34;&gt;&lt;sup&gt;1&lt;/sup&gt;&lt;/a&gt; matrix into a product of a triangular matrix and its conjugate transpose; in other words is a method to find the &lt;em&gt;square root&lt;/em&gt; of a matrix. The square root of a matrix &lt;span class=&#34;math inline&#34;&gt;\(C\)&lt;/span&gt; is another matrix &lt;span class=&#34;math inline&#34;&gt;\(L\)&lt;/span&gt; such that &lt;span class=&#34;math inline&#34;&gt;\({L^T}L = C\)&lt;/span&gt;.&lt;a href=&#34;#fn2&#34; class=&#34;footnote-ref&#34; id=&#34;fnref2&#34;&gt;&lt;sup&gt;2&lt;/sup&gt;&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Suppose you want to create 2 variables, having a Gaussian distribution, and a positive correlation, say &lt;span class=&#34;math inline&#34;&gt;\(0.7\)&lt;/span&gt;. The first step is to define the correlation matrix
&lt;span class=&#34;math display&#34;&gt;\[C = \left( {\begin{array}{*{20}{c}}
1&amp;amp;{0.7}\\
{0.7}&amp;amp;1
\end{array}} \right)\]&lt;/span&gt;
Elements in the diagonal can be understood as the correlation of each variable with itself, and therefore are 1, while elements outside the diagonal indicate the desired correlation. In &lt;span class=&#34;math inline&#34;&gt;\(\textsf{R}\)&lt;/span&gt;&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;C &amp;lt;- matrix(c(1,0.7,0.7,1),2,2)&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Next one can use the &lt;code&gt;chol()&lt;/code&gt; function to compute the Cholesky factor. (The function provides the upper triangular square root of &lt;span class=&#34;math inline&#34;&gt;\(C\)&lt;/span&gt;).&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;L &amp;lt;- chol(C)&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;If you multiply the matrix &lt;span class=&#34;math inline&#34;&gt;\(L\)&lt;/span&gt; with itself you get back the original correlation matrix (&lt;span class=&#34;math inline&#34;&gt;\(\textsf{R}\)&lt;/span&gt; output below).&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;t(L) %*% L
     [,1] [,2]
[1,]  1.0  0.7
[2,]  0.7  1.0&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Then we need another matrix with the desired standard deviation in the diagonal (in this example I choose 1 and 2)&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;tau &amp;lt;- diag(c(1,2))&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Multiply that matrix with the lower triangular square root of the correlation matrix (can be obtained by taking the transpose of &lt;span class=&#34;math inline&#34;&gt;\(L\)&lt;/span&gt;)&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;Lambda &amp;lt;- tau %*% t(L)&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Now we can generate values for 2 independent random variables &lt;span class=&#34;math inline&#34;&gt;\(z\sim\cal N\left( {0,1} \right)\)&lt;/span&gt;&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;Z &amp;lt;- rbind(rnorm(1e4),rnorm(1e4))&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Finally, to introduce the correlations , multiply them with the &lt;code&gt;Lambda&lt;/code&gt; obtained above&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;X &amp;lt;- Lambda %*% Z&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Now plot the results
&lt;img src=&#34;http://mlisi.xyz/post/2018-01-21-generating-correlated-random-variables-with-the-cholesky-factorization_files/figure-html/fig1-1.png&#34; width=&#34;729.6&#34; /&gt;
We can verify that the correlation as estimated from the sample corresponds (or is close enough) to the generative value.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;# correlation in the generated sample
cor(X[1,],X[2,])
[1] 0.7093591&lt;/code&gt;&lt;/pre&gt;
&lt;div id=&#34;why-does-it-work&#34; class=&#34;section level1&#34;&gt;
&lt;h1&gt;Why does it work?&lt;/h1&gt;
&lt;p&gt;The covariance matrix of the initial, uncorrelated sample is &lt;span class=&#34;math inline&#34;&gt;\(\mathbb{E} \left( Z Z^T \right) = I\)&lt;/span&gt;, that is the identity matrix, since they have zero mean and unit variance &lt;span class=&#34;math inline&#34;&gt;\(z\sim\cal N\left( {0,1} \right)\)&lt;/span&gt;&lt;a href=&#34;#fn3&#34; class=&#34;footnote-ref&#34; id=&#34;fnref3&#34;&gt;&lt;sup&gt;3&lt;/sup&gt;&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;Let’s suppose that the desired covariance matrix is &lt;span class=&#34;math inline&#34;&gt;\(\Sigma\)&lt;/span&gt;; since it is symmetric and positive defined it is possible to obtain the Cholesky factorization &lt;span class=&#34;math inline&#34;&gt;\(L{L^T} = \Sigma\)&lt;/span&gt;.&lt;/p&gt;
&lt;p&gt;If we then compute a new random vector as &lt;span class=&#34;math inline&#34;&gt;\(X=LZ\)&lt;/span&gt;, we have that its covariance matrix is
&lt;span class=&#34;math display&#34;&gt;\[
\begin{align}
\mathbb{E} \left(XX^T\right) &amp;amp;= \mathbb{E} \left((LZ)(LZ)^T \right) \\
&amp;amp;= \mathbb{E} \left(LZ Z^T L^T\right) \\
&amp;amp;= L \mathbb{E} \left(ZZ^T \right) L^T \\
&amp;amp;= LIL^T = LL^T = \Sigma \\
\end{align}
\]&lt;/span&gt;
Therefore the new random vector &lt;span class=&#34;math inline&#34;&gt;\(X\)&lt;/span&gt; has the covariance matrix &lt;span class=&#34;math inline&#34;&gt;\(\Sigma\)&lt;/span&gt;.&lt;/p&gt;
&lt;p&gt;The third step is justified because the expected value is a linear operator, therefore &lt;span class=&#34;math inline&#34;&gt;\(\mathbb{E}(cX) = c\mathbb{E}(X)\)&lt;/span&gt;. Also &lt;span class=&#34;math inline&#34;&gt;\((AB)^T = B^T A^T\)&lt;/span&gt;, note that the order of the factor reverses.&lt;/p&gt;
&lt;/div&gt;
&lt;div class=&#34;footnotes&#34;&gt;
&lt;hr /&gt;
&lt;ol&gt;
&lt;li id=&#34;fn1&#34;&gt;&lt;p&gt;Actually Choleski factorization can be obtained from all &lt;a href=&#34;https://en.wikipedia.org/wiki/Hermitian_matrix&#34;&gt;&lt;em&gt;Hermitian&lt;/em&gt;&lt;/a&gt; matrices. Hermitian matrices are a complex extension of real symmetric matrices. A symmetric matrices is one that it is equal to its transpose, which implies that its entries are symmetric with respect to the diagonal. In a Hermitian matrix, symmetric entries with respect to the diagonal are complex conjugates, i.e. they have the same real part, and an imaginary part with equal magnitude but opposite in sign. For example, the complex conjugate of &lt;span class=&#34;math inline&#34;&gt;\(x+iy\)&lt;/span&gt; is &lt;span class=&#34;math inline&#34;&gt;\(x-iy\)&lt;/span&gt; (or, equivalently, &lt;span class=&#34;math inline&#34;&gt;\(re^{i\theta}\)&lt;/span&gt; and &lt;span class=&#34;math inline&#34;&gt;\(re^{-i\theta}\)&lt;/span&gt;). Real symmetric matrices can be considered a special case of Hermitian matrices where the imaginary component &lt;span class=&#34;math inline&#34;&gt;\(y\)&lt;/span&gt; (or &lt;span class=&#34;math inline&#34;&gt;\(\theta\)&lt;/span&gt;) is &lt;span class=&#34;math inline&#34;&gt;\(0\)&lt;/span&gt;.&lt;a href=&#34;#fnref1&#34; class=&#34;footnote-back&#34;&gt;↩&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li id=&#34;fn2&#34;&gt;&lt;p&gt;Note that I am using the convention of &lt;span class=&#34;math inline&#34;&gt;\(\textsf{R}\)&lt;/span&gt; software, where the function &lt;code&gt;chol()&lt;/code&gt;, which compute the factorization, returns the &lt;em&gt;upper triangular&lt;/em&gt; factor of the Choleski decomposition. I think that is more commonly assumed that the Choleski decomposition returns the &lt;em&gt;lower triangular&lt;/em&gt; factor &lt;span class=&#34;math inline&#34;&gt;\(L\)&lt;/span&gt;, in which case &lt;span class=&#34;math inline&#34;&gt;\(L{L^T} = C\)&lt;/span&gt;.&lt;a href=&#34;#fnref2&#34; class=&#34;footnote-back&#34;&gt;↩&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li id=&#34;fn3&#34;&gt;&lt;p&gt;More generally the variance-covariance matrix is &lt;span class=&#34;math inline&#34;&gt;\(\Sigma = \mathbb{E}\left( {X{X^T}} \right) - \mathbb{E}\left( X \right) \mathbb{E}\left(X \right)^T\)&lt;/span&gt;. &lt;span class=&#34;math inline&#34;&gt;\(\mathbb{E}\)&lt;/span&gt; indicates the expected value.&lt;a href=&#34;#fnref3&#34; class=&#34;footnote-back&#34;&gt;↩&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;
&lt;/div&gt;
</description>
    </item>
    
    <item>
      <title>Multi-model estimation of psychophysical parameters</title>
      <link>http://mlisi.xyz/post/model-averaging/</link>
      <pubDate>Fri, 08 Dec 2017 00:00:00 +0000</pubDate>
      <guid>http://mlisi.xyz/post/model-averaging/</guid>
      <description>


&lt;p&gt;In the study of human perception we often need to measure how sensitive is an observer to a stimulus variation, and how her/his sensitivity changes due to changes in the context or experimental manipulations. In many applications this can be done by estimating the slope of the psychometric function&lt;a href=&#34;#fn1&#34; class=&#34;footnote-ref&#34; id=&#34;fnref1&#34;&gt;&lt;sup&gt;1&lt;/sup&gt;&lt;/a&gt;, a parameter that relates to the precision with which the observer can make judgements about the stimulus. A psychometric function is generally characterized by 2-3 parameters: the slope, the threshold (or criterion), and an optional lapse parameter, which indicate the rate at which attention lapses (i.e. &lt;em&gt;stimulus-independent&lt;/em&gt; errors) occur.&lt;/p&gt;
&lt;p&gt;As an example, consider the situation where an observer is asked to judge whether a signal (can be anything, from the orientation angle of a line on a screen, or the pitch of a tone, to the speed of a car or the approximate number of people in a crowd, etc.) is above or below a given reference value, call it zero. The experimenter presents the observers with many signals of different intensities, and the observer is asked to respond by making a binary choice (larger/smaller than the reference), under two different contextual conditions (before/after having a pint, with different headphones, etc.). These two conditions are expected to results in different sensitivity, and the experimenter is interested in estimating as precisely as possible the difference in sensitivity&lt;a href=&#34;#fn2&#34; class=&#34;footnote-ref&#34; id=&#34;fnref2&#34;&gt;&lt;sup&gt;2&lt;/sup&gt;&lt;/a&gt;. The psychometric function for one observer in the two conditions might look like this (figure below).&lt;/p&gt;
&lt;div class=&#34;figure&#34;&gt;
&lt;img src=&#34;http://mlisi.xyz/img/psyfun.png&#34; alt=&#34;Psychometric functions. Each points is a response (0 or 1 ; some vertical jitter is added for clarity), and the lines represent the fitted psychometric model (here a cumulative Gaussian psychometric function). The two facets of the plots represent the two different conditions. It can be seen that the precision seems to be different across conditions: judgements made under condition ‘2’ are more variable, indicating reduced sensitivity. &#34; style=&#34;width:70.0%&#34; /&gt;
&lt;p class=&#34;caption&#34;&gt;Psychometric functions. Each points is a response (0 or 1 ; some vertical jitter is added for clarity), and the lines represent the fitted psychometric model (here a cumulative Gaussian psychometric function). The two facets of the plots represent the two different conditions. It can be seen that the precision seems to be different across conditions: judgements made under condition ‘2’ are more variable, indicating reduced sensitivity. &lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;Our focus is on the psychometric slope, and we are not really interested in measuring the lapse rate; however it is still important to take lapses into account: it has been shown that not accounting for lapses can have a large influence on the estimates of the slope &lt;span class=&#34;citation&#34;&gt;(Wichmann and Hill 2001)&lt;/span&gt;.&lt;/p&gt;
&lt;div id=&#34;the-problem-with-lapses&#34; class=&#34;section level3&#34;&gt;
&lt;h3&gt;The problem with lapses&lt;/h3&gt;
&lt;p&gt;Different observer may lapse at quite different rates, and for some of them the lapse rate is probably so small that can be considered negligible. Also, we usually don’t have hypothesis about lapses, and about whether they should or should not vary across conditions.
We can base our analysis on different assumptions about when the observers may have attention lapses:&lt;/p&gt;
&lt;ol style=&#34;list-style-type: decimal&#34;&gt;
&lt;li&gt;they may never lapse (or they do so with a small, negligible frequency);&lt;/li&gt;
&lt;li&gt;they may lapse at a fairly large rate, but the rate is assumed constant across conditions (reasonable, especially if conditions are randomly interleaved);&lt;/li&gt;
&lt;li&gt;they may lapse with variable rate across conditions.&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;These assumptions will lead to three different psychometric models. The number can increase if we consider also different functional forms of the relationship between stimulus and choice; here for simplicity I will consider only psychometric models based on the cumulative Gaussian function (equivalent to a &lt;em&gt;probit&lt;/em&gt; analysis),
&lt;span class=&#34;math inline&#34;&gt;\(\Phi (\frac{x-\mu}{\sigma}) = \frac{1}{2}\left[ {1 + {\rm{erf}}\left( {\frac{{x - \mu }}{{\sigma \sqrt 2 }}} \right)} \right]\)&lt;/span&gt;,
where the mean &lt;span class=&#34;math inline&#34;&gt;\(\mu\)&lt;/span&gt; woud correspond to the threshold parameter, &lt;span class=&#34;math inline&#34;&gt;\(\sigma\)&lt;/span&gt; to the slope, and &lt;span class=&#34;math inline&#34;&gt;\(x\)&lt;/span&gt; is the stimulus intensity.
In our case the first assumption (&lt;em&gt;zero lapses&lt;/em&gt;) would lead to the simplest psychometric model
&lt;span class=&#34;math display&#34;&gt;\[
\Psi (x, \mu_i, \sigma_i)= \Phi (\frac{x-\mu_i}{\sigma_i})
\]&lt;/span&gt;
where the subscript &lt;span class=&#34;math inline&#34;&gt;\(i\)&lt;/span&gt; indicates that the values of both mean &lt;span class=&#34;math inline&#34;&gt;\(\mu_i\)&lt;/span&gt; and slope &lt;span class=&#34;math inline&#34;&gt;\(\sigma_i\)&lt;/span&gt; are specific to the condition &lt;span class=&#34;math inline&#34;&gt;\(i\)&lt;/span&gt;.
The second assumption (&lt;em&gt;fixed lapse rate&lt;/em&gt;) could correspond to the model
&lt;span class=&#34;math display&#34;&gt;\[
\Psi (x, \mu_i, \sigma_i, \lambda)= \lambda + (1-2\lambda) \Phi (\frac{x-\mu_i}{\sigma_i})
\]&lt;/span&gt;
where the parameter &lt;span class=&#34;math inline&#34;&gt;\(\lambda\)&lt;/span&gt; correspond to the probability of the observer making a random error. Note that this is assumed to be fixed with respect to the condition (no subscript).
Finally the last assumption (&lt;em&gt;variable lapse rate&lt;/em&gt;) would suggests the model
&lt;span class=&#34;math display&#34;&gt;\[
\Psi (x, \mu_i, \sigma_i, \lambda_i)= \lambda_i + (1-2\lambda_i) \Phi (\frac{x-\mu_i}{\sigma_i})
\]&lt;/span&gt;
where all the parameters are allowed to vary between conditions.&lt;/p&gt;
&lt;p&gt;We have thus three different models, but we haven’t any prior information to decide which model is more likely to be correct in our case. Also, we acknowledge the fact that there are individual differences and each observer in our sample may conform to one of the three assumptions with equal probability. Hence, ideally, we would like to find a way to deal with lapses - and find the best estimates of the slope values &lt;span class=&#34;math inline&#34;&gt;\(\sigma_i\)&lt;/span&gt; without committing to one of the three models.&lt;/p&gt;
&lt;/div&gt;
&lt;div id=&#34;multi-model-inference&#34; class=&#34;section level1&#34;&gt;
&lt;h1&gt;Multi-model inference&lt;/h1&gt;
&lt;p&gt;One possible solution to this problem is provided by a &lt;em&gt;multi-model&lt;/em&gt;, or model averaging, approach &lt;span class=&#34;citation&#34;&gt;(Burnham and Anderson 2002)&lt;/span&gt;. This requires calculating the &lt;a href=&#34;https://en.wikipedia.org/wiki/Akaike_information_criterion&#34;&gt;AIC (Akaike Information Criterion)&lt;/a&gt;&lt;a href=&#34;#fn3&#34; class=&#34;footnote-ref&#34; id=&#34;fnref3&#34;&gt;&lt;sup&gt;3&lt;/sup&gt;&lt;/a&gt; for each model and subjects, and then combine the estimates according to the Akaike weights of each model. To compute the Akaike weights one typically proceed by first transforming them into differences with respect to the AIC of the best candidate model (i.e. the one with lower AIC)
&lt;span class=&#34;math display&#34;&gt;\[
{\Delta _m} = {\rm{AI}}{{\rm{C}}_m} - \min {\rm{AIC}}
\]&lt;/span&gt;
From the differences in AIC, we can obtain an estimate of the relative likelihood of the model &lt;span class=&#34;math inline&#34;&gt;\(m\)&lt;/span&gt; given the data
&lt;span class=&#34;math display&#34;&gt;\[
\mathcal{L} \left( {m|{\rm{data}}} \right) \propto \exp \left( { - \frac{1}{2}{\Delta _m}} \right)
\]&lt;/span&gt;
Then, to obtain the Akaike weight &lt;span class=&#34;math inline&#34;&gt;\(w_m\)&lt;/span&gt; of the model &lt;span class=&#34;math inline&#34;&gt;\(m\)&lt;/span&gt;, the relative likelihoods are normalized (divided by their sum)
&lt;span class=&#34;math display&#34;&gt;\[
{w_m} = \frac{{\exp \left( { - \frac{1}{2}{\Delta _m}} \right)}}{{\mathop \sum \limits_{k = 1}^K \exp \left( { - \frac{1}{2}{\Delta _k}} \right)}}
\]&lt;/span&gt;
Finally, one can compute the model-averaged estimate of the parameter&lt;a href=&#34;#fn4&#34; class=&#34;footnote-ref&#34; id=&#34;fnref4&#34;&gt;&lt;sup&gt;4&lt;/sup&gt;&lt;/a&gt;, &lt;span class=&#34;math inline&#34;&gt;\(\hat {\bar \sigma}\)&lt;/span&gt;, by combining the estimate of each model according to their Akaike weight
&lt;span class=&#34;math display&#34;&gt;\[
\hat {\bar \sigma} = \sum\limits_{k = 1}^K {{w_k}\hat \sigma_k } 
\]&lt;/span&gt;&lt;/p&gt;
&lt;/div&gt;
&lt;div id=&#34;simulation-results&#34; class=&#34;section level1&#34;&gt;
&lt;h1&gt;Simulation results&lt;/h1&gt;
&lt;p&gt;Model averaging seems a sensitive approach to deal with the uncertainty about which form of the model is best suited to our data. To see whether it is worth doing the extra work of fitting 3 models instead of just one, I run a simulation, where I repeatedly fit and compare the estimates of the three models, with the model-averaged estimate, for different values of sample sizes. In all the simulations, each observer is generated by randomly drawing parameters from a Gaussian distribution which summarize the distribution of the parameters in the population. Hence, I know the &lt;em&gt;true&lt;/em&gt; difference in sensitivity in the population, and by simulating and fitting the models I can test which estimating procedure is more &lt;em&gt;efficient&lt;/em&gt;. In statistics a procedure or an estimator is said to be more efficient than another one when it provides a better estimate with the same number or fewer observations. The notion of “better” clearly relies on the choice of a cost function, which for example can be the mean squared error (it is here).&lt;/p&gt;
&lt;p&gt;Additionally, in my simulations each simulated observer could, &lt;em&gt;with equal probability&lt;/em&gt; &lt;span class=&#34;math inline&#34;&gt;\(\frac{1}{3}\)&lt;/span&gt;, either never lapse, lapse with a constant rate across conditions, or lapse at a higher rate in the more difficult condition (condition ‘2’ where the judgements are less precise). The lapse rates were draw uniformly from the interval [0.01, 0.1], and could get as high as 0.15 in condition ‘2’. Each simulated observer ran 250 trials per condition (similar to the figure at the top of this page). I simulated dataset from &lt;span class=&#34;math inline&#34;&gt;\(n=5\)&lt;/span&gt; to &lt;span class=&#34;math inline&#34;&gt;\(n=50\)&lt;/span&gt;, using 100 iterations for each sample size (only 85 in the case of &lt;span class=&#34;math inline&#34;&gt;\(n=50\)&lt;/span&gt; because the simulation was taking too long and I needed my laptop for other stuff). For simplicity I assumed that the different parameters were not correlated across observers&lt;a href=&#34;#fn5&#34; class=&#34;footnote-ref&#34; id=&#34;fnref5&#34;&gt;&lt;sup&gt;5&lt;/sup&gt;&lt;/a&gt;. I also had my simulated observer using the same criterion across the two conditions, although this may not necessarily be true.
The quantity of interest here is the difference in slope between the two condition, that is &lt;span class=&#34;math inline&#34;&gt;\(\sigma_2 - \sigma_1\)&lt;/span&gt;.&lt;/p&gt;
&lt;p&gt;First, I examined the mean squared error of each of the models’ estimates, and of the model-averaged estimate. This is the average squared difference between the estimate and the true value.
&lt;img src=&#34;http://mlisi.xyz/img/mse.png&#34; alt=&#34;&#34; style=&#34;width:60.0%&#34; /&gt;
The results shows (unless my color blindness fooled me) that the model-averaged estimate attains always the smaller error. Note also that the error tend to decrease exponentially with the sample size. Interestingly, the worst model seems to be the one that allow for the lapses to vary across conditions. This may be because the change in the lapse rate across condition was - when present - relatively small, but also because this model has a larger number of parameters, and thus produces more variable estimates (that is with higher standard errors) than smaller model. Indeed, given that I know the ‘true’ value of the parameres in this simulation settings, I can divide the error into the two subcomponents of variance and bias (see &lt;a href=&#34;http://scott.fortmann-roe.com/docs/BiasVariance.html&#34;&gt;this page&lt;/a&gt; for a nice introduction to the bias-variance tradeoff). The bias is the difference between the expected estimate (averaged over many repetitions/iterations) of the same model and the true quantity that we want to estimate. The variance is simply the variability of the model estimates, i.e. how much they oscillate around the expected estimate.&lt;/p&gt;
&lt;p&gt;Here is a plot of the variance. Indeed it can be seen that the variable-lapse model, which has more parameters, is the one that produces more variable estimates. There is however little difference between the other two models’ and the multi-model estimates
&lt;img src=&#34;http://mlisi.xyz/img/variance.png&#34; alt=&#34;&#34; style=&#34;width:60.0%&#34; /&gt;&lt;/p&gt;
&lt;p&gt;And here is the bias. This is very satisfactory, as it shows that while all individual models produced biased estimates, the bias of the model-averaged estimates is zero, or very close to zero.
&lt;img src=&#34;http://mlisi.xyz/img/bias.png&#34; alt=&#34;&#34; style=&#34;width:60.0%&#34; /&gt;
In sum, by averaging models of different levels of complexity according to their relative likelihood, I was able to simultaneously minimize the variance and decrease the bias of my estimates, and achieve a greater efficiency. Model averaging seems to be the ideal procedure in this specific settings where the observer would belong to one of the three categories (i.e., she/he would conform to one of the three assumptions) with equal probability. However I think (although I haven’t checked) that it would perform well even in cases where a single “type” of observers is largely predominant over the other.&lt;/p&gt;
&lt;/div&gt;
&lt;div id=&#34;code&#34; class=&#34;section level1&#34;&gt;
&lt;h1&gt;Code&lt;/h1&gt;
&lt;p&gt;The (clumsy written) code for the simulations is shown below:&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;# This load some handy functions that are required below
library(RCurl)
script &amp;lt;- getURL(&amp;quot;https://raw.githubusercontent.com/mattelisi/miscR/master/miscFunctions.R&amp;quot;, ssl.verifypeer = FALSE)
eval(parse(text = script))

set.seed(1)

# sim parameters
n_sim &amp;lt;- 100
sample_sizes &amp;lt;- seq(5, 100, 5)

# parameters
R &amp;lt;- 3 # range of signal levels (-R, R)
n_trial &amp;lt;- 500
mu_par &amp;lt;- c(0, 0.25) # population (mean, std.)
sigma_par &amp;lt;- c(1, 0.25)
sigmaDiff_par &amp;lt;- c(1, 0.5)
lapse_range &amp;lt;- c(0.01, 0.1)

# start
res &amp;lt;- {}
for(n_subjects in sample_sizes){
for(iteration in 1:n_sim){  
    
    # make dataset
    d &amp;lt;- {}
    for(i in 1:n_subjects){
        d_ &amp;lt;- data.frame(x=runif(n_trial)*2*R-R, 
                    condition=as.factor(rep(1:2,n_trial/2)), 
                    id=i, r=NA)

        r_i &amp;lt;- runif(1) # draw observer type (wrt lapses)

        if(r_i&amp;lt;1/3){
            # no lapses
            par1 &amp;lt;- c(rnorm(1,mu_par[1],mu_par[2]), 
                abs(rnorm(1,sigma_par[1],sigma_par[2])),
                0)
            par2 &amp;lt;- c(par1[1], 
                par1[2]+abs(rnorm(1,sigmaDiff_par[1],sigmaDiff_par[2])),
                0) 

        }else if(r_i&amp;gt;=1/3 &amp;amp; r_i&amp;lt;2/3){
            # fixed lapses
            l_i &amp;lt;- runif(1)*diff(lapse_range) + lapse_range[1]
            par1 &amp;lt;- c(rnorm(1,mu_par[1],mu_par[2]), 
                abs(rnorm(1,sigma_par[1],sigma_par[2])),
                l_i)
            par2 &amp;lt;- c(par1[1], 
                par1[2]+abs(rnorm(1,sigmaDiff_par[1],sigmaDiff_par[2])), 
                l_i) 

        }else{
            # varying lapses
            l_i_1 &amp;lt;- runif(1)*diff(lapse_range) + lapse_range[1]
            l_i_2 &amp;lt;- l_i_1 + (runif(1)*diff(lapse_range) + lapse_range[1])/2
            par1 &amp;lt;- c(rnorm(1,mu_par[1],mu_par[2]), 
                abs(rnorm(1,sigma_par[1],sigma_par[2])),
                l_i_1)
            par2 &amp;lt;- c(par1[1], 
                par1[2]+abs(rnorm(1,sigmaDiff_par[1],sigmaDiff_par[2])), 
                l_i_2) 
        }

        ## simulate observer
        for(i in 1:sum(d_$condition==&amp;quot;1&amp;quot;)){
            d_$r[d_$condition==&amp;quot;1&amp;quot;][i] &amp;lt;- rbinom(1,1,
                psy_3par(d_$x[d_$condition==&amp;quot;1&amp;quot;][i],par1[1],par1[2],par1[3]))
        }
        for(i in 1:sum(d_$condition==&amp;quot;2&amp;quot;)){
            d_$r[d_$condition==&amp;quot;2&amp;quot;][i] &amp;lt;- rbinom(1,1,
                psy_3par(d_$x[d_$condition==&amp;quot;2&amp;quot;][i],par2[1],par2[2],par2[3]))
        }
        d &amp;lt;- rbind(d,d_)
    }
    
    
    ## model fitting

    # lapse assumed to be 0
    fit0 &amp;lt;- {}
    for(j in unique(d$id)){
        m0 &amp;lt;- glm(r~x*condition, family=binomial(probit),d[d$id==j,])
        sigma_1 &amp;lt;- 1/coef(m0)[2]
        sigma_2 &amp;lt;- 1/(coef(m0)[2] + coef(m0)[4])
        fit0 &amp;lt;- rbind(fit0, data.frame(id=j, sigma_1, sigma_2, 
                loglik=logLik(m0), aic=AIC(m0), model=&amp;quot;zero_lapse&amp;quot;) )
    }
    
    # fix lapse rate 
    start_p &amp;lt;- c(rep(c(0,1),2), 0)
    l_b &amp;lt;- c(rep(c(-5, 0.05),2), 0)
    u_b &amp;lt;- c(rep(c(5, 20), 2), 0.5)
    fit1 &amp;lt;- {}

    for(j in unique(d$id)){
        ftm &amp;lt;- optimx::optimx(par = start_p, lnorm_3par_multi , 
                d=d[d$id==j,],  method=&amp;quot;bobyqa&amp;quot;, 
                lower =l_b, upper =u_b)
        
        negloglik &amp;lt;- ftm$value
        aic &amp;lt;- 2*5 + 2*negloglik
        # fitted parameters are the first n numbers of optimx output
        sigma_1&amp;lt;-unlist(ftm [1,2])
        sigma_2&amp;lt;-unlist(ftm [1,4])
        fit1 &amp;lt;- rbind(fit1, data.frame(id=j, sigma_1, sigma_2, 
                loglik=-negloglik, aic, model=&amp;quot;fix_lapse&amp;quot;)  )
    }
    
    
    # varying lapse rate
    start_p &amp;lt;- c(0,1, 0)
    l_b &amp;lt;- c(-5, 0.05, 0)
    u_b &amp;lt;- c(5, 20, 0.5)
    fit2 &amp;lt;- {}
    for(j in unique(d$id)){
        # fit condition 1
        ftm &amp;lt;- optimx::optimx(par = start_p, lnorm_3par , 
                d=d[d$id==j &amp;amp; d$condition==&amp;quot;1&amp;quot;,],  
                method=&amp;quot;bobyqa&amp;quot;, lower =l_b, upper =u_b) 
        negloglik_1 &amp;lt;- ftm$value; sigma_1 &amp;lt;- unlist(ftm [1,2])
        # fit condition 2
        ftm &amp;lt;- optimx::optimx(par = start_p, lnorm_3par , 
                d=d[d$id==j &amp;amp; d$condition==&amp;quot;2&amp;quot;,],  
                method=&amp;quot;bobyqa&amp;quot;, lower =l_b, upper =u_b) 
        negloglik_2 &amp;lt;- ftm$value; sigma_2 &amp;lt;- unlist(ftm [1,2])

        aic &amp;lt;- 2*6 + 2*(negloglik_1 + negloglik_2)
        fit2 &amp;lt;- rbind(fit2, data.frame(id=j, sigma_1, sigma_2, 
                loglik=-negloglik_1-negloglik_2, aic, model=&amp;quot;var_lapse&amp;quot;))
    }
    
    # compute estimates of the change in slope
    effect_0 &amp;lt;- mean((fit0$sigma_2-fit0$sigma_1))
    effect_1 &amp;lt;- mean((fit1$sigma_2-fit1$sigma_1))
    effect_2 &amp;lt;- mean((fit2$sigma_2-fit2$sigma_1))
    
    effect_av &amp;lt;- {}
    for(j in unique(fit0$id)){
        dj &amp;lt;- rbind(fit0[fit0$id==j,], fit1[fit1$id==j,], fit2[fit2$id==j,])
        min_aic &amp;lt;- min(dj$aic)
        dj$delta &amp;lt;- dj$aic - min_aic
        den &amp;lt;- sum(exp(-0.5*c(dj$delta)))
        dj$w &amp;lt;- exp(-0.5*dj$delta) / den
        effect_av &amp;lt;- c(effect_av, sum((dj$sigma_2-dj$sigma_1) * dj$w))
    }
    effect_av &amp;lt;- mean(effect_av)
    
    # store results
    res &amp;lt;- rbind(res, data.frame(effect_0, effect_1, effect_2, 
            effect_av, effect_true=sigmaDiff_par[1], 
            n_subjects, n_trial, iteration))

}
}

## PLOT RESULTS
library(ggplot2)
library(reshape2)

res$err0 &amp;lt;- (res$effect_0 -1)^2
res$err1 &amp;lt;- (res$effect_1 -1)^2
res$err2 &amp;lt;- (res$effect_2 -1)^2
res$errav &amp;lt;- (res$effect_av -1)^2

# plot MSE
ares &amp;lt;- aggregate(cbind(err0,err1,err2,errav)~n_subjects, res, mean)
ares &amp;lt;- melt(ares, id.vars=c(&amp;quot;n_subjects&amp;quot;))
levels(ares$variable) &amp;lt;- c(&amp;quot;no lapses&amp;quot;, &amp;quot;fixed lapse rate&amp;quot;, &amp;quot;variable lapse rate&amp;quot;, &amp;quot;model averaged&amp;quot;)
ggplot(ares,aes(x=n_subjects, y=value, color=variable))+geom_line(size=1)+nice_theme+scale_color_brewer(palette=&amp;quot;Dark2&amp;quot;,name=&amp;quot;model&amp;quot;)+labs(x=&amp;quot;number of subjects&amp;quot;,y=&amp;quot;mean squared error&amp;quot;)+geom_hline(yintercept=0,lty=2,size=0.2)

# plot variance
ares &amp;lt;- aggregate(cbind(effect_0,effect_1,effect_2,effect_av)~n_subjects, res, var)
ares &amp;lt;- melt(ares, id.vars=c(&amp;quot;n_subjects&amp;quot;))
levels(ares$variable) &amp;lt;- c(&amp;quot;no lapses&amp;quot;, &amp;quot;fixed lapse rate&amp;quot;, &amp;quot;variable lapse rate&amp;quot;, &amp;quot;model averaged&amp;quot;)
ggplot(ares,aes(x=n_subjects, y=value, color=variable))+geom_line(size=1)+nice_theme+scale_color_brewer(palette=&amp;quot;Dark2&amp;quot;,name=&amp;quot;model&amp;quot;)+labs(x=&amp;quot;number of subjects&amp;quot;,y=&amp;quot;variance&amp;quot;)

# plot bias
ares &amp;lt;- aggregate(cbind(effect_0,effect_1,effect_2,effect_av)~n_subjects, res, mean)
ares &amp;lt;- melt(ares, id.vars=c(&amp;quot;n_subjects&amp;quot;))
levels(ares$variable) &amp;lt;- c(&amp;quot;no lapses&amp;quot;, &amp;quot;fixed lapse rate&amp;quot;, &amp;quot;variable lapse rate&amp;quot;, &amp;quot;model averaged&amp;quot;)
ares$value &amp;lt;- ares$value -1
ggplot(ares,aes(x=n_subjects, y=value, color=variable))+geom_hline(yintercept=0,lty=2,size=0.2)+geom_line(size=1)+nice_theme+scale_color_brewer(palette=&amp;quot;Dark2&amp;quot;,name=&amp;quot;model&amp;quot;)+labs(x=&amp;quot;number of subjects&amp;quot;,y=&amp;quot;bias&amp;quot;)&lt;/code&gt;&lt;/pre&gt;
&lt;/div&gt;
&lt;div id=&#34;references&#34; class=&#34;section level1 unnumbered&#34;&gt;
&lt;h1&gt;References&lt;/h1&gt;
&lt;div id=&#34;refs&#34; class=&#34;references&#34;&gt;
&lt;div id=&#34;ref-Burnham2002&#34;&gt;
&lt;p&gt;Burnham, Kenneth P., and David R. Anderson. 2002. &lt;em&gt;Model Selection and Multimodel Inference: A Practical Information-Theoretic Approach&lt;/em&gt;. 2nd editio. New York, US: Springer New York. &lt;a href=&#34;https://doi.org/10.1007/b97636&#34;&gt;https://doi.org/10.1007/b97636&lt;/a&gt;.&lt;/p&gt;
&lt;/div&gt;
&lt;div id=&#34;ref-Wichmann2001&#34;&gt;
&lt;p&gt;Wichmann, F a, and N J Hill. 2001. “The psychometric function: I. Fitting, sampling, and goodness of fit.” &lt;em&gt;Perception &amp;amp; Psychophysics&lt;/em&gt; 63 (8): 1293–1313. &lt;a href=&#34;http://www.ncbi.nlm.nih.gov/pubmed/11800458&#34;&gt;http://www.ncbi.nlm.nih.gov/pubmed/11800458&lt;/a&gt;.&lt;/p&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;div class=&#34;footnotes&#34;&gt;
&lt;hr /&gt;
&lt;ol&gt;
&lt;li id=&#34;fn1&#34;&gt;&lt;p&gt;The psychometric function is a statistical model that predicts the probabilities of the observer response (e.g. “stimulus A has a larger/smaller instensity than stimulus B”), conditional to the stimulus and the experimental condition.&lt;a href=&#34;#fnref1&#34; class=&#34;footnote-back&#34;&gt;↩&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li id=&#34;fn2&#34;&gt;&lt;p&gt;A good experimenter should do that (estimate the size of the difference). A “bad” experimenter might just be interested in obtaining &lt;span class=&#34;math inline&#34;&gt;\(p&amp;lt;.05\)&lt;/span&gt;. See &lt;a href=&#34;http://cerco.ups-tlse.fr/-Charte-statistique-?lang=fr&#34;&gt;this page&lt;/a&gt;, compiled by Jean-Michel Hupé, for some references and guidelines against &lt;span class=&#34;math inline&#34;&gt;\(p\)&lt;/span&gt;-hacking and the misuse of statistical tools in neuroscience.&lt;a href=&#34;#fnref2&#34; class=&#34;footnote-back&#34;&gt;↩&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li id=&#34;fn3&#34;&gt;&lt;p&gt;The AIC of a model is computed as &lt;span class=&#34;math inline&#34;&gt;\({\rm{AIC}} = 2k - 2\log \left( \mathcal{L} \right)\)&lt;/span&gt;, where &lt;span class=&#34;math inline&#34;&gt;\(k\)&lt;/span&gt; is the number of free parameters, and &lt;span class=&#34;math inline&#34;&gt;\(\mathcal{L}\)&lt;/span&gt; is the maximum value of the likelihood function of that model.&lt;a href=&#34;#fnref3&#34; class=&#34;footnote-back&#34;&gt;↩&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li id=&#34;fn4&#34;&gt;&lt;p&gt;Here it is a parameter common to all models. See the book of Burnham &amp;amp; Andersen for methods to methods to deal with different situations &lt;span class=&#34;citation&#34;&gt;(Burnham and Anderson 2002)&lt;/span&gt;.&lt;a href=&#34;#fnref4&#34; class=&#34;footnote-back&#34;&gt;↩&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li id=&#34;fn5&#34;&gt;&lt;p&gt;Such correlation when present can be modelled using a mixed-effect approach. See my tutorial on mized-effects model in the&lt;a href=&#34;http://mattelisi.github.io/#notes&#34;&gt;‘misc’&lt;/a&gt; section of this website.&lt;a href=&#34;#fnref5&#34; class=&#34;footnote-back&#34;&gt;↩&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;
&lt;/div&gt;
</description>
    </item>
    
    <item>
      <title>Listing&#39;s law, and the mathematics of the eyes</title>
      <link>http://mlisi.xyz/post/listing-s-law-and-the-mathematics-of-the-eyes/</link>
      <pubDate>Wed, 27 Sep 2017 00:00:00 +0000</pubDate>
      <guid>http://mlisi.xyz/post/listing-s-law-and-the-mathematics-of-the-eyes/</guid>
      <description>


&lt;p&gt;&lt;em&gt;Brief intro to the mathematical formalism used to describe rotations of the eyes in 3D (including the torsional component).&lt;/em&gt;
&lt;img src=&#34;http://mlisi.xyz/img/3Deyecoord_Haslwanter1995.png&#34; alt=&#34;3D eye coordinate systems in the primary reference position, left panel, and after a leftward rotation, right panel (Haslwanter 1995). &#34; /&gt;&lt;/p&gt;
&lt;p&gt;The shape of the human eye is approximately a sphere with a diameter of 23 mm, and mechanically it behaves like a ball in a ball and socket joint. Because there is a functional distinguished axis - the visual axis, that is the line of gaze or more precisely the imaginary straight line passing through both the center of the pupil and the center of the fovea - the movements of the eyes are usually divided in &lt;em&gt;gaze direction&lt;/em&gt; and &lt;em&gt;cyclotorsion&lt;/em&gt; (or simply &lt;em&gt;torsion&lt;/em&gt;): while gaze direction refers to the direction of the visual axis, the torsion indicates the rotation of the eyeball about the visual axis. While modern video-based eyetrackers allow to record movements of the visual axis, they do not provide data about torsion. It turns out that there is a nice mathematical relationship that constrains the torsion of the eye in every direction of the gaze.
This relationship is known as Listing’s law, and was named after the german mathematician &lt;a href=&#34;https://en.wikipedia.org/wiki/Johann_Benedict_Listing&#34;&gt;Johann Benedict Listing (1808-1882)&lt;/a&gt;. Listing’s law can be better understood by looking at how the 3D orientation of the eye can be formally described.&lt;/p&gt;
&lt;div id=&#34;mathematics-of-3d-eye-movements&#34; class=&#34;section level1&#34;&gt;
&lt;h1&gt;Mathematics of 3D eye movements&lt;/h1&gt;
&lt;p&gt;3D eye position can be specified by characterising the 3D rotation that brings the eye to the current eye position from an arbitrary reference or &lt;em&gt;primary&lt;/em&gt; position&lt;a href=&#34;#fn1&#34; class=&#34;footnote-ref&#34; id=&#34;fnref1&#34;&gt;&lt;sup&gt;1&lt;/sup&gt;&lt;/a&gt;, which typically is defined as the position that the eye assumes when looking straight ahead with the head in normal, upright position. This rotation can be described by the 3-by-3 rotation matrix &lt;span class=&#34;math inline&#34;&gt;\(\bf{R}\)&lt;/span&gt;. More specifically the matrix can be used to describe the rotation of three-dimensional coordinates by a certain angle about a certain axis. To formally define this matrix, consider the coordinate system &lt;span class=&#34;math inline&#34;&gt;\(\{ \vec{h}_1,\vec{h}_2,\vec{h}_3 \}\)&lt;/span&gt; (a coordinate system is defined by a set of linearly independent vectors; e.g. here &lt;span class=&#34;math inline&#34;&gt;\(\vec{h}_1 = (1,0,0)\)&lt;/span&gt;, corresponding to the &lt;span class=&#34;math inline&#34;&gt;\(x\)&lt;/span&gt; axis) as the &lt;em&gt;head-centered&lt;/em&gt; coordinate system where the axis &lt;span class=&#34;math inline&#34;&gt;\(\vec{h}_1\)&lt;/span&gt; correspond to the visual axis when the eye is in the reference position, and &lt;span class=&#34;math inline&#34;&gt;\(\{\vec{e}_1,\vec{e}_2,\vec{e}_3\}\)&lt;/span&gt; is an &lt;em&gt;eye-centered&lt;/em&gt; coordinate system where &lt;span class=&#34;math inline&#34;&gt;\(\vec{e}_1\)&lt;/span&gt; always correspond to the visual axis, regardless of the orientation of the eye (see the figure on top of this page). Any orientation of the eye can be described by a matrix &lt;span class=&#34;math inline&#34;&gt;\(\bf{R}\)&lt;/span&gt; such that
&lt;span class=&#34;math display&#34;&gt;\[
{{\vec{e}}_i} = {\bf{R}} {{\vec{h}}_i}
\]&lt;/span&gt;
where &lt;span class=&#34;math inline&#34;&gt;\(i=1,2,3\)&lt;/span&gt;. This rotation matrix is straightforward for 1D rotations. For example, a purely horizontal rotation of an angle &lt;span class=&#34;math inline&#34;&gt;\(\theta\)&lt;/span&gt; around the axis &lt;span class=&#34;math inline&#34;&gt;\(\vec{h}_3\)&lt;/span&gt; is formulated as
&lt;span class=&#34;math display&#34;&gt;\[
\bf{R}_3 \left( \theta  \right) = \left( {\begin{array}{*{20}{c}}
{\cos \theta }&amp;amp;{ - \sin \theta }&amp;amp;0\\
{\sin \theta }&amp;amp;{\cos \theta }&amp;amp;0\\
0&amp;amp;0&amp;amp;1
\end{array}} \right)
\]&lt;/span&gt;
The first two columns of the matrix indicates the new coordinates of the first (i.e., &lt;span class=&#34;math inline&#34;&gt;\(\vec{h}_1\)&lt;/span&gt;) and of the second basis (&lt;span class=&#34;math inline&#34;&gt;\(\vec{h}_2\)&lt;/span&gt;) of the new eye-centerd coordinate system after the rotation, expressed according to the initial head-centered coordinate system. The third basis, &lt;span class=&#34;math inline&#34;&gt;\(\vec{h}_3\)&lt;/span&gt; is the axis of rotation, and does not change. It becomes more complicated for 3D rotations, i.e. rotations of the fixed eye-centered coordinate system to any new orientation. They can be obtained by calculating a sequence of 3 different rotations about the three fixed axis, and multiplying the corresponding matrices: &lt;span class=&#34;math inline&#34;&gt;\(\bf{R} = \bf{R}_3 \left( \theta \right) \bf{R}_2 \left( \phi \right) \bf{R}_1 \left( \psi \right)\)&lt;/span&gt;. Although the first two rotations are sufficient to specity the orientation of the visual axis, the third is necessary to specify the torsion component and fully specify the 3D orientation of the eye. Importantly, the order of the three rotations is relevant - rotations are not commutative, so if you put them in different order you end up with a different result - and needs to be arbitrarily specified (when it is specified in this order is referred to as &lt;em&gt;Flick sequence&lt;/em&gt;). This representation of 3D orientations is not very efficient (9 values, while only 3 are necessary), or practical for computations; additionally one needs to define arbitrarily the order of the rotations of the sequence.&lt;/p&gt;
&lt;div id=&#34;quaternions-and-rotation-vectors&#34; class=&#34;section level2&#34;&gt;
&lt;h2&gt;Quaternions and rotation vectors&lt;/h2&gt;
&lt;p&gt;An alternative way to describe rotations is with &lt;em&gt;quaternions&lt;/em&gt;. Quaternions can be looked upon as four-dimensional vectors, although they are more commonly split in a real scalar part and an imaginary vector part; they are in fact an extension of the complex numbers. They have the form
&lt;span class=&#34;math display&#34;&gt;\[
q_0 + q_1i + q_2j + q_3k = \left( q_0,\vec{q} \cdot \vec{I} \right) = \left( r,\vec{v} \right)
\]&lt;/span&gt;
where
&lt;span class=&#34;math display&#34;&gt;\[
\vec{q} = \left( \begin{array}{*{20}{c}}
{q_1}\\
{q_2}\\
{q_3}
\end{array} \right)
\]&lt;/span&gt;
and
&lt;span class=&#34;math display&#34;&gt;\[
\vec{I} = \left( \begin{array}{*{20}{c}}
{i}\\
{j}\\
{k}
\end{array} \right)
\]&lt;/span&gt;
&lt;span class=&#34;math inline&#34;&gt;\(i,j,k\)&lt;/span&gt; are the quaternion units. These can be multiplied according to the following formula, discovered by &lt;a href=&#34;https://en.wikipedia.org/wiki/William_Rowan_Hamilton&#34;&gt;Hamilton&lt;/a&gt; in 1843
&lt;span class=&#34;math display&#34;&gt;\[
i^2 = j^2 = k^2 = ijk =  - 1
\]&lt;/span&gt;
This formula may seems strange but it determines all the possible products of &lt;span class=&#34;math inline&#34;&gt;\(i,j,k\)&lt;/span&gt;, such as &lt;span class=&#34;math inline&#34;&gt;\(ij=k\)&lt;/span&gt; and &lt;span class=&#34;math inline&#34;&gt;\(ji=-k\)&lt;/span&gt;. Note that the product of the basis are not commutative. There is a visual trick to remember the multiplication rules, based on the following diagram:&lt;/p&gt;
&lt;div class=&#34;figure&#34;&gt;
&lt;img src=&#34;http://mlisi.xyz/img/quatrule.png&#34; alt=&#34;Multiplying quaternions. Multiplying two elements in the clockwise direction gives the next element along the same direction (e.g. jk=i). The same is for counter-clockwise directions, except that the result is negative (e.g. kj=-i). &#34; /&gt;
&lt;p class=&#34;caption&#34;&gt;Multiplying quaternions. Multiplying two elements in the clockwise direction gives the next element along the same direction (e.g. &lt;span class=&#34;math inline&#34;&gt;\(jk=i\)&lt;/span&gt;). The same is for counter-clockwise directions, except that the result is negative (e.g. &lt;span class=&#34;math inline&#34;&gt;\(kj=-i\)&lt;/span&gt;). &lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;Quaternions can be used to represent rotations. For example, a rotation of an angle &lt;span class=&#34;math inline&#34;&gt;\(\theta\)&lt;/span&gt; around the axis define by the unit vector &lt;span class=&#34;math inline&#34;&gt;\(\vec{u} = (u_1, u_2,u_3) = u_1i + u_2j + u_3k\)&lt;/span&gt;&lt;a href=&#34;#fn2&#34; class=&#34;footnote-ref&#34; id=&#34;fnref2&#34;&gt;&lt;sup&gt;2&lt;/sup&gt;&lt;/a&gt; can be described by the following quaternion
&lt;span class=&#34;math display&#34;&gt;\[
\cos \frac{\theta}{2} + \sin \frac{\theta}{2}\left( u_1i + u_2j + u_3k \right)
\]&lt;/span&gt;
The direction of the rotation is given by the &lt;a href=&#34;https://en.wikipedia.org/wiki/Right-hand_rule#A_rotating_body&#34;&gt;right-hand rule&lt;/a&gt;.
Successive rotations can combined using the formula for quaternion multiplication. The multiplication of quaternions can be computed by the products of their elements element as if they were two polynomials, but keeping track of the ordering of the basis, as their multiplication is not commutative. This is a desired property if we want to specify rotations, which as seen earlier are also not commutative. Quaternion multiplication can be also expressed in the modern language of vector and cross product
&lt;span class=&#34;math display&#34;&gt;\[
\left( r_1,\vec{v_1} \right) \left( r_2,\vec{v_2} \right) = 
\left( r_1 r_2 - \vec{v_1} \cdot \vec{v_2},\;\; r_1\vec{v_2} + r_2\vec{v_1} +\vec{v_1} \times \vec{v_2} \right)
\]&lt;/span&gt;
where “&lt;span class=&#34;math inline&#34;&gt;\(\cdot\)&lt;/span&gt;” is the &lt;a href=&#34;https://en.wikipedia.org/wiki/Dot_product&#34;&gt;dot product&lt;/a&gt; and “&lt;span class=&#34;math inline&#34;&gt;\(\times\)&lt;/span&gt;” is the &lt;a href=&#34;https://en.wikipedia.org/wiki/Cross_product&#34;&gt;cross product&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;In sum, quaternions are pretty useful to compute transformations in 3D. One can use quaternions to combine 3D any sequence of rotations about arbitrary axis (using quaternion multiplications), as well as to rotate any 3D Euclidean vector about any arbitrary axis. A quaternion can also be transformed into a 3D rotation matrix (formula &lt;a href=&#34;https://en.wikipedia.org/wiki/Quaternions_and_spatial_rotation#Quaternion-derived_rotation_matrix&#34;&gt;here&lt;/a&gt;), which then may be used in 3D graphics.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Rotation vectors&lt;/strong&gt; are an even more succint representation of rotations. Indeed, the scalar component of the quaterion (&lt;span class=&#34;math inline&#34;&gt;\(q_0\)&lt;/span&gt;) does not add any information that is not alredy in the vector part, so a rotation could be effectively described by just 3 numbers. The rotation vector &lt;span class=&#34;math inline&#34;&gt;\(\vec{r}\)&lt;/span&gt;, which correspond to a rotation of an angle &lt;span class=&#34;math inline&#34;&gt;\(\theta\)&lt;/span&gt; about an axis &lt;span class=&#34;math inline&#34;&gt;\(\vec{n}\)&lt;/span&gt; is defined as
&lt;span class=&#34;math display&#34;&gt;\[
\vec{r} = \tan \left( \frac{\theta}{2} \right) \vec{n}
\]&lt;/span&gt;
which can be defined also with respect to the equivalent quaternion &lt;span class=&#34;math inline&#34;&gt;\(\textbf{q}\)&lt;/span&gt;
&lt;span class=&#34;math display&#34;&gt;\[
\textbf{q}=\left( q_0, \vec{q} \right) = \left( \cos \left(\frac{\theta}{2}\right), \sin \left(\frac{\theta}{2}\right)\vec{n} \right)
\]&lt;/span&gt;
as
&lt;span class=&#34;math display&#34;&gt;\[ \vec{r} = \frac{\vec{q}}{q_0} \]&lt;/span&gt;.&lt;/p&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;div id=&#34;donders-law-and-listings-law&#34; class=&#34;section level1&#34;&gt;
&lt;h1&gt;Donder’s law and Listing’s law&lt;/h1&gt;
&lt;p&gt;Donder’s law (1848) states that the eye use only two degrees of freedom while fixating, although mechanically it has three. In othere words this means that the torsion component of the eye movement is not arbitrary but it is uniquely determined by the direction of the visual axis and is independent of the previous eye movements. From the material review above it should be clear how any 3D eye orientation can be fully described as a rotation abour a given axis from a primary reference position. This allows also to formulate Donder’s law more specifically,according to what is known as Listing’s law &lt;span class=&#34;citation&#34;&gt;(Helmholtz et al. 1910,&lt;span class=&#34;citation&#34;&gt;@Haustein1989&lt;/span&gt;)&lt;/span&gt; “&lt;em&gt;There exists a certain eye position from which the eye may reach any other position of fixation by a rotation around an axis perpendicular to the visual axis. This particular position is called primary position&lt;/em&gt;”. This means that &lt;em&gt;all possible eye positions&lt;/em&gt; can be reached from the primary position by a single rotation about an axis perpendicular to the visual axis. Since they are all perpendicular to the visual axis, all rotation axis that satisfy Listing’s law are on the same plane (&lt;em&gt;Listing’s plane&lt;/em&gt;). The law can be tested with eyetracking equipments that allows measuring also the torsional components (such as scleral coils): results have shown that the standard deviation from Listing’s plane of empirically measured rotation vectors is only about 0.5-1 deg &lt;span class=&#34;citation&#34;&gt;(Haslwanter 1995)&lt;/span&gt;. Formally it can be written that for any orientation of the visual axis, defined by the rotation vector &lt;span class=&#34;math inline&#34;&gt;\(\vec{a}\)&lt;/span&gt; and measured from the primary position &lt;span class=&#34;math inline&#34;&gt;\(\vec{h_1}=(1,0,0)\)&lt;/span&gt;,
&lt;span class=&#34;math display&#34;&gt;\[
\vec{h_1} \cdot \vec{a} = 0
\]&lt;/span&gt;
This indicates simply that the rotation about the visual axis is 0, and that as a consequence all the rotation axes lies in a frontal plane.&lt;/p&gt;
&lt;p&gt;Going back to the beginning, knowing the coordinates os Listing’s plane one can compute the rotation vector that correspond to the current eye position from the recording of the 2D gaze location on a screen. In the simplest case, we assume that the primary position corresponds to when the observer fixates the center of the screen, &lt;span class=&#34;math inline&#34;&gt;\((0,0)\)&lt;/span&gt;. What is the rotation vector that describes the 3D eye orientation when the observer fixates the location &lt;span class=&#34;math inline&#34;&gt;\((s_x, s_y)\)&lt;/span&gt; ? Let’s say the position on screen is defined in cm, and we know that the distance of the eye from the screen is &lt;span class=&#34;math inline&#34;&gt;\(L\)&lt;/span&gt; cm. The rotation angle can be computed as &lt;span class=&#34;math inline&#34;&gt;\(\theta = \rm{atan} \frac{\sqrt{s_x^2+s_y^2}}{L}\)&lt;/span&gt;, while the angle that defines the orientation of the rotation axis within Listing’s plane is &lt;span class=&#34;math inline&#34;&gt;\(\alpha = \rm{atan2}(s_y,s_x)\)&lt;/span&gt;. The complete rotation vector is then
&lt;span class=&#34;math display&#34;&gt;\[
\vec{r} = \tan \left( \frac{\theta}{2}\right) \cdot \left( {\begin{array}{*{20}{c}}
0\\
{\cos \alpha }\\
{ - \sin \alpha }
\end{array}} \right)
\]&lt;/span&gt;
This vector describe aparticular eye position as a rotation from the reference position, and does not have a torsional component (that is a component along &lt;span class=&#34;math inline&#34;&gt;\(\vec{h_1}\)&lt;/span&gt;). Indeed, Listing’s law implies that all possible eye positions can be reached from the primary reference position without a torsional component. However, vectors describing rotations from and to positions different than the primary one do &lt;em&gt;not&lt;/em&gt;, in general, lie in Listing’s plane. For Listing law to hold such vectors must lie in a plane whose orientation depends on the current eye position, and more specifically is such that the vector perpendicular to the plane is exactly halfway between the current and the primary eye position &lt;span class=&#34;citation&#34;&gt;(Tweed and Vilis 1990)&lt;/span&gt;.&lt;/p&gt;
&lt;hr /&gt;
&lt;/div&gt;
&lt;div id=&#34;references&#34; class=&#34;section level1 unnumbered&#34;&gt;
&lt;h1&gt;References&lt;/h1&gt;
&lt;div id=&#34;refs&#34; class=&#34;references&#34;&gt;
&lt;div id=&#34;ref-Haslwanter1995&#34;&gt;
&lt;p&gt;Haslwanter, Thomas. 1995. “Mathematics of three-dimensional eye rotations.” &lt;em&gt;Vision Research&lt;/em&gt; 35 (12): 1727–39. &lt;a href=&#34;https://doi.org/10.1016/0042-6989(94)00257-M&#34;&gt;https://doi.org/10.1016/0042-6989(94)00257-M&lt;/a&gt;.&lt;/p&gt;
&lt;/div&gt;
&lt;div id=&#34;ref-Haustein1989&#34;&gt;
&lt;p&gt;Haustein, Werner. 1989. “Considerations on Listing’s Law and the primary position by means of a matrix description of eye position control.” &lt;em&gt;Biological Cybernetics&lt;/em&gt; 60 (6): 411–20. &lt;a href=&#34;https://doi.org/10.1007/BF00204696&#34;&gt;https://doi.org/10.1007/BF00204696&lt;/a&gt;.&lt;/p&gt;
&lt;/div&gt;
&lt;div id=&#34;ref-Helmholtz1910&#34;&gt;
&lt;p&gt;Helmholtz, Hermann von, Hermann von Helmholtz, Hermann von Helmholtz, and Hermann von Helmholtz. 1910. &lt;em&gt;Handbuch der Physiologischen Optik&lt;/em&gt;. Hamburg: Voss.&lt;/p&gt;
&lt;/div&gt;
&lt;div id=&#34;ref-Tweed1990&#34;&gt;
&lt;p&gt;Tweed, Douglas, and Tutis Vilis. 1990. “Geometric relations of eye position and velocity vectors during saccades.” &lt;em&gt;Vision Research&lt;/em&gt; 30 (1): 111–27. &lt;a href=&#34;https://doi.org/10.1016/0042-6989(90)90131-4&#34;&gt;https://doi.org/10.1016/0042-6989(90)90131-4&lt;/a&gt;.&lt;/p&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;div class=&#34;footnotes&#34;&gt;
&lt;hr /&gt;
&lt;ol&gt;
&lt;li id=&#34;fn1&#34;&gt;&lt;p&gt;&lt;a href=&#34;https://en.wikipedia.org/wiki/Euler%27s_rotation_theorem&#34;&gt;Euler’s theorem&lt;/a&gt; guarantee that a rigid body can always move from one orientation to any different one through a single rotation about a fixed axis.&lt;a href=&#34;#fnref1&#34; class=&#34;footnote-back&#34;&gt;↩&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li id=&#34;fn2&#34;&gt;&lt;p&gt;Saying that &lt;span class=&#34;math inline&#34;&gt;\(\vec(u)\)&lt;/span&gt; is a unit vector indicates that it has length 1, i.e. &lt;span class=&#34;math inline&#34;&gt;\(\left| \vec{u} \right| = \sqrt{u_1^2 + u_2^2 + u_3^2} = 1\)&lt;/span&gt;&lt;a href=&#34;#fnref2&#34; class=&#34;footnote-back&#34;&gt;↩&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;
&lt;/div&gt;
</description>
    </item>
    
  </channel>
</rss>
