How far are we from reproducible research in neuroinformatics? – A case study
Jason Webster took the audience on a journey of his attempt at reproducing results from a scientific journal paper in computational neuroscience, published in Scientific Reports: “The neural representation of personally familiar and unfamiliar faces in the distributed system for face perception.”.
The goal of the case study was to see how easy it would be to reproduce results from a published scientific paper, starting with the given data and code accompanying the paper and following the methods described in the paper.
Jason began his presentation by providing an overview of the paper, summarizing the questions of investigation in the paper and the important methods used in the analysis. The paper is focused on familiar and unfamiliar face processing in the human brain and the number of regions in the brain associated with face recognition. Through new analysis methods presented in the paper, the authors seek to update the current model of the brain regions associated with face recognition.
One of the methods discussed by Jason in his talk was Representation Similarity Analysis (RSA), which allows one to compare response patterns to stimuli across brain regions and models. A stimulus is presented to the brain or a computational model and the response pattern to the stimulus is analyzed. The response patterns of different brain regions are compared using the correlation matrix (of responses) and the dissimilarity matrix. This technique can be used to compare response patterns across brain regions as well as across subjects. In his presentation, Jason presented a dissimilarity matrix looking at the dissimilarity between responses in particular brain regions to animate (body and face) and inanimate (natural and artificial) objects. RSA allows one to visualize how different or similar the response patterns are across different brain regions and across different subjects to different stimuli. One of the key questions answered in the paper is which areas of the brain give a higher response to familiar faces. Jason also mentioned the method of Multivoxel Pattern Analysis (MVPA) which additionally takes into account the contribution of relationships between multiple voxels from a brain fMRI. This type of analysis is different than the analysis of generalized linear models (GLMs) which focus only on assessing whether the response pattern of a given voxel is higher or lower under one condition versus another. The brain’s exact positioning inside the skull is different for different individuals. Jason mentioned that the MVPA is useful for aligning brain regions from fMRI in different subjects. The next part of the presentation focused on Jason’s experience of using the given code and data to reconstruct the results in the paper. Working with the given code didn’t turn out to be so straightforward. Jason mentioned that upon running the code, he got errors right away; after some troubleshooting, he discovered that this was because the code depended on using functions that are available in a different version of Python but not available in the current version. So in a way, this turned the code that came with the paper, which was published in September 2017, into an obsolete form for use in November, 2017. Jason told us that there was no documentation that would allow the user to anticipate this before running the given code. Perhaps the most important take-home message from this experience was that researchers need to formally specify dependencies in documentation of their code if they intend to make it to be useable for third parties. Jason added that the code was nicely written but there were holes in documentation. For example, he couldn’t figure out how the authors got the Regions of Interest (ROIs). Additionally, of the files provided in the acommpanying data, there was no documentation about which file should be used at which step. The presentation then transitioned into a round-table discussion as Anisha Keshavan from the audience asked a follow-up question to the above point. She asked if there was a standard way to name files in computational projects so it is easier to notice which file does what or how each file fits into the analysis. Ariel Rokem replied, mentioning Bill Noble’s paper, “A Quick Guide to Organizing Computational Biology Projects” as a useful read to answer that question. The session ended with a discussion on what can be done to make code last. Someone suggested the use of Docker or Singularity which may allow the code to be in a useable form for a year or two. But once something changes in the installation of Docker, we are not sure what happens. So is there a permanent solution to be able to write code that would give reproducible results at any given stage of time? We concluded, currently there is none. But that is a place where neuroinformaticians could step in and work towards a creative solution for writing code and documentation in a way that allowallows independent researchers to access, understand and run others’ code to produce reproducible results