This page is created specifically for new comers to the lab.
Please try finishing one or more of the following tasks (or based on instructions Gao gave you).
In this task you are going to install basic software and packages needed for the analysis, and work with git.
Here you need to install conda, SoS and docker for the following tasks. Please follow this setup_instruction.
All you need to do here is to clone the git repository for this assignment and work with it using bash. You should receive an email notification asking you to join the github repo. Please contact us if you don't have it.
You should learn about using git
if you haven't used it before. If you are not familiar with git
please walk through the handbook to learn basic bash/git.
Note: In order to finish this task, you need to finish task one first.
This task is about a simple computational biology research practice. This task should give you an example of how you could work with IPython notebook + JupyterLab. Regardless of your focus (on methods development or applied data analysis), it is required that all computational procedures in your daily research are well documented, organized and version controlled (using git) for review. In order to do so, you can communicate your results as well as the code that generated them in a self-contained document, i.e. the notebook.
Learn from this LMM example the suggested format to write and report computational analysis. The suggested format is as follows:
1. Title and in the same notebook cell **a brief one sentence summary** of what the notebook is about.
2. Motivation or Aims: describe the problem under investigation.
3. Methods overview: a high-level description of methods used to solve the problem.
4. Main conclusion: (not applicable to a pure workflow notebook): take home message from your investigations.
5. Data input and output (if applicable): describe data used and generated from the notebook.
6. The rest of the notebook: multiple sections of detailed steps, with interactive codes / workflows and narratives, as well as diagnostic summary statistics, plots and tables at each step.
Here in the Juyter notebook, we use SoS suite, as the workflow system (pipeline tool). SoS is super cool that it can work with multi-language for interactive analysis in one notebook! For example in this LMM example, we use Bash, R and Python all together.
What you need to do for this task is to run the minimal working example (MWE) of this LMM example. You will need to run Boltlmm and Regenie. We have provided the example of FastGWA here. You can find data here: LMM_MWE. Please only provide us your report of results. You should have manhattan plot and qq plot at least. It would be nicer if you could come up with a notebook in the format suggested as above.
If you use Debian based Linux desktop (Debian or Ubuntu) here are some recommendations on setting up your machine.
Note: Task one is not a prerequisite of this task.
Rstudio is alternative to JupyterLab for interactive anlaysis. (see Rmd below for more details). This task requires some R skills, also some minimal background of data science/machine learning/biostats.
Please download this R_assignment and Data for this assignment to finish this task and provide us your results, including the Rmd file and HTML report.
Please Google or ask us if you have any technical questions.
Please add any comments into your result/report and email the final version to Gao.
- Learn from these examples interactive data analysis using SoS Notebook that allows for multiple languages inside one notebook (you can find and run them at: http://sosworkflows.com):
- How to organize computational research projects
- This paper, this paper and this post.