Name		Name	Last commit message	Last commit date
parent directory ..
leafcutter		leafcutter
src		src
torus		torus
Dockerfile		Dockerfile
README.md		README.md
aFC.wdl		aFC.wdl
ase_aggregate_by_individual.wdl		ase_aggregate_by_individual.wdl
ase_gatk_readcounter.wdl		ase_gatk_readcounter.wdl
eqtl_peer_factors.wdl		eqtl_peer_factors.wdl
eqtl_prepare_expression.wdl		eqtl_prepare_expression.wdl
fastqtl.wdl		fastqtl.wdl
metasoft.wdl		metasoft.wdl
tensorqtl_cis_independent.wdl		tensorqtl_cis_independent.wdl
tensorqtl_cis_nominal.wdl		tensorqtl_cis_nominal.wdl
tensorqtl_cis_permutations.wdl		tensorqtl_cis_permutations.wdl

README.md

eQTL discovery pipeline for the GTEx Consortium

This repository contains all components of the eQTL discovery pipeline used by the GTEx Consortium, including data normalization, QTL mapping, and annotation steps. This document describes the pipeline used for the V7 and V8 data releases; for settings specific to the V6p analyses presented in [GTEx Consortium, 2017], please see the last section.

Docker image

The GTEx eQTL pipeline components are provided in a Docker image, available at https://hub.docker.com/r/broadinstitute/gtex_eqtl/

To download the image, run:

docker pull broadinstitute/gtex_eqtl:V8

Image contents and pipeline components

The following tools are included in the Docker image:

FastQTL: QTL mapping software (Ongen et al., Bioinformatics, 2016)
R 3.2
Python 3.5

Prerequisites

The following input files are needed:

VCF file with genotype information. Must be bgzip compressed and indexed with tabix.
Expression tables in GCT format. Two tables are needed: read counts and normalized (FPKM or TPM).
Gene annotation in GTF format.

Running the pipeline

Additional documentation and details about parameter choices are provided on the GTEx Portal.

This pipeline requires gene-level expression data. A collapsed reference GTF can be generated for this purpose using the collapse_annotation.py script available in the gene model directory. In the code below, it is assumed that ${annotation_gtf} was generated using this script.

1) Generate normalized expression in BED format

The expression data are normalized as follows:

Read counts are normalized between samples using TMM (Robinson & Oshlack, Genome Biology, 2010)
Genes are selected based on the following expression thresholds:
- ≥0.1 TPM in ≥20% samples AND
- ≥6 reads (unnormalized) in ≥20% samples
Each gene is inverse normal transformed across samples.

eqtl_prepare_expression.py ${tpm_gct} ${counts_gct} ${annotation_gtf} \
    ${sample_participant_lookup} ${vcf_chr_list} ${prefix} \
    --tpm_threshold 0.1 \
    --count_threshold 6 \
    --sample_frac_threshold 0.2 \
    --normalization_method tmm

The file ${vcf_chr_list} lists the chromosomes in the VCF, and can be generated using

tabix --list-chroms ${vcf} > ${vcf_chr_list}

The file ${sample_participant_lookup} must contain two columns, sample_id and participant_id, mapping IDs in the expression files to IDs in the VCF (these can be the same).

This step generates the following BED file and index:

${prefix}.expression.bed.gz
${prefix}.expression.bed.gz.tbi

2) Calculate PEER factors

Rscript run_PEER.R ${prefix}.expression.bed.gz ${prefix} ${num_peer}

The number of PEER factors was selected as function of sample size (N):

15 factors for N < 150
30 factors for 150 ≤ N < 250
45 factors for 250 ≤ N < 350
60 factors for N ≥ 350

For information on how these thresholds were determined, please see the Supplementary Information of [GTEx Consortium, 2017].

This step will generate 3 files:

${prefix}.PEER_residuals.txt
${prefix}.PEER_alpha.txt
${prefix}.PEER_covariates.txt

3) Combine covariates

This step generates a combined covariates file, containing genotype PCs, PEER factors, and additional explicit covariates (e.g., genotyping platform).

combine_covariates.py ${prefix}.PEER_covariates.txt ${prefix} \
    --genotype_pcs ${genotype_pcs} \
    --add_covariates ${add_covariates}

The covariate files should have one covariate per row, with an identifier in the first column, and a header line with sample identifiers. This step will generate the file ${prefix}.combined_covariates.txt

4) Run FastQTL

A wrapper script for multithreaded execution is provided in the docker image (/opt/fastqtl/python/run_FastQTL_threaded.py) and at https://github.com/francois-a/fastqtl

# nominal pass
run_FastQTL_threaded.py ${vcf} ${prefix}.expression.bed.gz ${prefix} \
    --covariates ${prefix}.combined_covariates.txt \
    --window 1e6 --chunks 100 --threads 16

# permutation pass
run_FastQTL_threaded.py ${vcf} ${prefix}.expression.bed.gz ${prefix} \
    --covariates ${prefix}.combined_covariates.txt \
    --window 1e6 --chunks 100 --threads 16 \
    --permute 1000 10000

The following files will be generated:

${prefix}.allpairs.txt.gz
${prefix}.egenes.txt.gz

Using docker

The steps described above can be run using docker. This assumes that the $path_to_data directory contains all required input files.

# Docker command for step 1:
docker run --rm -v $path_to_data:/data -t broadinstitute/gtex_eqtl:V8 /bin/bash \
    -c "/src/eqtl_prepare_expression.py /data/${tpm_gct} /data/${counts_gct} \
        /data/${annotation_gtf} /data/${sample_participant_lookup} /data/${vcf_chr_list} ${prefix} \
        --tpm_threshold 0.1 --count_threshold 6 --sample_frac_threshold 0.2 --normalization_method tmm"

V6p pipeline settings

Expression normalization

The expression data were normalized as follows:

Genes were selected based on the following exression thresholds:
- >0.1 RPKM in ≥10 samples AND
- ≥6 reads (unnormalized) in ≥10 samples
RPKMs were normalized between samples using quantile normalization
Each gene was inverse normal transformed across samples.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

qtl

qtl

README.md

eQTL discovery pipeline for the GTEx Consortium

Docker image

Image contents and pipeline components

Prerequisites

Running the pipeline

1) Generate normalized expression in BED format

2) Calculate PEER factors

3) Combine covariates

4) Run FastQTL

Using docker

V6p pipeline settings

Expression normalization

Files

qtl

Directory actions

More options

Directory actions

More options

Latest commit

History

qtl

Folders and files

parent directory

README.md

eQTL discovery pipeline for the GTEx Consortium

Docker image

Image contents and pipeline components

Prerequisites

Running the pipeline

1) Generate normalized expression in BED format

2) Calculate PEER factors

3) Combine covariates

4) Run FastQTL

Using docker

V6p pipeline settings

Expression normalization