If the species you need is not in our species list, please contact us !
Powered by GTXLab of Genetalks.
中文说明.
- What is GTX.Zip?
- Product Series
- Supported Bioinformatic Analysis Softwares
- Feature
- Environment Requirements
- How to Install?
- Let's Do It!
- Usage
- Rbin Files Downloads
- GTZ Ecology Softwares
- Change Log
- FAQ
- Contact Us
- License
GTX.Zip(or GTZ for short) is a professional fastq/bam compressor and also can be used as a universal data compression software, developed by GTXLab of Genetalks Inc. GTX.Zip can rapidly compress any DNA sequencing files and directories with very high compression rate, and generate a single compressed data files, thus facilitating the data storage, distribution and transmission. Different from other compression tools, GTX.Zip system focuses on high compression rate, high speed, and convenient data extraction.
- GTX.Zip Professional is a stand-alone version which supports local compression service. GTX.Zip Professional runs by command lines for compression and decompression of local genomic data.
Product | Version | Description | How to Get |
---|---|---|---|
GTX.Zip Professional | V1.0.1 | Companies, Institutions and individual users that with large local sequencing data | Install |
GTX.Zip Enterprise | V1.0.1 | Large-scale enterprises and data centers that with PB-level sequencing data and require distributed compression by their own computing clusters | Contact Us |
GTX.Zip Cloud | V1.0.1 | Companies that with large amounts of sequencing data distribution and storage in the cloud | http://gtz.io |
- BWA 0.7 for GTX.Zip is the the most widely used software package for mapping DNA sequences that can input XXX.gtz file directly. It consists of two softwares : bwa 0.7 and bwa-opt 0.7.
- bwa-opt 0.7 is the optimized version that is about 30% faster than standard bwa, and its mapping results are completely consistent with those of standard bwa.
- BOWTIE / BOWTIE2 for GTX.Zip is an ultrafast and memory-efficient tool for aligning sequencing reads to long reference sequences. It can input XXX.gtz file directly, and You can use this tool as if you are using the official version.
- BOWTIE for GTX.Zip based on BOWTIE 1.2.2 version.
- BOWTIE2 for GTX.Zip based on BOWTIE2 2.3.4.3 version.
GTX.Zip compressor system features:
- High Compression Ratio: The system implements Context Model compression technology, with a variety of optimized predicting model, and balancing the system concurrent and memory resources consumption, thus achieving a extreme high compression rate. For FASTQ files, GTX.Zip is capable to compress the original fastq file to 2.53%. The compression rate of GTX.Zip is about 3-6 times of gzip compressor which could save up to 80% storage space and transfer costs.
Data List | Compression rate of GTX.Zip | Compression rate of Fastq.gz |
---|---|---|
Nova_wes_1.fq | 2.53% | 17.15% |
Nova_wes_2.fq | 3.45% | 18.34% |
nova_wgs_1.fq | 3.18% | 17.55% |
nova_wgs_2.fq | 3.93% | 18.66% |
nova_rna_1.fq | 4.56% | 17.70% |
nova_rna_2.fq | 5.39% | 18.94% |
-
High Performance: GTX.Zip fully exploits the concurrency of the CPU, the new Haswell CPU architecture, and the computing power of the new instructions such as AVX2, BMI2, which makes GTX.Zip gain high compression speed even on a ordinary computing server, with the throughput of 1100MB/s for a single compression node. GTX.Zip Enterprise supports large-scale distributed compression.
-
Safety Guarantee: Thanks to its high speed, during the process of GTX.Zip compression, the data decompression and restore test is performed. The compression process will be done only after the data has been confirmed exactly the same as the source data. MD5 validation is performed to ensure data integrity as well.
-
Software Ecology: GTX.Zip provides command line and GUI decompression software for Linux, Mac OSX and Windows. It also provides SDK interfaces in languages such as Python, C, C++, etc. which is convenient for third-party developers to read and write gtz file (GTX.Zip compression format) directly. For example, gtz version of bcl2fastq, fastp and BWA are supported by community now.
If you want to get these softwares, please go to -GTZ Ecology Softwares-. -
Nirvana Plan:
As an enterprise-level software, GTX.Zip has developed a nirvana program for high-availability requirements to ensure that users can decompress compressed data into original data under the extreme condition. The nirvana plan's dual availability protection strategy is as follows:- GTX.Zip is multi-site hosted. http://gtz.io website, GitHub and other sites will permanently host all versions of GTX.Zip, to make sure that it is available to the entire network all the time and free of charge at any time.
- To ensure that compressed data can be restored to original file under any conditions, pre-embedded micro decompression programs could be extract from compressed data first, and then be used to decompress the file.
- Please click -here- for usage.
- 64-bit Linux system (CentOS >= 6.5;Ubuntu >= 12.04, < 18.04)
- To achieve good performance, the computing server with 32-core 64GB memory is recommended (at least 4-core and 8GB memory), or that has the same configuration with the AWS C4.8xlarge machine)
- Way one
Run command (recommended)
curl -SL https://gtz.io/gtz_latest.run -o /tmp/gtz.run && sh /tmp/gtz.run
after the first installation, you need to perform a source ~/.bashrc or exit to log back in, and then you can execute gtz and gtz_index in any directory
download the installation file:-GTX.Zip Professional-,then install
sh gtz_latest.run
similarly, after the first installation, you need to perform a source ~/.bashrc or exit and log back in again
- Way two
Run command (recommended)
sudo curl -SL https://gtz.io/gtz_latest.run -o /tmp/gtz.run && sudo sh /tmp/gtz.run
download the installation file:-GTX.Zip Professional-,then install
sudo sh gtz_latest.run
after the installation is complete, you can perform gtz and gtz_index in any directory
-
Verify installation
rungtz -v
If software version information appears, the installation is successful.
GTX.Zip Professional needs to be installed on the current machine. If not, please see -How to Install- .
-
Make bin file to enable high rate compression Take the human sample species as an example, make the index file (bin file) required for GTX.Zip high rate compression
-
Download the 1th rbin file ("1" is the number of the human rbin file in the gtx_index list) and gtz_index will save it to the default path (~/.config/gtz):
gtz_index download 1
or
You can download rbin file from here ( Homo_sapiens rbin file )
-
Make the bin file ( may need 100GB free disk space, and >28GB memory, and 10 mins)
gtz_index makeindex ~/.config/gtz/Homo_sapiens_bcacac9064331276504f27c6cf40e580.rbin
* bin file:The index file used for hight compression.The default file path is:"~/.config/gtz/"
* rbin file:The compact index file used for decompression.The default file path is:"~/.config/gtz/"
-
-
Compress sample fastq file
gtz sample.fq -o sample.fq.gtz --bin-file ~/.config/gtz/Homo_sapiens_bcacac9064331276504f27c6cf40e580.bin
sample.fq can be downloaded from https://gtz.io/sample.fq. (2GB fastq file, extracted from a real WES data produced by Novaseq)
*gtz can also directly compress fastq.gz file.
1:Compress the sample.fq to the current directory.
gtz sample.fq
2:Compress the file sample.fq into the out folder of the current directory.
gtz sample.fq -o ./out/sample.fq.gtz
***If the species is not specified by the '--bin-file' , GTZ will automatically recognize the species to compress. ***
3.GTZ performs high compression by specifying bin files in the Homo folder under the current directory.
gtz sample.fq --bin-file ./Homo/Homo_sapiens_bcacac9064331276504f27c6cf40e580.bin
1.Deompress the file sample.fq to the current directory.If there is no species rbin file under "\~/.config/ gtz/", GTZ will be automatically downloaded from the Cloud to "\~/.config/ gtz /".
gtz -d sample.fq.gtz
2.Specify the directory of the rbin path “~/Homo” for decompress sample.fq.gtz.
gtz -d sample.fq.gtz --rbin-path ~/Homo
3.Decompress sample.fq.gtz to the Homo folder in the current path
gtz -d sample.fq.gtz --out-dir ./Homo
usage: gtz [-h] [-o OUT] [-b INDEX_BIN] [-d DECOMPRESS] [-O OUT_DIR]
-h, --help show this help message and exit
-o OUT, --out OUT specify the GTZ file name after compression
-b BIN_FILE, --bin-file BIN_FILE specify the bin file name for high compression
-s, --suggest turn on automatic species identification for
compression, it's invalid when -b is specified
-B BIN_PATH, --bin-path BIN_PATH specify the directory in which the bin file resides
when automatic species recognition is turned on.
-n, --no-verify do not verify after compression is completed.
-d DECOMPRESS, --decompress DECOMPRESS decompress
-O OUT_DIR, --out-dir OUT_DIR specify the save path of the extracted file
-c, --stdout decompress to terminal
-z, --fastq-to-fastq-gz decompress fastq to fastq.gz, it's valid only for FASTQ
-r RBIN_PATH, --rbin-path RBIN_PATH specify the path where the rbin file resides
-p PARALLEL_NUM,--parallel specify parallel number for compression or decompression,
default equal CPU logical cores
-f, --force force overwrite of output file
-e, --no-keep don't keep input files
-v, --version display version number
Interaction mode:
gtz_index
Show supported species and you can gradually create bin files through human-machine interaction mode.
Manual mode
1:Show supported species list,the index number is the input of the gtz_index download command.
gtz_index list
2:Download the rbin file in the species list with No.1 index
gtz_index download 1
3:Make BIN,rec file by specifying the rbin file". /Homo/Homo_sapiens_bcacac9064331276504f27c6cf40e580.rbin "
gtz_index makeindex ./Homo/Homo_sapiens_bcacac9064331276504f27c6cf40e580.rbin
gtz_index <command> [options]
list show species which current support
download <index> <path_to> download species reference sequence rbin file , path_to is not necessary.
makeindex <rbin_path> making reference sequence
Let’s start Nirvana plan! At first, we have a gtz file named sample.fq.gtz.
Step 1:
Run the following command to extract the embeded programe gtz_reborn to current directory:
sed -e 's/\[GTZ_REBORN_BEGIN\]/\n&/;' sample.fq.gtz | sed -n '/\[GTZ_REBORN_BEGIN\]/,/\[GTZ_REBORN_END\]/p' | sed -e 's/.*\[GTZ_REBORN_BEGIN\]//g' -e 's/\[GTZ_REBORN_END\].*//g' | tar -zxvf -
Step2:
If sample.fq.gtz is a high compression file, download the corresponding fasta file according to the prompt, and then extract the file.
If sample.fq.gtz is not a high compression file, the FASTQ file can be extracted directly
./gtz_reborn -d sample.fq.gtz
If the species you need is not in our species list , please contact us !
- 1、BWA for GTZ
- 2、BCL2FASTQ for GTZ
- 3、STAR for GTZ
- 4、BOWTIE for GTZ
- 5、BOWTIE2 for GTZ
- 6、TOPHAT for GTZ
- 7、HISAT2 for GTZ
- 8、MEGAHIT for GTZ
- 9、FASTQC for GTZ
- 10、FASTP for GTZ
- 11、MINIMAP2 for GTZ
-
How to Install?
sudo curl -SL https://gtz.io/bwagtz_latest.run -o /tmp/bwagtz.run && sudo sh /tmp/bwagtz.run
download installation files:-GTX.Zip bwa-gtz-
Run commands in the installation file directorysudo sh bwagtz_lastest.run
complete installation according to prompt.
-
How to Use?
GTX.Zip's support package for BWA includes bwa-gtz and bwa-opt-gtz, both of which are based on version 0.7.17 of bwa. Among them: the two versions have added the ability to read GTZ files directly, and the functions are completely consistent with the main code functions of bwa. bwa-opt-gtz also optimizes the structure of BWA lookup table, which can save more than one third of the time without changing the results of comparison. Due to some changes in the data structure of the lookup table, bwa-opt-gtz is incompatible with the index file data generated by the original bwa. According to the standard steps of bwa, first regenerate the index file, and then compare it with bwa-opt-gtz.
The difference between bwa-gtz and bwa-opt-gtz is as follows:
(1) bwa-gtz can directly use index produced by official website BWA, and its performance is consistent with official website BWA.
(2) bwa-opt-gtz can not directly use the index produced by BWA on official website. index needs to be reproduced by bwa-opt-gtz, but its performance will be improved by 1/3 than that of BWA on official website.
export GTZ_RBIN_PATH=/path/rbin/
bwa-gtz mem ref.fa read1.fq.gtz read2.fq.gtz -o aln-pe.sam
* In this example, the path of the RBIN file is specified by the environment variable GTZ_RBIN_PATH, where "export GTZ_RBIN_PATH=/path/rbin/" is not necessary, but if you know the path of rbin, you are advised to specify it, which can speed up the processing of bwa-gtz. Because when bwa-gtz needs RBIN file and cannot find the RBIN file under the default path ~/.config/gtz, it will be downloaded through the network, and the download process will consume time.
bwa-opt-gtz index ref.fa
export GTZ_RBIN_PATH=/path/rbin/
bwa-opt-gtz mem ref.fa read1.fq.gtz read2.fq.gtz -t 4 -o aln-pe.sam
-
performance
In the case of sufficient server resources, the performance of bwa-opt-gtz is 1/3 better than that of official bwa. The following is a set of test data in the same environment (the number of specified threads is 4):
bwa mem ref.fa read1.fq.gz read2.fq.gz -t 4 -o aln-pe.sam
bwa-gtz mem ref.fa read1.fq.gtz read2.fq.gtz -t 4 -o aln-pe.sam
bwa-opt-gtz mem ref.fa read1.fq.gtz read2.fq.gtz -t 4 -o aln-pe.sam
Server configuration: 16 core CPU, 64G memory; file size: read1.fq.gz(1.8G), read2.fq.gz(1.8G), read1.fq.gtz(0.3G), read2.fq.gtz(0.3G)
Software bwa bwa-gtz bwa-opt-gtz Time consumption 50m14.06s 51m37.67s 39m18.86s Memory consumption 5.888G 10.56G 19.84G
-
How to Install?
For installation you can (recommended)
curl -SL https://gtz.io/bcl2fastq_gtz_latest.run -o /tmp/bcl2fastqgtz.run && sh /tmp/bcl2fastqgtz.run
After the first installation, you need to perform a source ~/.bashrc or exit to log back in, and then you can execute bcl2fastq-gtz in any directory
download installation files:-GTX.Zip bcl2fastq-gtz-,then install
sh bcl2fastq_gtz_latest.run
Similarly, after the first installation, you need to perform a source ~/.bashrc or exit and log back in again
For installation you can (recommended)
sudo curl -SL https://gtz.io/bcl2fastq_gtz_latest.run -o /tmp/bcl2fastqgtz.run && sudo sh /tmp/bcl2fastqgtz.run
download installation files:-GTX.Zip bcl2fastq-gtz-,then install
sudo sh bcl2fastq_gtz_latest.run
After the installation is complete, you can perform bcl2fastq-gtz in any directory
-
How to Use?
Gtx. Zip's support package for bcl2fastq, based on the v2.20.0.422 version of bcl2fastq.
Default output gtz format, output gz format when command use --no-bgtzf-compression parameter
bcl2fastq-gtz -i ./data/BaseCalls -R ./outdir/run --interop-dir ./outdir/interop -o ./outdir/result --ignore-missing-bcls --ignore-missing-filter --ignore-missing-positions --barcode-mismatches 0 --use-bases-mask y*,i7,i7,y* >./outdir/bcl2fastq.log 2>&1 || touch bcl2fastq.err
bcl2fastq-gtz -i ./data/BaseCalls -R ./outdir/run --interop-dir ./outdir/interop -o ./outdir/result --ignore-missing-bcls --ignore-missing-filter --ignore-missing-positions --barcode-mismatches 0 --use-bases-mask y*,i7,i7,y* --bin_file Homo_sapiens_bcacac9064331276504f27c6cf40e580.bin >./outdir/bcl2fastq.log 2>&1 || touch bcl2fastq.err
bcl2fastq-gtz -i ./data/BaseCalls -R ./outdir/run --interop-dir ./outdir/interop -o ./outdir/result --ignore-missing-bcls --ignore-missing-filter --ignore-missing-positions --barcode-mismatches 0 --use-bases-mask y*,i7,i7,y* --no-bgtzf-compression >./outdir/bcl2fastq.log 2>&1 || touch bcl2fastq.err
-
performance
bcl2fastq -i ./data/BaseCalls -R ./outdir/run --interop-dir ./outdir/interop -o ./outdir/result --ignore-missing-bcls --ignore-missing-filter --ignore-missing-positions --barcode-mismatches 0 --use-bases-mask y*,i7,i7,y* >./outdir/bcl2fastq.log 2>&1 || touch bcl2fastq.err
bcl2fastq-gtz -i ./data/BaseCalls -R ./outdir/run --interop-dir ./outdir/interop -o ./outdir/result --ignore-missing-bcls --ignore-missing-filter --ignore-missing-positions --barcode-mismatches 0 --use-bases-mask y*,i7,i7,y* >./outdir/bcl2fastq.log 2>&1 || touch bcl2fastq.err
Server configuration: 16 core CPU, 64G memory; file size: ./data/BaseCalls total size 40G
bcl2fastq output destination folder total size 40G,bcl2fastq-gtz output destination folder total size 16G
The official website STAR directly supports the GTZ format, after the installation of GTZ and STAR,
-
First step
make the index file with STAR
STAR --runMode genomeGenerate --genomeDir /path/to/genomeDir --genomeFastaFiles /path/xxx.fasta
Detailed reference to the official website documents:
-
Second Step
perform the mapping operation. an example as follows:
export GTZ_RBIN_PATH=/path/rbin/
STAR --genomeDir /path/to/genomeDir --readFilesIn read1.fq.gtz read2.fq.gtz --readFilesCommand gtz -c -d
* In this example, the path of the RBIN file is specified by the environment variable GTZ_RBIN_PATH, where "export GTZ_RBIN_PATH=/path/rbin/" is not necessary, but if you know the path of rbin, you are advised to specify it, which can speed up GTZ processing. Because when GTZ needs RBIN file and cannot find the RBIN file under the default path ~/.config/gtz, it will be downloaded through the network, and the download process will consume time.
STAR --genomeDir /path/to/genomeDir --readFilesIn read1.fq.gtz read2.fq.gtz --readFilesCommand gtz -r /path/to/gtz_rbin_dir/ -c -d
* In this example, the directory where the RBIN file is located is specified by -r, it's same as method one.
-
How to Install?
sudo curl -SL https://gtz.io/bowtiegtz_latest.run -o /tmp/bowtiegtz.run && sudo sh /tmp/bowtiegtz.run
download installation files:-GTX.Zip bowtie-gtz-
Run commands in the installation file directorysudo sh bowtiegtz_latest.run
complete installation according to prompt. -
How to Use?
After installation, three executable programs of bowtie-gtz, bowtie-build-gtz and bowtie-inspect-gtz will be generated. If y is selected when installing "create a soft link to /usr/bin", the above executable program can be run directly in any directory; Otherwise, you need to switch to the installation directory and run with ./bowtie-gtz.
GTX.Zip's support package for bowtie based on 1.2.2 version. Bowtie-gtz can directly use the index produced by bowtie on the official website. You can use bowtie-build or bowtie-build-gtz to make the index. Of course, bowtie-inspect-gtz and bowtie-inspect functions are exactly the same.
bowtie-build-gtz ref.fa ref_index
export GTZ_RBIN_PATH=/path/rbin/
bowtie-gtz -S ref_index reads.fq.gtz eg.sam
* In this example, the path of the RBIN file is specified by the environment variable GTZ_RBIN_PATH, where "export GTZ_RBIN_PATH=/path/rbin/" is not necessary, but if you know the path of rbin, it is recommended that you specify it to speed up bowtie-gtz processing. Because when bowtie-gtz needs RBIN file and cannot find the RBIN file under the default path ~/.config/gtz, it will be downloaded through the network, which will take time to download.。
-
performance
-
How to Install?
sudo curl -SL https://gtz.io/bowtie2gtz_latest.run -o /tmp/bowtie2gtz.run && sudo sh /tmp/bowtie2gtz.run
download installation files:-GTX.Zip bowtie2-gtz-
Run commands in the installation file directorysudo sh bowtie2gtz_latest.run
complete installation according to prompt. -
How to Use?
After installation, three executable programs of bowtie2-gtz, bowtie2-build-gtz and bowtie2-inspect-gtz will be generated. If y is selected when installing "create a soft link to /usr/bin", the above executable program can be run directly in any directory; Otherwise, you need to switch to the installation directory and run with ./bowtie2-gtz.
GTX.Zip's support package for bowtie2 based on 2.3.4.3 version. Bowtie2-gtz can directly use the index produced by bowtie2 on the official website. You can use bowtie2-build or bowtie2-build-gtz to make the index. Of course, bowtie2-inspect-gtz and bowtie2-inspect functions are exactly the same.
bowtie2-build-gtz ref.fa ref_index
export GTZ_RBIN_PATH=/path/rbin/
bowtie2-gtz -x ref_index -1 reads_1.fq.gtz -2 reads_2.fq.gtz -S eg2.sam -p 4 --reorder
* In this example, the path of the RBIN file is specified by the environment variable GTZ_RBIN_PATH, where "export GTZ_RBIN_PATH=/path/rbin/" is not necessary, but if you know the path of rbin, you are advised to specify it, which can speed up bowtie2-gtz processing. Because, when bowtie2-gtz needs RBIN file and cannot find the RBIN file under the default path ~/.config/gtz, it will be downloaded through the network, and the download process will consume time.。
-
performance
export GTZ_RBIN_PATH=/path/rbin/
bowtie2 -x ref_index -1 reads_1.fq.gz -2 reads_2.fq.gz -S eg2.sam -p 4 --reorder
bowtie2-gtz -x ref_index -1 reads_1.fq.gtz -2 reads_2.fq.gtz -S eg2.sam -p 4 --reorder
Server configuration: 16 core CPU, 64G memory; file size: read1.fq.gz(1.55G), read2.fq.gz(1.78G), read1.fq.gtz(0.43G), read2.fq.gtz(0.61G)
Software bowtie2 bowtie2-gtz CPU consumption(average) 400 445 Memory consumption(average) 0.19G 12.92G Time consumption 63m41.06s 61m56.67s
-
How to Install?
sudo curl -SL https://gtz.io/tophatgtz_latest.run -o /tmp/tophatgtz.run && sudo sh /tmp/tophatgtz.run
download installation files:-GTX.Zip tophat-gtz-
Run commands in the installation file directorysudo sh tophatgtz_latest.run
complete installation according to prompt. -
How to Use?
After installation, you can run tophat-gtz directly without any other dependencies.(If bowtie/bowtie2 is not installed in the environment, it will be installed automatically when tophat installing.)
If y is selected when installing "create a soft link to /usr/bin", you can run tophat-gtz in any directory; Otherwise, you need to switch to the installation directory and run with ./tophat-gtz.
GTX.Zip's support package for tophat based on 2.1.2 version. Among them: the ability to read GTZ files directly is added, and all functions are completely consistent with the main code function of tophat.
bowtie2-build ref.fa ref_index
* Note: It is recommended that bowtie 2/bowtie 2-gtz be used to produce index, because for bowtie, tophat can only use bowtie 1.1.2 and previous versions.
export GTZ_RBIN_PATH=/path/rbin/
tophat-gtz -o report_dir ref_index reads_1.fq.gtz reads_2.fq.gtz
* In this example, the path of the RBIN file is specified by the environment variable GTZ_RBIN_PATH, where "export GTZ_RBIN_PATH=/path/rbin/" is not necessary, but if you know the path of rbin, you are advised to specify it, which can speed up the processing of tophat-gtz. Because when tophat-gtz needs RBIN file and cannot find the RBIN file under the default path ~/.config/gtz, it will be downloaded through the network, which will consume time.
-
performance
export GTZ_RBIN_PATH=/path/rbin/
tophat -o report_dir -p 4 ref_index reads_1.fq.gz reads_2.fq.gz
tophat-gtz -o report_dir -p 4 ref_index reads_1.fq.gtz reads_2.fq.gtz
Server configuration: 16 core CPU, 64G memory; file size: read1.fq.gz(1.55G), read2.fq.gz(1.78G), read1.fq.gtz(0.43G), read2.fq.gtz(0.61G)
Software tophat tophat-gtz Time consumption 133m12.61s 134m43.02s
-
How to Install?
sudo curl -SL https://gtz.io/hisat2gtz_latest.run -o /tmp/hisat2gtz.run && sudo sh /tmp/hisat2gtz.run
download installation files:-GTX.Zip hisat2-gtz-
Run commands in the installation file directorysudo sh hisat2gtz_latest.run
complete installation according to prompt. -
How to Use?
After the installation is complete, the execution program and related scripts will be generated in the installation directory,such as hisat2-gtz,hisat2-build,etc. If you select "y" in the "create a soft link to /usr/bin" installation, you can run the hisat2-gtz and hisat2-build executables directly in any directory; otherwise you need to switch to the installation directory and run it in ./hisat2-gtz mode. GTX.Zip support package for hisat2, based on hisat2 (2.1.0) version, which: Added direct reading capability for gtz files, all functions are exactly the same as hisat2 main code function.
hisat2-build -p 4 ~/GCF_000001405.37_GRCh38.p11_genomic.fna genome
export GTZ_RBIN_PATH=/path/rbin/
hisat2-gtz -x genome -1 reads_1.fq.gtz -2 reads_2.fq.gtz -S result.sam
* In this example, the path of the RBIN file is specified by the environment variable GTZ_RBIN_PATH, where "export GTZ_RBIN_PATH=/path/rbin/" is not necessary, but if you know the path of rbin, you are advised to specify it, which can speed up the processing of hisat2-gtz. Because when hisat2-gtz needs RBIN file and cannot find the RBIN file under the default path ~/.config/gtz, it will be downloaded through the network, which will consume time.
-
performance
export GTZ_RBIN_PATH=/path/rbin/
hisat2 -x genome -1 reads_1.fq.gz -2 reads_2.fq.gz -S gz.sam -p 16 --reorder
hisat2-gtz -x genome -1 reads_1.fq.gtz -2 reads_2.fq.gtz -S gtz.sam -p 16 --reorder
Server configuration: 16 core CPU, 64G memory; file size: read1.fq.gz(7.3G), read2.fq.gz(7.3G), read1.fq.gtz(1.6G), read2.fq.gtz(1.8G)
Software hisat2 hisat2-gtz Time consumption 8m25.845s 10m47.930s
-
How to Install?
`sudo curl -SL https://gtz.io/megahitgtz_latest.run -o /tmp/megahitgtz.run && sudo sh /tmp/megahitgtz.run`
download installation files:-GTX.Zip megahit-gtz-
Run commands in the installation file directorysudo sh megahitgtz_latest.run
complete installation according to prompt. -
How to Use?
After installation, megahit-gtz (and megahit_asm_core, megahit_sdbg_build, megahit_toolkit) will be generated.
If y is selected when installing "create a soft link to /usr/bin", you can run megahit-gtz in any directory; Otherwise, you need to switch to the installation directory and run with ./megahit-gtz.
GTX.Zip's support package for megahit based on 1.1.3 version. Among them: the ability to read GTZ files directly is added, and all functions are completely consistent with the main code function of megahit.
megahit-gtz -1 pe1.fq -2 pe2.fq -o out
export GTZ_RBIN_PATH=/path/rbin/
megahit-gtz -1 pe1.fq.gtz -2 pe2.fq.gtz -o out
* In this example, the path of the RBIN file is specified by the environment variable GTZ_RBIN_PATH, where "export GTZ_RBIN_PATH=/path/rbin/" is not necessary, but if you know the path of rbin, you are advised to specify it, which can speed up the processing of megahit-gtz. Because when megahit-gtz needs RBIN file and cannot find the RBIN file under the default path ~/.config/gtz, it will be downloaded through the network, which will consume time.
-
performance
megahit -t 16 -o out -1 pe1.fq.gz -2 pe2.fq.gz
export GTZ_RBIN_PATH=/path/rbin/
megahit-gtz -t 16 -o out -1 pe1.fq.gtz -2 pe2.fq.gtz
Server configuration: 16 core CPU, 64G memory; file size: read1.fq.gz(1.55G), read2.fq.gz(1.78G), read1.fq.gtz(0.43G), read2.fq.gtz(0.61G)
Software megahit megahit-gtz Time consumption 67m38.381s 66m44.151s
-
How to Install?
sudo curl -SL https://gtz.io/fastqc_gtz_latest.run -o /tmp/fastqc2gtz.run && sudo sh /tmp/fastqc2gtz.run
download installation files:-GTX.Zip fastqc-gtz-
Run commands in the installation file directorysudo sh fastqc_gtz_latest.run
complete installation according to prompt. -
How to Use?
After the installation is complete, the execution program and related scripts will be generated in the installation directory. If you select "y" in the "create a soft link to /usr/bin" installation, you can run the fastqc-gtz executables directly in any directory; otherwise you need to switch to the installation directory and run it in ./fastqc-gtz mode. GTX.Zip support package for fastqc, based on fastqc (0.11.8) version, which: Added direct reading capability for gtz files, all functions are exactly the same as fastqc main code function.
export GTZ_RBIN_PATH=/path/rbin/
fastqc-gtz -t 1 reads_1.fq.gtz -o ~/result_directory
* In this example, the path of the RBIN file is specified by the environment variable GTZ_RBIN_PATH, where "export GTZ_RBIN_PATH=/path/rbin/" is not necessary, but if you know the path of rbin, you are advised to specify it, which can speed up the processing of fastqc-gtz. Because when fastqc-gtz needs RBIN file and cannot find the RBIN file under the default path ~/.config/gtz, it will be downloaded through the network, which will consume time.
-
How to Install?
For installation you can (recommended)
curl -SL https://gtz.io/fastpgtz_latest.run -o /tmp/fastpgtz.run && sh /tmp/fastpgtz.run
After the first installation, you need to perform a source ~/.bashrc or exit to log back in, and then you can execute fastp-gtz in any directory
download installation files:-GTX.Zip fastp-gtz-,then install
sh fastpgtz_latest.run
Similarly, after the first installation, you need to perform a source ~/.bashrc or exit and log back in again
For installation you can (recommended)
sudo curl -SL https://gtz.io/fastpgtz_latest.run -o /tmp/fastpgtz.run && sudo sh /tmp/fastpgtz.run
download installation files:-GTX.Zip fastp-gtz-,then install
sudo sh fastpgtz_latest.run
After the installation is complete, you can perform fastp-gtz in any directory
-
How to Use?
Gtx. Zip to fastp in the support package, based on FASTP version 0.19.5. Both input and output support GTZ and non-GTZ format files, and when the output file name ends with the .gtz, fastp-gtz compresses the output file GTZ
examples:
Output GTZ Format:
fastp-gtz -i in.fq -o out.fq.gtz --bin_file in.fq.species.bin
Output non-GTZ format:
fastp-gtz -i in.fq -o out.fq
For --bin_file use, refer to the following sections for instructions
examples:
export GTZ_RBIN_PATH=/path/rbin/
fastp-gtz -i in.R1.fq.gtz -I in.R2.fq.gtz -o out.R1.fq.gtz -O out.R2.fq.gtz --bin_file in.fq.species.bin
Command Description:
-
export GTZ_RBIN_PATH=/path/rbin/ The environment variable is recommended for setting, but is not required, to specify the search path for the Rbin file when reading the file as a high-magnification compressed GTZ file, with detailed readable working principles
-
--bin_file This parameter is recommended to specify, but is not required, to specify the two-ended read into the file belongs to the species corresponding to the bin file, specified when the FASTP-GTZ will be high magnification to compress the output result file, detailed readable working principle
When entered as a GTZ file, FASTP-GTZ can be briefly described as four procedures:
(A) read into In.gtz-> (B) unzip into In.fq-> (C) processing Data-> (D) compressed into IN.GTZ
Note
If IN.GTZ in Process A is a high-magnification compressed file, procedure B requires the corresponding Rbin file, and there are two ways to work: Mode one: You have the Rbin file locally and specify the path of the file with the following environment variables: export GTZ_RBIN_PATH=/path/rbin Then the program will complete step b using the local Rbin file Mode two: You do not have the Rbin file locally, or you do not specify it through an environment variable, in which case the program automatically downloads the rbin from the network, and of course the process consumes a certain amount of time
fastp-gtz Analysis Data
Mode one: Bin file not specified through--bin_file The fastp-gtz automatically recognizes in based on the bin and rec files under the ~/.config/gtz/path. Which species R1.fq.gtz and in.R2.fq.gtz each belong to, and then use the bin file of the corresponding species for compression, the automatic identification process will consume a certain amount of time. Of course, if there is no bin and rec under ~/.config/gtz/or no species information is identified, normal compression is used Mode two: The bin file is specified by--bin_file, fastp-gtz the bin file is used to do the high magnification compression directly
-
-
performance
fastp -i in.R1.fq.gz -I in.R2.fq.gz -o out.R1.fq.gz -O out.R2.fq.gz
export GTZ_RBIN_PATH=/path/rbin/
fastp-gtz -i in.R1.fq.gtz -I in.R2.fq.gtz -o out.R1.fq.gtz -O out.R2.fq.gtz --bin_file in.fq.species.bin
Server configuration: 16 core CPU, 64G memory; file size: in.R1.fq.gz(1.55G), in.R2.fq.gz(1.78G), in.R1.fq.gtz(0.43G), in.R2.fq.gtz(0.61G)
fastp total output file size of R1.fq.gz and out.R2.fq.gz is 3.3G, and the fastp-gtz total output file size of R1.fq.gtz and out.R2.fq.gtz is 1G
-
How to Install?
For installation you can (recommended)
curl -SL https://gtz.io/minimap2_gtz_latest.run -o /tmp/minimap2_gtz_latest.run && sh /tmp/minimap2_gtz_latest.run
After the first installation, you need to perform a source ~/.bashrc or exit to log back in, and then you can execute minimap2-gtz in any directory
download installation files:-GTX.Zip minimap2-gtz-,then install
sh minimap2_gtz_latest.run
Similarly, after the first installation, you need to perform a source ~/.bashrc or exit and log back in again
For installation you can (recommended)
sudo curl -SL https://gtz.io/minimap2_gtz_latest.run -o /tmp/minimap2_gtz_latest.run && sudo sh /tmp/minimap2_gtz_latest.run
download installation files:-GTX.Zip minimap2-gtz-,then install
sudo sh minimap2_gtz_latest.run
After the installation is complete, you can perform fastp-gtz in any directory
- How to Use?
After installation, minimap2-gtz will be generated.
export GTZ_RBIN_PATH=/path/rbin/
minimap2-gtz -ax asm20 ref.fa pacbio-ccs.fq.gtz > aln.sam
* In this example, the path of the RBIN file is specified by the environment variable GTZ_RBIN_PATH, where "export GTZ_RBIN_PATH=/path/rbin/" is not necessary, but if you know the path of rbin, you are advised to specify it, which can speed up the processing of minimap2-gtz. Because when minimap2-gtz needs RBIN file and cannot find the RBIN file under the default path ~/.config/gtz, it will be downloaded through the network, which will consume time.
minimap2 -t 16 -a Arab.mmi Arab_E822-R02-I_good_1.fq.gz > Arab_p.sam
export GTZ_RBIN_PATH=/path/rbin/
minimap2-gtz -t 16 -a Arab.mmi Arab_E822-R02-I_good_1.fq.gz.gtz > Arab_gtz_p.sam
Server configuration: 16 core CPU, 64G memory;
Software minimap2 minimap2-gtz Time consumption 2m57s 3m57.151s
Current Latest Version:gtz-1.2.3 [2019/01/24]
historical version: -Change Log-
Frequently Asked Questions are intended to help newcomers to understand how we work! -Click here!-
If you have any questions, feel free to contact: [email protected], or create a new GitHub issue .
See LICENSE for details.