ART logo

About ART

ART is a set of efficient simulators for generating synthetic sequence read data of next-generation sequencers including Illumina, 454 and SOLiD machines. The tool is useful for assessing results and performance of computational methods and software tools for ChIP-seq data analysis, polymorphism discovery and genome assembly. ART was used as a primary tool for the simulation study of the 1000 Genomes Project.

Features

Availability

ART is freely available to the public.

Please fill and submit the form below for download
Email:
Name:
Institution:
*Operating System:
*File Type:
bzip2 gzip

Usage

=============================================================================
ART FOR 454 READ SIMULATION 
=============================================================================
USAGE:
	ART_454 INPUT_SEQ_FILE OUTPUT_FILE_PREFIX FOLD_COVERAGE
	ART_454 INPUT_SEQ_FILE OUTPUT_FILE_PREFIX FOLD_COVERAGE MEAN_FLAG_LEN STD_DEV

EXAMPLES:
	ART_454 ref_genome.fa ./outdir/out_file 20
	ART_454 ref_genome.fa ./outdir/out_file 20 1000 20


=============================================================================
ART FOR ILLUMINA READ SIMULATION 
=============================================================================
X version: sequence read quality is based on Illumina read error profile
USAGE:
	ART INPUT_SEQ_FILE OUTPUT_FILE_PREFIX LEN_READ FOLD_COVERAGE
	ART INPUT_SEQ_FILE OUTPUT_FILE_PREFIX LEN_READ FOLD_COVERAGE MEAN_FLAG_LEN STD_DEV
	ART INPUT_SEQ_FILE OUTPUT_FILE_PREFIX LEN_READ FOLD_COVERAGE ERROR_SCALE_FACTOR
	ART INPUT_SEQ_FILE OUTPUT_FILE_PREFIX LEN_READ FOLD_COVERAGE MEAN_FLAG_LEN STD_DEV ERROR_SCALE_FACTOR

EXAMPLES:
	art ref_geneome.fa out_seq_read 32 10
	art ref_geneome.fa out_seq_read 32 10 200 20
	art ref_geneome.fa out_seq_read 32 10 1.2
	art ref_geneome.fa out_seq_read 32 10 200 20 1.2

Q version: sequence read quality is based on empirical distribution of quality scores 
USAGE:
	ART INPUT_SEQ_FILE OUTPUT_FILE_PREFIX LEN_READ FOLD_COVERAGE [ID]
	ART INPUT_SEQ_FILE OUTPUT_FILE_PREFIX LEN_READ FOLD_COVERAGE MEAN_FLAG_LEN STD_DEV [ID]
EXAMPLES:
	art reference_DNA.fa outfile_prefix 32 10
	art reference_DNA.fa outfile_prefix 32 10 300 30


=============================================================================
ART FOR SOLiD READ SIMULATION 
=============================================================================
USAGE:
	ART INPUT_SEQ_FILE OUTPUT_FILE_PREFIX LEN_READ FOLD_COVERAGE 
	ART INPUT_SEQ_FILE OUTPUT_FILE_PREFIX LEN_READ FOLD_COVERAGE MEAN_FLAG_LEN STD_DEV
	
EXAMPLES:
	art ref_geneome.fa out_seq_read 32 10
	art ref_geneome.fa out_seq_read 32 10 200 20
	

Input parameters: 

    1. INPUT_SEQ_FILE: DNA sequence in FASTA format
    2. OUTPUT_FILE_PREFIX: Common prefix of output file names
    3. LEN_READ: the length of read
    4. FOLD_COVERAGE: fold of read coverage
    5. MEAN_FLAG_LEN: mean fragment size of paired-end reads
    6. STD_DEV: standard deviation of paired-end read fragment sizes

Output files:

    1. *.fa files: sequence reads in FASTA format
    2. *.qual files: quality scores in FASTA format
    3. *.aln: read alignments to the genome

Datasets

Simulation datasets for the 1000 Genomes Project

Authors

Weichun Huang in collaboration with Gabor Marth



Copyright© 2008-2009 Weichun Huang