Config

Configures output directories, input and output file paths for each step of the ProDuSe pipeline, and configuration files for each script.

produse_config.py performs the following tasks:

  • Creates a main output directory (produse_analysis_direcetory) in the specified directory
  • Creates a log file listing the supplied arguments in the main output directory
  • Creates a Makefile in the main output directory (Depreciated, has no purpose)
  • (If necessary) Creates normal and bwa indexes for the designated reference genome

For each sample supplied:

  • Creates a sample-specific directory and subdirectories in the main output directory using the sample’s name
  • Symbolically Links input files into this sample-specific directory
  • Creates a config file for each step in the ProDuSe pipeline, containing sample-specific parameters

Run Using

produse configure_produse

or

python /path/to/ProDuSe/ProDuSe/configure_produse.py

Parameters

-f –fastqs:
If running a single sample, the fastq files for the sample to be analyzed.
Two arguments must be supplied.
A single sample directory names ‘Sample’ will be created.
Mutually exclusive with -sc –sample_config.
-sc –sample_config:
 
A configuration file listing sample names, fastq locations, and sample-specific parameters.
A directory will be created for each sample, using the name supplied.
Mutually exclusive with -f –fastqs.
-d –output_directory:
 The directory to output ProDuSe results. Default is the current working directory.
-r –reference:Reference genome build, in fasta format.
-c –config:A configuration file listing parameters to be supplied to each stage of the ProDuSe pipeline.

Additional Considerations

While indexes will be automatically generated if none are present in the reference genome directory, this make take a significant amount of time.