|
You may specify options to the GeneMark program by adding options on the command
line or by setting default options using the
DEFMAT
and
GMARGS
environment variables
.
All options start with the the '-' symbol to indicate it as an option followed by a
letter corresponding to the option. Most options require that they be followed by
an argument. For example, the -m option allows the user to select a sequence matrix
file like so: '
-m ecoli_4.mat'.
Analysis
Options
:
|
-a
number
|
A priori probability of coding. This is
the probability that a randomly selected sequence fragment is coding. The default
value is 0.5 and should work well in most circumstances. Values between 0.01 and
0.99 are permitted.
|
|
-c
filename
|
If the organism studied uses an alternative codon translation from the standard,
it can be specified here. 'filename' specifies
a file that lists on each line a codon, it's single letter IUPAC protein translation
(use '*' for stop codons), and, optionally the word 'start' or 'rare_start' to indicate
start codons.
|
|
-m
filename
|
Matrix file
. 'filename' specifies the GeneMark model
to use when calculating the coding potential function. The program has a default
location it will search, but will also search the path specified by the
MATPATH
environment variable.
|
|
-s
number
|
Step size (in nucleotides). This is the step size used by
the sliding evaluation window in the sequence. The default value of 12 is adequate.
Large values may produce strange results. Adjusting this value adjusts the number
of data points evaluated by the program but does not effect the accuracy of prediction.
|
|
-t
number
|
Threshold value. Regions and ORFs with a mean coding potential higher than this
value are identified as coding signals. The default value of 0.5 should be adequate.
Values between 0.01 and 0.99 are permitted.
|
|
-w
number
|
Window size (in nucleotides). This sets the size of the scanning window. Larger
values may cause the program to underpredict small coding regions. Smaller values
result in diminished prediction accuracy. The default value of 96 should be adequate
in most circumstances.
|
|
-v
|
Verbose. Print out a confirmation at each stage of the program's execution.
|
|
-D
|
Data. When this option is specified, GeneMark will provide machine-readable versions
of the
.lst and
.ps files generated by the -g and -l options above. These files will end
with the suffixes
.ldata and
.gdata respectively.
|
|
-R
filename
|
Ribosome binding site pattern file. If provided, the program will score ribosome
binding sites near putative gene starts according to parameters provided in an
RBS pattern file.
|
Graphic
Options
:
The GeneMark program permits the user to generate a Postscript file containing a
graphical depiction of the coding potential function in 6 reading frames. This graphic
can be very useful in visualizing the data. The Postscript graphic is placed in
a file with the suffix ' .ps'.
Graphic options are set by specifying ' -g' followed by any combination of the letters in
the table below. For example:
gm -gnos
-m ecoli_4.mat cyaY
Creates a graph containing start codon, stop codon, and open-reading frame indicators
(in addition to the coding potential function) and places it in the file cyaY.ps.
|
0
|
(zero) Cancel all previous graphic options.
If no options are selected, no graphical output is made.
|
|
f
|
Frameshift. Indicate possible frame-shift errors with a veritcal arrow.
|
|
k
|
Use an alternative scale labeling scheme that labels the scale in nice round units.
|
|
l
|
Landscape. Print the graph in landscape orientation rather than portrait (the default)
|
|
n
|
Ends. Indicate stop codon positions with a descending tick.
|
|
o
|
ORF. Indicate open-reading frames with a horizontal line.
|
|
r
|
Region. Indicate regions between stop codons where significant coding potential
is indicated. These regions are indicated with a grey bar.
|
|
s
|
Start. Indicate start and rare start codons with upward and small upward ticks respectively.
|
|
x
|
Exon. Indicate possible exon boundaries based soley on coding potential information.
Boundaries are inidicated using angular brackets.
|
You may also specify a "zoom level" using the ' -z' option. The number of data points graphed per page
is simply divideded by this number. So, to view the graphical output at 0.5x zoom
(twice as many data points per page):
gm -gnos
-z 0.5 -m ecoli_5.mat cyaY
GeneMark Postscript output can be sent to a printer or viewed interactively on your
computer.
Listing
Options
:
The GeneMark program permits the user to generate a text file containing summary
results of the program's analsysis. Summary information is placed in a file with
the suffix '
.lst'. The listing options are selected using the option ' -l' followed by any combination of the letters
from the table below. For example:
gm -lo
-m ecoli_5.mat cyaY
... would generate a list, cyaY.lst, of open reading frames with a mean coding
potential greater than the
threshold value
.
|
0
|
(zero) Cancel all previous listing options.
If no options are selected, no summary output is made.
|
|
o
|
ORF. List open reading frames. If an
RBS pattern file is specified value
, evaluations of RBS sites near putative gene starts is provided.
|
|
r
|
Region. List regions between stop codons where there is a significant coding potential.
|
|
q
|
Quiet. Suppress comments and header information (makes output easier to use with
scripts and other programs).
|
|
x
|
Exon. List regions between putative acceptor/donor sites with significant coding
potential.
|
ORF-related
Options
:
The GeneMark program permits the user to automatically write out, in FASTA format,
open-reading frames with high coding potential into a file as either nucleotide
sequences or amino acid translations. The results are placed in a file with the
suffix '
.orf'.
The ORF-related options are specified with the option '
-o'
followed by any combination of letters from the table below. For example:
gm -op -m ecoli_4.mat cyaY
... creates protein translations of high-scoring ORFs and places them in
cyaY.orf.
|
0
|
(zero) Cancel all previous ORF-related options. If no options are selected,
no ORF-related output is made.
|
|
n
|
Nucleotides. Write out the nucleotide sequences of the high-scoring ORFs.
|
|
p
|
Protein. Write out the amino acid translationg of the high-scoring ORFs.
|
|
q
|
Quiet. Suppress comments and header information (makes output easier to use with
scripts and other programs).
|
ROI-related
Options
:
The GeneMark program defines a unit called a "region of intrest" as a region between
two stop codons in the same reading frame with a significant coding potential. Such
regions may not occur within an open reading frame and may indicate coding regions
where start and stop codons have been masked by errors in the sequence or other
circumstances.
GeneMark permits the user to automatically write out, in FASTA format, regions of
interest with high coding potential into a file as either nucleotide sequences or
amino acid translations. The results are placed in a file with the suffix '
.rgn'.
The ROI-related options are specified with the option '
-r'
followed by any combination of letters from the table below. For example:
gm -rp -m ecoli_4.mat cyaY
... creates protein translations of high-scoring regions of interest and places
them in
cyaY.orf.
|
0
|
(zero) Cancel all previous ROI-related options. If no options are selected,
no ROI-related output is made.
|
|
n
|
Nucleotides. Write out the nucleotide sequences of the high-scoring ROIs.
|
|
p
|
Protein. Write out the amino acid translationg of the high-scoring ROIs.
|
|
q
|
Quiet. Suppress comments and header information (makes output easier to use with
scripts and other programs).
|
previous: Using GeneMark
next: GeneMark Environment Variables
|