Chapter
5
Setup
Setting up parameters for GPMAW.
The Setup|Setup system dialog box contains most of the default data needed for GPMAW. For setting up digest mass databases (Setup|Make digest databases please see Chapter 8, ‘Database mass search’). All these data are saved in the GPMAW.INI file (Appendix A).
Please browse this chapter carefully, as most of the layout and daily working of GPMAW is dependent on the settings.

These options will check/un-check the corresponding options in the ‘Print sequence’ dialog box (Chapter 3.1).
Omit modification info: Information about sequence modifications and cross-links will be printed.
Extended: Print elemental composition and amino acid composition data. List cross-linked residues.
2x line spacing: Print sequence with double line spacing.
Defines the default display of masses to either average or monoisotopic (see Appendix C.1). This default value can easily be changed for each sequence window individually (Chapter 3.1) by clicking on the Av./Mo. button. Most other windows also enable the mass type to be changed on the fly. Please note that the mass type of the peptide window is set individually from the sequence window.
% or ppm: Determines whether the precision is reported in as a percentage or as ppm (part per million). 0.01% equals 100 ppm. When working at high precision (e.g. better then 0.2%) you should use ‘ppm’ due to the better accuracy.
The frames are resizable parts of a window that shows information in addition to what is present in the main window. The ‘frames’ are available for the following windows:
1. Sequence window (Chapter 2). Contains information on modified residues, modified terminals, cross-links and pI.
2. MS/MS window (Chapter 10.1). Displays a sorted list of all masses displayed in the main window. The two displays are linked, so that clicking on a value in the frame will highlight the corresponding value in the main window.
The advanced page allows you to choose different tables for the calculation of the pI of peptides and proteins. The possibilities are:
1. B. Skoog & A. Wichmann, Trends in Anal. Chem. 3, 82-83 (1986)
2. Free amino acids
3. Rickard, Strohl & Nielsen, Anal. Biochem, 197, 197-207, (1991)
In all cases the algorithm of Skoog and Wichmann is used.
The sequence information dialog (Chapter 3.8) displays all three pI values.
Search tolerance: Default tolerance for mass searches, digest mass searches, etc. The value can be changed before each search.
Show multi-charged: Displays multi-charged species as defaults while showing the results of the search for mass (Chapter 6.1).
Determines whether the initial display of the mass search results will be in the long, or the short tabular form (Chapter 6.1). This parameter is also changed dynamically when you use the mass search result table.

The peptide parameters determine the initial display/print/copy parameters for the peptide window (the result of protein cleavage, see Chapter 9.4). Most parameters, except the copy parameters, can be changed on the fly.
Determines whether peptide masses are reported with 2 or 4 decimals. The internal precision of the mass calculations as carried out by the program is not changed. The calculated precision is dependent on the values entered in the mass files (Chapter 4.2, default is 5 decimals for average masses and 6 decimals for monoisotopic masses).
Shows masses as either average or monoisotopic masses on the screen.
i Note: The peptide window can have a different default mass setting than the other windows. E.g. the sequence window will usually have an ‘average mass’ setting while the peptide window will be in monoisotopic mass mode due to the higher resolution of peptides.
Displays amino acid residues in the peptide list using either 1- or 3-letter code. Can be changed dynamically in each peptide window.
Sorts the peptide list by number (position in the protein), mass, HPLC index, Bull & Breese index [H.B. Bull & K. Breese, Arch. Biochem. Biophys., 161, 665 (1974)], charge (with a secondary sort by number) or sequence (alphabetical sorting based on 1-letter code).
When the peptide table is copied to the clipboard this option will determine whether delimiters are space characters (text) or tab characters. Use the text form when you copy to a report, and tab delimited when you copy to a spreadsheet.
When you copy the peptide table to clipboard this option selects whether the peptide amino acid sequence is copied in total (‘Full sequence:’ EECSVPVCGQDR) or the central part of the sequence is replaced by … (‘Limited sequence:’ EEC…QDR).
Determines at what pH will the charge of the peptides in the peptide
list be calculated. This setting is also used in various other places like the
protein window frame, peptide info etc.
The ‘Low mass cutoff’ determines the mass value below which peptides are hidden in the peptide list. The main use for this option is in MALDI mass analysis where low mass values are not shown in the mass spectrum. The low mass cutoff can be set between 100 and 1000 Da by using the slider.
As the information needed for different experiments varies, it is
possible to specify what should be reported for each peptide. To make for a
flexible setup, you can set up two different layouts, primary layout (that shows when the peptide window opens) and alternate layout (you switch between
the two modes by the button
– Chapter 9.4).
The layouts are shown as two white lines
displaying the selected column headers. The exact column layout can be edited
by pressing the button
at the right end of each line.
The column layout is edited through a number
of drop-down selection boxes. The leftmost column is shown at the top of the
dialog box. Up to six columns can be selected (in addition to peptide number
and sequence). If you do not want a particular column, you set it to ‘(none)’.
The peptide number and the peptide sequence are always selected as first and
last parameter respectively.
The parameters you can choose to show are Number (always present in the first column), Sequence (always present in the last column, MH+ (MH+), MH2+ (MH22+), MH3+ (MH33+), MH4+ (MH44+), MH- (MH-), MH2- (MH22-), M (neutral mass), From-to, HPLC index (reversed phase retension index), pI (at chosen pH, see above), B&B (Bull & Breese index).
For a discussion of the individual parameters please see ‘Protein cleavage’, Chapter 9.1.
The system colors determine how sequences, graphs etc. are displayed on the screen. By selecting appropriate colors, you can make the reading of information faster and safer. As computer monitors vary in clarity and color fidelity, you are encouraged to experiment with various color combinations.

The left hand list shows you the currently defined colors in GPMAW. The list is divided into two columns, where the first column shows you text on white (actually light gray) background and the right hand column black text on colored background. The different colors are used either in one or the other mode, so the selection of color should reflect this.
Highlight 1-3: These colors are used as background when highlighting sequence residues. You should use light but bright colors (e.g. check whether the black characters are easy to read in the right-hand column).
Highlight 4: This is the color of modified residues and it not a background color, so you should choose the color based on the left-hand column.
Pre/PostAA: The color of the residues before and after the identified sequence in the mass search window (Chapter 6.1).
AA text: The sequence in the sequence window – usually black.
Graph1-4: The color of the four graphs that can be displayed in the various graphs. As the lines can often be thin and difficult to discern from the background, you should use bright and dark colors (e.g. check in the left hand column).
Sequence num.: The color of the subscript numbers in the sequence window.
Aux: This color is not used at present.
Dot1-4: The color of the dots in the ‘frame’ of the mass search window (Chapter 6.1).
i Note: Several windows are able to print in color (e.g. sequence window, peptide window, and several graphs), and when printing to a monochrome printer (e.g. a laser printer) the different colors will be printed as various shades of gray. By experimenting with different colors, you will most likely be able to print in useful shades of gray. If you are using both a color printer and a monochrome printer you may have to change the color table when changing printer – this is most easily done by setting up different users (see Chapter 5.7).
The button reverses all colors
to the default colors for GPMAW.
You can edit a color either by selecting the relevant line and click on or you double-click on the colored line.
In either case you will get the standard Windows ‘Color’ dialog box.
The current color will be selected (dotted line around the color). You can now select a new color from the ones displayed or press the ‘Define Custom Colors’ button to select a new color from the advanced dialog box layout.
Click on to select the color.
During installation, the various components of GPMAW are installed into the following directory structure:

The main directory is set to C:\GPMAW, but can be changed by the user during installation. Although the program can be installed to any directory, it is recommended that you use the default c:\gpmaw\ as future updates will be much easier to perform.
Below this directory, four directories are created:
\BIN contains the main gpmaw3.exe executable program file, the gpmaw3.ini file (contains the initialization data between sessions), the validation file gpmaw3.chk, and the help file gpmaw3.hlp. Additional helper programs like DBIndex (Chapter 12.3) are also installed here.
\DATABASE default place for digest mass databases. Each digest mass database contains three files, a data file (.DAT), a name file (.NAM), and an information file (.INF). See Chapter 8.2 for a description of how to create digest mass databases. If your hard disk is partitioned into several drives or you work across a network, you are likely to place the protein elsewhere.
\SYSTEM contains various common files that are shared between different users/sessions: modification files (.MOD), mass files (.MSS), and highlight profiles (.HPR). The system file can be changed in the Setup|Setup system|Directories, but unless you have compelling reasons you should leave it in the default state.
\USER contains the files
that are individual for a session: GPMAW sequence databases (.SEQ), peptide mass
files (.PEP), peak mass lists (.PKS), and peptide mass search result files
(.PMS). You can create several different user directories and change between
them in the Setup|Setup system|
Directories. If you make different directories for your projects/users you should
set up GPMAW for different users, see below in 5.7.
On the 'Directories' page of the system setup you can specify a different working directory, different digest mass search directories, and a different system directory. As the database mass search databases can be huge (> 20 MB), it can be advantageous to locate them on a central server or another large shared disk in a network.

If you want to change a directory you can
either double click on the corresponding line the list or select the line and
press the button
. Then you navigate to the correct directory.
Note: You can set up GPMAW for different users or projects by setting ‘Users’ (see 5.7) or using multiple icons on the desktop with different in-line parameters (see Appendix D).
The entries labeled 'EMBL sequence files' and 'EMBL index files' are the locations of the indices and sequence file of the Swiss-Prot database as delivered on the EMBL CD-ROM (Appendix A). Due to the relatively slow speed of CD-ROM's it can be advantageous to copy the index files to a harddisk in order to speed up searches (the index files comprise: freetext.pnx, freetext.hit, freetext.trg, brief.idx, entrynam.pnx, entrynam.idx, taxon.hit, taxon.pnx, taxon.trg). Copying the sequence files as well does not yield a significant speed increase (but can be convenient if the harddisk capacity is large enough). In case the index and sequence files can be copied to the same directory the two directory entries will be identical.
The 'Atlas sequence files' and the 'Atlas index files' are for the PIR protein sequence database (PIR1.SEQ, PIR2.SEQ, PIR3.SEQ and PATCHX.SEQ), and configuration is similar to the Swiss-Prot database above (the index files comprise pir1.inx, pir2.inx, pir3.inx and patchx.inx).
If you download either the SWISS-PROT or the PIR databases from the Internet, you cannot search them, as the index files are not published.
From the 'Digest src.' page you control the initial settings of the search parameters for the digest mass search (see Chapter 10).

The search limit parameters are more fully discussed in Chapter 8.
Mass range: The smallest and largest protein mass to search for mass hits. Usually, you know the approximate mass of the protein in question, but you should enter a wide margin in order to compensate for fragments, pre- and pro-proteins in the database. You should normally have a lower limit of 10 kDa (to exclude a large number of very small fragments) and an upper limit of 100 kDa (to exclude a small number of very large proteins that tend to give false positives).
Precision: The mass precision of the input search masses when searching the database. Can be in either % or ppm as defined on the System page.
Mininimum prec.: If you are unable to determine the low masses with absolute precision you enter the minimum attainable precision here, otherwise you enter 0.0.
Monomass: When using high-resolution mass spectrometers, you are able to obtain monoisotopic mass values at low masses (e.g. below m/z 3000). As these values are usually more precise than average data, it will be advantageous to use these. In order also to use high mass values, you set the monoisotopic crossover mass to the value below which you determine monoisotopic masses. As the digest database contains both sets of values, GPMAW can search both types simultaneously.
Overlaps: How many un-cleaved potential cleavage sites should be allowed in the target peptides (e.g. a tryptic peptide like LIPKTGHNEDRKSVR contains two potential cleavage sites and will have an overlap value of 2).
Min. hits: The minimum number of peptide masses that have to fit in order to be entered into the final score list.
Mass type: Default ion type for the mass input table, determined by your mass spectrum.
i Note: The program will do the fastest search when no overlaps have been specified. This is partly because each overlap adds a search overhead and partly because a slightly different algorithm is used.
The scoring parameters determine how hits are evaluated. A hit is defined as a database value that falls within the search window defined by a search mass. You should feel free to experiment with various values, as most likely there is no universal magic setup for the search parameters.
Overlaps: The score for a given number of overlaps. Non-overlapping peptides are given the highest values. Peptides containing one or two overlaps are also common and should be given a high score. Overlaps of four and more are quite rare (at least they are rarely observed).
Score type: At present three different scoring types are supported: Linear, scores are not modified. Score/NumPep, the score is divided by the number of peptides present in the database protein. Score/Square root, the score is divided by the square root of the number of peptides of the protein found in the database. The last two scoring types compensate for the fortuitous hits of large proteins. Score/NumPep tends to overcompensate while the Score/Square root usually compensates satisfactorily for ‘normal’ proteins in the 20-150 kDa mass range.
Precis./2 and Precis/4: If the hit is closer than half/quarter of the given precision for the search peptide, an additional score is added to the total.
Sequence and Compos.: The score given for a match of a sequence or an amino acid composition.
Optimization: The ‘hit’ list from the peptide mass search can be re-searched using optimized parameters. You can here select the optimization to include an incresed number of overlaps (missed cleavages) and/or perform a linear fit on the hits from the first search and use this modified ‘calibration’ for a second search.
Autoload correct mass file: When the program performs the second pass search, all mass calculations are redone. In order for this to function correctly the right mass file has to be loaded. If this option is checked, the file will be loaded automatically, otherwise you will be asked.
Show pI in results: If the original sequence database is available on-line, the program will calculate the pI of each result hit when presenting the result table.

'3-letter display': If checked, the sequence window will show amino acid residues in 3-letter code. If not checked residues will be shown in 1-letter code.
'Reduced Cys (SH)': If checked, cross-links are not displayed or calculated. If unchecked, cross-links are displayed as red lines (Cross-links, Chapter 3.5). Cys residues are calculated as mass 103 Da when reduced (SH) and as 102 Da when oxidized (SS).
'Highlight global': If checked, all sequence windows opened on the desktop will be highlighted whenever the highlight command is executed (Chapter 3, Highlight residues).
'Keep highlight': If checked the highlight dialog box will remember settings between executions. The two options 'Highlight global' and 'Keep highlight' can be changed at run-time (Chapter 3.2).
‘Display modula 5’: Sequence windows will display protein sequences only in multiples of 5, e.g. 55 residues pr. line, not 56 or 54 residues. Although most useful when displaying 1-letter code it also works for 3- letter code.
Number 10th residue: When checked every 10th residue in the sequence window will be labeled with a subscript number when displaying 3-letter code. The color of the number depends on the color setup (Chapter 5.3). In 1-letter mode every 10th residue will have a small vertical line.
Fit window height to seq.: When checked, all newly opened sequence windows will have a height that just fits the displayed sequence. See also ‘Default window size’ below.
Ask user before exit: If checked, GPMAW will pop up a dialog box and ask if you really want to close the program before closing.
Autoload last sequence: GPMAW will try to load the most recently accessed sequence automatically when the program is started next. If you work repeatedly with the same sequence this feature can save you a little time when restarting the program.
Autosize forms: When the system font is changed, some dialog boxes also change in order to accommodate the new font size. Sometimes GPMAW may have problems resizing correctly. If you experience this problem try to check this box to force the program to recalculate the size of dialog boxes.
This option enables you to determine the initial display state of GPMAW:
· Normal: The program will open in a window that will take up approximately 1/3 of the screen.
· Maximized: The program will be displayed covering the whole screen area.
· Minimized: The program will be minimized to the task bar. This feature is most useful if you add GPMAW to the Windows ‘Startup’ folder in order to automatically start GPMAW whenever you start your computer.
Click on thebutton to select any monospaced font installed on the system for the sequence window. You can also select a different font size. If you check the ‘Include daughter window’ box, the selected font will also be used for display in the peptide window (Chapter 9.4) and the mass search window (Chapter 6.1).
The default font is Courier New in size 10 point.
This option, when enabled, defines the initial size of the main GPMAW program window and the initial size of the sequence window.
If the ‘don’t care’ box is checked, the values entered will have no effect.
Pressing the button will read the current size of the program window / the size of the topmost sequence window and put the values into the relevant boxes (X – width, y – height). The values can also be edited manually.
If you have checked the ‘Fit height to seq.’ box above, the height parameter entered here will be ignored.
i Note: It is possible to enter values larger than the current window size. This will result in parts of the program / sequence window being inaccessible. In this case you should reopen the setup box and enter new values.

The BLAST setup page works in concert with the ‘Local BLAST homology search’ see section 7.2.
The BLAST homology search uses the NCBI BLAST program called ‘blastall.exe’. This file will in a normal GPMAW installation be installed in the C:\gpmaw\bin\ directory. Along with this file you will need the following files: formatdb.exe, blosum45, blosum62, blosum80, pam30, pam70 and seqcode.val.
i
Note: If you are unable to locate these
files, you can download them from the NCBI FTP site as a compressed
selfextractable file (ftp://ncbi.nlm.nih.gov/blast/executables/blastz.exe).
When you have decompressed the blastz.exe file, you can copy the files
mentioned above to the \gpmaw\bin\ directory. The remaining files in the
download are not used at present.
In order to use the local BLAST you need to tell GPMAW the location of the ‘blastall.exe’ file and you need a database in BLAST format.
Pressing the ‘Install
BLAST’ button will present you with a ‘File Open’ dialog box, that you use to
locate the ‘blastall.exe’ file. By default this will be located to
c:\gpmaw\bin\, but you can place it wherever you like. The other files
mentioned above have to be placed in the same directory in order for GPMAW to
locate them. If the ‘blastall.exe’ file is in the \bin\ directory GPMAW will
usually locate it automatically.
In order to run a
homology search, you need a protein database to compare with. These can be
generated from any FastA formatted protein database, please see Appendix B for
how to obtain a database. If you have obtained your copy of GPMAW on a CD-ROM,
you will usually find two databases (Swiss-Prot and EMBL-nr) on the disk, ready
for use. The databases can be the same as the ones used for retrieving
sequences (Chapter 2.6) and peptide digest database search (Chapter 8).
When you press the ‘Format’ button, you will be asked to open the FastA formatted database to be converted. The actual formatting is carried out by an external program ‘formatdb.exe’ that is called by GPMAW. Do not close the black DOS box that opens when this function is called! It will close automatically when the database formatting is finished.
When finished with the conversion, GPMAW will ask whether you want it added to the list of BLAST databases. When you have done so, the database will be available from the ‘Local BLAST’ option (Chapter 7.2).
If you have a ready made BLAST formatted
database, you can add it the list by pressing the
button. You will be
asked to locate the ‘.psq’ file of the database set.
You can remove entries from the list by
pressing the
entry button. Note: This function will only remove
the reference to the database, not the actual database.
i Hint: A BLAST database consists of three files with the extensions .phr, .pin and .psq (e.g. swiss.phr, swiss.pin and swiss.psq). The total space required by the three files is slightly larger than the original FastA formatted database. If your only purpose is to perform BLAST searches (e.g. no sequence retrieval or peptide mass searches) you can delete the FastA database after the generation of the BLAST database.
The concept of users can be used in two ways:
I. Multiple users can use the same installed version of GPMAW but have different preferences and directories to store individual data.
II. A single user can have different projects localized to different directories. Each project (or user) can furthermore have different preferences, very useful if you work with different mass instruments having different resolutions.
Selecting Setup|User|New user and entering a name of not more than eight characters create a new user. The current .INI file is then saved with this name. Any preferences you have made or will make before closing the program will be saved to the new user.
Selecting an already existing user will load the preferences in the new ini file. Changed preferences in the current user profile will not be saved before loading the new profile.
You remove a user by selecting Setup|User|Remove user and entering the name of an existing user when asked in the dialog box.
The Default option loads the default gpmaw.ini file.
The currently loaded use profile is displayed in the title bar and after the 'User' option in the 'Setup' menu.
See also Appendix D on how to set up GPMAW for different users to start directly from a given shortcut.