Chapter
4
Edit
Editing protein sequences, mass and modification files

You can edit the sequence of the currently
selected sequence window by selecting Edit|Edit
sequence, pressing the
button in the toolbar or by
right-clicking on the window and select 'Edit'
from the local menu.
i Note: The currently active window has to be a sequence window before you can edit the sequence. However, you can always start editing a new sequence, see end of section.
The sequence is edited in the large multi-line editor in the top part of the dialog box. The sequence can only be edited in 1-letter code. You have to exit to the sequence window in order to view the sequence in 3-letter code. The editor supports cut and paste, meaning that you can copy sequences to the clipboard from other applications and paste them into the editor. You can also highlight and use cut, copy, and paste inside the editor.
i Note: You have to use keyboard shortcuts, e.g. Ctrl-X or Sh-Del for cut, Ctrl-C or Ctrl-Ins for copy, Ctrl-V or Sh-Ins for paste. Alternatively you can use the pop-up menu (right-click in the edit box).
If you resize the dialog box, the sequence edit control will resize along with the dialog box. The rest of the dialog box controls will not change size or position.
The name of the sequence is edited in the edit line below the status panels. The maximum size of the name is 250 characters. If you need more information for the protein you can use the ‘Annotation’ page (Info|Annotation or the button in the toolbar of the sequence window, see Chapter 3).
If you paste a sequence in FastA format from the clipboard, the name line will automatically be copied into the name line, while the rest of the record will be pasted into the editor.
If you know the accession number of the protein you can enter it in the small edit box between the sequence and the name boxes. If you load from an indexed FastA database (Appendix B) the accession number will be loaded along with the sequence. If entered, the accession number will be shown in the sequence window title bar (eg. “[P35247] Pulmonary surfactant….”).
i Note: Although the accession number is not directly used by GPMAW for identification of proteins you are strongly encouraged always to use the number as it is a unique identifier into the respective databases.
If you paste a sequence that is in single letter code but not in uppercase characters you have to convert it into upper case by using the button.
If the sequence contains extra non-sequence characters (e.g. numbers, spaces and carriage returns) just press the button which removes all characters not defined as single letter characters in the current mass file. For more information on the mass files, please see the following chapter “Edit mass file”, 4.2. The button is explained in detail below.
When importing sequences you can also use the File|Import ASCII functions, either as import from clipboard or import from file (Chapter 2.5).
The panels just below the sequence editor show editing status and molecular mass information.
![]()
The first panel shows the position of the editor text cursor (the value is the number of the preceding residue), and the next panel the position of the mouse cursor. If part of the sequence is highlighted, the middle panel will show the first and last residue that is highlighted, otherwise the panel will show the total length of the protein. The last two panels show the monoisotopic and average masses of the intact protein as defined in the editor.
The bottom panel is a two-page notebook that shows either the modifications made to the protein or the amino acid and elemental composition of the protein. The composition panel is updated whenever a change is made in the sequence editor.
Removes all characters in the edit box that are not defined in the
current mass file (1-letter residue identifiers). This function is very useful
when you paste a sequence from another application that contains numbers, space
characters etc.
Converts the text in the edit box to upper case (capital letters).
This is necessary if you paste a sequence in lower case from another
application. When you enter characters from the keyboard, they will
automatically be converted to upper case.
Print the sequence. You can
select 1- or 3-letter residue printout. The printout is similar to printing
from the sequence window (Chapter 3.1).
N-terminus:
/ C-terminus:
You select the modification of each sequence
terminal from either drop-down list box below. To edit the content of the
drop-down list boxes you have to select Edit|Edit
mass file and select the N-terminal or
C-terminal tab (see below 4.2).
Opens the 'Edit cross-links'
dialog box (see Chapter 3.5) enabling you to modify cross-links. Cross-links
are shown in the list box below the button.
Opens the ‘Select
modification’ dialog box (see Chapter 3.6). Unlike when you double-click on the
residues in the sequence window, the ‘Select modification’ dialog box always
opens with residue 1 selected. Modifications are sown in the list box below the
botton.
The sequence offset
enables you to specify that the numbering of the sequence should not start with
one. This is typically used when you cut a sequence out from another sequence
or when you are working with a pre- or pro-sequence. The offset number can be
either positive or negative. When you have specified an offset, the residue
number in the status panel will be shown in red numbers.
Enables you to change the
font size in the edit sequence box in 1-point steps. The font is changed
dynamically.
Determines whether the table
on the composition page shows amino acid residue composition or elemental
composition. The table is updated for every change made to the sequence.

The Edit|Edit new sequence (toolbar button) is identical to the 'Edit sequence' command discussed above except that the name and sequence fields of the edit dialog box are initially empty. Furthermore, when the dialog box is closed, a new sequence window opens on the GPMAW desktop.
The ‘Edit new sequence’ dialog box can be used as an alternative to the File|Import ASCII|From clipboard by pasting into the sequence edit box, removing all extra text and using the and buttons.
i Hint: As the mass panels are updated for every entry in the edit box, you can use the editor to check the mass of a short peptide just by entering or pasting it into the edit box and modify it as appropriate. Only when selecting ‘OK’ will you create a new sequence window.
The Edit|Edit mass files command actually controls four different mass tables: The mass file, the N-terminal, the C-terminal and the atom mass table. The mass tables are crucial for the working of GPMAW. In addition to the masses they also define the amino acid residues (name, 1- and 3-letter code). The mass files reside in the ‘System’ directory as defined in ‘Setup – Directories’, by default this is c:\gpmaw\system\.
Chapter 1.5 contains an overview of all the essential tables of GPMAW.
GPMAW always needs a mass file in order to work. The default file loaded at startup is called AA_MASS.MSS. If this file is not found during start-up, or if errors are encountered, a default mass file is constructed internally which you are recommended to save as AA_MASS.MSS.
Each mass file contains of 32 entries. The first one is for unknown residues, usually called ‘X’. The next 20 residues are the standard 20 amino acid residues while the last 11 residue are user-definable and can be given any name (be careful not to use punctuation marks, $ or * as single residue character).
For each residue you have to enter the 1-letter code, the 3-letter code, a name (10 characters), and a composition (the atoms have to be defined in the atomic masses table, see below and end of chapter). The average mass column is only for verification of the mass and cannot be edited.
The 'extra' residues available in the mass table are best used for modified residues that are present in many copies or across several sequences. Modifying the individual residue (see Chapter 3.6, Amino acid modifications), best caters for single residue modifications. If you modify a residue type (e.g. carboxymethylate all cysteine residues) this is best carried out by changing the mass and full name of Cys (do not change the 1-letter code) and saving the mass file under a new name (e.g. pe_cys for pyridylethyl cysteine). You can then modify cysteines just by selecting a new mass file in the toolbar of the main window.

The button saves changes to the current mass file (shown at the bottom of the dialog while the button enables you to save the whole list to a new mass file.
The two tabs N-terminal and C-terminal are identical in setup and differ only in the terminal they define.

For each modification you enter a name for the modification and elemental composition (see Ch. 4.4).
i Important: In the N-terminal table the first entry has to be ‘Hydrogen’, H1, and for the C-terminal table the first entry has to be ‘Free acid’, O1H1. The first entry is automatically chosen whenever you load a new sequence, start editing a new sequence, perform cleavages etc.
All compositions are calculated relative to amino acid residues, not the free amino acid, see end of chapter for composition (formula) input.
Terminal modifications are saved in the system directory as a file called ‘TERMINALS.MSS’ when the program is closed.

The atomic masses are the basis for all mass calculations carried out in GPMAW. All atoms used in compositions in mass files, modifications etc. have to be defined in the atom mass table. The table can contain 10 atomic masses, and the values are saved in the GPMAW.INI file and are always loaded upon startup. If the INI file is not found, default values are loaded.
The button resets the table to default values. If you want to experiment with different values, remember to note down the previous values or make a copy of the GPMAW.INI file.
The modification files are used as a quick way to select a modification when modifying a residue (Chapter 3.6, ‘Amino acid modifications’). The other function of modification the files is when you perform a mass search of a protein. By including a modification file in the search, you can check whether any of the search masses could contain a modification as specified in the modification file.
The ‘Edit modification database’ dialog box works on the currently loaded modification file. If no file is currently loaded you have to load one through the button. Changes have to be saved to a file after modifications through the button. The file loaded will continue to be the ‘active’ modification file when the dialog box is closed. The modification files are saved in the ‘System’ subdirectory of the ‘GPMAW’ directory (see Chapter 5.4).
Each modification file can contain up to 30 entries. Each entry consists of a name, a formula, and a number of residues for which the modification is valid. If no valid residues are specified, the modification is taken to be valid for all residues.

Whenever the focus changes to a new row, the
name and mass of the current line will be shown to the right of the table. When
editing a line, you have to move to another line and back again, before the
mass of the line is recalculated (the program needs a complete formula in order
to calculate the mass). Click on the button
to open the ‘Composition editor’ (see following section and Chapter
12.2).
The last column enables the entry when checked. Normally, this column is only used for performing mass searches, as having all entries valid in a large modification file can give a very large result list - modifications that might be known not to be relevant under the current circumstances.
The composition in the mass file, the N- and C-terminals, and the modifications all follow the same rules.
The composition of the residue/modification is entered using the abbreviations specified in the atomic mass table (see above) followed by the number of atoms. If atoms are lost from the composition you put a minus sign '-' in front of the atoms lost (e.g. if you hydrolyze an amide you lose one nitrogen atom and two hydrogen atoms but gain an oxygen atom and a hydrogen atom, e.g. '-N1H1+O1'). Please note that negative numbers have to precede positive.
In several of the edit boxes you can
activate the composition editor, either by clicking on the button
or by double clicking in the
relevant formula field.

This opens the ‘Elemental composition’ dialog with the composition of the current selection. The composition can now be modified, either by directly entering the relevant numbers in the number boxes, or by using the up/down arrows next to the numbering boxes. Both positive and negative numbers can be entered. Negative numbers will only have meaning when editing post-translational modifications.
The button resets all number fields to zero.
The ‘Composition’ field shows the total composition and cannot be edited (but you may highlight and copy). The ‘Average mass’ field is for information only.
See also Chapter 12.1 ‘Composition calculator’.