Chapter

8

Database mass search

Identifying proteins based on mass spectrometric peptide maps or directly in a database based on mass.

Introduction to digest mass search                                                  8.1

This is a very powerful and sensitive way of identifying proteins. The method use the idea that the masses of peptides generated by a specific enzyme of a given protein (i.e. the peptide mass map) is specific for each protein in the protein database. The peptide mass map is usually so redundant that even a small sub-fraction of peptides is sufficient for identification.

In practical terms you need in the order of 6-8 peptide masses in the mass range 1000-2500 Da in order to get a reasonable ‘hit’ in the database. The mass precision has to be reasonably good (0.02% or better). As the number of proteins in the databases gets larger, you can expect to need more peptides and/or higher precision. Masses below 1000 Da are often not very specific (i.e. a given mass is shared among many proteins) and above 2500 Da mass precision is not very good and usually also contain missed cleavage points (i.e. overlapping peptides).

The sensitivity of the method is entirely dependent on the mass spectrometric identification of peptides, and is usually in the sub-picomole range. Samples can be pure proteins in solution, isolated by gel electrophoresis or by other means.

A major limitation is that, generally, only proteins present in the database can be identified, e.g. you cannot count on finding homologous proteins.

For references, please see end of section.

The GPMAW search.

The search in GPMAW is based on a scoring system. The system is quite flexible and the user can easily change the scores and thus optimize the search for particular systems. The scores are set in ‘Setup’ on the ‘Digest src.’ page. The scoring system is divided into three parts:

1.       Direct match. A score is given based on the number of overlaps (missed cleavages) in the database peptide (e.g. zero overlap may give a score of 10, 1 overlap 8, 2 overlaps 6 etc.). This is to reflect fact that the more cleavage points present in a peptide the more unlikely it is.

2.       Better fit: If the difference between the search and the database peptide is better than half/quarter the given precision and additional score is given. This enables you to specify a looser search precision than you actually have (e.g. 200 ppm instead of 100 ppm) which enables you to catch outlyers in your search data while still enabling true ‘hits’ to get to the top of the result list.

3.       Scoring type: The normal search type is ‘Linear’ where scores are listed directly. Use this when you specify a narrow mass search range (e.g. search 30-60 kDa. proteins). When you search a large mass range (e.g. 10-150 kDa.) you should use one of the alternative scoring types. Alternatives are ‘Score/NumPep’ (score divided by number of peptides in the database protein) and ‘Score/Sqrt(NumPep)’ (divided by the square root of peptides). The score calculated by these types will compensate for the fact that large proteins tend to give false positives. The ‘Score/NumPep’ tends to over-compensate (favor small proteins).

4.       Sequence tags: Finally you can give a score to a sequence tag (a short sequence you have identified in the protein, e.g. by ms/ms experiments) and to an amino acid composition (in some cases you can identify certain residues to be present in a given peptide).

In order to speed up the search, the protein database is ‘pre-digested’ with the cleavage agent used in the search (e.g. trypsin). This is done in order to speed up the search dramatically.

Setup digest mass databases                                                           8.2

Before making a digest search you have to set up a number of parameters in the Setup|Setup system dialog box on the ‘Peptide src.’ page (Chapter 5.5). Furthermore, you have to generate digest databases based on a protein sequence databases in FastA format.

Directories

The digest database directory (Setup, directories page) specifies the directory where GPMAW looks for digest databases. By default this directory is C:\GPMAW\DATABASE, but can be located anywhere on a local hard drive, CD-ROM or network. It is strongly recommended that you place the digest database on your local hard drive as the actual search is heavily I/O dependent. The protein database itself can be placed on a slow media (i.e. network or CD-ROM) as the speed penalty in retrieving sequences is much less.

Peptide search parameters

The peptide search tab specifies search parameters and scoring parameters. The search parameters can be changed in the 'Digest mass search parameters' page of the Setup system (Chapter 5.5).

Make digest database

Digest databases can be generated from most protein databases in FastA or PIR/NBRF format (see Appendix B). Swiss-Prot is in a different format and is not accepted directly as input., but has to be converted to FastA format. The EMBL and NCBI non-redundant protein databases need to be modified slightly as the sequences can have extremely long sequence names.

The conversion of databases and reduction of complexity can be carried out by the ‘Dbindex’ utility (Appendix B). More detailed information on the databases and how to obtain them can be found in Appendix B.

The creation of digest databases is carriout by a series of questions in a multipaged dialog box (a ‘wizard’). Before you start the wizard you should make certain that you have a proper FastA formatted database ready (see Appendix B) and that you have sufficient space on your harddisk. The final databases will typically take up space corresponding to approximately one quarter to one third of the original database.

The wizard is started from the main menu option Setup|Make digest database.

Mass file: The initial choice in the wizard is to select the mass file pertinent to the digest search. The drop-down selection box is similar to the one in the main menu. The choice is usually between different modifications of cysteine. If you choose the default file AA_MASS.MSS (i.e. Cys is defined as mass 102/103) you should also choose whether Cys is in the oxidized or reduced state. Press the ‘Next’ button to go to the next choice.

The selections made on each page of the wizard are shown in the left-hand list. You can at any point use the ‘Previous’ button to go back and make changes.

 

Database: In the top edit box you enter the position of the FastA formatted protein database to convert. You can either enter the file path and name manually or you can use the ‘Open file’ button to the right of the edit line. The ‘Output directory’ is where you want to place the digest database. If the ‘Syncronize..’ check-box is checked the output directory will match the database directory. If you want the output to be placed in a different directory than the database you have to un-check this box before entering/selecting the output directory.

i          Note: You will not be able to proceed from this page before you have selected a valid digest database.

Next you have to select the cleavage agent (enzyme). The drop-down selection box is similar to the one in automatic digest (Chapter 9.1). This is also where you go to make necessary changes. The Cleavage parameters are changed automatically when you make a selection of an enzyme (cleavage agent). However, this box can be edited if you need to make changes.

The ‘Filename’ determines the name of the final digest database file. This is created automatically from the first four characters of the enzyme (cleavage agent) and the first four characters of the mass file. You can change the name to a more appropriate one before going to the next page.

In the final page of the wizard you can add a comment to the digest database. By default information regarding the state of Cys is included, but you can enter any information up to a maximum of 80 characters. When you press the ‘Finish’ button, the digest database files will be created. This is typically a process that takes 1-5 minutes. A dialog with a progress meter will show the development of the database. As the protein database is an ASCII (text) file, the actual state of the progress meter will only be an approximation.

When the creation of the digest database is finished you will see a temporary dialog stating that the mass file has been reinstated. This is because during creation of the digest database, the mass file of your choice (wizard page 1) has been temporarily loaded.

Digest mass search - data input                                                       8.3

Database selection

When you start a search the program will ask you to select a digest database if no database has been previously selected or the previously selected database is no longer present. If you have previously run digest mass search and the previously selected database is still present, it will be selected automatically.

The ‘Open database’ dialog box is a standard Windows open file dialog. Make certain that you open a digest database (a .DA2 file) that has been prepared based on the correct mass file (particularly modifications of cysteine). Make also certain that you have the same mass file selected when you carry out the actual digest database search.

You can change database at any time by selecting the ‘Database’ button .

Database information

 Pressing the ‘Info’ button when a database is opened shows the database information dialog.

Select database: By clicking on the drop-down list box at the top, you can select between the digest databases present in the currently selected digest database directory.

Database information: The panel below will show the characteristics and data entered when the currently opened database was created.

Database on-line: If the protein database is present in the current directory or the location specified in line three, the database is available for information during searching, and the message 'Database available on-line' will be displayed below the panel. If not, the message will be 'Database not available!'. In this case you will not be able to retrieve a sequence, obtain pI information, or view the extended report.

Input of search data:

In the left column you enter the peptide masses to search for. Whenever a mass is entered manually, it will be selected in the next column (indicated with a check-mark, 'X'). Masses can be selected and deselected by checking and unchecking this box. In the next column you can enter a sequence or composition to search for. The sequence has to in the standard 1-letter residue code. The search program searches first for a peptide match, and subsequently for the sequence/composition. In the last column you enter S if want to search for a sequence or C to search for a composition.

Peptide mass list

Edit and pre-screen: Enables you to quickly select, de-select and remove masses from the search list. You can also pre-screen the list against a pre-compiled list of masses (e.g. a list of autodigest and/or common background peaks). See also mass search, Chapter 6.1.

Load and Save: Enable you to load and save peptide mass lists. GPMAW can read peak lists from PerSeptive (GRAMS), Bruker-Daltronik and Hewlett Packard laser-tof mass spectrometers (other file formats will be supported in the future, please check with the online help or contact Lighthouse data). Saving is only supported for GPMAW's own peak file format (.PKS, see Appendix A).

Search limits:

Mass range: The minimum and maximum mass of the protein to search for. Used to select only part of the database. You should always give a large allowance for variations in mass assignments (pre- and pro-proteins, fragments etc.). The low limit can usually be left at 10-20 kDa. This will leave out a large number of fragments and very small proteins that in almost all cases will be irrelevant for the mass search. The high limit can be set at 100-200 kDa. This is to remove the influence of a small number of very large proteins in the database that tend to give false positives due to the large number of random hits. This is particularly important if you run the current setup with the ‘Score type’ as ‘Linear’ while not so important if you have selecte ‘Score divided by the square root of the number of peptides’ (Score/Sqrt(NumPep)).

Precision: Precision of the mass data obtained. This will either be in % or ppm as defined in Setup (Chapter 5.1).

Min. Prec.: Minimum precision of the mass data. If you have difficulties in assigning mass data with absolute precision, you can set this to the best attainable precision, otherwise set it to 0.0.

Monoioso<: This field determines the crossover point for monoisotopic masses. If you have a high-resolution mass spectrometer your low mass ions will usually be isotopically resolved enabling you to read the more precise monoisotopic mass. However, above a certain m/z you can no longer resolve the isotopes and you have to revert to average masses. If you only use average masses, you set the ‘Monoiso<’ value to 0.

i          Note: If you enter monoisotopic mass values, you have to enter a value in the ‘Monoiso<’ field higher than the largest monoisotopic mass in your list!

Max. overlap: Specifies the maximum number of overlapping peptides that can be allowed in a search mass (e.g. the tryptic peptide GFESRNITK contains an internal tryptic cleavage site and is thus an overlapping peptide with a value of 1). Searches are much faster using a value of 0, but a value of 1 or 2 will usually give a more realistic search pattern (see also 'Optimize' under results below).

Min. hits: This value sets the minimum number of peptides that have to match the input masses before being added to the score list. As the score list is sorted during the search, the highest scores are always kept in the list even when there is an 'overflow' of hits. A low value will slow down the search while important hits might be lost with a high value.

Mass type: M-H, M or M+H has to be selected depending on the input mass type.

Change limits: If any of the above parameters has to be changed, you press this button and make the changes in the resulting dialog box. The ‘Save as default’ check-box enables you to save the entered values as default values. Default values can also be entered in the ‘Setup dialog’ on the ‘Digest src.’ page (Chapter 5.5).

 

Digest mass search - status window                                                8.4

When you start a search, GPMAW will check whether your currently loaded mass file is identical to the mass file used to generate the digest database. If there is a difference you will be asked if you want to change to the database mass file. The ‘correct’ mass file is only important when you look at the ‘Detailed report’ where the calculated mass values are calculated dynamically. In all other instances, the values are based on the ones saved in the digest database.

During a search, the status window will be displayed. The red horizontal bar shows the progress of the search. When the search is finished the dialog closes and the result list is displayed.

Maximum score: The maximum score encountered in the search.

Proteins searched: The number of proteins in the database that fall within the specified mass window (min. and max. mass).

Total in database: Total number of proteins encountered. When the search has finished, this value equals the total number of proteins in the database.

Found: The number of proteins in the database that have at least the 'min. hits' number of peptides with a mass that fits the search specifications. A '>' in front of the number means that the maximum number of proteins that can be reported has been exceeded (at present 500).

Digest mass search - results                                                             8.5

The results dialog box lists all the proteins found that matches the search criteria up to a maximum limit of 500. If this limit is exceeded, only the highest scoring 500 protein hits will be listed.

The dialog is divided into three parts:

1.       Top left shows search information on the protein selected in the score table. Only the first occurrence of peptide ‘hits’ will be shown.

2.       Top right shows graphically the precision of the hits listed in the left box

3.       Bottom shows the actual protein score table including number of hits, score, short name, ID, mass coverage, pI and mass.

Score table

The score table is initially sorted by the score (column 3). After performing optimization, the list will be sorted by the optimized score (column 4). By right-clicking on the table you can select either sorting order from the pop-up menu.

Table content:

#: Line number of table.

Hit: Number of peptides that fit the input masses without/with optimization. If several peptides fit the same input mass, only the first one will be reported.

Score: The score calculated for the given protein based on the scoring system specified under Setup (Chapter 5.5, Digest mass search). If a given search peptide results in more than one ‘hit’, only the first is displayed and counted as part of the score even if later ‘hits’ have a higher precision. The extended report (see below) displays all possible ‘hits’ in the target protein.

Opt. sc.: The optimized score after optimization (see below – Toolbar | Optimize).

Name: Name of the protein truncated to 32 characters (except for the note below). The full name of the database entry is shown in the detailed report (see below). Notice that even though you search a non-redundant database you will often experience multiple hits of the same protein. This is because non-redundant databases are seldom really non-redundant, but contains a multitude of proteins with only one or a few amino acid differences. This is particularly noticeable when you hit a protein that has been analyzed by X-ray or NMR analysis (e.g. like the proteins with the ‘pdb’ in the ID of the figure of the score table).

ID: ID or accession number of the protein. These numbers are unique and enable a positive identification in the relevant databases. If the database used is a combined (non-redundant) database (like Owl, NCBI-nr, EMBL-nr) the ID field will often show the origin database. Likely abbreviations are: sp – Swiss-Prot; spn - Swiss-Prot New; spt - Swiss-Prot TREMBL; tr – TREMBL; trn – TREMBL new; gp – GenPept; PIR - Protein Identification Resource; pdb – Protein Data Bank (Brookhaven 3D structure database);

Cov.: Coverage of the identified peptide in the given protein (calculated as mass percentage).

pI: The calculated pI of the protein. This value is only available if the feature has been turned on in Setup (Chapter 5.5, Digest mass search) and the database is available on-line. The algorithm used is unable to calculate the pI for some proteins, in these cases the pI reported would be 0.0. Three different pI tables are available for calculation, please see Chapter 5.6 Setup Advanced.

kDa: The mass of the intact protein.

i          Note: If the complete FastA formatted database is available on-line, the first 25 proteins will be loaded and the pI and full name entered into the list irrespectively of the setting of the pI calculations.

Hit evaluation help

In order quickly to evaluate whether a ‘hit’ is significant or not, three panels along the top of the ‘hit list’ displays relevant information on the currently selected ‘hit’. Whenever a new line is selected in the hit list, all three panels are updated. Two of the panels, the left-most ‘Mass list’ and the central ‘Mass graph’ can be turned on and off by checking the corresponding boxes in the command bar. The status of these check boxes is remembered between sessions.

The three evaluation windows are divided by resizable splitters, which you may grab and move with the mouse. Likewise the division between the hit list and the evaluation windows is a resizable splitter.

If the entire window is expanded horizontally, only the ‘Search precision’ window will expand. The other windows have to be expanded manually.

Search information

When a protein is selected in the score table, information about the number and precision of the peptides constituting the ‘hit’ for the selected entry will be shown in the top left score table. If multiple ‘hits’ are present in the protein, only the first ‘hit’ will be displayed and counted as part of the score.

The list box display input peptides in the left column (if the input mass list was entered as M+H+ the mass of a proton will be subtracted). The central column will list the ‘hit’ peptides from the database and the right column will list the mass difference in Da and in ppm (part per million) – a precision of 0.1% is equal to 1000 ppm.

Just above the score table there is a blue line representing the selected protein. The green lines show the ‘hit’ peptides relative size and position in the protein.

Mass graph

The mass graph displays theoretical peptide masses for the currently selected protein as well as the input search masses. The graph is essentially identical to the corresponding graph in mass search (Chapter 6.1).

Theoretical peptide masses are shown as gray lines with ‘straight’ peptides without overlaps going up, while peptides containing a single overlap (or missed cleavage) will point down.

Search peptides will be red and pointing up when there is a hit, and they will be blue and point down when they do not fit with any theoretical mass.

The graph can be zoomed by clicking on it twice, once on the upper and once on the lower zoom limit. After the first click, the mass clicked on will be shown in the bottom right corner. You can reset to default mass display by double-clicking in the window.

Search precision

The precision of the currently selected ‘hit’ in the protein score table can be viewed graphically in the top right box.

The x-axis shows the masses from 500 to 4000 Da and the y-axis the precision in ppm. The scale is the current mass search precision defined in the mass input dialog.

Each peptide mass ‘hit’ is displayed as a short red bar. If the ‘hit’ is the result of optimization it will be drawn in dark red color.

The graph is a visual aid in determining the validity of the current ‘hit’. In the above example there is a correct ‘hit’ having a calibration offset of approx. 45 ppm while below is a typical ‘random’ hit that shows random fluctuations around zero. Another typical calibration error is when you have a more or less constant offset. This will show up as a sloping line.

Although the graph is a convenient aid in determining false positives, you should be careful in the interpretation, as peptide masses are not randomly distributed but falls into mass ranges.

Toolbar

The buttons in the toolbar are placed in a band that can be ‘torn off’ with the mouse and positioned anywhere on the screen. When the band is a free-floating window it can be resized.

The table commands are from left to right:

Load tbl. and Save tbl.: Enable you to load and save the score table. The main reason for saving the score table is to compare different digests of the same protein (see below). If you load a table from disk, you will not be able to view the information that requires the search data (optimization) or the protein database on-line (pI).

Optimize. The optimization works only on the proteins in the score table. In the System setup | Digest mass search (chapter 5.5) you can turn on the optimization by linear fit and the number of overlaps (missed cleavages) to use. The linear fit is carried out for each of your hits against the given protein in the database. This will increase the score for all proteins; however, correct hits are more likely to benefit than chance hits. The article by V. Egelhofer contains more details. At the same time the number of overlaps will be increased to the number specified in the setup.

Print: Print the score table. You will be given the option of printing the first page of the list (default) or the whole list. In most cases the first page will be sufficient.

Redo: Repeat the search. You will be returned to the data input dialog box with all the search data intact, thus enabling you to redo the search using other parameters.

Redo NI: Similar to the above command except that only peptide masses not identified for the currently selected protein will be reused for the next search, i.e. the identified masses are deleted.

Get sequence: If the database is available on-line, this button will be enabled, and by pressing it, the currently selected protein from the score table will be retrieved from the database and displayed in GPMAW as a sequence window. Peptides that have been identified during the mass search will be underlined and colored (Pre/post AA see Chapter 5.3, System colors).

Extended report: Displays the extended report (also called the second pass search) for the currently selected protein. See below. This option is only available if the database is on-line.

Database information: Displays a dialog showing the search database name, database comment, enzyme used in creating the database, enzyme cleavage specificity and number of proteins searched/present in the protein database. The information is essentially the information saved in the ‘.INF’ file.

Setup: Opens the Setup system dialog box on the digest search parameter page. Any changes you make will not take effect until your next search.

Help: Context sensitive help.

Exit: Closes the digest search result table.

Detailed report (second pass search)

Select the ‘Protein|View report’, the ‘Detailed report’ button in the toolbar or just double click on an item in the score table to open the ‘Detailed report’ dialog.

This dialog lists the available information on the currently selected protein in a separate window. If you select a different protein in the score table and requests the Extended report, a new window will not open, but the content of the sequence report window will change to reflect the newly selected protein. The report only displays non-optimized data (i.e. without mass shift and max. overlaps).

In addition to straight ‘hits’ the report will also show potential ‘hits’ that corresponds to oxidized methionines (i.e. peptides having a mass 16 Da higher and containing at least one methionine).

The detailed report includes:

The full protein name as it appears in the FastA database.

ID is accession number. Other accession numbers may appear in the name.

The mass is average mass calculated base on the currently selected mass file.

The pI is a calculated value and should be regarded as indicative only. In ‘Setup - Advanced’ you can choose between different tables for calculating the pI. Appendix C lists the tables for pI calculation.

The sequence is shown with identified peptides in red upper case characters and non-identified residues as blue lower case characters. When printing the report, the sequence will be in black and white, but you will be able to differentiate identified residues due to upper/lower case.

The coverage is percentage of residues in identified peptides (unlike the coverage in the hit list which is mass percentage identified and may be higher due to multiple peptides covering the same residues).

The peptide mass table: 

Unlike the mass table presented in the overall hit list above, which only list the first occurrence of multiple peptides that fit the search profile, the peptide mass table in the detailed report includes all peptides that fit the search mass profile.

·       Measured: Measured mass (data input corrected for protonation)

·       Calculated: Calculated masses based on database entry (‘No match’ means that the given input peptide was not identified in the protein displayed).

·       AM/ov: A - average mass; M - monoisotopic mass; ov - number of overlapping cleavage sites.

·       Diff: Mass difference (in Da.) between measured and calculated mass.

·       0/00 or ppm: Mass difference as parts in 1000 or ppm (parts per million) as selected in Setup (Chapter 5.1).

·       Res: Position of identified peptide.

·       Seq: Identified peptide sequence with one preceding and following residue.

Below the peptide mass table is: Average differences, number of matches and mismatches. Then follows a list of potential ‘hits’ if the methionines are oxidized.

At last is given some reference data on the database and input parameters. If the ‘List peptides’ check-box in the toolbar is checked, a list of all theoretical peptide mass data for the given protein as found in the digest mass database, no overlapping peptide masses.

The toolbar.

Prev. / Next: Display the previous/next protein in the ‘hit’ list. If you double click in the score table of the parent window, the report will be updated to reflect the ‘hit’ clicked upon.

Copy: Copies the content of the report window to the clipboard. The copy on the clipboard will not contain any formatting characters.

Save: Saves the content of the report to disk. This is an ASCII (text) file and can easily be incorporated in a report. It is not particularly amenable for spreadsheet analysis.

Print: Prints the report.

Close: Close the report window.

List peptides: If checked a list of all potential peptides in the target protein will be listed at the bottom of the report.

Help: Open the context sensitive help.

 

Multiple digest mass search                                                             8.6

The multiple digest mass search option enables you to run digest search on multiple peak lists without operator intervention.

You start by selecting the database. Like in the normal database search described above, the most recently used database is automatically selected if present.

The peak files to analyze are either dragged into the list box in the left part of the dialog from File Explorer, or they are selected using the ‘Open file’ dialog box that can be activated by pressing the ‘Add file(s)’ button. Multiple files can be selected in one operation.

Files can be removed from the list by highlighting the file name and pressing the ‘Remove file’ button.

Each search shows a dialog with a progress bar like the single search above. After the last search you are back in GPMAW.

i          Note: The multiple digests mass search only works on disk files (peak lists). Furthermore, all searches have to be performed using identical parameters.

Options.

Save report: At the end of each search, the result list is saved as a .PMS file which enables you to retrieve the results, perform optimization, view the detailed report and rerun the search using different parameters.

Pages to print: Determines how many pages of the result list are to be printed. The default is 1 page (if you save the report, you can go back and print more pages).

Change limits: These are the search limits for the current digest search. The limits are identical to the search limits for the single digest search command (see above).

Combine digest mass search                                                            8.7

If the specificity of your digest mass search is too low, you can perform multiple mass searches, either using different input parameters or, preferably, different digests, and combine the search results afterwards.

After you have performed a search, you can save the search results in a PMS file (see above and Appendix A). You then select Search|Combine digest search.

In the left-hand dialog box, you select the digest results that you want to combine (2-5), press the button and all the selected digest search files will be compared and entries having the same ID will have their scores combined. The final list will be sorted and displayed in the 'Combined results' table.

Note: The list of PMS files is taken from the currently defined ‘User directory’ (see chapter 5.4).

Notice that the table does not have any links to the original database. The different digest searches have to be carried out on the same database as the comparison based on the ID field and different databases will most likely have different ID's for the same protein.

Protein mass search                                                                         8.8

Instead of using the peptide masses from a digest, you may also use the mass of the intact protein.

However, this approach is fraught with dangers. Unlike the digest mass search where the information is usually redundant, the search for a protein only contains a single piece of data. Furthermore, the likelihood of the protein being modified is very high. For example, the presence or absence of an initiating methionine (i.e. is it present/absent in the protein and/or in the database). Residues may be chemically modified (e.g. oxidation of methionine), the possibility of adduct ions is greater, and, finally, the protein may be post-translationally modified.

If you take the appropriate considerations, you may be able to use the database to search for proteins.

Database

You can use the same kind of databases as used for digest mass search (Chapter 8.2). However, you have to make certain that the mass file used for constructing the derivatised database use the correct mass file. For peptide mass searches you will often derivatise cysteine (i.e. with vinylpyridine) while proteins will often be either oxidized or reduced.

Data input and searching

Data input is quite simple, as the only parameters needed are:

·          Protein mass

·          Precision

·          Database

The mass of the protein is taken as the average mass in Dalton. The precision is also in Dalton and is +/-. The database option works just like in peptide database searching, Chapter 8.3. If a search database has been used, the previously opened database is automatically opened. You can select a new database by pressing the ‘Database’ button and selecting a new database (.DA2 file). Pressing the ‘i’ button will open a dialog with information on the protein database.

Only integer data can be entered in the ‘Search for’ and ‘Precision’ data fields.

When search data has been entered you press the ‘Do search’ button. A status bar moves across the dialog box just below the buttons, indicating the progress of the search.

The time for a search is in the order of 10-15 seconds for a non-redundant database (~500 000 proteins).

The results are sorted with the best hit at the top. If the ‘hit’ mass is ~131 Da higher than the search mass, it is because the ‘Check for initiating Met’ is checked, and the corresponding protein sequence both fits with an additional Met and the sequence actually starts with Met.

Options

Check for initiating Met: When checked, proteins in the database will be checked for the presence of a methionine in position 1. If the methionine is present in the database sequence and the search mass + mass of Met, it will be added to the search results.

Print: Prints the search results including the search data.

Retrieve: Select a hit result and press the ‘Retrieve’ button to load the corresponding protein into GPMAW as a new sequence window. Alternatively you may double click on the relevant entry to open it as a sequence window in GPMAW.

References (digest mass search)

W.J. Henzel, T.M. Billeci, J.T. Stults & S.C. Wong, Proc. Natl. Acad. Sci. (USA) 90, 5011 (1993).

M. Mann, P. Højrup & P. Roepstorff, Biol. Mass Spectrom. 22, 338 (1993).

D.J.C. Pappin, P. Højrup and A.J. Bleasby, Current Biology 3, 327 (1993).

P. James, M. Quadroni, E. Carafoli & G. Gonnet, Biophys. Biochem. Res. Comm. 195, 58 (1993).

J.R. Yates, S. Speicher, P.R. Griffin & T. Hunkapiller, Anal. Biochem. 214, 397 (1993).

V. Egelhofer, K. Büssow, C. Luebbert, H. Lehrach and E. Nordhoff: Improvements in Protien Identification by MALDI-TOF-MS Peptide Mapping, Anal. Chem., 72, 2741-2750 (2000).