Specification of FASTA files
Note that PPP input requires (multiple) FASTA format, according to the following conditions:
- The input has to be in plain (ASCII) text format and contains only "printable" characters, i.e. you may view the file in an editor as e.g. Windows Notepad. Note that UFO does not work with Microsoft Word .doc files, i.e. files that are not saved as "plain text"!
- The file must not contain a "global header", i.e. lines that begin with multiple ">" characters.
- Each sequence has to be specified with a header, i.e. a single sequence description line that begins with a single ">" character, and one or more subsequent lines that represent the amino acid sequence in IUPAC one-letter-code. Empty lines should be avoided.
Here is an example content of a valid multiple FASTA file (containing 3 sequences) as accepted by PPP:
>Q9NEU5|GSCR2_CAEEL Hypothetical protein Y39B6A.33 in chromosome V - Caenorhabditis elegans MVAGKRTGAAKGSRHNKKYWRKGTNIDDIEDSIHIKSRQAATGGVISEMKDEDLFIVDRT ATANKPVVPKLTKKQQAALEKITKNITQEHVTLPKPSTTSKILKKPAKLPRGNAILALKK GPKAAAPAAKKKNFDVWTTDLTPKIPKSKLENQEAAEHFLKVVKKKQPKTPGKSITSLLP AVQIAEGGASYNPESAEYQEYVAKIAGEEQKLIDHEAKIKAGIEPQWEKVTTEHERFLEM AEGLRIHPKYGKDDEEEEEAGNSEKSMKTGGEAEPKSQRVECDRMTKEQKKKKAKAQKLD KEEKRRLEEKAKEQDSHNVYRTKQLHKELDEEEKQRHEESEVRKKEKLINKLTKRQQLGK GKFVDAEDPFLLQEELTGNLRQLKPQGHVLDDRMKSLQRRNMLPIGGDKEKRRIKNRLKS KVVEKRSAKNIVKGSRVI >Q9TZM2|MO25L_CAEEL Hypothetical MO25-like protein T27C10.3 in chromosome I - Caenorhabditis elegans MFITTWCSNDAPDSSVPRTRSKTKFTPKVRDLLFHCRYEIVDIVDEGYDMPASIGFYNDI IREFVTDDFCLTSLFEDNGQDSNNIQHRNDGCIWSIFDRLMNTNKFRDFDVIQGTFDTLQ IIFFTNHESANNFIKNNLPRFMQTLHKLIACSNFFIQAKSFKFLNELFTAQTNYETRSLW MAEPAFIKLVVLAIQSNKHAVRSRAVSILEIFIRNPRNSPEVHEFIGRNRNVLIAFFFNS APIHYYQGSPNEKEDAQYARMAYKLLNWDMQRPFTQEQLQDFEEGWTHQKKMREEQLVRT CFHDNNPRLPKVNHVYRTRIIPNQQFLREPLKFGNPFRQ >O18211|MO25M_CAEEL Hypothetical MO25-like protein Y53C12A.4 in chromosome II - Caenorhabditis elegans MLKPLFGKADKTPADVVKNLRDALLVIDRHGTNTSERKVEKAIEETAKMLALAKTFIYGS DANEPNNEQVTQLAQEVYNANVLPMLIKHLHKFEFECKKDVASVFNNLLRRQIGTRSPTV EYLAARPEILITLLLGYEQPDIALTCGSMLREAVRHEHLARIVLYSEYFQRFFVFVQSDV FDIATDAFSTFKDLMTKHKNMCAEYLDNNYDRFFGQYSALTNSENYVTRRQSLKLLGELL LDRHNFSTMNKYITSPENLKTVMELLRDKRRNIQYEAFHVFKIFVANPNKPRPITDILTR NRDKLVEFLTAFHNDRTNDEQFNDEKAYLIKQIQELRV