New tools for MIAPE Generation Emilio Salazar Doñate Bioinformatics Group CNB – CSIC Overview The Motivation What is an API? What is the MIAPE API? How can I use it? A use-case of the MIAPE API. The Pride MIAPE Converter. Future Plans. Motivation The report of MIAPEs (Minimum Information About Protein Experiments) is recommended by the most prestigious journals. The creation of these reports is at present a laborious task as it is done manually, and very often the scientist skip it. Motivation On top of that, the unnecessary duplication of data is obvious, why, if the data of the experiment are already there, must I report them again? Raw Data, XML Files… The logical approach is to automate the process of generation of MIAPEs. That is what the Java MIAPE API aims to do. API (Application Programming Interface) An API is an abstraction that defines and describes an interface for the interaction with a set of functions used by components of a software system. The software that provides the functions described by an API is said to be an implementation of the API. Therefore an API is not an application, but a helpful tool to be used programmatically by third party users. API (Application Programming Interface) APIs, depending on the type of usage they can be: • Local: They are installed in the final user (client) and they do not need interaction online whatsoever. - the full set of an API that are bundled in the libraries of a programming language, like Java, C…. • Remote: It is provided from another terminal (servers), somewhere in the internet. Normally these servers have a great computational power, and are a great help to perform heavy processes which otherwise could not be executed by normal computers. - The Blast Service at EBI can accessed programmatically via web service API (Application Programming Interface) • Hybrid: Some components need to be downloaded in the client and others need remote access. - The Google Maps API API (Application Programming Interface) Why to use an API? To illustrate the problem, let’s see a typical situation: How to sort a list of Integers? {2, 71, 38, 16, …… An -1, An} a) Beginners in programming: Search for the smallest number and place it on the right of the sorted list Complexity: O(n²) API (Application Programming Interface) b) Computer Geeks: function mergesort(array A[x..y]) begin if (x-y > 1)): array A1 := mergesort(A[x..(int( x+y / 2))]) array A2 := mergesort(A[int(1+(x+y / 2))..y]) return merge(A1, A2) else: return A end function merge(array A1[0..n1], array A2[0..n2]) begin integer p1 := 0 integer p2 := 0 array R[0..(n1 + n2 + 1)] while (p1 <= n1 or p2 <= n2): if (p1 <= n1 and A1[p1] <= A2[p2]): R[p1 + p2] := A1[p1] p1 := p1 + 1 if (p2 <= n2 and A1[p1] > A2[p2]): R[p1 + p2] := A2[p2] p2 := p2 + 1 return R end Complexity: O(n.logn) If n = 1000. a) Complexity: 1,000,000 b) Complexity 9,996 (less than 1%) API (Application Programming Interface) C) Using the Java Collections API: Collections.sort(List list) They claim: The sorting algorithm is a modified mergesort (in which the merge is omitted if the highest element in the low sublist is less than the lowest element in the high sublist). This algorithm offers guaranteed n log(n) performance. Java MIAPE API The Java MIAPE API is a set of libraries and webservices which provide functionality to retrieve, store and exchange MIAPEData in a efficient manner. It has four main modules: • • • • XML module (parses and creates XML files with MIAPE information) Database Manager (includes a web service which connects with the ProteoRed database). MIAPE Factory: Allows the creation of MIAPE data regardless the source of information. Entity MIAPE: The connection between modules. Java MIAPE API API Architecture: Java MIAPE API Why to use the Java MIAPE API: • • • • • It is developed in Java, a widely used, platform independent language. It is implemented based in the most popular Design patterns to ensure performance. Built multi modular to encourage extension and re-usage of code. Provides a storage system for the user (although it allows a customized one) The software can store/retrieve some of the XML HUPO-PSI standards (Pride at Present) with MIAPE Format Java MIAPE API • Fully tested and documented Java MIAPE API Usage Manually behind a user interface: Setting the MIAPE Data manually by the user. - It is really tedious and time consuming - Very accurate as the information is complete and no redundant. - It does not need intermediate sources like XML - Already available at http://www.proteored.org Automatically by parsing a XML file. - It saves time to the user although increases the computational time. - The accuracy is lower as the mapping between data is far from 100% - If the experiment information is stored as XML, seems a natural way of use. Java MIAPE API Usage • Integrated in third party software: - The mapping between data is done by the programmers, which means more work for them, but no work at all for the users. - The accuracy might be 100% when properly used. - Full functionality only with Java. A Use-Case: Pride-MIAPE Converter Converter Databa se MIAPE Generator Tool A Use-Case: Pride-MIAPE Converter Most common XML Format: Pride. Wide range of file sizes: from less than 1MB to 13GB Issues: • Problems for mapping: The XML allows anything as a user parameter, which makes things complicated for an accurate identification. <cvParam cvLabel="PSI" accession="PSI:1000008" name="Ionization Type" value="ESI" /> • Problems of text size: The XML allows text of any size in the fields, but unfortunately the database has a limit, and this can throw exceptions or truncate the text with the loss of information. • Problems to Upload files. A traditional web application has limitations to upload files , and considering that files might be as heavy as 13GB, the development must find a workaround. A Use-Case: Pride-MIAPE Converter • It is definitely impossible to have a one to one Mapping between the Pride File and the MS and MSI MIAPEs but it can provide us with a very good starting point including the Spectra, and the identification of Proteins and Peptides (manually, this task is nearly impossible) The MIAPE Generator Web is available to complete the information missing. • Instead of a traditional Web Site, the application is a client – server application, in which the transfer of the File is done via FTP (File Transfer Protocol) which is far more reliable than HTTP. The drawback is that the User needs to have Java installed in the computer. A Use-Case: Pride-MIAPE Converter: Current State It is available a beta version at http://proteo.cnb.csic.es:9999/miape-webservice-pride/ It uses a Test Database and a Test version of the MIAPE Generator Tool before a definitive version is online. It should allow any size of files, although the times vary a lot, depending on network traffic. The most illustrative example is to map to both MS and MSI MIAPE as it uses all the information in the MIAPE and generates the mapping between the 2 MIAPE types. A Use-Case: Pride-MIAPE Converter: Current State Launch the application with Java Web Start. s A Use-Case: Pride-MIAPE Converter: Current State First introduce user and password (It must match an existing one in the ProteoRed Database. Select a PRIDE file from your local computer and the type of MIAPE to generate, and press start. A Use-Case: Pride-MIAPE Converter: Current State After some time the application will have uploaded the file to the FTP server and will have stored it as a MIAPE. A browser will be opened with the address, showing what it stored in the Database. Future Plans The API is not fully functional yet, it is expected to be finished in May. More XML formats will be used to be stored as a MIAPE: •mzML •mzIndentML •GelML Thank you!