Introduction to programming languages CSC8304 – Computing Environments for Bioinformatics - Lecture 6 1 Objectives Concepts of programming Programming languages Development of computer programs CSC8304 – Computing Environments for Bioinformatics - Lecture 6 2 Why computer programs ? Problems: • • • • • Arranging the text of a letter Collecting and maintaining data about customers Calculating the best investment portfolio Making a photo with your mobile phone Synchronising the components of car engine Computer programs aim to solve such problems related to electronically stored and processed data CSC8304 – Computing Environments for Bioinformatics - Lecture 6 3 Solving problems Problem description Data collections Problem analysis (including data analysis) Designing a solution Implementing the solution CSC8304 – Computing Environments for Bioinformatics - Lecture 6 4 Algorithms Algorithm = systematic processing of actual or virtual data Specification of input and output data Specification of methods of data processing E.g. Euclid’s greatest common divisor algorithm: • • • • • a, b two positive numbers – which is their gcd ? x = a, y = b If x > y then n = x, d = y otherwise n = y, d = x n = q * d + r, x = d, y = r If y = 0 then gcd = x CSC8304 – Computing Environments for Bioinformatics - Lecture 6 5 Early computers Binary data entry – punch-cards Machine language: e.g. MOV A,B; LLR; etc. Difficult to program – easy to make errors CSC8304 – Computing Environments for Bioinformatics - Lecture 6 6 Constants and variables Constant: a fixed value, e.g. 5 Constant: a fixed value with a name, e.g. a=5 Variable x – a place holder for a value (e.g. number, text) ‘:=‘ assignation of a value to a variable = the contents of the variable with a given name takes a certain specified value Makes sense: x := x +1 • x := 5, x := x+1, now the value of x is 6 Other variables: s := ‘Hello!’, y := (2, ‘apples’, ‘table’) CSC8304 – Computing Environments for Bioinformatics - Lecture 6 7 Data types Data is stored in variables with names Variable: name + type + contents Type determines what kind of contents the variable may have: e.g. integer, floating point real, string, combination of other data types E.g. • • int x, x := 5 is allowed, x := 5.1 is not allowed string s, s := ‘hello kids’ is allowed, s := 3 is not allowed Type definition for combined types: • • addr = record (int nr, string st, string ct, string pc) addr a, a := (5, ‘Hyde’, ‘York’, ‘YO2 4RH’) CSC8304 – Computing Environments for Bioinformatics - Lecture 6 8 Operators Operators: +, -, concatenate, <=: • a:=5+3, s:=concatenate(‘hot’, ‘dog’), a<=5 Each type has a range of operators that can be applied to variables of that type Operator overload: some operators may apply in different ways to data of different types In case of subtypes, e.g. real and integer, additional operators may apply to the subtype – e.g. integer division CSC8304 – Computing Environments for Bioinformatics - Lecture 6 9 Early programming languages Fortran, Cobol Better than machine code Introduce flow control CSC8304 – Computing Environments for Bioinformatics - Lecture 6 10 Flow control – 1 (conditions) If-then-else Branching depending on condition If <condition> then <Tblock> else <Fblock> E.g. • • If x=5 then a=2 else a=1 If (signal, left) then (turn, left) else (turn, right) CSC8304 – Computing Environments for Bioinformatics - Lecture 6 11 Flow control – 2 (loops) for – fixed length cycling for <init statement>, <increment statement>, <condition statement>, <execution statement> E.g. • for {i:=1,a:=1}, i:=i+1, i<=100, do a:=a*i; while, repeat – variable length cycling while <condition statement>, <execution statement> repeat < execution statement>, <condition statement> E.g. • • while i<100, do a:=a*i, i:=i+1 repeat a:=a*i, i:=i+1, until i=100 CSC8304 – Computing Environments for Bioinformatics - Lecture 6 12 Structured programming Structured programming was introduced in the late 60’s – early 70’s Pascal, C Flow control is packaged into procedures, data are separated between program structures better understanding, better design, better programs with fewer errors CSC8304 – Computing Environments for Bioinformatics - Lecture 6 13 Procedures and functions Procedures: blocks of programs containing flow control structures with a set of specified input data and a set of specified output data Functions: similar to procedures, but generates a single output data (i.e. it is like a function) Procedures are called with a set of actual values of their formal input variables and a set of variables specified for their formal output variables E.g. • • • • • procedure Draw (int x,y,z,w) procedure Prediction (int x,y,z; var int a,b) int function Length (string s) Length(‘hello’) Draw(10,10,50,50) CSC8304 – Computing Environments for Bioinformatics - Lecture 6 14 Recurrent procedures Recurrent procedure: procedure that calls itself Data separation E.g. • Procedure Gcd (int a,b; var int g) int x,y,r,q,n,d x:=a; y:=b; if x>y then {n:=x; d:=y} else {n:=y; d:=x}; q:=n div d; r:=n – q*d; x:=d; y:=r; if y=0 then g:=x else Gcd(x,y,g); end; CSC8304 – Computing Environments for Bioinformatics - Lecture 6 15 Object oriented programming Object oriented programming emerges in the 70s and becomes mainstream programming paradigm in the late-80s – early 90s Aims: • • • Better description of real world problems Better software design Increased reliability of large software systems Smalltalk, Delphi, C++, C#, Java CSC8304 – Computing Environments for Bioinformatics - Lecture 6 16 Classes and objects – 1 Class: encapsulation of data and data manipulation, such that interference with outside is the minimal necessary Class: attributes and methods – some visible from the outside, most visible only inside E.g. • Class Square int llx,lly,dx,color Create Destroy Draw FillDraw Square S , S.Create – an object is an instance of a class CSC8304 – Computing Environments for Bioinformatics - Lecture 6 17 Classes and objects – 2 Classes can be defined as derivatives of other classes – inheritance Derived classes inherit attributes and methods from the parent class and may add further attributes and methods to these or may change the definition of some inherited E.g. Class Rectangle (Square) int dy (new attribute) (int llx,lly,dx,color – inherited) Draw (redefined) FillDraw (redefined) Rotate (new method) (Create, Destroy – inherited) CSC8304 – Computing Environments for Bioinformatics - Lecture 6 18 Flow control with exceptions Objects are instances of classes and many objects exist simultaneously concurrent execution of objects Objects interact by sending messages – i.e. invoking methods of them, which are visible from the outside Flow control: try – catch – throw Exception: incorrect execution because of some reason E.g. try R.Draw; return(‘OK’); catch (exception e) throw GraphicsExceptionFault; return(‘Error’); CSC8304 – Computing Environments for Bioinformatics - Lecture 6 19 Functional programming Everything is written as a function, the program is a combination of functions LISP Applied in AI (Artificial Intelligence) CSC8304 – Computing Environments for Bioinformatics - Lecture 6 20 Declarative programming Instructions are not necessarily specified directly What is wanted is declared, but how to get it is not specified Prolog – logic programming used in AI SQL – database language Declarative programming is closer to natural language than imperative programming (describing how to do things – e.g. C, C++, Java), but it may imply much longer execution time CSC8304 – Computing Environments for Bioinformatics - Lecture 6 21 Compilation vs. interpretation Compilation: the program is translated into a sequence of machine codes that can be executed directly by the processor – the whole program is translated (compiled) at once, when it is finished, the compiled program is executed compilers Interpretation: the program is interpreted by taking instructions/declarations one-by-one, each interpretation leads to a brief machine code translation that is executed, then the next instruction/declaration is interpreted – the program is translated (interpreted) as it is executed, and at any time only a small part is translated into machine code interpretors Compilers usually generate faster running programs, while interpretors leave more space for interactive use of programs CSC8304 – Computing Environments for Bioinformatics - Lecture 6 22 Interpreted or compiled? BASIC C/C++ Java R Matlab Perl CSC8304 – Computing Environments for Bioinformatics - Lecture 6 23 Reusable software Developing software takes long time – it is desirable to re-use existing software to solve partial problems of new problems Re-use is facilitated by documentation – description of what is written in the program and why Early programming languages did not support very much re-use Object oriented programming languages provide very much support for re-use CSC8304 – Computing Environments for Bioinformatics - Lecture 6 24 Component-based programming Component-based programming is the current major trend in software development New software is built by combining existing components in novel ways – relies very much on reuse of existing software E.g. classes or objects can be purchased or used as service providers, most of the software does not have to written from scratch – for example handling of a printer or reading standard file formats (like XML) CSC8304 – Computing Environments for Bioinformatics - Lecture 6 25 Software development Problem analysis Data analysis Design Development and integration Prototype Testing Use and maintenance CSC8304 – Computing Environments for Bioinformatics - Lecture 6 26 Software development: problem analysis What is the problem that needs the software solution E.g. • • Management of data bases in a uniform manner Visualisation of complex scientific data Identification of users Collection of information and data about user needs and requirements Analysis of collected information and data CSC8304 – Computing Environments for Bioinformatics - Lecture 6 27 Software development: data & design Collection and analysis of relevant data Analysis of data formats – needs and requirements Design the relevant information flow Design data structures supporting the information flow Design processing of the data CSC8304 – Computing Environments for Bioinformatics - Lecture 6 28 Software development: integration & implementation Development of software components implementing the design Acquiring existing components based on design requirements, and analysis of features of existing components Integration of existing components and writing of integration software and possible other components that cannot be bought-in off-the-shelf CSC8304 – Computing Environments for Bioinformatics - Lecture 6 29 Software development: prototype & testing Development of a small-scale prototype to test functionalities Testing of components of the software system – test scenarios, use cases Elimination and correction of faults and errors CSC8304 – Computing Environments for Bioinformatics - Lecture 6 30 Software development: use and maintenance Installation and training of users Deployment of the software Maintenance Updates and patches CSC8304 – Computing Environments for Bioinformatics - Lecture 6 31 Summary Algorithms History of programming languages: machine code; early languages: Fortran, Cobol; structured programming: Pascal, C; object oriented programming: C++, C#, Java; functional programming: Lisp; declarative programming: SQL Constants, variables, data types Flow control structures: if-then-else, for, while, repeat Procedures and functions Classes: encapsulation, inheritance Compilers and Interpreters Software development process CSC8304 – Computing Environments for Bioinformatics - Lecture 6 32 Q&A Is it true that Java is a declarative language ? Is it true that only variables of the same type can be compared by comparison operators ? Can we use the ‘for’ flow control mechanism to execute the same set of operations for 10 or 20 times depending on the value of some processed data ? Is it true that a class is an instance of an object ? Can we use the try-catch-throw flow control in concurrent environments, with many objects executed at the same time ? Can we develop a prototype of a software before meeting the users to collect user requirements ? CSC8304 – Computing Environments for Bioinformatics - Lecture 6 33