Daniel C C Hamm UniS CS380 – Week 10 Notes Real-World Languages Introduction _________________________________________________________________ 1 Phase one __________________________________________________________________________ 1 Phase two _________________________________________________________________________ 2 Categories of language ________________________________________________________ 2 Database languages _________________________________________________________________ 2 Unix languages _____________________________________________________________________ 3 Shell scripts ______________________________________________________________________________ 3 Scripting languages ________________________________________________________________________ 3 Computer scientists languages ________________________________________________________ 3 AI languages _______________________________________________________________________ 3 “Amateur” languages________________________________________________________________ 3 Web languages _____________________________________________________________________ 3 Linking of program units and “objects”___________________________________________ 4 The Filter model ____________________________________________________________________ 4 Important note ____________________________________________________________________________ 4 The GUI model _____________________________________________________________________ 4 Introduction We have been looking at the main families of languages, and main programming paradigms. However, much of the work done in “real life” is outside these paradigms. We can divide the development of these languages into two phases: that predominating in the early to mid 90s, and the model which is began to emerge in the late 90s and is developing rapidly in the first couple of years of the new millennium. Phase one While C++ was the predominant language in the 90s amongst professional programmers, even this was true only upto a point. The following categories of usage have always been outside the mainstream: 1 Database users – tended to use their own tools, encapsulated within so called 4GLs (4th Generation Languages1), that came with database systems. Unix administrators and “power users” – the Unix shell is a powerful programming language in its own right, and is largely a matrix in which other “filters” are incorporated. The generations are as follows: 1. Direct machine code (numbers had to by typed or, more likely entered by toggle switches, on a machine console). 2. Assembly language – a symbolic language where instructions had a 1 to 1 correspondence with the equivalent machine code instructions. 3. Traditional procedural languages, in which a high level instruction had a direct mapping onto one or more machine code instructions. Of course, complex control structures did not necessarily map in quite as straightforward a fashion. Moreover, the object model introduces new confusion – at the lowest level of code there is still a direct relationship between program code and machine code, but a programmer does not necessarily have access to the source code of an object, and anyway this style of programming does not lend itself to “drilling down” through the object hierarchy unless one is interested in the way things work. 4. Languages that are a combination of:: a database definition language such as SQL; traditional procedural code; user-interface code, loosely based on the Windows event-driven model. © DCCH/UniS/533581878/ 08-Mar-16 rev 13 Page 1/4 Computer scientists, who obviously developed their own languages, amongst which advanced object-oriented languages played in increasing part Researchers into AI and other “non-procedural” areas – as well as developing their own tools, they tended to use declarative (non-procedural) languages such as Lisp and Prolog. Amateur programmers – not a pejorative term. They tend to be users who are subject specialists for whom the programming language is a quick and easy tool. Basic was the predominant language. Web developers – an unknown breed at the beginning of the decade, but increasing in importance. Phase two The situation towards the end of the 90s and into the 00s has changed. Basic in particular has developed to become a more-or-less fully functional language. The web is rapidly becoming the predominant model for new applications, and represents a fundamental paradigm shift. Web-based programming tools are developing rapidly, and will soon be classed within the mainstream languages. Java, still an experimental tool with a debatable future 1-2 years ago is now the in the greatest demand (demand for C++ programmers has diminished by ~20% over the past years, whereas competent Java programmers can command £800 per day. Microsoft has developed a new suite of .NET tools, with a new language C# (which bears an uncanny resemblance to Java), and at last a fully object-oriented Visual Basic (which has developed into the “workhorse” language amongst all but dedicated professional programmers and sometimes amongst them too). Web-based languages (based on HTML but moving rapidly beyond) are another fast-growing area. An important development is XML – based on HTML, but giving structure to data by use of data definition tags. Thus, it is possible to extract and manipulate data by name, rather than knowledge of its position within a document. Allied to these are scripting languages, which are not restricted to web development, but play an important part in “gluing” together applications. Java, while nominally web-based, can also be used for local applications; Corel, always a pioneer, decided to rewrite its Office application suite in Java – prematurely as it turned out, but it makes a point. We shall examine these developments by language category. Categories of language Database languages The predominant database language was, and is, SQL. Specifically it is designed for specifying and manipulating relational databases (still the predominant model), and is described in PLPP (p385) as “intermediate between the relational algebra and calculus”. The objects (in the loose sense) on which SQL operates are database relations. See also under Linking of program units and “objects” below. Note that such languages are often embedded – that is, a traditional language (such as C++) is enhanced to include database operations. Often, the code would be fed to a pre-processor, which would translate the code into straight C++ with added library calls.2 The alternative to embedded SQL is the 4GL, as described above under Phase one. 2 Out of interest, such an approach was used with the original version of C++, which was translated by a pre-processor into C. It is said that this accounts for both the strengths and shortcomings of the language. © DCCH/UniS/533581878/ 08-Mar-16 rev 13 Page 2/4 Daniel C C Hamm UniS Unix languages There is a raft of languages, originally developed for Unix but now generally available, which are primarily designed for text manipulation. The Unix shell script languages (based on various shells or command processors: e.g. the Korn and C shells) are the starting point. Shell scripts Shell scripts rely on: a basic set of operators supplied by the shell (not the operating system, as is sometimes confusingly stated, but obviously dependent on features within the operating system) – piping and redirection being the most common; a fairly basic syntax (allowing looping, parameter substitution etc); the existence of a wide range of tools or “filters” which take standard input and produce standard output (many supplied by the operating system, but user-definable also); the pervasiveness of certain constructs, such as the regular expression, throughout the system – one of the most important of which is the regular expression. Shell scripts are “fairly unique” to Unix, in the sense that they do not transport well. There are scripting languages on other systems; Rex on OS/2 being a lamented exemplar (although it has migrated onto NT and other systems). The notorious COMMAND.COM in DOS and Win9X is so primitive as to be virtually useless, although it has its devotees. COMMAND.CMD, under WinNT, employs virtually the same syntax but works properly (e.g. scripts can be chained together by pipes or redirection, which they cannot be under Win9X). Scripting languages While the Unix shell script is unique to Unix, the scripting languages it has spawned (sed, awk, PERL) have migrated successfully onto other platforms, most notably PERL on Windows NT. Computer scientists languages It goes without saying that computer scientists develop their own languages to illustrate or explore various concepts. AI languages AI languages – languages for handling expert system, natural language translation, neural networks, etc., tend to be based on the declarative programming model, which encompasses Lisp, Prolog etc. Logic Programming (as in Prolog) is especially apposite for inference engines. “Amateur” languages If there is a single language which has traditionally been used by non-professional programmers, it is Basic. Basic has evolved enormously since Dartmouth College in the 60s; it is now impossible to talk about the language without mentioning the implementation. By far the most widely used is Microsoft Basic, in its Visual Basic form. Even though it has been available for many years, it is only now (with the next .NET release) that it is fully OO, fully compiled, and almost on a par for all except systems work with C++. It is often a choice with subject experts who wish to implement their understanding without becoming enmired in a “proper” programming language. In recent years, its relative slowness and appetite for memory has becoming increasingly unimportant as computer capacity has increased. Web languages Unsurprisingly, languages specifically for programming the Web have developed enormously in recent years. HTML This was the original document markup language3, and therefore not a programming language as such. Scripting languages These have always been around (see Unix languages above and The Filter model (scripting languages) below). Some existing languages, notably PERL, have been found ideal for developing CGI scripts. 3 A markup language is generally used for marking text with non textual features – layout, colour, fonts etc. © DCCH/UniS/533581878/ 08-Mar-16 rev 13 Page 3/4 Others, such as VBScript and Jscript (Microsoft) and JavaScript (the world) have been developed specifically. Since they can be embedded directly within HTML, in effect they turn the latter into more of a conventional programming language. PERL (Practical Extraction and Report Language) is more than just a scripting language; it was first developed in 1987 for monitoring large software projects and generate reports. It is now primarily used for server-side Web programming, and is continually evolving to meet the demands of this environment. Linking of program units and “objects” The usefulness of incorporating other programs or program units has long been recognised as a useful way of proceeding. There are two models – the filter model and the object model (my own terms), associated respectively with Unix (70s onwards) and GUI-based environments (exemplared but not restricted to) Windows (mid-90s onwards). The Filter model (scripting languages) As always, the Unix community did things 20 years before the rest of the world. Look at the following Perl program fragment (in fact, a complete program, though not particularly useful): @list = `dir c:\\ \/s`; foreach $n (@list) { if ($n =~ /BAT.*/ ) {print "$n"} }; What this program does is to examine the program output from the directory listing, process it line by line, and print all lines containing the string BAT. Note the existence of the regular expression string /BAT.*/ The section between the backward quotes (`…`) represents an instruction to the operating system to spawn a new process (dir) and send the output to standard output where it is picked up by the assignment to @list. The rest of the program then operates on this output. This technology has been around for many years, and the same technique is used in embedded languages such as SQL. They generally have to be passed through a pre-processor. The same operation can be carried out C:\Mks1\BIN>dir C:\ | awk /BAT/ The effect is the same, although the interpretation of the regular expression by the language is subtly different. Note that both examples could be wrapped up in a shell script, say findbat, which would behave in the same way regardless of the output. Important note The regular expression syntax describes a metalanguage for specifying strings which are wellformed regular expressions. It does not of itself specify the meaning of operators and delimiters, such as / above. The GUI model ((D)COM and CORBA) The GUI (Windows) model is based on the linking of objects. It is appropriate for use when the relation between the program unit being linked and the main program is not simply a question of streaming data into and out of the object. There are two standards – (D)COM and CORBA – which are in use, roughly on Windows and Unix platforms respectively. COM is the Common Object Model (with D for distributed, i.e. across more than one system), CORBA is the Common Object Request Broker Architecture. These different implementations are very similar in intent. © DCCH/UniS/533581878/ 08-Mar-16 rev 13 Page 4/4