Real-world languages

advertisement
Daniel C C Hamm
UniS
CS380 – Week 10 Notes
Real-World Languages
Introduction _________________________________________________________________ 1
Phase one __________________________________________________________________________ 1
Phase two _________________________________________________________________________ 2
Categories of language ________________________________________________________ 2
Database languages _________________________________________________________________ 2
Unix languages _____________________________________________________________________ 3
Shell scripts ______________________________________________________________________________ 3
Scripting languages ________________________________________________________________________ 3
Computer scientists languages ________________________________________________________ 3
AI languages _______________________________________________________________________ 3
“Amateur” languages________________________________________________________________ 3
Web languages _____________________________________________________________________ 3
Linking of program units and “objects”___________________________________________ 4
The Filter model ____________________________________________________________________ 4
Important note ____________________________________________________________________________ 4
The GUI model _____________________________________________________________________ 4
Introduction
We have been looking at the main families of languages, and main programming paradigms. However,
much of the work done in “real life” is outside these paradigms. We can divide the development of
these languages into two phases: that predominating in the early to mid 90s, and the model which is
began to emerge in the late 90s and is developing rapidly in the first couple of years of the new
millennium.
Phase one
While C++ was the predominant language in the 90s amongst professional programmers, even this was
true only upto a point. The following categories of usage have always been outside the mainstream:
1

Database users – tended to use their own tools, encapsulated within so called 4GLs
(4th Generation Languages1), that came with database systems.

Unix administrators and “power users” – the Unix shell is a powerful programming language
in its own right, and is largely a matrix in which other “filters” are incorporated.
The generations are as follows:
1. Direct machine code (numbers had to by typed or, more likely entered by toggle switches, on a machine console).
2. Assembly language – a symbolic language where instructions had a 1 to 1 correspondence with the equivalent
machine code instructions.
3. Traditional procedural languages, in which a high level instruction had a direct mapping onto one or more
machine code instructions. Of course, complex control structures did not necessarily map in quite as
straightforward a fashion.
Moreover, the object model introduces new confusion – at the lowest level of code there is still a direct
relationship between program code and machine code, but a programmer does not necessarily have access to the
source code of an object, and anyway this style of programming does not lend itself to “drilling down” through
the object hierarchy unless one is interested in the way things work.
4. Languages that are a combination of::
a database definition language such as SQL;
traditional procedural code;
user-interface code, loosely based on the Windows event-driven model.
© DCCH/UniS/533581878/ 08-Mar-16 rev 13
Page 1/4

Computer scientists, who obviously developed their own languages, amongst which advanced
object-oriented languages played in increasing part

Researchers into AI and other “non-procedural” areas – as well as developing their own
tools, they tended to use declarative (non-procedural) languages such as Lisp and Prolog.

Amateur programmers – not a pejorative term. They tend to be users who are subject
specialists for whom the programming language is a quick and easy tool. Basic was the
predominant language.

Web developers – an unknown breed at the beginning of the decade, but increasing in
importance.
Phase two
The situation towards the end of the 90s and into the 00s has changed. Basic in particular has
developed to become a more-or-less fully functional language. The web is rapidly becoming the
predominant model for new applications, and represents a fundamental paradigm shift.

Web-based programming tools are developing rapidly, and will soon be classed within the
mainstream languages.

Java, still an experimental tool with a debatable future 1-2 years ago is now the in the greatest
demand (demand for C++ programmers has diminished by ~20% over the past years, whereas
competent Java programmers can command £800 per day.

Microsoft has developed a new suite of .NET tools, with a new language C# (which bears an
uncanny resemblance to Java), and at last a fully object-oriented Visual Basic (which has
developed into the “workhorse” language amongst all but dedicated professional programmers
and sometimes amongst them too).

Web-based languages (based on HTML but moving rapidly beyond) are another fast-growing
area. An important development is XML – based on HTML, but giving structure to data by use
of data definition tags. Thus, it is possible to extract and manipulate data by name, rather than
knowledge of its position within a document.

Allied to these are scripting languages, which are not restricted to web development, but play
an important part in “gluing” together applications.

Java, while nominally web-based, can also be used for local applications; Corel, always a
pioneer, decided to rewrite its Office application suite in Java – prematurely as it turned out, but
it makes a point.
We shall examine these developments by language category.
Categories of language
Database languages
The predominant database language was, and is, SQL. Specifically it is designed for specifying and
manipulating relational databases (still the predominant model), and is described in PLPP (p385) as
“intermediate between the relational algebra and calculus”. The objects (in the loose sense) on which
SQL operates are database relations. See also under Linking of program units and “objects” below.
Note that such languages are often embedded – that is, a traditional language (such as C++) is enhanced
to include database operations. Often, the code would be fed to a pre-processor, which would translate
the code into straight C++ with added library calls.2
The alternative to embedded SQL is the 4GL, as described above under Phase one.
2
Out of interest, such an approach was used with the original version of C++, which was translated by a pre-processor
into C. It is said that this accounts for both the strengths and shortcomings of the language.
© DCCH/UniS/533581878/ 08-Mar-16 rev 13
Page 2/4
Daniel C C Hamm
UniS
Unix languages
There is a raft of languages, originally developed for Unix but now generally available, which are
primarily designed for text manipulation. The Unix shell script languages (based on various shells or
command processors: e.g. the Korn and C shells) are the starting point.
Shell scripts
Shell scripts rely on:

a basic set of operators supplied by the shell (not the operating system, as is sometimes confusingly
stated, but obviously dependent on features within the operating system) – piping and redirection
being the most common;

a fairly basic syntax (allowing looping, parameter substitution etc);

the existence of a wide range of tools or “filters” which take standard input and produce standard
output (many supplied by the operating system, but user-definable also);

the pervasiveness of certain constructs, such as the regular expression, throughout the system – one
of the most important of which is the regular expression.
Shell scripts are “fairly unique” to Unix, in the sense that they do not transport well. There are scripting
languages on other systems; Rex on OS/2 being a lamented exemplar (although it has migrated onto NT
and other systems). The notorious COMMAND.COM in DOS and Win9X is so primitive as to be
virtually useless, although it has its devotees. COMMAND.CMD, under WinNT, employs virtually the
same syntax but works properly (e.g. scripts can be chained together by pipes or redirection, which they
cannot be under Win9X).
Scripting languages
While the Unix shell script is unique to Unix, the scripting languages it has spawned (sed, awk, PERL)
have migrated successfully onto other platforms, most notably PERL on Windows NT.
Computer scientists languages
It goes without saying that computer scientists develop their own languages to illustrate or explore
various concepts.
AI languages
AI languages – languages for handling expert system, natural language translation, neural networks,
etc., tend to be based on the declarative programming model, which encompasses Lisp, Prolog etc.
Logic Programming (as in Prolog) is especially apposite for inference engines.
“Amateur” languages
If there is a single language which has traditionally been used by non-professional programmers, it is
Basic. Basic has evolved enormously since Dartmouth College in the 60s; it is now impossible to talk
about the language without mentioning the implementation. By far the most widely used is Microsoft
Basic, in its Visual Basic form. Even though it has been available for many years, it is only now (with
the next .NET release) that it is fully OO, fully compiled, and almost on a par for all except systems
work with C++. It is often a choice with subject experts who wish to implement their understanding
without becoming enmired in a “proper” programming language. In recent years, its relative slowness
and appetite for memory has becoming increasingly unimportant as computer capacity has increased.
Web languages
Unsurprisingly, languages specifically for programming the Web have developed enormously in recent
years.
HTML
This was the original document markup language3, and therefore not a programming language as such.
Scripting languages
These have always been around (see Unix languages above and The Filter model (scripting languages)
below). Some existing languages, notably PERL, have been found ideal for developing CGI scripts.
3
A markup language is generally used for marking text with non textual features – layout, colour, fonts etc.
© DCCH/UniS/533581878/ 08-Mar-16 rev 13
Page 3/4
Others, such as VBScript and Jscript (Microsoft) and JavaScript (the world) have been developed
specifically. Since they can be embedded directly within HTML, in effect they turn the latter into more
of a conventional programming language.
PERL (Practical Extraction and Report Language) is more than just a scripting language; it was first
developed in 1987 for monitoring large software projects and generate reports. It is now primarily used
for server-side Web programming, and is continually evolving to meet the demands of this environment.
Linking of program units and “objects”
The usefulness of incorporating other programs or program units has long been recognised as a useful
way of proceeding. There are two models – the filter model and the object model (my own terms),
associated respectively with Unix (70s onwards) and GUI-based environments (exemplared but not
restricted to) Windows (mid-90s onwards).
The Filter model (scripting languages)
As always, the Unix community did things 20 years before the rest of the world. Look at the following
Perl program fragment (in fact, a complete program, though not particularly useful):
@list = `dir c:\\ \/s`;
foreach $n (@list) {
if ($n =~ /BAT.*/ )
{print "$n"} };
What this program does is to examine the program output from the directory listing, process it line by
line, and print all lines containing the string BAT. Note the existence of the regular expression string
/BAT.*/
The section between the backward quotes (`…`) represents an instruction to the operating system to
spawn a new process (dir) and send the output to standard output where it is picked up by the
assignment to @list. The rest of the program then operates on this output. This technology has been
around for many years, and the same technique is used in embedded languages such as SQL. They
generally have to be passed through a pre-processor.
The same operation can be carried out
C:\Mks1\BIN>dir C:\ | awk /BAT/
The effect is the same, although the interpretation of the regular expression by the language is subtly
different.
Note that both examples could be wrapped up in a shell script, say findbat, which would behave in
the same way regardless of the output.
Important note
The regular expression syntax describes a metalanguage for specifying strings which are wellformed regular expressions. It does not of itself specify the meaning of operators and
delimiters, such as / above.
The GUI model ((D)COM and CORBA)
The GUI (Windows) model is based on the linking of objects. It is appropriate for use when the relation
between the program unit being linked and the main program is not simply a question of streaming data
into and out of the object. There are two standards – (D)COM and CORBA – which are in use, roughly
on Windows and Unix platforms respectively. COM is the Common Object Model (with D for
distributed, i.e. across more than one system), CORBA is the Common Object Request Broker
Architecture. These different implementations are very similar in intent.
© DCCH/UniS/533581878/ 08-Mar-16 rev 13
Page 4/4
Download