C is an imperative (procedural) systems implementation language. It was designed to be compiled using a relatively straightforward compiler, to provide low-level access to memory, to provide language constructs that map efficiently to machine instructions, and to require minimal run-time support. C was therefore useful for many applications that had formerly been coded in assembly language.
Despite its low-level capabilities, the language was designed to encourage machine-independent programming. A standards-compliant and portably written C program can be compiled for a very wide variety of computer platforms and operating systems with little or no change to its source code. The language has become available on a very wide range of platforms, from embedded microcontrollers to supercomputers.
Minimalism
C's design is tied to its intended use as a portable systems implementation language. It provides simple, direct access to any addressable object (for example, memory-mapped device control registers), and its source-code expressions can be translated in a straightforward manner to primitive machine operations in the executable code. Some early C compilers were comfortably implemented (as a few distinct passes communicating via intermediate files) on PDP-11 processors having only 16 address bits. C compilers for several common 8-bit platforms have been implemented as well.
Characteristics
Like most imperative languages in the ALGOL tradition, C has facilities for structured programming and allows lexical variable scope and recursion, while a static type system prevents many unintended operations. In C, all executable code is contained within functions. Function parameters are always passed by value. Pass-by-reference is simulated in C by explicitly passing pointer values. Heterogeneous aggregate data types (struct) allow related data elements to be combined and manipulated as a unit. C program source text is free-format, using the semicolon as a statement terminator (not a delimiter).
C also exhibits the following more specific characteristics:
- lack of nested function definitions
- variables may be hidden in nested blocks
- partially weak typing; for instance, characters can be used as integers
- low-level access to computer memory by converting machine addresses to typed pointers
- function and data pointers supporting ad hoc run-time polymorphism
- array indexing as a secondary notion, defined in terms of pointer arithmetic
- a preprocessor for macro definition, source code file inclusion, and conditional compilation
- complex functionality such as I/O, string manipulation, and mathematical functions consistently delegated to library routines
- A relatively small set of reserved keywords
- A lexical structure that resembles B more than ALGOL, for example:
{ ... }rather than either of ALGOL 60'sbegin ... endor ALGOL 68's( ... )- the equal-sign is for assignment (copying), more like Fortran, than like ALGOL's ":=" assignment.
- two consecutive equal-signs are to test for equality (compare to
.EQ.in Fortran or the equal-sign in BASIC and ALGOL) &&and||in place of ALGOL's "∧" (AND) and "∨" (OR) (these are semantically distinct from the bit-wise operators&and|because they will never evaluate the right operand if the result can be determined from the left alone (short-circuit evaluation)).- However Unix Version 6 & 7 versions of C indeed did use ALGOL's /\ and \/ ASCII operators, but for determining Infimum and Supremum respectively.[1]
- a large number of compound operators, such as
+=,++, etc. (Equivalent to ALGOL 68's+:=and+:=1operators)
Absent features
The relatively low-level nature of the language affords the programmer close control over what the computer does, while allowing special tailoring and aggressive optimization for a particular platform. This allows the code to run efficiently on very limited hardware, such as embedded systems.
C does not have some features that are available in some other programming languages:
- No direct assignment of arrays or strings (copying can be done via standard functions; assignment of objects having
structoruniontype is supported) - No automatic garbage collection
- No requirement for bounds checking of arrays
- No operations on whole arrays
- No syntax for ranges, such as the
A..Bnotation used in several languages - Prior to C99, no separate Boolean type (zero/nonzero is used instead)[6]
- No formal closures or functions as parameters (only function and variable pointers)
- No generators or coroutines; intra-thread control flow consists of nested function calls, except for the use of the longjmp or setcontext library functions
- No exception handling; standard library functions signify error conditions with the global
errnovariable and/or special return values - Only rudimentary support for modular programming
- No compile-time polymorphism in the form of function or operator overloading
- Only rudimentary support for generic programming
- Very limited support for object-oriented programming with regard to polymorphism and inheritance
- Limited support for encapsulation
- No native support for multithreading and networking
- No standard libraries for computer graphics and several other application programming needs
A number of these features are available as extensions in some compilers, or can be supplied by third-party libraries, or can be simulated by adopting certain coding disciplines.
Undefined behavior
Many operations in C that have undefined behavior are not required to be diagnosed at compile time. In the case of C, "undefined behavior" means that the exact behavior which arises is not specified by the standard, and exactly what will happen does not have to be documented by the C implementation. A famous, although misleading, expression in the newsgroups comp.std.c and comp.lang.c is that the program could cause "demons to fly out of your nose".[7] Sometimes in practice what happens for an instance of undefined behavior is a bug that is hard to track down and which may corrupt the contents of memory. Sometimes a particular compiler generates reasonable and well-behaved actions that are completely different from those that would be obtained using a different C compiler. The reason some behavior has been left undefined is to allow compilers for a wide variety of instruction set architectures to generate more efficient executable code for well-defined behavior, which was deemed important for C's primary role as a systems implementation language; thus C makes it the programmer's responsibility to avoid undefined behavior, possibly using tools to find parts of a program whose behavior is undefined. Examples of undefined behavior are:
- accessing outside the bounds of an array
- overflowing a signed integer
- reaching the end of a non-void function without finding a return statement, when the return value is used
- reading the value of a variable before initializing it
These operations are all programming errors that could occur using many programming languages; C draws criticism because its standard explicitly identifies numerous cases of undefined behavior, including some where the behavior could have been made well defined, and does not specify any run-time error handling mechanism.
Invoking fflush() on a stream opened for input is an example of a different kind of undefined behavior, not necessarily a programming error but a case for which some conforming implementations may provide well-defined, useful semantics (in this example, presumably discarding input through the next new-line) as an allowed extension. Use of such nonstandard extensions generally limits software portability.
History
Early developments
The initial development of C occurred at AT&T Bell Labs between 1969 and 1973; according to Ritchie, the most creative period occurred in 1972. It was named "C" because many of its features were derived from an earlier language called "B", which according to Ken Thompson was a stripped-down version of the BCPL programming language.
The origin of C is closely tied to the development of the Unix operating system, originally implemented in assembly language on a PDP-7 by Ritchie and Thompson, incorporating several ideas from colleagues. Eventually they decided to port the operating system to a PDP-11. B's lack of functionality to take advantage of some of the PDP-11's features, notably byte addressability, led to the development of an early version of the C programming language.
The original PDP-11 version of the Unix system was developed in assembly language. By 1973, with the addition of struct types, the C language had become powerful enough that most of the Unix kernel was rewritten in C. This was one of the first operating system kernels implemented in a language other than assembly. (Earlier instances include the Multics system (written in PL/I), and MCP (Master Control Program) for the Burroughs B5000 written in ALGOL in 1961.)
K&R C
In 1978, Brian Kernighan and Dennis Ritchie published the first edition of The C Programming Language. This book, known to C programmers as "K&R", served for many years as an informal specification of the language. The version of C that it describes is commonly referred to as K&R C. The second edition of the book covers the later ANSI C standard.
K&R introduced several language features:
- standard I/O library
long intdata typeunsigned intdata type- compound assignment operators of the form
=op (such as=-) were changed to the form op=to remove the semantic ambiguity created by such constructs asi=-10, which had been interpreted asi =- 10instead of the possibly intendedi = -10
Even after the publication of the 1989 C standard, for many years K&R C was still considered the "lowest common denominator" to which C programmers restricted themselves when maximum portability was desired, since many older compilers were still in use, and because carefully written K&R C code can be legal Standard C as well.
In early versions of C, only functions that returned a non-integer value needed to be declared if used before the function definition; a function used without any previous declaration was assumed to return an integer, if its value was used.
For example:
long int SomeFunction();
/* int OtherFunction(); */
/* int */ CallingFunction()
{
long int test1;
register /* int */ test2;
test1 = SomeFunction();
if (test1 > 0)
test2 = 0;
else
test2 = OtherFunction();
return test2;
}
All the above commented-out int declarations could be omitted in K&R C.
Since K&R function declarations did not include any information about function arguments, function parameter type checks were not performed, although some compilers would issue a warning message if a local function was called with the wrong number of arguments, or if multiple calls to an external function used different numbers or types of arguments. Separate tools such as Unix's lint utility were developed that (among other things) could check for consistency of function use across multiple source files.
In the years following the publication of K&R C, several unofficial features were added to the language, supported by compilers from AT&T and some other vendors. These included:
voidfunctions- functions returning
structoruniontypes (rather than pointers) - assignment for
structdata types - enumerated types
The large number of extensions and lack of agreement on a standard library, together with the language popularity and the fact that not even the Unix compilers precisely implemented the K&R specification, led to the necessity of standardization.
[edit] ANSI C and ISO C
During the late 1970s and 1980s, versions of C were implemented for a wide variety of mainframe computers, minicomputers, and microcomputers, including the IBM PC, as its popularity began to increase significantly.
In 1983, the American National Standards Institute (ANSI) formed a committee, X3J11, to establish a standard specification of C. In 1989, the standard was ratified as ANSI X3.159-1989 "Programming Language C." This version of the language is often referred to as ANSI C, Standard C, or sometimes C89.
In 1990, the ANSI C standard (with formatting changes) was adopted by the International Organization for Standardization (ISO) as ISO/IEC 9899:1990, which is sometimes called C90. Therefore, the terms "C89" and "C90" refer to the same programming language.
ANSI, like other national standards bodies, no longer develops the C standard independently, but defers to the ISO C standard. National adoption of updates to the international standard typically occurs within a year of ISO publication.
One of the aims of the C standardization process was to produce a superset of K&R C, incorporating many of the unofficial features subsequently introduced. The standards committee also included several additional features such as function prototypes (borrowed from C++), void pointers, support for international character sets and locales, and preprocessor enhancements. The syntax for parameter declarations was also augmented to include the style used in C++, although the K&R interface continued to be permitted, for compatibility with existing source code.
C89 is supported by current C compilers, and most C code being written nowadays is based on it. Any program written only in Standard C and without any hardware-dependent assumptions will run correctly on any platform with a conforming C implementation, within its resource limits. Without such precautions, programs may compile only on a certain platform or with a particular compiler, due, for example, to the use of non-standard libraries, such as GUI libraries, or to a reliance on compiler- or platform-specific attributes such as the exact size of data types and byte endianness.
In cases where code must be compilable by either standard-conforming or K&R C-based compilers, the __STDC__ macro can be used to split the code into Standard and K&R sections to take advantage of features available only in Standard C.
[edit] C99
After the ANSI/ISO standardization process, the C language specification remained relatively static for some time, whereas C++ continued to evolve, largely during its own standardization effort. In 1995 Normative Amendment 1 to the 1990 C standard was published, to correct some details and to add more extensive support for international character sets. The C standard was further revised in the late 1990s, leading to the publication of ISO/IEC 9899:1999 in 1999, which is commonly referred to as "C99." It has since been amended three times by Technical Corrigenda. The international C standard is maintained by the working group ISO/IEC JTC1/SC22/WG14.
C99 introduced several new features, including inline functions, several new data types (including long long int and a complex type to represent complex numbers), variable-length arrays, support for variadic macros (macros of variable arity) and support for one-line comments beginning with //, as in BCPL or C++. Many of these had already been implemented as extensions in several C compilers.
C99 is for the most part backward compatible with C90, but is stricter in some ways; in particular, a declaration that lacks a type specifier no longer has int implicitly assumed. A standard macro __STDC_VERSION__ is defined with value 199901L to indicate that C99 support is available. GCC, Sun Studio and other C compilers now support many or all of the new features of C99.
[edit] C1X
In 2007, work began in anticipation of another revision of the C standard, informally called "C1X". The C standards committee has adopted guidelines to limit the adoption of new features that have not been tested by existing implementations.
[edit] Uses
C's primary use is for "system programming", including implementing operating systems and embedded system applications, due to a combination of desirable characteristics such as code portability and efficiency, ability to access specific hardware addresses, ability to "pun" types to match externally imposed data access requirements, and low runtime demand on system resources.
One consequence of C's wide acceptance and efficiency is that compilers, libraries, and interpreters of other programming languages are often implemented in C.
C is sometimes used as an intermediate language by implementations of other languages. This approach may be used for portability or convenience; by using C as an intermediate language, it is not necessary to develop machine-specific code generators. Some compilers which use C this way are BitC, Gambit, the Glasgow Haskell Compiler, Squeak, and Vala.
Unfortunately, C was designed as a programming language, not as a compiler target language, and is thus less than ideal for use as an intermediate language. This has led to development of C-based intermediate languages such as C--.
C has also been widely used to implement end-user applications, but as applications became larger, much of that development shifted to other languages.
No comments:
Post a Comment