Saturday, January 16, 2010

Syntax

Unlike languages such as FORTRAN 77, C source code is free-form which allows arbitrary use of whitespace to format code, rather than column-based or text-line-based restrictions. Comments may appear either between the delimiters /* and */, or (in C99) following // until the end of the line.

Each source file contains declarations and function definitions. Function definitions, in turn, contain declarations and statements. Declarations either define new types using keywords such as struct, union, and enum, or assign types to and perhaps reserve storage for new variables, usually by writing the type followed by the variable name. Keywords such as char and int specify built-in types. Sections of code are enclosed in braces ({ and }, sometimes called "curly brackets") to limit the scope of declarations and to act as a single statement for control structures.

As an imperative language, C uses statements to specify actions. The most common statement is an expression statement, consisting of an expression to be evaluated, followed by a semicolon; as a side effect of the evaluation, functions may be called and variables may be assigned new values. To modify the normal sequential execution of statements, C provides several control-flow statements identified by reserved keywords. Structured programming is supported by if(-else) conditional execution and by do-while, while, and for iterative execution (looping). The for statement has separate initialization, testing, and reinitialization expressions, any or all of which can be omitted. break and continue can be used to leave the innermost enclosing loop statement or skip to its reinitialization. There is also a non-structured goto statement which branches directly to the designated label within the function. switch selects a case to be executed based on the value of an integer expression.

Expressions can use a variety of built-in operators (see below) and may contain function calls. The order in which arguments to functions and operands to most operators are evaluated is unspecified. The evaluations may even be interleaved. However, all side effects (including storage to variables) will occur before the next "sequence point"; sequence points include the end of each expression statement, and the entry to and return from each function call. Sequence points also occur during evaluation of expressions containing certain operators(&&, ||, ?: and the comma operator). This permits a high degree of object code optimization by the compiler, but requires C programmers to take more care to obtain reliable results than is needed for other programming languages.

Although mimicked by many languages because of its widespread familiarity, C's syntax has often been criticized. For example, Kernighan and Ritchie say in the Introduction of The C Programming Language, "C, like any other language, has its blemishes. Some of the operators have the wrong precedence; some parts of the syntax could be better."

Some specific problems worth noting are:

  • Not checking number and types of arguments when the function declaration has an empty parameter list. (This provides backward compatibility with K&R C, which lacked prototypes.)
  • Some questionable choices of operator precedence, as mentioned by Kernighan and Ritchie above, such as == binding more tightly than & and | in expressions like x & 1 == 0.
  • The use of the = operator, used in mathematics for equality, to indicate assignment, following the precedent of Fortran, PL/I, and BASIC, but unlike ALGOL and its derivatives. Ritchie made this syntax design decision consciously, based primarily on the argument that assignment occurs more often than comparison.
  • Similarity of the assignment and equality operators (= and ==), making it easy to accidentally substitute one for the other. C's weak type system permits each to be used in the context of the other without a compilation error (although some compilers produce warnings). For example, the conditional expression in if (a=b) is only true if a is not zero after the assignment.[8]
  • A lack of infix operators for complex objects, particularly for string operations, making programs which rely heavily on these operations (implemented as functions instead) somewhat difficult to read.
  • A declaration syntax that some find unintuitive, particularly for function pointers. (Ritchie's idea was to declare identifiers in contexts resembling their use: "declaration reflects use".)

Operators

C supports a rich set of operators, which are symbols used within an expression to specify the manipulations to be performed while evaluating that expression. C has operators for:

  • arithmetic (+, -, *, /, %)
  • equality testing (==, !=)
  • order relations (<, <=, >, >=)
  • boolean logic (!, &&, ||)
  • bitwise logic (~, &, |, ^)
  • bitwise shifts (<<, >>)
  • assignment (=, +=, -=, *=, /=, %=, &=, |=, ^=, <<=, >>=)
  • increment and decrement (++, --)
  • reference and dereference (&, *, [ ])
  • conditional evaluation (? :)
  • member selection (., ->)
  • type conversion (( ))
  • object size (sizeof)
  • function argument collection (( ))
  • sequencing (,)
  • subexpression grouping (( ))

C has a formal grammar, specified by the C standard.

"Hello, world" example

The "hello, world" example which appeared in the first edition of K&R has become the model for an introductory program in most programming textbooks, regardless of programming language. The program prints "hello, world" to the standard output, which is usually a terminal or screen display.

The original version was:

main()
{
printf("hello, world\n");
}

A standard-conforming "hello, world" program is:[9]

#include 

int main(void)
{
printf("hello, world\n");
return 0;
}

The first line of the program contains a preprocessing directive, indicated by #include. This causes the preprocessor — the first tool to examine source code as it is compiled — to substitute the line with the entire text of the stdio.h standard header, which contains declarations for standard input and output functions such as printf. The angle brackets surrounding stdio.h indicate that stdio.h is located using a search strategy that prefers standard headers to other headers having the same name. Double quotes may also be used to include local or project-specific header files.

The next line indicates that a function named main is being defined. The main function serves a special purpose in C programs: The run-time environment calls the main function to begin program execution. The type specifier int indicates that the return value, the value that is returned to the invoker (in this case the run-time environment) as a result of evaluating the main function, is an integer. The keyword void as a parameter list indicates that the main function takes no arguments.[10]

The opening curly brace indicates the beginning of the definition of the main function.

The next line calls (diverts execution to) a function named printf, which was declared in stdio.h and is supplied from a system library. In this call, the printf function is passed (provided with) a single argument, the address of the first character in the string literal "hello, world\n". The string literal is an unnamed array with elements of type char, set up automatically by the compiler with a final 0-valued character to mark the end of the array (printf needs to know this). The \n is an escape sequence that C translates to a newline character, which on output signifies the end of the current line. The return value of the printf function is of type int, but it is silently discarded since it is not used. (A more careful program might test the return value to determine whether or not the printf function succeeded.) The semicolon ; terminates the statement.

The return statement terminates the execution of the main function and causes it to return the integer value 0, which is interpreted by the run-time system as an exit code indicating successful execution.

The closing curly brace indicates the end of the code for the main function.

No comments:

Post a Comment