Wednesday, December 19, 2007

STRINGS IN C, A BRIEF REVIEW

A C-style string is defined as an array of characters that terminates in the null character. For example, in a C program the following would declare a string variable str with a storage allocation of 6 characters, the last character being reserved for the terminating null character:

char str[6];

If we wish to also initialize a string variable at the time it is declared, we can do so by

char str[5 + 1] = "hello";

or by

char str[] = "hello"; /* (A) */

where we have omitted the length of the array str. The double-quoted string of characters on the right-hand side, "hello", is called a string literal.[2] A string literal is a string constant, very much like the number 5 is an integer constant. Since a string literal is stored as an array of chars, the compiler represents it by the memory address of the first character, in the above case the address of the character h. More precisely, the type of a string literal is const char*.

We can also use a character pointer directly to represent a string, as in

char* str = "hello"; /* (B) */

which causes the address of the first character, h, of the string literal "hello" to be stored in the pointer variable str. Note that the declaration in (B) gives you direct access to the block of memory, that is read-only, in which the string literal is stored. On the other hand, the declaration in (A) copies the string literal from wherever it is stored into the designated array.

While we may declare a string variable to be an array of characters, as in the definition in line (A) above, or to be a character pointer, as in the definition in line (B), the two versions are not always interchangeable. In the array version, the individual characters can be modified, as would be the case with an array in general. However, with the pointer version, the individual characters of the string cannot be changed because a string literal, being of type const char*, is stored in a read-only section of the memory. The fact that a statement such as the one shown in line (B) is legal is because the compiler allows you to assign a const char* type to char* type. So whereas the pointer str in line (B) is of type char*, it is pointing to a block of read-only memory in which the string literal itself is stored.[3] For another difference between the string definitions in lines (A) and (B), the identifier str in the array version is the name of an array—it cannot be assigned values as it cannot serve as an lvalue. On the other hand, in the pointer version in line (B), str is a pointer variable that, during program execution, could be given any value of type char*.

We will now review briefly the frequently used functions in C that are provided by the string.h header file for performing operations on strings. These include strcmp whose prototype is given by

int strcmp( const char* argl, const char* arg2 );

for comparing two strings that are supplied to it as arg1 and arg2. It returns a value less than, equal to, or greater than 0 depending on whether arg1 is less than, equal to, or greater than arg2. Typically, ASCII character sets are used and strings are compared using the ASCII integer codes associated with the characters. For example, the following inequality is true for the one-character strings shown

strcmp( "A", "a" ) < 0

because the ASCII code for the character A is 65, whereas the ASCII code for a is 97, making the string literal "A" less than the string literal "a". Given this character by character comparison on the basis of ASCII codes, longer strings are compared using lexicographic ordering—an ordering that is akin to how words are arranged in a dictionary. For example, in lexicographic ordering, the string "abs" will occur before the string absent, so the former is less than the latter. However, the string Zebra will occur before the string debra, as the former is less than the latter because the ASCII codes for all uppercase letters, A through Z, occupy the range 65 through 90, whereas the codes for lowercase letters, a through z, occupy the range 97 through 122.

Another frequently used string function from the string.h header file is the strlen function for ascertaining the length of a string. This function has the following prototype:

size_t strlen( const char* arg );

where the return type, size_t, defined in the header file stddef.h, is usually either unsigned int or unsigned long int. For practically all cases, we can simply think of the value returned by strlen as an integer. To illustrate,

strlen( "hello" )

returns 5. Note that the integer count returned by strlen does not include the terminating null character.

Another very useful C function for dealing with strings is

char* strcpy( char* arg1, const char* arg2 );

which copies the characters from the string arg2 into the memory locations pointed to by arg1. For illustration, we could say

char str1[6];
char* str2 = "hello";
strcpy( str1, str2 );

or, using the C memory allocation function malloc (),

char* str1 = (char*) malloc( 6 );
char* str2 = "hello";
strcpy( str1, str2 );

In both cases above, the string hello will be copied into the memory locations pointed to by the character pointer str1. The function strcpy () returns the pointer that is its first argument. However, in most programming, the value returned by strcpy () is ignored. The returned value can be useful in nested calls to this function [45, p. 252].

When one wants to join two strings together, the following function from the string.h header comes handy

char* strcat( char* arg1, const char* arg2 );

This function appends the string pointed to by arg2 to the string pointed to by arg1. For example,

char str1[8];
strcpy( str1, "hi" );
strcat( str1, "there" );

will cause the string hithere to be stored at the memory locations pointed to by str1. As with the strcpy () function, the string concatenation function returns the pointer to its first argument. But again as before, the returned value is usually ignored in most programming.

[2]The initialization syntax shown at (A) copies over the string literal stored in a read-only section of the memory into the array. Therefore, effectively, the declaration shown at (A) is equivalent to

char str[] = { 'h', 'e', '1', '!', 'o', '\0' } ;


[3]Some C and C++ compilers do allow a string literal to be modified through a pointer to which the string literal is assigned. For example, the following will work with some compilers:

char* str = "hello";
*str = 'j';

But modifying a string literal though a pointer in this manner could result in non-portable code. If you must modify a string literal, it is best to first copy it into an array that is stored at a location different from where the string literal itself is stored, as in

char str[] = "hello";
str[0] = 'j';

String literals being represented by const char* allows for code optimization, such as achieved by storing only one copy of each literal.
SOME COMMON SHORTCOMINGS OF C-STYLE STRINGS
C-style strings can be painful to use, especially after you have seen the more modern representations of strings in other languages. For starters, when invoking some of the most commonly used string library functions in C, such as a strcpy (), strcat (), and so on, you have to ensure that sufficient memory is allocated for the output string. This requirement, seemingly natural to those who do most of their programming in C, appears onerous after you have experienced the convenience of the modern string types.

Consider this sample code for the string type from the C++ Standard Library:

string str1 = "hi";
string str2 = "there";
string str3;
str3 = str1 + str2;

We are joining the strings str1 and str2 together and copying the resulting string into the string object str3. Using the operator + for joining two strings together seems very natural. More particularly, note that we do not worry about whether or not we have allocated sufficient memory for the new longer string. The system automatically ensures that the string object str3 has sufficient memory available to it for storing the new string, regardless of its length.

Now compare the above code fragment with the following fragment that tries to do the same thing but with C-style strings using commonly used functions for string processing in C:

char* str1 = "hi";
char* str2 = "there";
char* str3 = (char*) malloc( strlen( str1 ) + strlen( str2 ) + 1 );
strcpy( str3, str1 );
strcat( str3, str2 );

The syntax here is definitely more tortured. A visual examination of the code, if too hasty, can be confusing with regard to the purpose of the code. You have to remind yourself about the roles of the functions strcpy and strcat to comprehend what's going on. You also have to remember to allocate memory for str3—forgetting to do so is not as uncommon as one might like to think. What's worse, for proper memory allocation for str3 you have to remember to add 1 for the null terminator to the byte count obtained by adding the values returned by strlen for the strings str1 and str2. (Just imagine the disastrous consequences if you should forget!)

For another example of the low-level tedium involved and the potential for introducing bugs when using C-style strings, consider the following function:

void strip( char* q ) {
char* p = q + strlen( q ) -1; //(A)
while ( *p == ' ' && p >= q ) //(B)
*p-- = '0'; //(C)
}

which could be used to strip off blank space at the trailing end of a string. So in a call such as

char* str = (char*) malloc( 10 );
strcpy( str, "hello " );
strip( str ) ;

the function strip would erase the five blank space characters after "hello" in the string str. Going back to the definition of strip, in line (A) we first set the local pointer p to point to the last character in the string. In line (B), we dereference this pointer to make sure that a blank space is stored there and that we have not yet traversed all the way back to the beginning of the string. If both these conditions are satisfied, in line (C) we dereference the pointer again, setting its value equal to the null character, and subsequently decrement the pointer.[4] If someone were to write in a hurry the implementation code for strip, it is not inconceivable that they'd write it in the following form:

void strip( char* q ) {
char* p = q + strlen( q ) - 1;
while ( *p == ' ' ) //(D)
*p-- = '\0';
}

where in line (D) we have forgotten to make sure that that the local pointer p does not get decremented to a value before the start of the argument string. While this program would compile fine and would probably also give correct results much of the time, it could also cause exhibit unpredictable behavior. In programs such as this, one could also potentially forget to dereference a string pointer resulting in programs that would compile alright, but not run without crashing.

[4]Recall from C programming that the unary postfix increment operator, ‘–’, has a higher precedence than the indirection operator ‘*’. So the expression *ptr– in line (C) is parsed as *(ptr–). But because the decrement operator is postfix, the expression ptr– evaluates to ptr. Therefore, what gets dereferenced is ptr. It is only after the evaluation of the expression that ptr is decremented by the postfix decrement operator.


C++ STRINGS
C++ has a built-in type string that avoids the pitfalls of C-style strings.[5] Since many aspects of this type cannot be fully explained until we discuss concepts such as operator overloading, our goal in this section is limited to familiarizing the reader with some rudimentary aspects of this type to the extent that we can use it in some of the examples in this and later chapters. To use the C++ string type, you must include its associated header file:

#include

4.3.1 Constructing a C++ String Object
To declare a string with initialization, we can say

string str( "hi there");

which is a call to the constructor of the string class with "hi there" as its const char* argument. An alternative way to initialize a string is

string str = "hi there";

We can also use the following syntax:

string str = string( "hi there" );

We can think of the right-hand side here as constructing an anonymous string object that is then assigned to the variable str through what's known as the copy constructor for the string class.[6]. We also have the option of invoking the new operator to obtain a pointer to a string object:

string* p = new string( "hi there" );

An empty string can be declared in the following manner[7]

string str;

or as

string str = "";

These declarations create an object of class string whose name is str.[8] A principal feature of this object is that it stores inside it the string of characters specified by the initialization syntax. The stored string may or may not be null terminated. If in a particular implementation of C++, the string is not null terminated, that does not create a problem because also stored in the object is the exact length of the string. So there is never any question about how many memory locations would need to be accessed in order to read an entire string.

While the string constructor invocations illustrated above show us how to convert a const char* string into a string object, what about the opposite? How does one convert a C++ string object back into a C-style null-terminated string? This is done by invoking the c_str () member function for the string class:

string str( "hello" );
const char* c_string = str.c_str();

4.3.2 Accessing Individual Characters
The individual characters of a C++ string can be accessed for reading and writing by either using the subscript operator'[]' or the member function at (). The former is not range checked, while the latter is. What that means is that suppose you try to access a character position that does not really exist, what you get with the subscript operator is unpredictable, meaning implementation dependent. On the other hand, if you try to access a nonexistent position with the at () function, the program is guaranteed to abort. This is illustrated by the following program where we have commented out the line in which we invoke the at () function with an argument that is clearly outside the range valid for the “hello” string. If you uncomment this line, the program will abort at run time when the flow of control reaches that line. On the other hand, when we try to reach the same index with the subscript operator, we may see some garbage character displayed on the screen.

--------------------------------------------------------------------------------
// StringCharIndexing.

#include
using namespace std;

int main()
{
string str( "hello" );
0char ch = str[0]; // ch initialized to 'h'
str[0] = 'j'; // str now equals "jello"
ch = str. at( 0 ); // ch's value is now 'j'
str.at(0) = 'h'; // str agains equals "hello"
ch = str[ 1000 ]; // garbage value for ch
// ch = str.at( 1000 ); // program aborts if uncommented
return 0 ;
}
--------------------------------------------------------------------------------

4.3.3 String Comparison
Two strings can be compared for equality (or inequality) on the basis of the ASCII codes associated with the characters using the binary operators ‘==’, ‘! =’, ‘>’, ‘>=’, ‘<’, and ‘<=’. Two strings are equal if and only if they are composed of identical character sequences. A string is less than another string if the former occurs earlier in a lexicographic ordering of the strings on the basis of the ASCII codes associated with the characters.

While the operators listed above are all binary, in the sense that they return either true or false, sometimes it is more useful to employ a 3–valued comparison function, compare (), that is defined for the string class. Given two string objects str1 and str2, the invocation

str1.compare( str2 );

returns one of three possible values:

a positive value if str1 is greater than str2

0 if str1 is equal to str2

a negative value if str1 is less than str2

For example,

string str1( "abc" );
string str2( "abc123" );
if ( str1.compare( str2 ) == 0 ) // test returns false
.....
if ( str1.compare( str2 ) < 0 ) // test returns true
.....
if ( str1.compare( str2 ) > 0 ) // test rturns false
....

It is also possible to invoke compare with additional arguments that designate at what character position to start the comparison in the invoking string and how many characters to use from the argument string. In the following example, "hello" is the string that invokes compare on the argument string "ellolotion" in line (A). The second argument to compare in line (A)—in this case 1—designates the index at which the start the character comparisons in the string "hello". This means that the string comparison will begin at the letter ‘e’ of "hello". The third argument to compare in line (A) is 4; this is the number of characters from the string "ellolotion" that will be used for string comparison.

string str1("hello");
string str2("ellolotion");
if ( str1.compare( str2, 1, 4 ) == 0 ) //(A)
cout << "\nThe substring starting at index 1 "
"of 'hello' is the same as the first "
"four chars of 'ellolotion'."
<< endl; //(B)
else
cout << "The compare test failed" << endl;

For the example code shown, the comparison test in line (A) returns true and the message in the statement leading up to line (B) is printed out.

In the three-argument version of compare shown in line (A) above, the second argument is of type string:: size_type,[9] which for all practical purposes can be considered to be int, and the third argument of type unsigned int. There is also a two-argument version of compare in which the second argument plays the same role as in the example shown. Now the comparison is with the entire argument string. We should also mention that the compare function works just the same if its first argument is a C-style const char* string.[10]

A 3–valued string comparison function, such as the compare function, is what you'd need for certain kinds of string sorting functions. Let's say we wish to sort an array of string literals as shown below:

string wordList[] = {"hello", "halo", "jello", "yellow", //(C)
"mellow", "Hello", "JELLO", "Yello",
"MELLOW"};

Although later the reader will be introduced to the sorting functions designed expressly for C++, we can sort this array by using the venerated qsort function defined originally in the stdlib.h header file of the C standard library, but also made available through the header file string of C++. The function qsort, frequently an implementation of quick-sort, is capable of sorting an array of any data type as long as you are able to specify a comparison function for the elements of the array.[11] The prototype of qsort is

void qsort( void* base, //(D)
size_t nmemb,
size_t size,
int (* compar) ( const void*, const void* ) );

where base is a pointer to the first element of the array to be sorted, nmemb the number of elements to be sorted, [12] size the size of each element in bytes, [13] and, finally, compar a pointer to a user-defined function for comparing any two elements of the array. The user defined comparison function that will be bound to the parameter compar must return an int and must take exactly two arguments, both of type void*. Furthermore, for qsort() to work correctly, the int returned by the comparison function must be positive when the entity pointed to by the first argument is greater than the entity pointed to by the second argument; must be negative when the opposite is the case; and must be zero when the two entities are equal.

Here is a possible comparison function for the fourth argument of qsort for sorting the elements of the array wordList of line (C) above:[14]

int compareStrings( const void* arg1, const void* arg2 ) { //(E)
return ( *( static_cast( arg1 ) ) ). compare (
*( static_cast( arg2) ) );
}

In terms of the return type and the parameter structure, this comparison function corresponds exactly to what is specified for the fourth argument of qsort () in line (D). The actual comparison is carried out by invoking the compare function of the string class.

Shown below is a simple program that pulls together the code fragments shown above into a complete program:

--------------------------------------------------------------------------------
//Qsort.cc
#include
using namespace std;

int compareStrings( const void* arg1, const void* arg2 );
int checkUpperCase( string buffer );
int main()
{
string wordList[] = {"hello", "halo", "jello", "yellow",
"mellow", "Hello", "JELLO", "Yello",
"MELLOW"};
cout << sizeof( wordList[] << endl; // 36

int sizeArray = sizeof( wordList ) / sizeof( wordList[ 0 ] );
cout << sizeArray << endl; // 9

qsort( wordList, sizeArray , sizeof(string), compareStrings);
int j = 0;
while ( j < sizeArray )
cout << wordList[j++] << " ";
//Hello JELLO MELLOW Yello halo hello jello mellow yellow
cout << endl;
return 0;
}

int compareStrings( const void* arg1, const void* arg2 ) {
return ( *( static_cast( arg1 ) ) ).compare(
*( static_cast( arg2) ) );
}
--------------------------------------------------------------------------------

4.3.4 Joining Strings Together
Through the overloading of the ‘+’ operator, the string class makes it very easy to join strings together without having to worry whether or not you allocated sufficient memory for the result string.[15] For example, we can say

string str1( "hello" );
string str2( "there" );
string str3 = str1 + " " + str2; // "hello there"
str2 += str1; // "therehello"

which would result in the object str3 storing the string "hello there" and the object str2 storing the string "therehello". The operator ‘+’ works the same if the second operand is of type const char* or just char as long as the first operand is an object of type string.[16] So while the following will not work

string s = "hello" + " there"; // Wrong

the following does:

string s = string( "hello" ) + " there";

It is also possible to use the append member function for joining two strings, or one string with a part of another string, as the following example illustrates:

string string1( "hello" );
string string2( " the world at large" );
string string3 = string1;

string3.append( string2 ); //(A)
cout << string3; // "hello the world at large"

string1.append( string2, 3, 6 ); //(B)
cout << string1; // "hello world"

In the one-argument invocation of append in line (A), the entire argument string is appended to the invoking string. In the three-argument version of append, shown in line (B), a substring from the argument string is appended to the invoking string. The substring begins at the index specified by the second argument, with the third argument specifying its length. The second and the third arguments in the three-argument version are both of type string:: size_type, which as mentioned before can be taken to be the same as int for the purpose of program design.

There is also a two-argument version of append in which the second argument is the same as the second argument of the three-argument version. In this case, the entire argument string starting at the specified index is appended to the invoking string.

As is true of all string class member functions, the argument string can also be a C-style const char* string.

4.3.5 Searching for Substrings and Characters
A frequent problem in string processing is that we want to know if a given string has particular substrings or particular characters in it. Consider, for example, the problem of isolating words in a text file. Of the many different ways of solving this problem, one would be to read the file one line at a time and to then look for whitespace characters in each line. If not excessively large, we could even read the entire file as a single string and then look for whitespace characters (which include line-feeds and carriage returns) to break the string into individual words.

The C++ string library provides a number of functions for searching for substrings and individual characters in a string. These functions are named find, rfind, find_first_of, find_last_of, find_first_not_of, and find_last_not_of. In all there are 24 functions with these six names, the various versions of the functions catering to different types of arguments. In this section, we will explain how one can invoke find and find_first_of on string type objects with string or char type arguments. (Their usage on const char* type arguments is parallel to the usage on string arguments.) The functions rfind do the same thing as find, except that they start the search from the end of a string towards its beginning. The functions find_last_of again do the same thing as find_first_of, except that they start their search at the end of a string toward its beginning.

Here is an example that illustrates how one can invoke find to search for a substring in a string:

string::size_type pos = 0;
string quote( "Some cause happiness wherever they go,"
" others whenever they go - Oscar Wilde" );
if ( ( pos = quote.find( "happiness" ) ) != string::npos ) //(A)
cout << "The quote contains the word 'happiness'" << endl;

The function find returns the index of the character in the invoking string where it scores a match with the argument string. This index, although officially of type string:: size_type, can be taken to be an int for all practical purposes. If no match is found, find returns a symbolic constant string:: npos, a static data member of the string class also of type size_t. The actual value of npos is such that no actual character index in any valid string would ever correspond to it. In the above program fragment, note how we compare the value returned by find with the symbolic constant npos to establish the presence or the absence of the substring.

The following program shows a simple demonstration of the use of find. It also shows how replace, another member function of the string class, can be used together with find to search for each occurrence of a substring in a string and, when found, how the substring can be replaced with another string. The program produces the output

4
32
one armadillo is like any other armadillo

where the numbers 4 and 32 are the position indices where the substring "hello" occurs in the larger string "one hello is like any other hello". Here is the program:[17]

--------------------------------------------------------------------------------
//StringFind.cc

#include
using namespace std;

int main()
{
string str( "one hello is like any other hello" );
string searchString( "hello" );
string replaceString( "armadillo" );

assert( searchString != replaceString );

string::size_type pos = 0;
while ( (pos = str.find(searchString, pos)) != string::npos ) {
str.replace( pos, searchString.size(), replaceString );
pos++;
}
cout << str << endl; //one armadillo is like any other armadillo
return 0;
}
--------------------------------------------------------------------------------

Note the use of the 2-argument version of find in the above program. The second argument tells find where to begin the search for the substring. When you are searching for a character or a substring with find, after you have obtained the first match, you need to increment the index represented by pos so that the search can continue on for the next occurrence. If you don't do that, find will keep on returning the same index ad infinitum.

The above example code also illustrates the use of the 3–argument replace. This function can take up to five arguments. The two additional arguments, both of type string:: size_type, specify the position in the argument string and the number of characters to be taken starting at that position for the purpose of replacement.

Shown below is an example of how one can use the string library function find_first_of to locate and count some of the more frequently used punctuation marks in a string. We place all the punctuation marks we are looking for in a string called marks, with the original string stored in quote. We invoke find_first_of on quote and supply it with marks as its first argument, the second argument consisting of the position index in quote where we want the search to begin. Note how we increment pos after each hit. If we did not do so, the function find_first_of will keep on returning the same location where it found the first punctuation mark. For the example shown, the program returns a count of five.

string quote( "Ah, Why, ye Gods, should two and two "
"make four? - Alexander Pope" );
string marks( ",.?:;-" );
string::size_type pos = 0;
int count = 0;
while ( ( pos = quote.find_first_of( marks, pos ) )
!= string::npos ) {
++pos;
++count;
}
cout << count << endl; // 5

4.3.6 Extracting Substrings
The string library offers the function substr for extracting a substring from a source string on which the function is invoked. This function can be invoked with one argument, of type size_type, that designates the index of the character that marks the start of the substring desired from the source string. The extracted substring will extend all the way to the end of the source string. This use is illustrated by the following code fragment. Here the string returned by substr will start at the position indexed 44 and go to the end of the quote. As a result, the output produced by line (B) is "Fiction has to make sense.—Tom Clancy".

string quote( "The difference between reality and fiction? "
"Fiction has to make sense. - Tom Clancy" );
string str = quote.substr( 44 );
cout << str << endl; //(A)

There is also a two-argument version of substr in which the first argument works the same as in the example shown above. The second argument, also of type size_type, now designates the number of characters to be extracted from the source string. If the number of characters requested exceeds the number remaining in the source string, the extracted substring will stop at the end of the source string. The following code fragment, which will output "Fiction," illustrates this usage.

string quote( "The difference between reality and fiction?"
"Fiction has to make sense. - Tom Clancy" );
string str = quote.substr( 44, 7 );
cout << str << endl; // Fiction

It is also possible to invoke the substr function with no arguments, in which case it simply returns a copy of the string object on which it is invoked.

Substrings can also be extracted by invoking the string constructor with a string argument and with additional optional arguments to specify the starting index for substring extraction and the number of characters to be extracted from the first argument string. In the invocations of the string constructor below that construct the objects str_1 and str_2, the first yields the substring "Fiction has to make sense. - Tom Clancy", and the second just the word "Fiction".

string quote( "The difference between reality and fiction?"
"Fiction has to make sense. - Tom Clancy" );
string str_1( quote, 44 );
string str_2( quote, 44, 7 );

4.3.7 Erasing and Inserting Substrings
The string class member function erase can be used to erase a certain number of characters in the string on which the function is invoked. The function can be invoked with zero arguments, with one argument, and with two arguments. When invoked with no arguments, the function erases the string stored in the invoking object and replaces it with the empty string "". When invoked with one argument, which must be of type string: :size_type, the string stored in the invoking object is erased from the position indexed by the second argument to the end. When invoked with two arguments, both of typestring:: size_type, the second argument designates the number of characters to be erased starting at the position specified by the first argument.

The following code fragment illustrates the two-argument erase. It also illustrates the insert member function which can be used to insert a new substring into a string object. The function insert can be invoked with either two arguments, or three arguments, or four arguments. When invoked with two arguments, the first argument, of type string: :size_type, designates the index of the position at which the new insertion is to begin, and the second argument the string to be inserted. In the three-argument version, the additional argument specifies a position in the argument string that designates the start of the substring to be inserted; the substring continues to the end. In the four-argument invocation, the last argument specifies the number of characters to be taken from the argument string for the purpose of insertion.

The example below shows two-argument and four-argument versions of insert.

string: :size_type pos = 0;
string quote = "Some cause happiness wherever they go, "
"others whenever they go - Oscar Wilde";
if ( ( pos = quote.find( "happiness" ) ) != string: :npos ) {
quote.erase( pos, 9 );
quote.insert( pos, "excitement" );
}
cout << quote << endl; //(A)
quote.erase( pos, 10 );
cout << quote << endl; //(B)
quote. insert( pos, "infinite happiness in the air", 9, 9 );
cout << quote << endl; //(C)

The code produces the following output:

FROM LINE (A):
Some cause excitement wherever they go, others whenever they go - Oscar Wilde

FOME LINE (B):
Some cause wherever they go, others whenever they go - Oscar Wilde

FROM LINE (C):
Some cause happiness wherever they go, others whenever they go - Oscar Wilde

4.3.8 Size and Capacity
The size() (or length(), which does the same thing) member function when invoked on a string object will ordinarily return the number of characters in the string stored in the object. This will also ordinarily be the amount of memory allocated to a string object for the storage of the characters of the string.

string str( "0123456789" );
cout << str.size() << endl; // returns 10

When you extend the length of a string by using, say, the ‘+=’ operator, the size of the allocated memory is automatically increased to accommodate the longer length. But if a string is going to be extended in bits and pieces frequently, you can reduce the background memory-allocation work by preallocating additional memory for the string through the resize() member function. If we refer to the total amount of memory currently available to a string for the storage of its characters as the string object's capacity, we can use resize to endow a string with any desired capacity. In the code fragment shown below, we initially create a string object of size 10 characters. At this moment the capacity of the string object is also 10. But then we increase the capacity to 20 characters by invoking resize, although the number of actual characters in the string is still 10.

--------------------------------------------------------------------------------
//StringSize.cc

#include
#include

int main()
{
string str = "0123456789";

cout << "The current capacity of the string is:"
<< str.size() << endl; // 10
str.resize( 20 );

cout << "The new capacity of the string is:"
<< str.size() << endl; // 20

cout << "The actual length of the string is: " // 10
<< strlen( str.c_str() ) << endl;

cout << "The string object after resizing "
<< "to 20 a 10 character string: "
<< str << endl; // "0123456789"
str += "hello";
cout << str << endl; // "0123456789hello"

return 0;
}
--------------------------------------------------------------------------------

This code shows a one-argument version of resize. When supplied with an optional second argument, which must be of type char, the designated character is used to initialize the spaces not occupied by the characters in the string, the default being the null character.

While on the subject of size, we also want to clarify the relationship between the size of a string object and the size of the string held by a string object. The size of a string object can be ascertained by invoking sizeof( string ), which for g++ returns 4 for all strings (but could return 8 on some systems). Before we go into why sizeof( string ) returns the same number for all strings on any given system, let's quickly review the nature of sizeof.

Remember from C that, despite its appearance, sizeof is not a function, but an operator. It is not a function in the sense that it does not evaluate its argument; it only looks at the type of its argument. To illustrate the nature of this operator, all of the following invocations of sizeof[18]

int x = 4;
int y = 5;
sizeof(x);
sizeof(x + y);
sizeof x;
sizeof( int );
sizeof int;

eturn on the author's machine the same value, which is 4 for the 4 bytes that it takes to store an int.[19] So if we say

string s1 = "hello";
string s2 = "hello there";

and then invoke the sizeof operator by

sizeof( s1 ); // returns 4 for g++
sizeof( s2 ); // returns 4 for g++

we'd get exactly the same answer in both cases, the number 4 (or 8 for some compilers). Compare this with the following case of applying sizeof to the string literals directly:

sizeof( "hello" ); // returns 6
sizeof( "hello there" ); // returns 12

We get 6 for the string literal "hello" because it is NOT stored as a string object and because its internal representation is a null-terminated array of characters. Similarly for the string literal "hello there".

The constant value of 4 returned by sizeof( string ) is easy to understand if we think of the string class as having been provided with a single non-static data member of type char* for holding a character pointer to a null-terminated array of characters.

class string {
char* ptr;
// static data members if needed
public:
// string functions
};

Then the memory occupied by a string object would be what's needed by its sole nonstatic data member shown—4 bytes for the pointer. On the other hand, if a compiler returned 8 bytes for sizeof ( string ), that's because the string class used by that compiler comes with an additional data member—of possibly an unsigned integer type—for holding the size of the string pointed to by the first data member. In this case, it would not be absolutely necessary for the char* string to be null terminated since the second data member would tell us directly how many characters belonged to the string.

Note that if we applied the sizeof operator to any pointer type, we'd get 4 for the four bytes to hold a memory address. For example,

sizeof ( string* ) -> 4
sizeof ( int* ) -> 4
sizeof ( char* ) -> 4

We have brought the above statements together in the following program:

--------------------------------------------------------------------------------
//StringSizeOf.cc
#include
#include

int main()
{
cout << sizeof( "hello" ) << endl; // 6
cout << sizeof( "hello there" ) << endl; // 12
string str1 = "hello";
string str2 = "hello there";

cout << sizeof( str1 ) << endl; // 4
cout << sizeof( str2 ) << endl; // 4

char* s1 = "hello";
char* s2 = "hello there";

cout << sizeof( s1 ) << endl; // 4
cout << sizeof( s2 ) << endl; // 4

char c_arr[] = "how are you?";
cout << sizeof( c_arr ) << endl; // 13

return 0;
}
--------------------------------------------------------------------------------

Before ending this subsection, we should remind the reader that sizeof () can sometimes show seemingly unexpected behavior. Consider the role of sizeof in the following program that attempts to find the size of the array in a called function by invoking sizeof:

--------------------------------------------------------------------------------
//ArraySizeOf.cc

#include

int sum( int [], int );

int main()
{
int data [100] = {2, 3};
int m = sizeof( data ) / sizeof ( data[0] ); // (A)
cout << sum( data, 100 ) << endl;
return 0;
}

int sum( int a[], int arr_size ) {
//the following value of n is not very useful
int n = sizeof( a ) / sizeof( a[0] ); // (B)

int result = 0;
int* p = a;
while (p-a return result;
}
--------------------------------------------------------------------------------

While at (A) the number m will be set to 100, at (B) the number n will be set to 1. The reason for this is that when an array name is a function parameter, it is treated strictly as a pointer. So the numerator on the right-hand side at (B) is synonymous with sizeof( int* ) which yields 4.

4.3.9 Some Other String Functions
The string library offers a function swap that can be used to swap the actual strings stored inside two string objects. In the following code fragment, after the execution of the third statement, the object str1 will store the string "lemonade", whereas the object str2 will store the string "lemon".

string str1 = "lemon";
string str2 = "lemonade";
str1.swap( str2 );

A different effect is achieved by the assign function. After the execution of the third statement below, both the objects str1 and str2 will contain the string "lemonade";

string str1 = "lemon";
string str2 = "lemonade";
str1.assign( str2 );

[5]Actually, the built-in string type in C++ is the template class basic_string. The C++ string class is a typedef alias for basic_string, which is the basic_string template with char as its template parameter. The concept of a template class, introduced briefly in Chapter 3, is presented more fully in Chapter 13.

[6]Copy constructors are discussed in Chapter 11.

[7]Depending on how the string type is implemented, a C++ string may not include a null terminator at the end. In that case, an empty C++ can be truly empty, as opposed to a "" string in C which consists of the null terminator.

[8]We could also have said: "This declaration creates an object of type string." For nonprimitive types, the characterizations type and class are used interchangeably in object-oriented programming.

[9]On the basis of the notation explained in Section 3.16.1 of Chapter 3, the syntax string:: size_type refers to inner type size_type defined for the string class.

[10]This is actually true of all string member functions. They work the same for both string and const char* arguments.

[11]In Chapter 5, we discuss the notion of stable sorting for class type objects and point out that qsort may not be the best sorting function to invoke in some cases.

[12]We can think of size_t as an unsigned integer.

[13]For the example array shown, each element of the array is a string object that is initialized by the corresponding string literal on the right hand side of the declaration for wordList. So we can use sizeof(string) for the third argument of qsort.

[14]Typical C syntax for the same function would be

int compareStrings( const void* arg1, const void* arg2 ) {
return (*(const string*) arg1).compare(*(const string*) arg2);
}

The difference between the C way of writing this function and the C++ syntax shown in line (E) is with regard to casting. What is done by the cast operator (const string*) in the C version here is accomplished by static_cast() in the C++ definition in line (E). The static_cast and other C++ cast operators are presented in Chapters 6 and 16.

[15]Obviously, there has to be sufficient free memory available to the memory allocator used by the string class for this to be the case. If the memory needed is not available, the memory allocator will throw an exception.

[16]As we will explain in Chapter 12, for class type operands the compiler translates the expression

str1 + str2;
into
str1.operator+( str2 );

where the function operator+ contains the overload definition for the ‘+’ operator. That makes str1 the operand on which the function operator+ is invoked and str2 the argument operand. We may loosely refer to str1 as the invoking operand.

[17]Note the use of the assert function in this program. The test stated in the argument to this function must evaluate to true for the thread of execution to proceed beyond the point of this function call.

[18]Although the parentheses are not really needed in sizeof(x), in the sense that we could also have said sizeof x, because of operator precedence the compiler would understand sizeof (x + y) and sizeof x + y differently. Since the operator sizeof is a unary operator and since unary operators have higher precedence than binary operators, sizeof x + y; would be interpreted as sizeof (x) + y.

[19]To be precise, the sizeof operator in C++ returns the size of a type-name in terms of the size of a char. However, in most implementations, the size of a char is 1 for the 1 byte that it takes to hold a character in C++. Also as a point of difference between C++ and C, in C sizeof ( 'x' ) returns 4, whereas sizeof ( char ) returns 1. On the other hand, in C++, both sizeof ( 'x' ) and sizeof ( char ) return 1. The reason for the discrepancy between the two sizeof values for C is that a char argument to the operator is read as an int, as is often the case with char arguments in C. Despite this discrepancy in C, the following idiom in C

int size;
char arr[3] = {'*', 'y', 'z'};
size = sizeof ( arr ) / sizeof( arr[0] );

does exactly what the programmer wants it to do (the value of size is set to 3, the number of elements in the array) because the sizeof operator looks only at the type of arr[0] in the denominator. In other words, even though sizeof( 'x' ) returns 4 in C, sizeof( arr[0] ) will always return 1.
STRINGS IN JAVA
Java provides two classes, String and StringBuffer, for representing strings and for string processing. An object of type String cannot be modified after it is created.[20] It can be deleted by the garbage collector if there are no variables holding references to it, but it cannot be changed. For this reason, string objects of type String are called immutable. If you want to carry out an in-place modification of a string, the string needs to be an object of type StringBuffer.

As in C++, a string literal in Java is double-quoted. String literals in Java are objects of type String. As in C++, two string literals consisting of the same sequence of characters are one and the same object in the memory. That is, there is only one String object stored for each string literal even when that literal is mentioned at different places in a program, in different classes, or even in different packages of a Java program.

That a string literal consisting of a given sequence of characters is stored only once in the memory is made clear by the following program. Lines (A) and (B) of the program define two different String variables, strX and strY, in two different classes; both strX and strY are initialized with string literals consisting of the same sequence of characters. Nonetheless, a comparison of the two with the ‘==' operator in line (D) tests true. Recall, the operator ‘==' returns true only when its two operands are one and the same object in the memory.

Line (C) of the program illustrates the following string-valued constant expression on the right-hand-side of the assignment operator

"hell" + "o"

In such cases, the Java compiler creates a new string literal by joining the two string literals "hell" and "o". Being still a literal, the resulting literal is not stored separately in the memory if it was previously seen by the compiler. So in our case, the variable strZ in line (C) will point to the same location in the memory as the variables strX in line (A) and strY in line (B). This is borne out by the fact that the ‘==' comparison in line (E) tests true.

While joining two string literals together results in a constant expression that is resolved at compile time, the assignment to the variable s3 in the following three instructions can only be made at run time. Therefore, the string hello constructed on the right-hand side in the third statement below will have a separate existence as a String object in the memory even if a string literal consisting of the same sequence of characters was created previously by the program. That should explain why the comparison in line (F) of the program tests false.

String s1 = "hel";
String s2 = "lo";
String s3 = s1 + s2;

However, Java provides a mechanism through the method intern () defined for the String class that allows a string created at run-time to be added to the pool of string literals (if it was not in the pool already). If the above three instructions are replaced with

String s1 = "hel";
String s2 = "lo";
String s3 = (s1 + s2).intern();

Java will compare the character sequence in the string object returned by s1 + s2 with the string literals already in store. If a match is found, intern() returns a reference to that literal. If a match is not found, then the string returned by s1 + s2 is added to the pool of string literals and a reference to the new literal returned. That should explain why the ‘==' comparison in line (G) of the program tests true. The reference returned by (s1 + s2). intern () will point to the same string literal as the data member strx of class X.

Here is the program:

--------------------------------------------------------------------------------
//StringLiteralUniqueness.java

class X { public static String strX = "hello"; } //(A)

class Y { public static String strY = "hello"; } //(B)

class Z { public static String strZ = "hell" + "o"; } //(C)

class Test {
public static void main( String[] args ) {

// output: true
System.out.println( X.strX == Y.strY ); //(D)

// output: true
System.out.println( X.strX == Z.strZ ); //(E)

String s1 = "hel";
String s2 = "lo";

// output: false
System.out.println( X.strX == (s1 + s2 ) ); //(F)

// output: true
System.out.println( X.strX == (s1 + s2).intern() ); //(G)
}
}
--------------------------------------------------------------------------------

4.4.1 Constructing String and StringBuffer Objects
String objects are commonly constructed using the following syntax

String str = "hello there";

or

String str = new String( "hello there" );

For constructing a StringBuffer object, the first declaration does not work because of type incompatibilities caused by the fact that the right hand side would be a String object and the left hand side a StringBuffer object.

StringBuffer strbuf = "hello there"; //WRONG

StringBuffer objects are commonly constructed using the following syntax

StringBuffer strbuf = new StringBuffer( "hello there" );

An empty String object, meaning a String object with no characters stored in it, can be created by

String s0 = "";

or by

String s0 = new String();

To create an empty StringBuffer object, use either

StringBuffer sb0 = new StringBuffer( "" );

or

StringBuffer sb0 = new StringBuffer();

When a String object is created with a nonempty initialization, the amount of memory allocated to the object for the storage of the characters equals exactly what's needed for the characters. On the other hand, when a new StringBuffer object is created, the amount of memory allocated to the object for actual representation of the string is often 16 characters larger than what is needed. This is to reduce the memory allocation overhead for modifications to a string that add small number of characters to the string at a time. The number of characters that a StringBuffer object can accommodate without additional memory allocation is called its capacity. The number of characters stored in a String or a StringBuffer object can be ascertained by invoking the method length () and the capacity of a StringBuffer object by invoking the method capacity ():

String str = "hello there";
System.out.println( str.length() ); // 11
StringBuffer strbuf = new StringBuffer( "hello there" );
System.out.println( strbuf.length() ); // 11
System.out.println( strbuf.capacity() ); // 27

One is, of course, not limited to the capacity that comes with the default initialization of a StringBuffer object-usually 16 over what is needed for the initialization string. If we invoke the StringBuffer with an int argument, it constructs a string buffer with no characters in, but a capacity as specified by the argument. So the following invocation

StringBuffer strbuf = new StringBuffer( 1024 );

would create string buffer of capacity 1024. Characters may then be inserted into the buffer by using, say, the append function that we will discuss later in this section.

While we have shown all the different possible constructor invocations for the StringBuffer class, the String class allows for many more, all with different types of arguments. In the rest of this section, we will show a few more of the String constructors. One of the String constructors takes a char array argument to construct a String object from an array of characters, as in the following example:[21]

char[] charArr = { 'h', 'e', 'l', 'l', 'o' };
String str4 = new String( charArr );

A String object can also be constructed from an array of bytes, as in

byte[] byteArr = { 'h', 'e', 'l', 'l', 'o' };
String str5 = new String( byteArr ); // "hello"

Each byte of the byte array byteArr will be set to the ASCII encoding of the corresponding character in the initializer. When constructing a String from the byte array, the Java Virtual Machine translates the bytes into characters using the platform's default encoding, which in most cases would be the ASCII encoding. Subsequently, the String object is constructed from the default encodings for the characters.

If the default encoding will not do the job for constructing a String from a byte array, it is possible to specify the encoding to be used.[22] In the following example, the byte array is specified so that each pair of bytes starting from the beginning corresponds to a Unicode representation of the character shown by the second byte of the pair. For example, the 16-bit pattern obtained by joining together one-byte ASCII based representations of '\O' and 'h' is the Unicode in its big-endian representation for the character 'h'. As a result, the string formed by the constructor is again "hello".

byte[] byteArr2 = { '\O', 'h', '\o', 'e', '\o', 'l',
'\0', 'l', '\0', 'o' };
String str6 = new String( byteArr2, "UTF-16BE" ); // "hello"

If we wanted to specify the byte order in the little-endian representation, we'd need to use the "UTF-16LE" encoding, as shown below:

byte[] byteArr3 = { 'h', '\0', 'e', '\0', 'l', '\0',
'l', '\0', 'o', '\0' };
String str7 = new String( byteArr3, "UTF-16LE" ); // "hello"

The last two invocations of the String constructor throw the UnsupportedEncodingException if the specified encoding is not supported by a JVM. The topic of exceptions and how to deal with them will be discussed in Chapter 10.

4.4.2 Accessing Individual Characters
The individual characters of a Java string can be accessed by invoking the charAt method with an int argument:

String str = "hello";
char ch = str.charAt( 1 ); // 'e'

StringBuffer strbuf = new StringBuffer( "hello" );
ch = strbuf.charAt( 1 ); // 'e'

Since the strings created through the StringBuffer class are mutable, it is possible to write into each character position in such a string, as the following example illustrates:

StringBuffer strbuf = new StringBuffer( "hello" );
strbuf.setCharAt( 0, 'j' );

which would convert "hello" into "jello".

Indexing for accessing the individual characters of a string is always range checked in Java. If you try to access an index that is outside the valid limits for a string, JVM will throw an exception of type StringIndexOutOf BoundsException:

String str = "hello";
char ch = str.charAt( 100 ); // ERROR

StringBuffer strbuf = new StringBuffer( "hello" );
ch = strbuf.charAt( 100 ); // ERROR

For a StringBuffer string, it is a range violation if you try to access an index that is outside the length of the string even if the index is inside the capacity.

StringBuffer strbuf = new StringBuffer( "hello" );
System.out.println( strbuf.capacity() ); // 21
ch = strbuf.charAt( 20 ); // ERROR

For a StringBuffer string, you can delete a character by invoking deleteCharAt:

StringBuffer strbuf = new StringBuffer( "hello" );
strbuf.deleteCharAt( 0 );
System.out.println( strbuf.length() ); // 4, was 5
System.out.println( strbuf.capacity() ); // 21, was 21

By deleting a character, the deleteCharAt method shrinks the length of the string by one, but note that the capacity of the string buffer remains unaltered.

4.4.3 String Comparison
Java strings are compared using the equals and compareTo methods, and the ‘==' operator. The method equals returns a TRUE/FALSE answer, whereas the method compareTo returns an integer that tells us whether the String on which the method is invoked is less than, equal to, or greater than the argument String. For example, in the following program fragment

String str1 = "stint";
String str2 = "stink";
System.out.println( str1.equals( str2 ) ); // false

String str3 = "stint";
String str4 = "stink";
System.out.println( str3.compareTo( str4 ) > 0 ); // true

the first print statement outputs false because the strings pointed to by str1 and str2 are composed of different character sequences. The second print statement outputs true because the string str3 is indeed "greater" than the string str4. We'll have more to say on the compareTo method later in this subsection when we talk about sorting arrays of strings.

With regard to the ‘==' operator, as we have already mentioned, the operator can only be used for testing whether two different String variables are pointing to the same String object. Suppose we have the following statements in a program

String s1 = new String("Hello");
String s2 = s1;

then s1 == s2 would evaluate to true because both s1 and s2 will be holding references to the same string object, meaning an object that resides at the same place in the memory. On the other hand, if we say

String s1 = new String("hello");
String s2 = new String("hello");

then s1 == s2 will evaluate to false because we now have two distinct String objects at two different places in the memory even though the contents of both objects are identical in value, since they are both formed from the same string literal.

As was mentioned earlier in Chapter 3, both equals and ‘==' are defined for the Object class, the root class in the Java hierarchy of classes, and that the system-supplied definitions for both are the same for Object-comparison on the basis of equality of reference. So, as defined for Object, both these predicates tell us whether the two references point to exactly the same object in the memory. However, while equals can be overridden, ‘==' cannot because it is an operator. The method equals has already been overridden for us in the String class. So it carries out its comparisons on the basis of equality of content for String type strings. But since, in general, operators cannot be overridden in Java, the operator ‘==' retains its meaning as defined in the Object class.

A word of caution about comparing objects of type StringBuffer: While the system provides us with an overridden definition for the equals method for the String class, it does not do so for the StringBuffer class. In other words, while for the String class you can use the equals method to test for the equality of content, you cannot do so for the StringBuffer class, as borne out by the following code:

String s1 = new String( "Hello" );
String s2 = new String( "Hello" );
System.out.println( ( s1.equals( s2 ) ) + """ ); // true

StringBuffer s3 = new StringBuffer( "Hello" );
StringBuffer s4 = new StringBuffer( "Hello" );
System.out.println( ( s3.equals( s4 )) + "" ); // false

If you must compare two StringBuffer objects for equality of content, you can can do so by first constructing String objects out of them via the toString method, as in

StringBuffer sb = new StringBuffer( "Hello" );
if ( ( sb.toString().equals( "jello" ) )
....

We will now revisit the compareTo method for the String class. The String class implements the Comparable interface by providing an implementation for the compareTo method. The compareTo method as provided for the String class compares two strings lexicographically using the Unicode values associated with the characters in the string.[23] Because the String class comes equipped with compareTo method, we say that String objects possess a natural ordering, which implies that we are allowed to sort an array of Strings by invoking, say, java.util.Arrays.sort without having to explicitly supply a comparison function to the sort method. This is in accord with our Chapter 3 discussion on comparing objects in Java. The following example illustrates invoking java.util.Arrays.sort for sorting an array of strings.

If we do not want the array of strings to be sorted according to the compareTo comparison function, we can invoke a two-argument version of java.util.Arrays.sort and supply for its second argument an object of type Comparator that has an implementation for a method called compare that tells the sort function how to carry out comparisons.[24] If all you want to do is to carry out a case-insensitive comparison, you can use the Comparator object CASE_INSENSITIVE_ORDER that comes as a static data member of the String class. In the code example shown below, the second sort is a case-insensitive sort. The java.util.Arrays.sort is based on the merge-sort algorithm.

--------------------------------------------------------------------------------
//StringSort.java

import java.util.*;

class StringSort {
public static void main( String[] args ) {
String[] strArr = { "apples", "bananas", "Apricots", "Berries", "oranges", "Oranges", "APPLES", "peaches"};
String[] strArr2 = strArr;
System.out.println("Case sensitive sort with Arrays.sort:" );
Arrays.sort( strArr );
for (int i=0; i System.out.println( strArr[i] );
System.out.println("\nCase insensitive sort:" );
Arrays.sort( strArr2, String.CASE_INSENSITIVE_ORDER );
for (int i=0; i System.out.println( strArr2[i] );
}
}
--------------------------------------------------------------------------------

The output of this program is

Case sensitive sort:
APPLES
Apricots
Berries
Oranges
apples
bananas
oranges
peaches
Case insensitive sort:
APPLES
apples
Apricots
bananas
Berries
Oranges
oranges
peaches

4.4.4 Joining Strings Together
In general, Java does not overload its operators. But there is one exception to that general rule, the operator ‘+' for just the String type (and not even for the StringBuffer type). The overload definition for this operator will cause the object str3 in the following code fragment to store the string "hello there".

String str1 = "hello";
String str2 = "there";
String str3 = str1 + str2;

Strings of type StringBuffer can be joined by invoking the append method, as in

StringBuffer strbuf = new StringBuffer( "hello" );
StringBuffer strbuf2 = new StringBuffer( " there" );
strbuf.append( strbuf2 );
System.out.println( strbuf ); // "hello there"
String str = "!";
strbuf.append( str );
System.out.println( strbuf ); // "hello there!"

The capacity of a string buffer is automatically increased if it runs out of space as additional characters are added to the string already there.

In addition to invoking the append method with either the String or the StringBuffer arguments, you can also invoke it with some of the other types that Java supports, as illustrated by:

StringBuffer strbuf = new StringBuffer( "hello" );
int x = 123;
strbuf.append( x );
System.out.println( strbuf ); // "hello123"
double d = 9.87;
strbuf.append( d );
System.out.println( strbuf ); // "hello1239.87"

As you can see, append first converts its argument to a string representation and then appends the new string to the one already in the buffer. This permits append to be invoked for any object, even a programmer-defined object, as long as it is possible to convert the object into its string representation. As we saw in Chapter 3, when a class is supplied with an override definition for the toString method, the system can automatically create string representations of the objects made from the class.

Going back to the joining of String type strings, an immutable string class is inefficient for serial concatenation of substrings, as in

String s = "hello" + " there" + " how" + " are" + " you";

The string concatenations on the right are equivalent to

String s = "hello" + (" there" + (" how" + (" are" + " you")));

If the Java compiler had available to it only the immutable String class for string processing, each parenthesized concatenation on the right would demand that a new String object be created. Therefore, this example would entail creation of five String objects, of which only one would be used. And then there would be further work entailed in the garbage collection of the eventually unused String objects. Fortunately, the Java compiler does not really use the String class for the operations on the right. Instead, it uses the mutable StringBuffer class and the append method of that class to carry out the concatenations shown above. The final result is then converted back to a String.

4.4.5 Searching and Replacing
One can search for individual characters and substrings in a String type string by invoking the indexOf method:

String str = "hello there";
int n = str.indexOf( "the" ); // 6

By supplying indexOf with a second int argument, it is also possible to specify the index of the starting position for the search. This can be used to search for all occurrences of a character or a substring, as the following code fragment illustrates:

String mystr = new String( "one hello is like any other hello" );
String search = "hello";
int pos = 0;
while ( true ) {
pos = mystr.indexOf( search, pos );
if ( pos == -1 ) break;
System.out.println( "hello found at: " + pos ); // 4 and 28
pos++;
}

To parallel our C++ program StringFind.cc, we show next a program that searches for all occurrences of a substring and, when successful, it replaces the substring by another string. Since a String is immutable, we'll have to use a StringBuffer for representing the original string. But since there are no search functions defined for the StringBuffer class, we have to somehow combine the the mutability of a StringBuffer with the searching capability of a String. The following program illustrates this to convert "one hello is like any other hello" into "one armadillo is alike any other armadillo".

--------------------------------------------------------------------------------
//StringFind.java
class StringFind {
public static void main( String[] args ) {
StringBuffer strbuf = new StringBuffer(
"one hello is like any other hello" );
String searchString = "hello";
String replacementString = "armadillo";
int pos = 0;
while ( ( pos = (new String(strbuf)).indexOf(
searchString, pos ) ) != -1 ) {
strbuf.replace( pos, pos +
searchString.length(), replacementString );
pos++;
}
System.out.println( strbuf );
}
}
--------------------------------------------------------------------------------

There is also the method lastIndexOf that searches for the rightmost occurrence of a character or a substring:

String str = "hello there";
int n = str.lastIndxOf( "he" ); // 7

The methods endsWith and startsWith can be invoked to check for suffixes and prefixes in strings:

String str = "hello there";
if (str.startsWith( "he" ) ) // true
....
if ( str.endsWith( "re" ) ) // true
....

4.4.6 Erasing and Inserting Substrings
The following example shows how we can search for a substring, erase it, and then insert in its place another substring. What erase did for C++ is now done by delete with two int arguments for the beginning index and the ending index of the character sequence to be deleted. Insertion of a substring is carried out with the insert method whose first argument, of type int, specifies the index where the new substring is to be spliced in.

--------------------------------------------------------------------------------
// StringInsert.java
class StringInsert {
public static void main( String[] args ) {
int pos = 0;
StringBuffer quote = new StringBuffer(
"Some cause happiness wherever they go,"
+ " others whenever they go - Oscal Wilde" );
String search = "happiness";
if ( ( pos = ( new String(quote) ).indexOf( search) ) != -1 ) {
quote.delete( pos, pos + search.length() );
quote.insert( pos, "excitement" );
}
System.out.println( quote );
}
}
--------------------------------------------------------------------------------

4.4.7 Extracting Substrings
Both String and StringBuffer support substring extraction by invoking the substring method with either one int argument or two int arguments. When only one argument is supplied to substring, that is the beginning index for the substring to be extracted. The substring extracted will include all of the characters from the beginning index till the end. When two arguments are supplied, the second argument stands for the ending index of the desired substring. In all cases, for both String and StringBuffer, the returned object is a new String. For illustration:

String str = "0123456789abc";
System.out.println( str.substring( 5 ) ); // "56789abc"
System.out.println( str.substring( 5, 9 ) ); // "56789"
StringBuffer stb = new StringBuffer( "0123456789abc" );
System.out.println( stb.substring( 5 ) ); // "56789abc"
System.out.println( stb.substring( 5, 9 ) ); // "56789"

[20]Operations on String type objects sometimes have the appearance that you might be changing an object of type String, but that is never the case. In all such operations, a new String object is usually formed. For example, in the following statements the string literal "jello" in line (A) did not get changed into "hello" in line (B). The string literals "jello" and "hello" occupy separate places in the memory. Initially, s1 holds a reference to the former literal and then to the latter literal. After s1 changes its reference to "hello", the string literal "jello" will eventually be garbage collected if no other variable is holding a reference to it. The statement in line (C) results in the creation of a new String object whose reference is held by the variable s2.

String s1 = "jello"; //(A)
s1 = "hello"; //(B)
String s2 = s1 + " there"; //(C)

By the same token, in lines (D) and (E) below, the object s2 is a new string object, as opposed to being an extension of the object s1:

String s1 = "hello"; //(D)
String s2 = s1.concat( "there" ); //(E)

The invocation of the concat method in line (E) returns a new string that is a concatenation of the string on which the method is invoked and the argument string.

[21]The reader may wish to read the rest of this subsection after we discuss the different primitive types in Java in Chapter 6.

[22]Java supports the following character encodings that we will discuss further in Chapter 6:

US-ASCII (this is the seven-bit ASCII)

ISO-8859-1 (ISO-Latin-1)

UTF-8 (8-bit Unicode Transformation Format)

UTF-16BE (16-bit Unicode in big-endian byte order)

UTF-16LE (16-bit Unicode in little-endian byte order)

UTF-16 (16-bit Unicode in which the byte order is specified by a mandatory initial byte-order mark)

The Notion of a Class and Some Other Key Ideas

DEFINING A CLASS IN C++
Here is a simple example of a C++ class:

class User {
string name;
int age;
};

We have defined the class User with two variables, name and age. As mentioned before, these are usually referred to as members, data members, or fields. The former is of type string and the latter of type int. Note that a C++ class needs a semicolon at the end to terminate the definition.

The reader who is not already familiar with the system-supplied classes in C++ is probably perplexed by the type string. An introduction to the C++ string type will be provided in the next chapter. All we want to say here is that while in C a string of characters is represented by a null-terminated array of char's, as in

char str[] = "hello";

or by a pointer to type char, as in

char* str = "hello";

where the right hand side is a string literal, it is more common in C++ to use the string type. Of course, one also has the option of using the C-style strings in C++, but these don't come with the same protections as the string type, as we will see in the next chapter.

A class defines a new type. It may be system supplied, or can be programmer defined. With the User class defined as above, we are allowed to declare variables of type User. For example, we could declare the name u to be a variable of type User by

User u;

just as you might declare the name i to be a variable of type int by

int i;

or the name ch to be a variable of type char by

char ch;

One is immediately faced with the following question for a user-defined class: How does one initialize variables of such types? How would one initialize the variable u for a User whose name is "Zaphod" and whose age is 119? What we really want to do is to create a specific object of type User whose name member is set to Zaphod and whose age member is set to 119. An object is constructed by instantiating a class with the help of a class constructor. A constructor sets aside a part of the memory for the object that one wants to create and sets the various members of the object according to the arguments supplied to the constructor. If the arguments for some or all of the data members are not supplied, the constructor may use default values.

Here is a more useful definition of the User class with a constructor included:

class User {
string name;
int age;
public:
User( string str, int yy ) { name = str; age = yy; }
};

The use of the keyword public as shown places the constructor in the public section of the class. (The data members name and age are implicitly in the private section of the class.) This, as further explained in Section 3.11, allows the rest of your program to create objects of type User. Also note that there is no return type specified for the constructor. That's because a constructor is really not a function; its job is to appropriate the memory needed and build an object therein.

Now that we have available to us a constructor for the class User, we can create objects of this type by invoking one of the following forms in C++

User u( "Zaphod", 119 ); //(A)

User* p = new User( "Zaphod", 119 ); //(B)

The constructor invocation in line (A), in the form of a declaration/initialization for the variable u, allocates memory for the new object on the stack. This memory gets freed up automatically when the variable u goes out of scope. In the constructor invocation in line (B), the operator new allocates fresh memory for the object on the heap and then returns a pointer to this memory. This memory can only be freed up under program control by explicit invocation of the delete operator, as in line (C) below:

User* p = new User( "Zaphod", 119 );
delete p; //(C)

When invoked on a pointer to a class-type object, the delete operator invokes the class's destructor. We will have more to say about destructors in Section 3.8 of this chapter and in Chapter 11.

Since we would also want to see the objects we create, let's include in the class definition a print function:

class User {
string name;
int age;
public:
User( string str, int yy ) { name = str; age = yy; }

void print() { //(D)
cout << "name:" << name << " age: " << age << endl; //(E)
}
};

In Section 2.1 of the previous chapter, we briefly discussed the purpose served by the insertion operator ‘<<' and by the output stream object cout used in line (E).

A member function, such as print() in line (D) above, is invoked on a specific object. The syntax of this invocation depends on whether the function is invoked directly on an object or on a pointer to an object, as shown in the following two examples:


User u( "Zaphod", 119 );
u.print(); // name: Zaphod age: 119 //(F)

User* p = new User( "Zaphod", 119 );
p->print(); // name: Zaphod age: 119 //(G)

where ‘.' in line (F) and ‘->' in line (G) are known as the member access operators.

Here is a working C++ program that uses the class definition provided above:

--------------------------------------------------------------------------------
//User1.cc

#include
#include
using namespace std;

class User {
string name;
int age;
public:
User( string str, int yy ) { name = str; age = yy; }

void print() {
cout << "name: " << name << " age: " << age << endl;
}
};

int main()
{
User u( "Zaphod", 119 );
u.print();
return 0;
}
--------------------------------------------------------------------------------

C++ allows the implementation code for a class to reside outside the definition of the class itself. Shown below is the same program as above, but with the constructor and the print function definition outside the class definition:

--------------------------------------------------------------------------------
//User2.cc

#include
#include
using namespace std;

class User {
string name;
int age;
public:
User( string str, int yy ); //(H)
void print(); //(I)
};

User::User( string str, int yy ) { //(J)
name = str; age = yy;
}

void User::print() { //(K)
cout << "name: " << name << " age: " << age << endl;
}

int main()
{
User u( "Zaphod", 119 );
u.print();
return 0;
}
--------------------------------------------------------------------------------

So whereas the constructor is declared in line (H) inside the class definition, the implementation code for the constructor is provided in line (J) outside the class. Same for the print function; the declaration as a member function is in line (I) and the definition at line (K). Note the use of the scope operator '::' in lines (J) and (K) to help the compiler figure out that the definitions being provided are for the class User. This usage of the scope operator is as a class scope operator.[3] As we will explain in Chapter 11, one has no choice but to resort to the class scope operator in the manner shown and provide definitions external to a class when classes are interleaved in a C++ program. This is necessitated by the fact that, unlike what happens in Java compilation, a C++ compiler does not possess a look-ahead capability.

There is yet another variation on how a class is defined in C++. This variation concerns how the data members of a class instance are initialized by a constructor. The program shown below is identical to the program User1.cc, except that in line (L) below the constructor now uses the member initialization syntax for the initialization of the data members of an object. Strictly speaking, as will be explained in Chapter 7, it is necessary for only the const and the reference data members of a class to be initialized in this manner. But it is common to see code in which this sort of initialization is carried out for other types of data members also.


--------------------------------------------------------------------------------
//User3.cc

#include
#include
using namespace std;

class User {
string name;
int age;
public:
User( string str, int yy ) : name( str ), age( yy ) {} //(L)
void print() {
cout << "name: " << name << " age: " << age << endl;
}
};

int main()
{
User u( "Zaphod", 119 );
u.print();
return 0;
}

DEFINING A CLASS IN JAVA
Paralleling our class definitions for C++ in the previous section, here is a simple example of a Java class:

class User {
private String name;
private int age;
}

As for C++, we define the class User with two data members, name and age. As mentioned before, these are usually referred to as members, data members, or fields. The string type in Java is named String—the data type we use for the member name above.[4] Note that, unlike C++, a class definition in Java does not need a terminating semicolon.

Whereas for C++, leaving the access control modifier unmentioned meant that the data members were in the private section of the class. For achieving the same effect in Java, the modifier private must be made explicit in the manner shown.

As was the case with the C++ example of the previous section, in order to create objects from a class, the class needs a constructor. Here is a more useful definition of the User class in Java with a constructor included:

class User {
private String name;
private int age;

public User(String str, int yy) {name = str; age = yy;} //(A)
}

The access control modifier public in line (A) serves the same purpose as it did for C++—it allows the rest of your program to create objects of type User. As before, no return type is specified for the constructor.

Now that we have available to us a constructor for the class User, we can create objects of this type by invoking the following form

User u = new User( "Zaphod", 119 );

The invocation on the right creates a new object of type User and then returns a reference to the newly created object. Subsequently, the assignment operation causes the variable u to hold this reference.

We will now include in the class definition a print function:

class User {
private String name;
private int age;

public User( String str, int yy ) { name = str; age = yy; }

public void print() { //(B)
System.out.println( "name: " + name + " age: " + age );
}
}

In the previous chapter, we briefly alluded to the Java method System.out.println for displaying information on your terminal.

As was the case with C++, a member function such as print() in line (B) above is invoked on an object, as in the following example:

User u = new User( "Zaphod", 119 );
u.print(); // name: Zaphod age: 119 //(C)

where ‘.’ in line (C) is the member access operator. This is the only operator available for member access in Java.

Here is a working Java program that does the same thing as the C++ programs of the previous section:

--------------------------------------------------------------------------------
//User.java

class User { //(D)
private String name;
private int age;

public User( String str, int yy ) { name = str; age = yy; }
public void print() {
System.out.println( "name: " + name + " age: " + age );
}
}

class Test { //(E)
public static void main( String[] args ) {
User u = new User ("Zaphod", 23 );
u.print();
}
}
--------------------------------------------------------------------------------

Note that we now have two classes, in lines (D) and (E), defined in the same file called User.Java. In keeping with our explanation in the last chapter, we compile this file by using the invocation

javac User.java

The compilation will deposit the bytecode for the classes User and Test in the files

User.class

and

Test.class

Of the two classes, only Test is executable since it contains main. We execute the class by

java Test

OO Programming

What is object-oriented programming?

Although the answer to this question will reveal itself as you work your way through this book, at this juncture it might be useful to draw parallels between object-oriented programming (OO) and the world around us. You are unlikely to dispute the assertion that during the last half century the following facts about societies have become amply clear: Societies function best when centralized control is kept to a minimum; when the intelligence needed for the smooth functioning of a society is as distributed as possible; when each person is sufficiently smart to know for himself or herself how to make sense of the various norms and mores of the society for the common good; and when the higher-level organizational structures, often organized in the form of hierarchies, facilitate the propagation of society-nurturing messages up and down the hierarchies.

Large object-oriented programs are no different. The idea is to think of large software (sometimes consisting of millions of lines of code) as consisting of a society of objects: objects that possess sufficient intelligence to interpret messages received from other objects and to then respond with appropriate behavior; objects that inherit properties and behaviors from higher-level objects and permit lower-level objects to inherit properties and behaviors from them; and so on. Just as decentralization of human organizations makes it easier to extend and maintain the various societal structures (because the intelligence needed for such maintenance and extension resides locally in the structures), a decentralized organization of software allows it to be extended and maintained more easily. If as a programmer you are not happy with the objects supplied to you by a software vendor, in most cases you'd be able to extend those objects with relative ease and customize them to your particular needs. And if any problems developed in one of the components of a large decentralized organization of objects, your troubleshooting would be easier because of its localized nature—this would apply as much to a society of people as it would to a society of software objects.

A discourse concerning societies is made more efficient if we group together all those objects that share common characteristics. We could then refer to such groups as classes. For example, all people engaged in the delivery of healthcare have to have certain common professional attributes. We could say that these common attributes define the class health-care professional. All medical doctors—the class medical doctor being a subclass of the class health-care professional—must possess the attributes of all health-care professionals; they must also possess additional attributes by way of specialized education and training.

This analogy carries over directly to software design based on objects. All objects that possess the same attributes and exhibit the same behaviors are grouped into a single class. In fact, we first define a class and then create individual objects by a process known as instantiating a class. All objects that possess the attributes and behaviors of a previously defined class, possessing at the same time additional more-specialized attributes and behaviors, are represented as a subclass of the previously defined class.

What good does OO do?

Over the years, object-oriented programming has become the preferred style of programming for graphical user interfaces (GUI)—so much so that even when using languages that do not directly support object orientation (such as C), programmers create software structures that simulate OO for GUI programming. Probably the most famous example of this is the GNOME/GTK+ toolkit for GUI design; it's all in C, yet it is "very OO" in its programming style and structuring. For purposes of comparative presentation, where the main focus is, of course, on C++ and Java. OO is also making strong inroads into database and network programming.

How do I master it?

It takes a three-pronged strategy to master the OO paradigm for solving actual problems involving large and complex systems. You must, of course, learn the syntax specific to the languages. Clearly, without a working level familiarity with all the major constructs of a language, you may not be able to bring to bear the most effective tools on a problem. This, however, does not mean that you must memorize all of the syntax. For example, it would be impossible to commit to memory all of the different Java classes and the attributes and the functions associated with each of the classes. Fortunately, it is not necessary to do so in this age of web-based documentation. A standard approach to Java programming is to display in one window the extremely well-organized on-line Java documentation while you are constructing your own program in another window.

In addition to the syntax, you must master for each language the concepts of encapsulation, inheritance, and polymorphism, as these three concepts form the cornerstones of a truly OO language. How each concept works varies in subtle ways from language to language. For example, C++ permits multiple inheritance which gives a programmer certain freedoms, but with an increased risk of writing buggy code. On the other hand, Java forbids multiple inheritance in the sense permitted by C++, but allows for a class to inherit from any number of interfaces. Similarly, the access modifiers that allow you to encapsulate information in a class with different levels of access work slightly differently in C++ and Java. Additionally, Java has the concept of a package that has a bearing on access control—a concept that does not exist in C++. Polymorphism allows a subclass type to be treated like a superclass type. Although it works essentially the same in all major OO languages, the manner in which it is invoked can place important constraints on programming. In C++, for example, polymorphism can only be invoked through pointers, a fact that can have a large bearing on how you might refer to an object in a program.

The last of the three-pronged strategy deals with learning OO design. As with all design activity, there is a certain mystique associated with it. This is not surprising, because it would be impossible to enunciate the design principles that would span all possible problems, those already solved and those yet to be solved. Much of learning how to design an OO solution to a large and complex problem is a matter of experience, aided perhaps by examining good OO code written by other people. Nonetheless, the accumulated wisdom over the years now dictates the following approach to the development of expertise in OO design: (1) mastering a "meta" language, such as the Unified Modeling Language (UML), that allows you to express your design visually at a conceptual level; and (2) learning the design patterns, these being template solutions to a host of subproblems likely to be encountered during the evolution of an OO program. Regarding design patterns, there is no specific chapter devoted to it, although the example code presented includes the implementation of some of the patterns. For a reader wanting to pursue more deeply both UML and the topic of design patterns, there are excellent books available on both [7, 13, 20, 21].

SIMPLE PROGRAMS

SUMMING AN ARRAY OF INTEGERS

Let's say you want to add 10 integers that are stored in an array. A C program for doing this would look like



/* AddArray1.c */

#include

int addArray (int [], int);

main()
{
int data[] = {4,3,2,1,0,5,6,7,8,9}; /* (A) */
int size = sizeof (data) /sizeof (data [0]); /* (B) */
printf("sum is %d/n", addArray( data, size )); /* (C) */
return 0;
}

int addArray( int a[] , int n ) { /* (D) */
int sum = 0;
int i;
for(i=0; i sum += a[i] ; /* (E) */
return sum;
}


Line (A) of main declares an integer array data and initializes it as shown. Line (B) figures out the size of the array. The function addArray is called in line (C) to sum up all the integers in the array.

If there is anything noteworthy about this program at all, it lies in the fact that an array name in C (and also in C++) is treated like a pointer in some contexts. Whereas data is an array name when supplied as an argument to the operator sizeof in line (B), it is a pointer to the first element of the array when supplied as an argument to the function addArray in line (C).

Contrast this with the fact that the array name a in the called function addArray in line (D) is merely a pointer, in the sense that sizeof(a) computed anywhere inside the function addArray will return 4 for the four bytes it takes to store a memory address on many modern machines. On the other hand, sizeof ( data ) in line (B) will return 40 for the 40 bytes that it takes to store the 10 integers of the array data, assuming that your machine allocates 4 bytes for an int.[1]

So when main calls addArray in line (C), the memory address that is the value of data when treated as a pointer is assigned to the parameter a in line (D) and that the array itself is not copied. Subsequently, the function addArray visits each element of the array in line (E) through the memory address assigned to a and adds the element to the sum.

A more explicitly pointer version of the addArray function is shown in the following program that does the same thing:



/* AddArray2.c */

#include

int addArray( int*, int );

main()
{
int data[] = {4,3,2,1,0,5,6,7,8,9};
int size = sizeof(data)/sizeof(data[0]);
printf("Pointer Version: sum is %d\n", addArray( data, size ));
return 0;
}

int addArray( int* a, int n ) {
int sum = 0;
int i;
for(i=0; i sum += *a++;
return sum;
}


The two programs shown above are essentially identical because, as mentioned already, declaring a function parameter to be an array (the first program) is the same as declaring it to be a pointer (the second program).

Now let's consider a C++ program for doing the same thing:




//AddArray.cc

#include //(A)
using namespace std; //(B)

int addArray( int*, int );

int main()
{
int data[] = {4,3,2,1,0,5,6,7,8,9};
int size = sizeof(data)/sizeof(data[0]);
cout << "C++ version: sum is " //(C)
<< addArray( data, size ) << endl;
}
return 0;
}

int addArray( int* a, int n) {
int sum = 0;
int i;
for(i=0; i sum += *a++;
return sum;
}


This program shows this book's first use of an "object" in C++. The object is cout in line (C). This is an output stream object whose name is usually pronounced "c-out" as an abbreviation for "console out." This object knows how to send information to the standard output stream, which would generally be directed to the window of the terminal screen in which you are running your program. All objects in OO programming belong to some object class. The output stream object cout belongs to the class basic_ostream that is defined in the library header file iostream included in the program in line (A).

The header iostream is one of the many header files that constitute the C++ Standard Library.[2] This library is a culmination of the effort of the International Standards Organization (ISO) and the American National Standards Institute (ANSI) for the standardization of the C++ language. A significant portion of the C++ Standard Library includes what is informally referred to as the Standard Template Library (STL). STL consists of container classes for holding collections of objects and classes that play supporting roles for using the container classes. The Standard Library also includes the header file string that we will be using very frequently in this book for representing and processing C++ strings. Other header files in the C++ Standard Library contain classes for memory management (new and memory); representing exceptions (exception and stdexcept); representing complex numbers (complex); run-time type identification (typinfo;) and so on.

Although this point will become clearer after we have presented the idea of a namespace in Chapter 3, the directive

   using namespace std;

in line (B) of the program takes account of the fact that all the identifiers (meaning the names of classes, functions, objects, etc.) used in the C++ Standard Library are defined within a special namespace known as the standard namespace and designated std. If we did not invoke this directive, we would need to call the output stream object by using the syntax std:: cout.

The symbol ‘<<’ in line (C) is called the output operator or the insertion operator. This operator, defined originally as the left bitwise shift operator, has been overloaded in C++ for inserting data into output stream objects when used in the manner shown here.[3] The ‘<<’ operator does formatted insertions into an output stream object. What that means is that if the operator is asked to insert an int into an output stream object, it will translate the four bytes of the int into its printable character representation and then insert the character bytes into the output stream object.

You can comment code in C++ the way you do it in C, that's by using the delimiters /* . */. You can also comment individual lines, or the trailing part of a line, by //. The compiler will not see on that line any characters past //.

Note that, as indicated by the commented out statement at the beginning of the program, Unix-like platforms require the name of a file containing the C++ source code to end in the suffix .cc. One can also use the suffix .C or the suffix .cpp. To compile this program, you'd say

   g++ filename

The compiler will deposit an executable file called a. out or a.exe in your directory. This assumes that you are using the GNU C++ compiler. This compiler comes prepackaged with Unix and Linux distributions, although, if needed, you could download the latest version from the Free Software Foundation (http://www.gnu.org). If you are using a PC and you do not have access to a pre-loaded C++ compiler, you can download the GNU compiler (and other very useful Unix-emulation utilities for Windows) from the site http://sourceware.cygnus.com/cygwin/. For Solaris platforms, you should also be able to use the CC compiler via the invocation

   CC filename

where, again, the name of the file must end in either ’.C’ or ’.cc’ or ’.cpp’. As with g++, the compiler will deposit an executable file called a.out or a.exe in your directory.

For another point of difference—a difference regarding style—between the C programs we showed at the beginning of this section and the C++ program above is in the header of main. The main in both C and C++ programs returns a status code, which is 0 if the program terminates normally and a nonzero integer to indicate abnormal termination. By tradition, the return type of main in C programs is left unmentioned—it being int by default. On the other hand, C++ requires a program to explicitly mention the return type int of main.

Now let's see how one would write a Java program for doing the same thing:



//AddArray.java
public class AddArray { //(A)
public static void main( String[] args ) //(B)
{
int[] data = { 0, 1, 2, 3, 4, 5, 9, 8, 7, 6 }; //(C)
System.out.println( "The sum is: " //(D)
+ addArray(data) );
}
public static int addArray( int[] a ) { //(E)
int sum = 0;
for ( int i=0; i < a.length; i++ )
sum += a[i];
return sum;
}
}


As shown in the commented out line at the beginning of the program, this source code resides in a file called

   AddArray.java

The program begins with a class declaration:

    public class AddArray {
....
....

In Java, functions can exist only inside classes. So even though using a class for the simple task for which we are writing the program seems rather excessive, there is no choice.

Note that the name of the file before the suffix java is the same as the class name, AddArray. Ordinarily, this is necessary only if a class is declared to be public. A file containing Java classes is allowed to have no more than one public class. If no classes in a file are public, the file can be given any name, but, of course, it must end in the suffix .java. To compile this file, you invoke the Java compiler by

   javac AddArray.java

The compiler outputs what's known as the bytecode for the class and, in this case, deposits it in a file called

   AddArray.class

This bytecode is machine-independent, unlike the executables for C or C++ programs, and can be run by another program called the Java Virtual Machine (JVM). A JVM will execute the program either in the interpreted mode or by first converting the bytecode into a machine-dependent executable using a second round of what is known as just-in-time (JIT) compilation and then executing the binaries thus obtained. The latter is the default mode and results in a tenfold increase in execution speed over the interpreted mode. For the bytecode file named AddArray. class, a Java Virtual Machine is invoked by

   java AddArray

Before you can compile and run a Java program, you may have to tell the system how to find the classes you created with your program. The default is your current directory. But if you wanted to compile a class that was stored in some other directory, you have to tell both javac and java tools how to locate the class. The preferred way to do this is by using the -classpath option when invoking the javac compiler or the java application launcher. The -classpath option is also needed even if you are trying to compile a Java program in the directory in which it resides if your program uses other Java classes, your own or written by a third party, that reside in other directories.[4]

Suppose your program uses third-party classes that are stored in directories directory_1 and directory_2,[5] you'd want to invoke javac and java with the following syntax on Unix and Linux platforms:

   javac  -classpath .:directory_1:directory_2 sourceCode.java

java -classpath .:directory_1:directory_2 className

and on Windows platforms by[6]

   javac  -classpath .;directory_1; directory_2 source.java

java -classpath .;directory_1; directory_2 className

where the symbol ’.’ is used to designate the current directory where presumably the main application resides. Note that the delimiter between the directories for Unix and Linux platforms is the character ’:’ and for the Windows platform the character ’;’. The third-party classes, or, for that matter, even your previously programmed classes may come packaged in the form of an archive called the JAR archive.[7] If that's the case, you'd need to specify the pathname to such archives in your classpath specification, as for example in

   javac -classpath .:/path_to_archive/archive.jar your_program.java

java -classpath .:/path_to_archive/archive.jar your_class_name

If the classpath strings become too long, you can create shell files containing the above invocations on Unix and Linux platforms. On Windows platforms, the same is accomplished by using batch files. On Unix and Linux platforms, it is also possible to set up aliases for the compiler and the application launcher that include the classpath string.

On Unix and Linux platforms and on some of the older Windows platforms, instead of using the classpath option as shown above, it is also possible to set the CLASSPATH environment variable. For example, if you are using either the csh or the tcsh shells, you can define a classpath by, say, including the following in a .cshrc file,

   setenv CLASSPATH .:directory_1:directory_2:....

which would create the same classpath setting as our earlier examples. If, on the other hand, you are using either sh, ksh, or bash, you can achieve the same effect by including the following strings in your .profile file:

   CLASSPATH=.:directory_1:directory_2:....
export CLASSPATH

If desired, you can "unset" the value of the environmental variable by invoking unsetenv CLASSPATH in csh and tcsh and by invoking unset CLASSPATH in sh and ksh.

Even if you use the CLASSPATH environment variable, you may still have to use the -classpath option as shown previously to customize the classpath for a particular application. The classpath as set by the -classpath option overrides the classpath as set by the environmental variable. Note again the importance of including the character ’.’ in the CLASSPATH environment variable since, as was the case with the -classpath option, setting the environment variable overrides the default.

Getting to the program itself, the code that is inside the class definition is very much like the C or the C++ code we showed earlier. We have the function main() in line (B) and the method[8] addArray() inside the class definition in line (E). In Java, any class can include main(). When a class includes main(), the class becomes executable as an application. Since main() does not return anything in Java, its return type is declared as void. The significance of the labels public and static in the header for main will be explained in Chapter 3. In the body of main, we declare the identifier data as an array of ints and initialize it at the same time, very much like we did for C and C++.

The invocation System.out.println( ) in line (D) is a call to the println() method that is defined for the output stream object out. More precisely, out is a field of type OutputStream defined in the class System. System is a class that comes with the java. lang package.[9] This package is loaded in automatically by the Java compiler. println() is a method defined for the class OutputStream. One could also use the method print() via the invocation System. out.print( ) if it is not necessary to display the output in a separate line of text. The println and the print methods are as defined for the PrintStream class. The argument to these methods must either be a string or a type that Java would know how to convert into a string for display. In our example, the second part of the argument, of type int, gets converted into a string automatically.

The rest of the program consists of the method addArray() in line (E), which is very much like the C++ function of the same name in the earlier program, except for the manner in which the size of the array is determined inside the function. For both C and C++, the size of the array had to be passed explicitly to the function. But in Java, that is not necessary. Arrays in Java are objects that have data members[10] associated with them. The data member that is associated with an array object is length. When we access this data member through the call data.length, we can determine the length of the array data.

Also note from the above examples that C++ and Java have exactly the same way of commenting code. You can either use the C-style comment delimiters /* . */ or //. However, the latter can only be used for comments on a single line, because the compiler will not see any characters past //.

With regard to comments in Java programs, a special tool called javadoc can automatically generate documentation for your program using text that is delimited by /** and */. This tool generates HTML files that can be viewed with a browser.

[1]It is important to bear in mind that while an array name can "decay" into a pointer, it is not a pointer. The extent to which an array name decays into a pointer depends, among other things, on whether an array name is the name of a parameter of a callable function. So, whereas the array name data in main will act like a pointer in some contexts only, the array name a in the function addArray of the program AddArray1. c will act like a pointer in practically all contexts.

[2]The C++ Standard Library consists of these header files: algorithm, bitset, complex, deque, exception, fstream, functional, iomanip, ios, iosfwd, iostream, istream, iterator, limits, list, locale, map, memory, new, numeric, ostream, queue, set, sstream, stack, stdexcept, streambuf, string, typeinfo, utility, valarray, vector. Of these, the following are informally referred to as the Standard Template Library (STL): algorithm, bitset, deque, functional, iterator, list, map, queue, set, stack, valarray and vector.

[3]Operator overloading, discussed in detail in Chapter 12, allows the same operator to be used in different ways. The operands determine as to which meaning of such an operator applies in a given context.

[4]You do not need to specify the classpath for the classes that come with the Java platform. Both the compiler and the application launcher can locate those automatically.

[5]By directory name here is meant the pathname to the directory.

[6]If you are using a Cygnus emulation of Unix on Windows, you may need to place the classpath string between double quotes.

[7]A JAR file in Java is an archive file, just like a Unix tar (tape archive) file. Jar files are created and manipulated by using the Java jar tool. To create a JAR archive of all your classes, including the sources, in your current directory, you'd say

   jar cvf archiveName.jar *.class *.java

To list the contents of a JAR file, you'd say

   jar tvf archiveName.jar

and to unpack a jar archive, you'd say

   jar xvf archiveName.jar

If you don't want to unpack the entire archive, but would like to extract a single class, you'd say

   jar xf archiveName.jar className.java

or

   jar xf archiveName.jar className.class

as the case may be.

TERMINAL I/O

Let's now compare simple C, C++, and Java programs for eliciting information from a user and then printing something out in response on the terminal.

Here is a C program that asks the user to type in a sequence of integers, all in one line. The integers are allowed to be anywhere in a line, not necessarily starting at the beginning, and the entry of the data is considered completed when the user presses ‘Enter’ on the keyboard. The program sums up all the integers and types out the sum.

Therefore, if a user types in

    ####3#######56##20#1#####19########

where # stands for a space, the program should print out 99. The following C program does the job.



/* TermIO.c */

#include

main()
{
int i;
int sum = 0;
char ch;

printf("Enter a sequence of integers: ");
while ( scanf( "%d", &i ) == 1 ) { /* (A) */
sum += i; /* (B) */
while ( ( ch = getchar() ) & = ' ') /* (C) */
;
if ( ch == '\n' ) break;
ungetc( ch, stdin ); /* (D) */
}
printf( "The sum of the integers is: %d\n", sum );
return 0;
}


The integers are read in by the scanf() function call in line (A) and the summing of the numbers done in line (B). Lines (C) through (D) take care of the following property of scanf: Most conversion specifiers for this function skip over the whitespace characters—meaning the tabs, the space, the newline character, and so on—before the beginning of an input item (and not after). Therefore, after consuming an integer, scanf will simply wait for the next integer, ignoring any blank spaces and the end of the data entry line. This creates a problem at the end of the data line when the user hits "Enter"—scanf will simply gobble up the newline character and wait for the next integer. The statements in lines (C) through (D) peek ahead, while consuming blank spaces, and look for the newline character in case the user has hit "Enter" on the keyboard. If the character found after all the blank spaces have been consumed does not turn out to be a newline character, we put it back in the input stream in line (D).

Here is a C++ program that does the same thing:



//TermIO.cc

#include
using namespace std;

int main()
{
int sum = 0;
cout << "Enter a sequence of integers: ";
int i;
while ( cin >> i ) { //(A)
sum += i;
while ( cin.peek() == ' ' ) cin.get(); //(B)
if ( cin.peek() == '\n' ) break; //(C)
}
cont << "Sum of the numbers is: " << sum << endl;
return 0;
}


This program uses the input stream object cin whose name is usually pronounced "c-in" for "console-in." This object is of type istream and it knows how to read data from a user's terminal. The expression in line (A)

   cin >> i;

causes the input operator ’>>’, which is also known as the extraction operator, to extract one int at a time from the input stream object cin. As the user makes keystrokes, the corresponding characters are entered into the operating system's keyboard buffer and then, when the user hits the "Enter" key on the keyboard, the operating system transfers the contents of the keyboard buffer into the cin stream's internal buffer. The operator ’>>’ then extracts the needed information from this buffer. Clearly, the program will block if the user did not provide the necessary keystrokes.[11] The operator ’>>’, originally defined to be the right bitwise shift operator, has been overloaded in C++ for extracting information from input stream objects when used in the manner shown here. Because the operator has been overloaded for all the built-in data types, it can be used to extract an int, a float, a double, a string, and so on, from an input stream object.

To understand the controlling expression in line (A) of the while loop:

   while ( cin >> i )

the expression

   cin >> i

returns the input stream object itself, meaning cin. However, the returned cin will evaluate to false when either the end-of-file is encountered or when the extraction operator runs into an illegal value. As an example of the latter case, if you were trying to read a floating point number into an int variable, the extraction operator, when it runs into the decimal point, would place the input stream object in an error state and cause cin to evaluate to false.

Lines (B) and (C) deal with the fact that the default behavior of the extraction operator ’>>’ skips over the white space characters, which includes blank space, tabs, newlines, and so on. So the controlling expression in line (A) will not by itself stop the while loop when the user hits "Enter" after entering the desired number of integers in a line. We invoke[12] peek() on the object cin in lines (B) and (C) to ascertain the character immediately after the most recently consumed integer. If it's a blank space, we consume it in line (B) by invoking get() on the object cin; and do the same to all the successive blank spaces until there are no more blank spaces left. If the next character is a newline, it would be trapped in line (C) and the while loop of line (A) exited. Otherwise, we continue reading the data in the next iteration of the loop.

This program also demonstrates that, unlike in C, C++ allows you to declare a variable anywhere in a program. We declared the variable i after the cout statement asking the user to enter data.[13] This feature improves the readability of large C++ programs as one can declare variables just before they are actually needed.

We will now show an equivalent Java program:



//TermIO.java

import java.io.*;

class TermIO {

static boolean newline; //(A)

public static void main( String[] args ) {
int sum = 0;
System.out.println( "Enter a sequence of integers:" );
while ( newline == false ) {
String str = readString(); //(B)
if ( str != null ) {
int i = Integer.parseInt( str ); //(C)
sum += i;
}
}

System.out.println( "Sum of the numbers is:" + sum );
}
static String readString() { //(D)
String word = "";
try {
int ch;
while ( ( ch = System.in.read() ) == ' ' ) //(E)
;
if ( ch == '\n' ) { //(F)
newline = true; //(G)
return null; //(H)
}
word += (char) ch; //(I)
while ( ( ch = System.in.read() ) != ' '
&& ch != '\n' ) //(J)
word += (char) ch; //(K)
if ( ch == '\n' ) newline = true; //(L)
} catch( IOException e ) {}
return word; //(M)
}
}


Since Java does not provide a function that can directly read an integer value into an int variable, the logic of the program is slightly more complex than that of the C and the C++ programs shown earlier. To make sense of this program, recall that the program is supposed to extract the integer values from the numbers entered by a user in a single line and the user is allowed to place any number of spaces before the first integer, between the integers, and after the last integer. In other words, we want our program to be able to extract integer numbers from the following sort of a line entered by a user:

    ####3#######56##20#1#####19########

where the symbol # stands for a space. To explain the working of the program:

The program reads each integer value entered by the user as a string which is of type String, a Java class This is done by invoking the method readString() in line (B). Therefore, for the data entry line shown above, the first string read will correspond to the number 3, the second to the number 56, and so on.

If the string read in the previous step is not null, we invoke the method parseInt of the Integer class in line (C) to convert the string into its integer number value, assuming that the reader did not try to fool the system by typing nondigit characters.[14]

With regard to the readString() method, we strip off all the empty spaces before a string in the while loop in line (E). Each character is read from the user's terminal in this loop by the read() method that of the java.io.InputStream class. The standard input stream System.in is an object of type java.io.InputStream. We invoke the method read() inside a try-catch block since it throws an exception of type IOException that must either be caught or rethrown.

If the last character read in the while loop in line (E) is the newline character, the test in line (F) causes readString to terminate with null for the returned value in line (H). We also set the newline variable to true in line (G) to tell main that the data entry has come to an end. If the last character read in the while loop in line (E) is not the newline character, we then start a new word with this character in line (I).

The while loop that starts in line (J) keeps on adding fresh characters to the word started in line (I) as long as a new character is neither a space nor a newline. If the latest character read in the while loop of line (J) is a newline, we set our flag newline to true in line (L).