frequently asked questions on lrdev

contents: C/C++ / Java / assembly / patterns / Unix / math / wine / miscellaneous / ask

C/C++ programming languages

What is ISO/IEC 9899:1999? ISO/IEC 9899:1990? Where can I find ISO/IEC 9899?
The ISO/IEC 9899:1999 standard defines the C programming language. Older versions of this standard are known as ISO/IEC 9899:1990, ANSI/ISO 9899-1990, ANSI-C or K&R2 [The C Programming Language, Brian W. Kernighan, Dennis M. Ritchie, Prentice Hall, 2nd Ed. 1988, ISBN 0-13-110362-8]. ISO standards are not free. They can be bought [online, as PDF, too] at ISO. [ ISO/IEC 9899:1999 order sheet / entry point for C programming language information / C programming language coding guidelines with background information ]
What is ISO/IEC 14882:2003? Where can I find it?
This ISO standard defines the C++ programming language [The C++ Programming Language, Bjarne Stroustrup, Addison-Wesley, 3rd Ed. 1997, ISBN 0-201-88954-4]. See question above.
What's the C programming language definition? C programming language reference? What's the C++ programming language definition? C++ programming language reference?
See the two questions above [for books and ISO standards].
Do you feature C coding guidelines? C coding standards? Do's and don't's in C? C coding styles?
Yes, see C programming language coding guidelines.
Do you feature C language program samples? Did you implement projects using the C language?
Yes, please see the lrdev software page for some C programming language (among other programming languages) samples. C++ samples / projects are also featured there.
Do you feature C programming language program tutorials? Free courses in C programming?
No. But there's a short overview of the C programming language, with some footnotes.
What are the C programming language keywords?
auto break case char const continue default do double else enum extern float for goto if int long register return short signed sizeof static struct switch typedef union unsigned void volatile while [BK/DR,1988,p.192,A2.4] ; [9899:1999] additionally defines inline restrict and the [partially optional] types _Bool _Complex _Imaginary [ questions on: auto (stack alignment), const (orthogonality to volatile), int (string to int), sizeof (orthogonality), struct, typedef, union, volatile (orthogonality to const), / keywords vs. reserved words ]
What are the C++ programming language keywords?
and and_eq asm auto bitand bitor bool break case catch char class compl const const_cast continue default delete do double dynamic_cast else enum explicit export extern false float for friend goto if inline int long mutable namespace new not not_eq operator or or_eq private protected public register reinterpret_cast return short signed sizeof static static_cast struct switch template this throw true try typedef typeid typename union unsigned using virtual void volatile wchar_t while xor xor_eq [BS,1997,p.794,A.2] [ questions on: auto (stack alignment), const (orthogonality to volatile), int (string to int), operator (precedences), sizeof (orthogonality), struct, typedef, typeid, union, virtual, volatile (orthogonality to const), and/and_eq/bitand/bitor/compl/not/not_eq/or/or_eq/xor/xor_eq, ]
What is the difference between programming language keywords and reserved words? Reserved words in C language?
Reserved words cannot be used as identifiers, whereas keywords signify special semantics (e.g. flow control [the Lisp programming language accordingly calls constructs with those keywords special forms]). In a reasonable programming language (such as C), [all] the keywords are reserved words, since that makes code easier to read [and less error-prone, and easier to parse]. Fortran's keywords are not reserved words [which makes parsing more difficult here and introduces a new source of errors]; Java's reserved word goto is not strictly a keyword [since it is only reserved but not used in that language; however it is listed as a keyword {to allow compilers issue more informative diagnostics and minimize Java/C++ porting problems}]. The reserved words in the C programming language are [as mentioned] its keywords; additionally some identifiers must not / should not be used, e.g. such beginning with the underscore character [reserved for the C system/implementation/library]. [ C keywords / C++ keywords / Java keywords / Turtle keywords / coding guidelines about identifier names / coding guidelines about name space / scope and identifier naming ]
What are digraphs? Trigraphs? What is bitand?
Digraphs (<% %> <: :> %: meaning {}[]#) and trigraphs (??= ??( ??< ??/ ??) ??> ??' ??! ??- ??? meaning #[{\]}^|~?) support restricted character sets (that are obviously missing one or more of the characters #[\]^{|}~). C++ supports keywords (and and_eq bitand bitor compl not or or_eq xor xor_eq not_eq meaning && &= & | ~ ! || |= ^ ^= != {more consistently [but less conveniently] and_eq would be bitand_eq, or_eq bitor_eq, xor bitxor, and xor_eq bitxor_eq}) that can replace !&^|~. [Found in iso646.h.] However, one cannot do without the special characters "%'()*+,-./:;<=>?_ {unless with obfuscating practices}. Some of the ASCII characters ($@`) are not used in C/C++ programming languages. [ digraph-hello-world.cc / trigraph-hello-world.c ]
What are the C/C++ operator precedences? What are operator precedences?
The C++ operator precedences can be found here. Since [in C/C++] all expressions establish a [r-]value (including e.g. assignments), expressions can be [deeply] nested and it turned out useful to define many levels of reasonable precedences [to mostly suppress the need for parentheses in expressions]. The operators have an associativity as well (e.g. the assignments: right to left). In C++ (as opposed to C), the ternary operator seems to have its precedence switched with the assignments' (a typo?) [you might want to use parentheses here, to be portable and robust]. The bit operators have an unexpected low precedence for historic reason [predecessor to the logical operators]. Precedences typically do not define an order of evaluation, as they do not establish sequence points [a useful exception here are e.g. the logical operators]. [ operators in the C programming language overview ]
What kinds of operators does the C language define?
One can distinguish the following kinds of operators: arithmetic operators, bit operators, logical operators, relational operators, assignment operators, data selection operators, referencing/dereferencing operators, function call operator, explicit conversion operator, increment/decrement operators, size of an expression/type operator, ternary operator and sequence operator (see this overview). [ operator precedences ]
What is the atoi syntax? What is the ... syntax?
atoi's and other's syntax and synopsis can usually be found by means of 'man atoi' (Unix manual command [see question on unix man too]) or what's appropriate on your system. [Unfortunately atoi does not have a dedicated error return value (0 is also a valid non-error return value); for that reason, the better strtol, which features an (optional) additional reference parameter and a parameter denominating the input base, should be used.] [ integer conversion question / how to do it in C++ ]
How is qsort used? How to program qsort compare-functions?
extern void qsort(void* base, size_t count, size_t width, int (*compare)(void const* a, void const* b)); is called with an array of data to sort [base], the number of elements to sort [count], the size of a single element in bytes [width; an array element may be a large structure or just a pointer], a compare function [compare], some of which might be expressed with sizeof. A typical qsort compare function [helper function] looks like [T being a data type, e.g. struct stat]: static int cmp_func(void const* a, void const* b) { T const* aa = (T const*)a; T const* bb = (T const*)b; if (aa->x != bb->x) { return (aa->x > bb->x ? 1 : -1); } if (aa->y != bb->y) { return (aa->y > bb->y ? 1 : -1); } ... return 0; }; all data accesses are const, the helper function can be static [i.e. not globally visible], the parameters are passed as generic [void] pointers and are cast to the target type [while remaining const], the most significant data element is compared first, then, if it's equivalent, the next one, etc; zero return means 'equivalent'. If there is an additional level of indirection (as with an array of pointers to data), T const* aa = *(T const**)a;, etc is chosen. Note that since typical qsort implementations are not stable/preserving [i.e. an existing order in groups of elements of equivalent sort key may/will change], the compare functions often compare up to a unique key [e.g. when sorting directory entries by time or size: sort by that and, if equal, by {unique} name]. [ what is quicksort / qsort's drawbacks / sort-up-to-a-unique-key pattern ]
What are qsort's drawbacks?
One drawback of quicksort implementations is that they are often/usually not stable. A drawback in C's qsort's interface is the missing user data (instance data for the compare functions), that is usually passed to callback functions.
What is strtok? Strtok nesting?
strtok is a simple string tokenizer [used for lexical analysis / parsing]. For several reasons it's often not the appropriate tool: strtok is not reentrant (therefore can't be nested: a second tokenizer instance would overwrite first's {global!} state) [see question on reentrancy] (although some platforms [additionally] provide a reentrant version of this function {e.g. [Solaris'] strtok_r}), implements only limited patterns and the buffer to be parsed is modified. Typically it's more convenient [for a parser] to return the whole [parsing-] state to the caller and use an end-index instead of [nul] string termination [lrctpz.zip:ctpzut.cc:SetParser::enumerate (C++ sample) implements such a parser pattern, with the help of a (simple) StringRange helper class]. Java's nio's Buffer, with its 'viewed buffer' [layering] abstraction, implements this useful pattern and multiple indices ['capacity', 'limit', 'position', 'mark'] too.
What is the meaning of reentrant? What are the problems with reentrancy?
A function may [should!] be reentrant, which means that it may be invoked again (re-entered), while it is still referenced by one of the program-flow contexts (stack frame, thread context). Synchronous reentrancy pitfalls include static state [i.e. static data (static variables), or global data], which is overwritten by multiple function invocations (either explicit, implicit [i.e. indirect; often difficult to tell, see errno problem], or through recursion). Asynchronous reentrancy pitfall situations include multithreading and signals [or interrupts, e.g. in interrupt-based embedded systems]. It doesn't usually matter if the underlying hardware is a single-processor system or a multi-processor system; however the chances of asynchronous problems occurring are typically higher on multi-processor systems [due to their finer-granular concurrency {not just interruptions by task-switches, only every microsecond or so}]. Libraries may contain non-reentrant functions, that either require to keep state [libc: e.g. strtok], or just are implemented impure [which may be considered a bug]. [ reentrant lock ]
What is it about scope?
Scope is important. In C there's application, file, function and block scope that can be applied on functions, variables, types and macros. Using optimal scope is a means for [good] encapsulation [an OO-paradigm], which makes code frameworks less complex to maintain. Scope tends to influence [identifier] naming: the smaller the scope, the less strict naming rules are. C++ additionally knows of class member access modifiers [private, protected, public, friend] that control scope in a class relationship manner. [ coding guidelines' scope chapter / Java's access modifiers ]
What is a typedef?
C's typedefs introduce type aliases [used as portability types]. They are typically not or only less used in C++ [the more powerful class construct is used there, with user defined cast operators, and possibly polymorphism (both of which can be used to express tight relationship)].
What is a typeid?
C++'s typeid operator returns a class type_info reference, which, as the name suggests, carries data type information. type_info provides a printable name [as string] and comparison operators [to check for type equality and make type information data sortable]. type_info is part of C++'s run time type information (RTTI) support [as dynamic_cast also is]. RTTI support instantiates several helper data structures per class [which are usually linked into the classes' virtual tables] and sometimes requires extra compiler options.
What is the meaning of C++'s 'virtual' keyword? Virtual table layout? Virtual inheritance?
'virtual' means 'indirection'; either in method invocation or in composite object access. These indirections are needed for C++ to implement polymorphism and diamond-shaped inheritance. See this for background information and some assembly dumps. [Additional keywords: vftable, vbtable, vmtable.]
Is the virtual table layout compiler specific?
Yes, even compiler option specific. However, table layout will be similar with other compilers or other compiler settings, since there is one quite optimal way (concerning code size and speed) how to do it. [ virtual / virtual tables ]
What are asserts? Assertions?
Asserts are a compact means to make sure software conditions are met, often to assure program-internal logic. Asserts are more often used in C than C++ [since the latter offers an exception handling system that allows compact error handling as well {i.e. exceptions need not be handled at every function-call level, and uncaught exceptions fall through to the top (and have similar consequences there as the assert macro)}]. Often asserts are used to assure preconditions and postconditions, to assure interface contracts. Data [object] invariants are also candidates for asserts. [ coding guidelines' assertions chapter / sample that asserts some internal logic / question about code robustness ]
How do I check for integer overflows?
C doesn't provide any mechanisms to catch integer operation overflows. Usually the problem domain (the used number range) should be well known and input data sufficiently checked to avoid integer overflows. Input can be manually checked before or after operations, as e.g. done in strtol implementations. Most processors detect overflow conditions and on machine-level provide opcodes to trap/interrupt (and such to [just] branch) on these conditions [e.g. sparc: tvs (trap on overflow set) and e.g. the {esoteric} taddcctv (tagged add and modify integer condition codes and trap on overflow); x86: into (interrupt on overflow)]. The carry flag and the sign flag aren't directly accessible from the C language either.
How do I check for buffer overflows?
Make sure buffer size information is handed through to each function using the buffer. Make sure it's used there. E.g.: Don't use 'gets' (use 'fgets' instead). Do manual reviews that ensure these properties. [Note that Java has its virtual machine checking for buffer overflows {which makes Java a more robust language}.] [ coding guidelines' buffer sizes chapter / stack growing down leading to more severe buffer overflows ]
Can we overwrite functions in C? Can we overload functions in C?
Function overwrite, i.e. re-implementing functions with different signatures (parameter number and/or types), is not supported in C. Overloading can be done manually with structs and function members [see e.g. OpenSSL source code].
How do I convert strings to integers? How do I convert hexadecimal to binary? Int to hex?
C [compiler] supports only decimal, hexadecimal and octal numbers as number literals (in the notations 10, 0x10 and 010). [As number literals typically shouldn't be used in code anyway (see here), this restriction isn't too severe.] A C programmer typically isn't concerned with a machine-internal number representation [two's complement, bcd, endianess, etc] either (lower level libraries are). The conversion from [external] string representation to internal number representation should be accomplished by strtol (which features an input base parameter and an optional scan-end-position reference parameter [which can be used for error checking and further scanning]; possible overflow is checked too [since ints, longs, etc can only hold a limited number range]). [Don't use scanf and atoi, for different reasons; see 'atoi' question too.] Conversion from internal representation to a string ('printing') is usually done with sprintf (better, if available: snprintf). [s][n]printf unfortunately offers only the d, x and o (decimal, hexadecimal and octal) format specifiers, i.e. not all representable bases are supported [unlike e.g. with Lisp, that offers the optional :base and :radix parameters and the ~nR print-format; input can be specified by #nr... {e.g. #3r...}, #x..., #o... or {binary!} #b...]. Numbers' string representations usually allow bases from 2 to 36 (inclusive) [using 10 digits and 26 latin letters]. [ coding guidelines' chapter on sign extension / how to convert to integer in C++ ]
How do I convert strings to integers in C++?
As C++ is [mostly] a superset of C, the C conversion methods can be applied too. However for [generalized] scanning and printing C++ offers the type safe istream/ostream operator>>/operator<< mechanisms. E.g. to convert from string, use istrstream s(input); int i; s >> i; if (s.tellg() == 0) ... [scan position advancing used to check for parse errors]. Note that the ios class, from which istream and ostream [and istrstream, etc] derive, does not implement an input/output base, but only some special bases [dec, hex, oct, implemented as flags]. [ C conversion methods; additional background information ]
Why do I get this ominous ffffffe4 instead of e4 when printing a hexadecimal representation of a character? Why does it read ZÿFFFFFCrich instead of Zürich in mails I receive?
There is a preliminary explicit cast missing in the data type conversion from char to a larger numerical type [e.g. to an int; see here]. Often chars are signed and widening will extend that sign, so [non-ASCII] characters above 0x7f will result in negative ints [with leading 1-bits {ffffff prefix}]; e.g. a MIME-quoted-printable encoding '=FF' in this case may be interpreted as 'ÿ' [y-umlaut; in iso-latin-1]; an URL-encoding '%FF' may lead to the same.
How do I calculate an integer's binary logarithm?
The binary logarithm {lb(1) == 0, lb(2) == 1, lb(4) == 2, ...} can either be calculated iteratively (by dividing by two until zero is reached; or comparing to powers of two {1<<0, 1<<1, 1<<2, ...}) [runtime O(n), where n is the bit size of an integer {e.g. 32}], or by a binary search [through an array of the powers of two] [runtime O(lb(n)), {e.g. lb(32) == 5}]. Taking advantage of the usual two's complement integer representation, one may scan for the highest bit [runtime O(n)], or apply bit patterns [that implement a specialized binary search {a binary binary search?;-)}] [e.g. on 32-bit platforms bt(0xffff0000u); bt(0xff00ff00u); bt(0xf0f0f0f0u); bt(0xccccccccu); bt(0xaaaaaaaau); where bt(m) is i <<= 1; if ((p = n & m) != 0) i |= 1, n = p;] [runtime O(lb(n))]. Note that many processors implement opcodes to do that, such as bit-scan, highest-populated-bit, etc; that however are not necessarily faster than the solution above. [ ffff0000 and similar bit patterns ]
How to reverse bit patterns?
See bitreverse.c for implementation(s), comments and x86 assembly.
Are there guidelines concerning stack usage? Stack versus heap?
Unlike Java, C and C++ are able to instantiate a lot more data [objects] on stack. Typically data that has function-body lifetime should be realized on the stack; data that has dynamic extent (i.e. that is returned to calling functions [not by-value] and referenced there) must be allocated on heap. The heap can be used too (the Java way) if there exist [severe] stack size restrictions. Static data should not be applied [in that or other cases], since that makes a function non-reentrant [see question on reentrancy]. alloca shouldn't be used either since it is not so portable/robust and tends to irritate debuggers. [ stack based programming languages / stack alignment / pascal style stack clean-up / stack corruption / stack growing up or down / stacks in recursion / stack size in quicksort implementations / stack overflow with recursion ]
What about stack alignment? C auto variables alignment (stack data alignment) vs. data segment variables alignment (global data alignment)?
Often the stack pointer is sufficiently aligned for larger primitive types (double, long long, long double, etc) [this is of course especially the case for CISC machines, which often don't really require data alignment {however run faster with it}], so no additional steps need to be taken besides subtracting the required number of bytes. If not: the stack frame may be aligned by rounding it down [e.g. {x86} esp &= ~0xf] (in the case of stack growing down); this typically requires saving its old contents [e.g. in {x86} ebp]. Global data is usually rather generously (i.e. not less) aligned, compared to stack data. [This issue is handled by compilers, not the C programmer, however worth some consideration.]
Can a goto statement corrupt the stack?
No. However it allows to skip over variable initializations which may make a problem look like stack corruption. longjmp can leave corrupted stacks [if used inappropriately]. An overflow of a local buffer is the usual reason for stack corruption.
Can the stack leak memory?
No. Neither does stack usage fragment memory. However, excessive recursion [with a stack overflow] can be considered 'stack-leaking'.
How to write a function that shows whether the stack is growing up or down?
First: You don't want to know; it usually doesn't matter. Second: Step in a debugger; the register display will indicate stack growth on push/pop, call/return and similar operations. Third: Call a function with references (pointers) at auto variables from differently nested stack frames and compare the pointer addresses; use the usual precautions [use function's input and output, so nothing gets optimized away, encapsulate in distinct compilation units, so the optimizer doesn't have the overall picture {since memory locations of unrelated data objects are not required to fulfill certain semantics}]. Sample: stackgrowth.tar.gz. A stack growing down may lead to [from the security point of view] more severe buffer overflow problems [since it can overwrite return addresses and so easily manipulate program flow].
What is byte order?
There are different ways how to map a multi-byte primitive to memory, disk or network. The most usual mappings (out of all [possible] permutations [e.g. totally 24 with 32-bit/4-byte integers]) are little endian byte order (encoding in ascending significance, least-significant byte is encoded first: 0 1 2 3) and big endian byte order (encoding in descending significance, most-significant byte is encoded first: 3 2 1 0). Note that most of the other permutations are not seen at all; however one-bytes 'groups' can be handled differently from two-bytes groups, etc. (1 0 3 2 and 2 3 0 1 in the 32bit case, sometimes called 'mixed endian'), this is however rarely seen. Some CPUs (e.g. MIPS) can serve different endianesses (to enable compatibility to different operating systems), however usually set only at boot time. [ network byte order / byte order conversion / byte order indicator ]
What is network-byte-order?
It's easier and less error prone to normalize data, than to figure out if talking to a similar or a different system. Unix quite early defined a 'standard' encoding order, how data is to be written to file and especially to network (since Unix was networked, interchanging data). Network byte order is big endian byte order [for historic, incidental reasons; there is no reason to prefer one or the other, (by the way: not even for human representations)]. Note that there are different views on 'standard' encoding: Windows systems rely on little endian (for historic reasons too, since their first implementations were on Intel x86 processors); CORBA e.g. chooses to leave the choice of endianess up to the user [and includes an endianess flag in its encoding; allowing for different encodings (to allow a tiny bit of optimization); it's not a good idea to do so: it decreases the robustness of the data protocol]. [ convert from/to network byte order / document normalization / Java implicitly uses network byte order ]
How to convert to network byte order? How to convert from network byte order?
Data conversions into network byte order are done manually by using the htonl function family ['host to network byte order'; htons, htonl, htonll, ntohs, ntohl, ntohll]. hton/ntoh operations are cheap and happen to be no-ops [identity operations] for one class of systems [the big endian systems]. With the little endian systems, the ntoh functions are identical to the corresponding hton function, reversing the order of the bytes. [ network byte order / "[htonl] statement has no effect" warning ]
How do I find out whether I'm on a big-endian or little-endian system? Byte order indicator?
[Again:] You usually don't want / don't need to know. If writing binary data to disk or sending it over a network, normalize it first, to the well-defined network byte order, (by using the htonl function family), or use ASCII encoding (i.e. use a more human readable and system independent representation). Use the snippet printf("%x\n", *(int const*)"abcdefghijklmnop"); to generate an endianess-code, e.g. 64636261 for little endian, 32 bit [ASCII encoding]. [ network byte order / document normalization ]
Nested loop guidelines? Block nesting? Nested if-statements?
Often loops are iterations over data structures; the loop body [code] imposes specific actions on each data member, and can be put into a [local] function [that is given an appropriate name; 'if it can be made a {conceptual} abstraction, put it into a function']. This approach leads to smaller functions and avoids loop nesting. Loop iterators can be implemented generically [have abstract functions iterating over data structures, calling application-supplied specific call-back functions for each data member {internal iterators}]. Less nesting depth means less state (and less complexity; each loop context means additional state in the form of indices and other iteration variables); small and less complex functions are readable and maintainable. I would not declare nesting-depth metrics, but rather an optimal range of number of lines [or function-points] per function [which would lead to less [or no] nesting too, since loop frames and bodies count as LOCs too]. [Speed-] optimized algorithms sometimes use nested loops representing different orders of state change [and may use gotos {that only allow to jump inside a function} to change such states in fine-granular ways {often [the better abstractions than goto?] break and continue are sufficient}]. [Break-out of nested loops in C/C++ requires gotos too, unlike with Java's multi-break.] I prefer early returns to block nesting; often 'normal' code runs down from top to bottom of a function and special cases and error cases can be handled in if-blocks with early-returns. If-statements in the form of if (...) {...} else if (...) {...} else if (...) {...} ... else {...} [tail nesting] are conceptually not considered nested and are not indented [K&R2]. [Tail-nested ternary statements can be formatted in a similar way, see e.g. ipow sample.] Complicated nested if/else blocks usually are better handled with a revised better data model. [ code nesting / strtok nesting / multiple returns / iterator pattern ]
malloc vs. calloc?
p = calloc(n, m); is semantically equivalent to p = malloc(n * m); if (p != NULL) memset(p, 0, n * m);, typically meaning: allocate an array of n data objects of size m bytes each [m typically expressed in terms of sizeof], with implicit initialization to 0. In C/C++ 0 has a well defined and useful meaning for the primitive types {and hence in C for the aggregated types; C++'s object composition includes additional stuff, like pointers to meta data [needed for polymorphism], which must not be 0}. n*m is the over-all size. It is a good practice to initialize data [buffers], especially dynamic memory, before use, to make code more deterministic, and robust (which is especially of help at bug finding time ['searching for bugs that show non-deterministic symptoms is an unpleasant task at best']). In symmetry with initialization of stack based data, I prefer the malloc/memset approach. Dynamically allocated memory is returned by free [however, the heap {the area which serves malloc et al.} typically doesn't shrink {use mmap based allocation for that}]. [ determinism / program robustness / stack vs. heap ]
union and struct?
union and struct are syntactically similar, struct defining a sequence of data fields [with possible padding {emerging from alignment}], union overlapping data fields [with different types; all starting at offset zero].
How do I convert C++ single line comments to C comments?
A simple answer would be: apply the regular expression substitute pattern 's,//\(.*\),/*\1 */,', but that doesn't consider // in 'problematic' lexical contexts, such as in a string literal! The better answer is to use a parser [more precise: a lexical analyzer] such as ccomment ['Care has been taken of: strings, character literals, constructs like '//.../*...*/' and performance.'].
How to have variable arguments macros?
Macros with variable arguments are unfortunately not supported in [K&R2] C [macro syntax is too simple to support such things {e.g. compared to Lisp's macro system}]. However, as macros implement by call-by-name, a parameter list can be specified as one macro argument [this is however not very flexible]. The newer [9899:1999] C standard allows variable arguments macros [providing the __VA_ARGS__ meta macro], that however supports only simple things [such as forwarding the argument list as a whole; i.e. isn't very flexible either; parsing the variable argument list e.g. is actually not possible]. C++'s inline code and template code [possibly together with function overwriting] are acceptably flexible.
Are there macro debug guidelines?
Additionally to macro design guidelines such as don't use macros with side effects [e.g. with macro parameters being referenced not exactly once] and have the macro argument references fully parenthesized [to suppress problems with operator precedences], some rules can be formulated on how to approach macro design safely and make macro debugging easier: If the macro expression is other than trivial, implement things as a function [first]. If using C++ [instead of C], consider inlined functions [or templates]. Debugging macros in the sense of stepping through them usually requires them to be transferred into the form of functions [too; which e.g. provides single code locations to set breakpoints]. [ assert debug macro ]
What about gets? What are gets alternatives?
Don't use gets. Use fgets instead. Why: possible buffer overflows that cause program robustness problems and severe security risks.
What is IO.H? IO.H vs. io.h? Where can I find io.h?
io.h contains [low-level] file handling and input/output function declarations for the win32 platforms, such as access or write. struct _finddata_t [for directory traversal] and the FAT file attributes [read-only, hidden, system,, archive] can be found there as well. Some of the io.h API maps to the Posix API [on Unix found in unistd.h], other parts are additions [such as chsize]. io.h and IO.H are synonyms since win32 filesystems, such as FAT [FAT16, FAT32] and NTFS, are case-insensitive. Header files to standard libraries are, as C [and C++] development kits, not included in standard win32 operating system installations; they come with those development kits.
What is pascal style?
'Pascal style' names the order of placing local (static) functions before their references in C source files [and hence avoiding function prototypes]. Pascal style is not feasible with indirect recursion. 'Pascal calling style' names the convention of letting the called C function clean up the stack [instead of the calling function; this saves some opcodes]. Pascal calling style is not feasible with variable arguments. [ coding guidelines' code order chapter ]
What are the hidden variables in the C language?
C and Posix do not really define many global variables; functionality based on interfacing through global variables is inflexible in implementation, breaks encapsulation and poses synchronization problems. Even errno is stated as 'integer expression' (not 'integer variable') in K&R2 [and typically implemented by means of a function returning a pointer {to thread-local data}]. Sometimes, environ, _argv or __argv, etc, are found, but are proprietary and not portable. Hidden can be considered whatever is external [accessible] in libc and not defined in published header files [but of course should not be used!]. [ question on reentrancy ]
What does type safe mean? Why do modern languages not use implicit declarations?
Type safe means that, if possible, data types used in assignments and function calls, etc, are checked for correctness at compile time. This typically saves the programmer a lot of debugging effort. The C programming language is a quite strongly typed language; this means that [data] types, variables and functions must [should!] be declared before their use [see my C overview]. This kind of type safeness is also called static type checking, as opposed to dynamic type checking that is additionally supported by languages such as Java, C++ [optionally, depending on type of cast and compiler flags] and Lisp/CLOS, requiring run time type information. C's generic references, that are often declared with void*, can't be checked; object-oriented programming languages use the safer polymorphism and inheritance [hierarchies] for that. Older languages such as basic mostly didn't apply static type safeness, leading to [possibly] late run-time errors. Modern counter-samples to typesafeness are: C's unsafe/unchecked casts, C/C++'s use of integers as booleans, Java's polymorphic vector/hashtable/etc.
What means 'robust' in C/C++? Robust code?
One key issue with code robustness is robustness against possible [user] inputs: a program should not crash or otherwise misbehave when fed with certain [possibly unexpected] inputs. The classic sample are buffer overflows, which often establish a severe security risk. Another issue is robustness against code changes: logical dependencies and function side-effects should be minimized, to keep the risk of introducing code errors small. Checking code constraints often (see assertions question) is a counter-strategy here. Other problems may arise from scalability: algorithms should be chosen and implemented in a way that resource consumption doesn't grow into extremes with unexpected or untested input values. Handle interface ambiguities conservatively: don't rely on the behavior of one or several specific implementations; rely on the interface definition (i.e. manual page, etc) only, and test the interface. C: error return values must be checked consequently (C++ offers more convenient formalisms with its exception system), even test for such things as running out of disk-space (and indirect consequences of such problems). A wish for code robustness is a typical reason for code reviews. Robust code minimizes code maintenance costs. Lean implementations are often more robust. [Premature] optimization is often less robust.
What is portability?
Portability is a code property of being usable [compilable, linkable, and runnable] on a variety of computer systems. Portability is often achieved by being conservative about 'features' usage [i.e. check whether a function is specified {by a standard} and not just 'present', don't assume specific sizes of data and data alignment, don't deduce function semantics from tests {but from manual pages}]. Often portability considerations are done together with robustness considerations [indeed: robustness against change of environment]. Portability [in e.g. C] is mostly an issue of library [API] portability and not so much an issue of programming language portability. On Unix APIs are now more unified than earlier. Java does quite a good job specifying APIs [with the exception of useful things like StringTokenizer and tiny details like Thread.getName missing in J2ME/MIDP/CLDC]. You might expect code of good quality to be portable. System specific code is often separated from the core code [or from most code] to keep that portable. [ portability types / typedefs / minimizing Java porting problems by adding C/C++ keywords / unportable 'hidden' variables / robustness / C++ to Java / C to Java / Java to C++ ]
How to port C++ code to Java?
Porting C++ code to Java is probably easier than porting Java code to C++ [leaving away performance concerns in the first step]. Although C++ is a richer language compared to Java, offering quite some additional features, problems arising seem to be acceptably easy to cope with. Performance considerations may partially be outweighed by faster APIs (e.g. nio new I/O API) [possibly using native code] and faster hardware [moore's law]. Some C++ language features deserve detailed consideration: function pointers [and member function pointers, which are e.g. used for callbacks; can usually be replaced by defining and implementing interfaces], constness fine-policies [are not supported by keywords in Java; need to be coped by cloning at appropriate locations], templates [replaced by less generic classes or polymorphism], multiple inheritance (not in the sense of {pure} interfaces) [needs redesign], friend access modifier [usually replaced by package access]. Minor syntactical details include: cast overloading [tighter control over type conversions; explicit conversion methods needed in Java], operator overloading [semi obscure, but allows short code {where appropriate}; not supported in Java; needs to be replaced by method calls], typedef [for type name aliases; use inheritance or the proxy pattern], reference-type function parameters [e.g. int&; holder needed or slight redesign]. C++ things typically not needed include: default parameter values [overwriting is sufficient], static function variables [not useful with multithreading or reentrancy], macros [unsafe because of side effects], conditional code compiling [separable to platform specific jars]. [ C to Java / Java to C++ / differences between Java and C++ ]
How to port C code to Java?
C and Java have different design approaches; Java is object-oriented and uses quite dynamic data handling; Java is often centered around [class] framework development; C is usually used in a non-OO manner and may use static data; C is more often oriented towards solving a single concrete problem, often in a fast manner, running on lean environments. Porting C to Java often involves a (complete) redesign. [ C++ to Java / Java to C++ ]
How to port Java code to C++?
Porting Java code to C++ is probably more difficult than porting C++ code to Java because of C++ language's difficulty. Porting requires major memory management changes [away from Java's garbage collection paradigm's implicit cleanup; some C++ environments provide heuristic garbage collectors, autopointers e.g. help to make resource allocation exception-safe]. [ C++ to Java / C to Java / differences between Java and C++ ]
What is orthogonality?
Orthogonal means 'unrelated' and 'not influencing each other'. [The term comes from vector geometry, where an orthogonal vector starts a new dimension; the scalar product of two orthogonal vectors is zero.] Orthogonality in software development means: things that are not related conceptually should not be related in the system; new features shouldn't break or change existing ones. In orthogonal systems, a change at one end doesn't/shouldn't affect the other end [no ripple effect]. Orthogonal software components often have no or minimized side effects, are often small, uncoupled, pieces and often freely combinable. Orthogonal design avoids 'special cases' and restrictions. This makes systems more intuitive to use [{at least} for the systematically thinking]. ['Special cases' may be a sign of missing design.] [ programming language orthogonality ]
What is programming language orthogonality? Orthogonal language? Orthogonal C?
Programming language orthogonality means the language is designed so that all [or most] [possible] expressions have a meaning [and therefore are 'defined' {in the sense of language specification}]. Orthogonality may make a programming language more intuitive to use. C is a sample of an orthogonal programming language. C allows to chain and nest expressions: i = j = 0; f(g(), i++, ++j); i = f(k = 2, h(g()), i);; sometimes it may be convenient to have such language constructs. [That however doesn't mean you have to or should use them. Readability and debugging convenience will decrease when using nested or chained expressions; e.g. source code debugging doesn't show intermediate data, sub-expressions can't be reassigned a value, {at least not easily, without showing/using [cpu] registers}, etc]. Unless necessary C doesn't lock in [certain] expressions [e.g. assignments] to be a special statement case [what possibly would make the language less usable and the grammar more complex]. C's operators are defined for different data types [e.g. math vs. pointer arithmetics]; sizeof is defined for expressions [including variables] and types. [Counter-intuitively to some] C's const and volatile keywords are orthogonal [i.e. don't exclude each other]: const means 'can't be modified by the code' [relevant for putting data in {shared} read-only segments and for contracts on data access policies]; volatile means 'is subject to asynchronous change' [relevant to optimizations done by compilers]; [9899:1999] names the sample of a hardware clock [mapped to an unwritable memory address], declared using both modifiers. Another sample of orthogonality are Lisp's sequence operators that do not only work on lists but also on arrays, strings, etc [the concept of sequences here is somewhat orthogonal to the concept of lists, however this may looked on as polymorphism; some consider Lisp not to be an orthogonal language since often there's not only one single construct implementing a concept, e.g. if/else statement doable by if, when, unless, cond, and, or, ... {question: compact orthogonal set or compact orthogonal working set?}]. [ orthogonality / C programming language overview / C programming language coding guidelines ]
What is polymorphism?
Polymorphism is a core concept in object-oriented programming: a method call on an object is dispatched at run-time by the concrete type of this object. It enables the use of pure abstract types (generic types, in Java: interfaces), and makes the following concepts more useful: inheritance, class hierarchies, partial implementations (abstract classes) and multiple inheritance. Not polymorphic [i.e. not object-oriented] are e.g. the language constructs dynamic_cast (C++), instanceof (Java), switch (C/C++/Java) and typeid (C++). C++ requires the virtual modifier to make things {methods and classes} polymorphic [since C++ too supports {a bit faster} non-polymorphic code behavior], Java doesn't.
Do you have samples of syntax coloring?
No; the sample (C++) that should give you a hint what syntax coloring looks like, has just colorized comments, not colorized keywords, identifiers, etc. [ diff coloring ]

Java programming language

What are the Java programming language specifications?
The most recent [2005] Java spec is [The Java Language Specification, James Gosling, Bill Joy, Guy Steele, Gilad Bracha, Addison-Wesley, 3rd Ed. 2005, ISBN 0-321-24678-0]. Older versions are [The Java Language Specification, James Gosling, Bill Joy, Guy Steele, Gilad Bracha, Addison-Wesley, 2nd Ed. 2000, ISBN 0-201-31008-2] and [The Java Language Specification, James Gosling, Bill Joy, Guy Steele, Addison-Wesley, 1996, ISBN 0-201-63451-1]. The Java language specifications can be found in electronic form at java.sun.com. [ Java keywords ]
What are the Java programming language keywords?
[JG/BJ/GS/GB,2005] defines abstract assert boolean break byte case catch char class const continue default do double else enum extends final finally float for goto if implements import instanceof int interface long native new package private protected public return short static strictfp super switch synchronized this throw throws transient try void volatile while The words false null true [technically] are considered literals. strictfp was introduced with [JG/BJ/GS/GB,2000], assert enum were introduced with [JG/BJ/GS/GB,2005]. [ Java programming language specifications / abstract / final / private / protected / public / static classes / synchronized features / use of volatile keyword ]
What are the differences between Java and C++? Java versus C++; Java vs. C++?
Java usually shows a better maintainability (less pitfalls and less complicated language). A listing of [technical] language topics can be found here.
Why is the const keyword not supported?
Assignment finality / constantness (C++: T* const t = ...) is supported by the final keyword (Java due to its flow analysis also supports blank final fields and variables). Constantness / immutability of referenced objects / data (C++: T const* t;) isn't explicitly supported; it is usually contractual (e.g. String), sometimes found in the type's name (e.g. BigInteger vs. MutableBigInteger). 'Constness' is considered rather logical than physical with more abstract (object oriented) programming languages (C++ as opposed to C); C++ needs the additional mutable keyword in such contexts. Physical constantness (constant, usually non-OO, data put into read-only sections) isn't used with Java's heap object data.
Why is goto not supported?
Java imposes more restrictions on variable initialization (which helps making code more robust); the need for code flow analysis (to check initialization or single initialization [of blank final variables]) isn't compatible with [general] goto statements (i.e. some gotos wouldn't be valid or would make the code flow check infeasible). However, labeled continues and breaks are permitted (since no valid initializations are skipped when e.g. jumping out of a for loop).
Why is multiple inheritance not supported in Java?
Not supporting multiple class inheritance makes the programming language easier to use and implement, fewer pitfalls occur (e.g. with diamond-shaped inheritance and its precedence problem), fewer obscure features are present (e.g. [private] class-mix-in). The more important multiple interface ('data-less classes') implementation is however supported in Java. [ Java's design enhancements / about C++'s virtual multiple inheritance complexity / question on C++'s 'virtual' modifier / multilevel inheritance ]
What about multilevel inheritance (vs. multiple inheritance)? Multilayer inheritance?
Deep class hierarchies are not necessarily cool (either). [Class hierarchy layouts are subject to discussion, even more so with increasing deepness.] Multilevel hierarchies are often used to slice responsibilities (in order to encapsulate/modularize). [ multiple inheritance ]
Unsigned numbers in Java?
You don't need them; see here. [One additional bit of 32 e.g. isn't much; type-safeness isn't gained either.]
Java byteorder?
The [underlying] byteorder is not visible in Java; it's hidden by the VM. You wouldn't need it anyhow.
What kinds of classes does Java support?
A Java class is either a top-level class, a member class or a local class. The latter two are called nested classes; member classes are either inner classes (with a reference to the enclosing object) or static classes. Classes can be abstract, final or neither. They can be super (base) classes or sub (derived) classes, [only explicitly] neither, or both. Classes can have public or package access; member classes can additionally have protected or private access. Anonymous classes are local, inner and [implicitly] final. A non-final class can be used [externally, if allowed by access modifiers] as a super class; classes are implicitly [direct or indirect] sub classes of Object (i.e. Java's type hierarchy is single-rooted); Object is known to be a super class (e.g. of classes defined in the java.lang package [and of course all additional classes]). Nested classes were introduced in [JG/BJ/GS/GB,2000].
What are Java's access modifiers?
private, default/package/friendly, protected, and public; each being a subset of the following, i.e. the access modifiers are not orthogonal (protected access includes package access). Private fields and methods are only accessible by the implementing class itself, default/package/friendly (no keyword) by all code in the package, protected additionally by derived classes' code, and public generally. Having protected implying default makes the language simpler. In a good object oriented approach private would be all data fields and helper methods, protected only helper constructors and some helper methods needed by derived classes, and public the API (that possibly implements a public interface).
Why does Java not need header files?
Unlike C and C++ with their libraries (and archives) imposing least overhead [in size], Java includes the signatures, return types and access modifiers in the [compiled] byte code (.class files and .jar files). The information is needed too for version and security checks at runtime.
What does the synchronized keyword provide, what not? What is thread starvation?
Java's synchronized keyword does provide guarding critical sections and memory synchronization. It [usually] doesn't trigger task switches, which fact may lead to possible thread starvation: one thread doesn't get to run; neither helps a higher thread priority nor yield; only a kind of wait/notify paradigm or Doug Lea's fair locks [now integrated into Java 1.5 [API]] will work; these are however collaborative constructs (and constitute some {possibly unwanted} level of dependency). A JVM's monitors could be implemented for fair scheduling, but [typically] aren't, for performance reasons. Note that a multi-processor system may defuse thread starvation problems a bit [buy-decent-hardware [for [Java] servers] pattern]. [ flush by synchronized / double check locking pattern / reentrancy / reentrancy problems on multi-processor systems / thread starvation sample / thread deadlock sample ]
Is specific synchronization on an object needed to flush out its VM-local working memory copy?
No, any synchronization should do. Often objects contain dedicated sync-objects for different synchronization/notification purposes, which are supposed to synchronize the parent [referrer] object's memory too. [ memory synchronization by synchronized ]
Shall I use Java's volatile keyword?
No, use explicit synchronization at the appropriate places [possibly using dedicated synchronization objects, to avoid deadlocks]. volatile involves [potentially] unnecessary overhead and may indicate a degree of cluelessness. It seems to be broken just like the double check pattern. [ synchronized keyword / memory synchronization object ]
What are the differences between enumeration and iterator?
Both of the two Java interfaces implement the iterator pattern [EG/RH/RJ/JV,1994], more specifically that of an external iterator [external: the data structure advancing is controlled by the client, which offers more flexibility]. A java.util.Iterator is a semi semi robust iterator; it additionally includes a remove method and [better] takes care of the underlying data structure's modification status [one 'semi' here arises from the fact that only remove functions are considered but not insert functions, the other from the fact that such operations are only allowed via the iterator's methods but not via the normal container methods; asynchronous {multithreading} issues are not considered {at all}, since they question the validity of returned elements which are being processed; omitting insertions makes the iterator implementation simpler and leaner {and more robust in a different sense} and avoids questions about iterator-after-insertion semantics; removal during iteration is frequent but insertion rare, samples being elements deregistering themselves]. [ race conditions with iterators ]
Can race conditions arise from concurrent inserts and removes on collections?
Yes. E.g. Java 1.4's AbstractCollection implements an unsynchronized method remove(Object) with the help of an Iterator, which may throw a ConcurrentModificationException, e.g. in the case of the concrete class ArrayList (which indirectly inherits from AbstractCollection, and whose Javadoc informs on this behavior). Safer is the similar Vector, which implements a synchronized remove(Object). Another alternative is the use of a wrapper collection returned by Collections.synchronizedCollection(...). Yet better is the use of a more appropriate data structure, e.g. a Hashtable when remove(Object) operations are frequent [Hashtable is synchronized and doesn't iterate]. [ read manual pages to obtain robust code ]
Sequence points in Java?
Java doesn't have the concept of sequence points; the order of evaluation of subexpressions, arguments of method calls, etc, is strictly defined. This leaves less pitfalls to the programmer, but also less room for expression optimization to the compiler. [ sequence points in C ]
How would you implement multiple return values in Java?
As Java doesn't provide reference parameters, multi-value returns could be implemented by returning an array of Objects [not compile-time type-safe] or by using holder types [as in CORBA implementations]. [ multiple return statements / porting C++ code to Java ]
Should I use multiple return statements in a function?
Yes, it makes function code less nested. The use of exceptions prevents the single exit point paradigm anyhow. [ return multiple values in Java / (avoid) block nesting (by early returns) ]
What is the Java Turtle? Where can I find a Java Turtle implementation? How to implement a turtle geometry?
The Java Graphics Turtle [known too from the Logo programming language] establishes an alternative point-of-view on low-level graphics programming. A Turtle's graphics-state includes a position (usually two-dimensional), a heading (direction) and such things as pen-up/down state, color, etc. A Turtle's primitives [idioms] are directives such as forward, left, etc. Samples of these Java turtle graphics can be found here. Insights into turtle geometry implementation may be won from com/lrdev/turtle/TurtleCore.java [part of the com.lrdev.turtle package; includes wrapping, clipping and scaling].
What are the Turtle keywords? Turtle commands? Turtle primitives?
back drawstate forward heading home left nowrap pendown penup right setheading setx setxy sety towards wrap xcor ycor / bk fd lt pd pu rt seth
What is the Hilbert Curve?
The Hilbert Curve is a fractal pattern [a recursive pattern, that has an order {depth}; it is e.g. described in the header comment of hilbert.java or here]. It is a good candidate for a simple sample of an animated applet [see Java page]. hilbert.java uses a Java Turtle too, but a much simpler one [that e.g. can only change directions in ninety degree steps]. [ recursion / Moore curve (a curve similar to the Hilbert curve, but closed) ]
Where can I find the BigRational Java class? BigRational: what is it?
BigRational.java can be found here. The big rational number type lets you do the arithmetic operations addition, subtraction, multiplication and division without overflow and without loss of precision. [The data type typically grows {concerning memory use and calculation overhead} with its size and precision.] Note that logarithms, roots, etc cannot be handled precisely with a big rational number type, since their results typically don't fall into the rational number domain. [ about the BigRational testcases / about BCDs / question on integer overflows / rounding ]
Do you cover J2ME / MIDP / CLDC questions?
Not yet. For the time being, see e.g. this JavaME-f.a.q.. [ J2ME portability / Sun about wireless communication ]

assembly code

Do you feature SPARC assembly samples? SPARC sample code?
See a sample of analyzing SPARC assembly code for a small sample of SPARC assembly code. That paper of course has a focus on analyzing such code.
What's the SPARC instruction set? SPARC assembly opcodes? SPARC assembly instructions?
The SPARC instruction set is defined in the according architecture manuals, e.g. for SPARC version 8: [The SPARC Architecture Manual, Version 8, David L. Weaver, Tom Germond, SPARC International, Prentice Hall, 1992, ISBN 0-13-825001-4]. An AnswerBook2: SPARC Assembly Language Reference Manual may be found online. SPARC International offers some documents. Processor specific definitions may be found e.g. at SUN, e.g. for an UltraSPARC-IIi processor. sparcv8 code runs on e.g. older SUNs too; sparcv9 code typically requires e.g. a SUN Ultra [the way to address this issue better is to use platform specific [dynamic] libraries].
What are SPARC synthetic instructions?
Some seemingly fundamental SPARC CPU operations are implemented by means of other opcodes, often using the zero/sink register %g0. These are called synthetic instructions; the list is the following [including the opcode(s) they are translated to]: bclr [andn], bset [or], btog [xor], btst [andcc], call [jmpl,call], clr [or,st], clrb [stb], clrh [sth], cmp [subcc], dec [sub], deccc [subcc], inc [add], inccc [addcc], jmp [jmpl], mov [or,rd,wr], neg [sub], not [xnor], restore [restore], ret [jmpl], retl [jmpl], save [save], set [sethi;or], tst [orcc]. See too the sources mentioned in the above question.
How to call printf from SPARC assembly?
First: you don't want to do that: usually, assembly code is called by high-level-language code, not the other way around; assembly code is used for highly platform-adapted functions, the adaptations are e.g. done for performance reasons, and in that context it doesn't make sense to call less optimized code [such as printf]. Second: see e.g. here [sample compiled from high level language]: ...; .s:; .ascii "hello, world\n\000"; ...; sethi %hi(.s),%o0; add %o0,%lo(.s),%o0; call printf; nop; ...; Third: the operations involved in calling a library function [such as printf] are very specific not only to the platform [CPU], but also to the chosen high level language and the compiler settings (see e.g. __fastcall pragma, pascal stack cleanup mode, volatile registers across function calls, or polymorphic C++ method calls). [ convert [integer] to string [via s[n]printf] ]
Do you feature x86 assembly samples?
No. But there's a dump of a (highly optimized) small x86 application, with analytical comments. x86 patterns such as shorter bytecode and pseudo-three-operands opcode are discussed. [ sample bitreverse.c x86 assembly / C++ relevant assembly dump snippets / non-recursive gcd assembly implementation ]
Which x86 CPU registers are typically considered non-volatile across C function calls?
That depends on compiler settings, compiler and (for external calls) library policies. Typically esp and ebp [makes things less tricky if they are non-volatile], as well as esi, edi and ebx are considered non-volatile [i.e. more x86 registers are considered non-volatile than volatile]. See e.g. x86 sample dump with comments.
What is __set_app_type? _except_handler3?
__set_app_type, _except_handler3, etc, as e.g. present as references in this win32 x86 dump, can be found in the system specific C runtime libraries and sources (sources are often available as part of the development system). These functions are part of the C runtime code and are implementation specific internals (hence e.g. beginning with __ or _ [reserved identifiers in C]). [The 'application type' that is set in the function in question (e.g. console or windowing system) is e.g. used to route diagnostic output.]
What is bcd?
Binary coded decimals (bcd) is a number format that encodes decimal digits (0 to 9; 3.3220 bits of information) in 4 bits (one nibble, half a byte) each. Pocket calculators use[d] to calculate in (such) a digital format to achieve a fast display stage [on {ancient} slow processors!] (to give the calculator a better interactive response, e.g. while rolling registers) and allow things such as the hours-minutes-seconds (hms) format or the start-end-step increment/decrement format, which would not be feasible with binary floating point formats. Digital formats avoid the kind of representation and rounding errors that are unexpected by typical users [who tend to think decimally {only}]. The x86 opcode daa (decimal adjust after addition) implements two-step bcd operations (e.g. adc/daa pairs) and does typically two digits (one byte only) per round, since one carry flag is required per digit [x86 implements 'the' carry flag and an additional 'auxiliary' {'nibble'} carry flag]. [x86 processors also support an unpacked bcd format (with which the more significant nibble is ignored on inputs, and so can be fed e.g. with ASCII digitals {\x30..\x39} directly).] Precise base-10 representations can be implemented too by scaled big-numbers (big-decimals) and big-rationals. [ Java BigRational ]
What is 7efefeff? 0x7efefeff? 81010101? 0x81010101?
7efefeff is a bit pattern used in optimized strlen implementations. 0x7efefeff is actually -0x81010101 [in two's complement representation], that, on addition to a four-byte (32 bit) value, triggers carry-outs when encountering nul bytes [more explanations see source code comments of strlen.c]. These carries are used to faster check for string termination in chunks of 4 bytes (on 32 bit systems). A similar algorithm can be used in memchr implementations, by prior xor-ing the chunks with a pattern of the repeated search-character. Other patterns are 81010100 (0x81010100) [and on 64 bit systems 7efefefefefefeff (0x7efefefefefefeff), 8101010101010101 (0x8101010101010101), 8101010101010100 (0x8101010101010100)]. 7efefeff and other values mentioned are often found in register dumps of crashed processes [or even in operating system panic dumps {and panic messages}]: if illegal string pointers are given as arguments to strlen or similar functions (strchr, memchr, etc) [usually due to a software bug], a process will dump on the first memory load, probably after initializing registers with the above values; the values may also be remnants of earlier function calls [such functions are called frequently and unused registers get not cleared]. [ more comments on an optimized C strlen C implementation / ffff0000 patterns ]
What is ffff0000? aaaaaaaa?
The set of patterns ffff0000 (0xffff0000), 0000ffff (ffff, 0x0000ffff, 0xffff), ff00ff00 (0xff00ff00), 00ff00ff (ff00ff, 0x00ff00ff, 0xff00ff), f0f0f0f0 (0xf0f0f0f0), 0f0f0f0f (f0f0f0f, 0x0f0f0f0f, 0xf0f0f0f), cccccccc (0xcccccccc), 33333333 (0x33333333), aaaaaaaa (0xaaaaaaaa) and 55555555 (0x55555555) consist of bit groups 10101010..., 11001100..., 11110000..., ..., and are used in binary integer logarithm calculation, bit reversing and bit population count [by recursive pairwise addition]; all have O(lb(bitsize)) runtime. [Hexadecimal] patterns of repeating 3 or c [i.e. #333333, #cccccc] are also found in numeric representations of colors [more precise: of different gray levels] belonging to an uniform 8-bit depth color palette [see e.g. 'the browser safe palette' {instructions on how to chose an image color palette to best fit a known application's color palette, to suppress disturbing dithering effects}]. [ patterns used in binary logarithm calculations / 7efe... pattern / recursion ]
What's the meaning of bitscan?
The term bitscan usually describes the operation of finding the first or the last occupied bit in an integer / register. Bitscan can be found as opcode or implemented as function [see question on binary logarithm].

patterns

What does "pattern" mean?
In software design, which this here is about, pattern usually refers to what was popularized by [Design Patterns: Elements of Reusable Object-Oriented Software, Erich Gamma, Richard Helm, Ralph Johnson, John Vlissides, Addison-Wesley, 1994, ISBN 0-201-63361-2]. Doug Lea [Concurrent Programming in Java: Design Principles and Patterns, Doug Lea, Addison-Wesley, 2nd Ed. 2000, ISBN 0-201-31009-0] formulates the term pattern this way: a pattern encapsulates a successful and common design form, usually an object structure (also known as a micro-architecture) consisting of one or more interfaces, classes, and/or objects that obey certain static and dynamic constraints and relationships; patterns are an ideal vehicle for characterizing designs and techniques that need not be implemented in exactly the same way across different contexts, and thus cannot be usefully encapsulated as reusable components. [ pattern catalogs / bit patterns: bit patterns used in binary integer logarithm calculations [e.g. 0xf0f0f0f0u], bit patterns used in reversing bits [same as for binary logarithm], bit patterns used in optimized strlen implementations [e.g. 0x7efefeffu], / fractal graphics patterns: Hilbert Curve ]
What is the double check pattern? What is double check locking? Is double check locking broken?
Don't use double check locking. The double-check-pattern assumes that [simple] read operations are system-wide atomic [and memory is sufficiently synchronized] and applies a first unguarded [unmonitored] data lookup as a speed optimization. The double check pattern is mainly used in data initialization [e.g. with singletons; done once only, hence a [potential] big lock overhead]. Instead of double check locking, find faster lock implementations. [[DL,2000] writes about Java locking performance [about]: the overheads associated with concurrency constructs decrease as JVMs improve; the overhead cost of a single uncontended synchronized method call with a no-op body is on the order of a few unsynchronized no-op calls.] In Java, singleton instantiation can be done in static initializers [and class variables], which are guarded against concurrency problems by the class loader (i.e. an external monitor is applied). [ a warning: DoubleCheckedLockingIsBroken / Java's volatile keyword is broken ]
What is the difference between recursive locking and double check locking?
A recursive lock may internally use two locks [two lock data structures], whereas double check locking looks up [lock-] data twice. The recursive lock [reentrant lock] by definition doesn't block (deadlock) if acquired more than once by the same thread. Recursive locks can be implemented by means of [typically two] non-recursive locks. [Recursive locks are default with the synchronized modifier in Java, i.e. one synchronized method can safely call another synchronized method of the same class.] [ recursive lock C++ sample implementation ]
What is the 'let the user decide' pattern?
The let-the-user-decide-pattern is a user interface pattern, which intent it is to make programs [or libraries, etc] more useful by allowing configurations that don't make [optimal] sense at the first sight. The typical approach is to not [yet] applying constraints that seem straight-forward.
What is the 'sort up to a unique key' pattern?
When sorting data, don't generate non-deterministic data output [order], especially not if the recipient is an [end] user; it is disturbing if an order cannot be deduced or (worse) if it [slightly] changes from instance to instance. Don't rely on data fields being 'sufficiently' unique, unless they are specified [and enforced] to be unique [or at least try your best]. Don't rely on a initial order being meaningful. Don't rely on initial sub-orders being preserved [unstable/non-preserving sort algorithms such as qsort may/will change initial sub-orders].
Do you maintain a pattern catalog?
No. You may find one following the following links. A good source too is [The Pattern Almanac 2000, Linda Rising, Addison-Wesley, 2000, ISBN 0-201-61567-3]. [ essential pattern concepts and terminology / pattern wiki ]
Which patterns did you invent?
I formulated such things as the 'buy-decent-hardware' pattern, 'let the user decide' pattern, the 'sort up to a unique key' pattern and the 'use two indices' parse pattern. There are lots of implicit patterns, some of which can be found in these publications and [here], e.g. analyze-running-system [don't restart and lose state; use truss, pstack, etc], dont-use-switch, find-and-use-abstractions [find and use the right abstractions, be sufficiently abstract], know-your-language, read-specifications [be informed, be educated], test-on-multiprocessor-systems [test threading issues on multiprocessor systems {too}], use-normalization [normalize file contents before saving], use-small-scope, etc.
How do I implement an 'undo' operation?
Use the Command pattern [EG/RH/RJ/JV,1994]. Java's StringBufferStream implements an undo mechanism [mark; as a stream extension; see javacppdiffs].
Is there a sample of a ring buffer?
sqfig.zip:sqenc.c:mergeLineFilter (C sample) implements a ring buffer [a.k.a. circular buffer; this sample is used in a parser too]. The ring buffer features a begin pointer (current read position) and an end pointer (current write position), which are updated in the respective operations, while considering buffer wraps and increasing buffer size if necessary.
What is file normalization? What is document normalization?
Documents are often laid out ignoring stuff such as spacing [which includes line breaks {line wraps}] in their input files. Normalized [document] sources allow better comparisons [including diff and version control tools] and are sometimes better [human] readable [as unfortunately e.g. seen in RTF document format usages]. [ RTF file normalizator / diff output readability / normalized sorted data (sort-up-to-a-unique-key pattern) ]
How to make diff output more readable?
Key patterns to making diff output more readable are normalization (either implicit {such as with the -b option} or by filter {such as rtfn}), context (e.g. by -u option) and [possibly] outlining. Tools such as xemacs/emacs' ediff [coloring] package are convenient by providing context and fine-diff.
Do you have samples of testing patterns? Do you use unit testing?
Yes. com.lrdev.bn.BigRational.test() implements some 338 (!) testcases (for one module!) [only testing the public interfaces, leaving away function aliases]. BigRational does not use the JUnit unit testing framework, for code simplicity. [ BigRational page ]
What is recursion; recursive? What is indirect recursion? Recursion pattern?
A recursive function calls itself in the course of evaluating its return value, typically with modified arguments [that are usually getting 'smaller' in some kind, to let the recursion come to an end]. A sample is a recursive gcd implementation: unsigned int gcd(unsigned int a, unsigned int b) { return (b == 0 ? a : gcd(b, a % b)); }. The mathematical methods deduction and [transformed] induction can be implemented by recursion. True recursion involves more than one recursive call [such as the two calls in qsort], otherwise it's easy to transform it into an iteration [provided proper data structures {e.g. a stack}, actually any recursion can be transformed into an iteration]; the gcd sample above shows an even simpler tail-end recursion. Indirect recursion involves a flow-graph that eventually lets a function be called again [e.g. a calls b calls c calls a again]. Deep recursion requires some stack space being available, however truly recursive algorithms often work with recursion depth log2(n) [and hence small stack requirements]. [ typical recursion applications / recursive gcd implementation / bit count by recursive pairwise addition / Hilbert curve recursive [fractal] drawing pattern / recursion to iteration / recursive locking / stack overflow with recursion ]
How to avoid stack overflow with recursion?
(1) Use tail-end recursion: that allows you to work on linear structures without additional stack requirements. Doing explicit tail-end recursion can be seen as converting the recursion into an iteration [see sample code]. (2) On non-linear structures [e.g. balanced trees] take the shorter route first [and the rest as tail-end recursion]: that will cut the recursion depth to less than log2 of the input size. Such a small recursion depth will likely not produce an overflow. Sample: reasonable quick sort implementations. It might not be so easy to recognize the shortest path [as it is with quick sort; consider the problem of finding a way out of a labyrinth solved by backtracking: at each step you don't know in advance which direction will conceal the longest path]. (3) If one can figure out a reasonable upper limit for log2 of the problem size, that can be used to limit the search depth: if that limit is reached, give up and try (all) other branches first; performance will suffer and maintaining an ordered list of branches-yet-to-take is required.
What are typical applications of recursion?
Of course as any recursion can be transformed into an iteration, any iteration can be transformed into a recursion. A typical case of such a [weak?, iterative?] recursion is the Logo programming language's approach to use recursion for list processing [using its empty? predicate and butfirst list operator]. Stronger recursion is used in divide-and-conquer algorithms such as the quick-sort sorting algorithm, in tree-traversing such as in Unix' find tool [that scans hierarchical directory structures], backtracking, permutation enumeration and many algorithms that work on graphs. Geometrical applications include fractal curve drawing and tiling. [ about recursion / recursive gcd / Hilbert fractal curve / polyomino tiling / Logo Foundation ]
How to transform a recursion to an iteration?
The refactoring pattern of transforming recursive functions to non-recursive, iteration using, functions typically introduces state information, i.e. one ore more additional variables, updated in each iteration step [a previous recursion step]. The transformation of a simple, tail-end, recursion such as the gcd sample only involves updating the function parameter variables [no additional state; the sample's additional variable c is only required as temporary holder in a swap operation and can be discarded at the bottom of the loop]. The more complex sample of e.g. integer power calculation: int ipow(int i, unsigned int n) { return (n == 0 ? 1 : (n % 2 == 0 ? ipow(i * i, n / 2) : i * ipow(i, n - 1))); } that can be transformed to int ipow(int i, unsigned int n) { int j = 1; while (n > 0) { if (n % 2 == 0) { i *= i; n /= 2; } else { j *= i; n--; } } return j; } requires a new state j because of the i*ipow() multiplication. j*ipow(i,n) can be seen as an invariant [and 'result'] for all steps. [Note that the sample codes do not assert !(n==0&&i==0).] [ recursion / recursion applications ]
What is quicksort? What is quicksort's performance?
Quicksort is an efficient sorting algorithm that in the average case runs with O(n*log2(n)). Quicksort chooses a pivot element [often pseudo-randomly to avoid worst cases {O(n2)}; the median would be a good choice {is however hard to find}], which determines how to partition the input array; larger elements are moved on one side, smaller {and equal} ones on the other side [in-place; no additional sort-space is needed]; the sub-arrays then are recursively sorted, eventually up to pairs of elements, which establishes the final sort step. When recursively descending, the smaller partition is chosen first and the rest is done by optimized tail recursion to limit stack size use drastically [recursion depth is maximally the binary logarithm of the number of elements in the input array]. Quick-sort implements the divide-and-conquer paradigm. [ C programming language's qsort / recursion / recursion applications / recursion to iteration ]
What are the typical questions on sorting algorithms?
The typical questions address runtime efficiency (as a function of input size, pre-ordering, key distribution, etc), worst-case runtime behavior, likelihood of that worst-case(s), average case, in-place data moves (as opposed to scratch-space requirements), stability (keeping pre-ordering on otherwise equal elements), overhead (computational effort per iteration), data access patterns (e.g. important when sorting data on a single tape). Samples: quick sort is efficient, has a bad worst-case behavior, which can however be made very unlikely, moves data in-place and is typically unstable; insertion sort is inefficient, has however a low overhead, moves data in-place and is stable. [ quick sort algorithm / comments on C's qsort / sort-up-to-a-unique-key pattern ]
What is lean software? What are lean interfaces? Lean software development?
Lean software design may lead to less complex design and to implementations that are more robust [and show less security-flaws]. Lean software often has a smaller footprint and runs on smaller devices too [and might run faster]. Lean interfaces are often easier to use and their use may be more robust; lean APIs and lean applications may have a steeper learning curve and therefore [depending on the usage pattern, e.g. frequency of use] cause a better overall productivity. Layering [and facades and bridges] may be an approach to lean software design [e.g. Turtle code layered above TurtleCore code; another sample are multi-level layered implementation hierarchies and filter classes, e.g. Java's input/output streams]. C is a quite lean language that can be used to produce extremely lean executables [therefore it does not even include run-time size information or run-time type information with variables, arrays, etc {applications have to take care of this information explicitly; C's buffer overflow problem partly arises from this fact}]. C++ is a fat language that can however be used to produce extremely lean executables [just like C, but requires thorough language insights!]. [Additional terms: lean methods, lean classes, compact software, fat interfaces, monolith vs. framework, compact command set vs. compact working set.] Lean software development means either development of lean software (discussed above) or lean development of [normal] software, which is rather about [development] processes than about design. Interesting approaches of such a lean development are: find defects early [late finding is expensive; might lead to: constant redesign / constant refactoring], 'delayed commitment' [early developing in the wrong direction is expensive], iterative development [small steps; proofs of concepts, proofs of functionalities, proofs of needs]. [ C / C overview / orthogonal design / robust code ]

Unix operating system

What is the Unix Man? What is man man?
Unix Man is a well known song seen from the Unix-point-of-view. Of course Unix man is the term for the Unix [online] manual [pages] [help] system too. Just type 'man man', or 'man man | more' to start. Several web servers offer gateways to these man pages. [Scope, options {and details} of commands, system calls and library functions vary a bit from Unix to Unix; be careful to recognize proprietary stuff, to stay portable.]
Would you implement a method man?
Documentation, to which category manual text counts, is considered different from implementation code, so I wouldn't usually implement man methods. Having manual text in code [not in code-comments or javadoc {and similar}] bloats up library size and is usually not needed during run-time. Lisp and Mathematica usually include short manual texts in the code [as first string in a defun-body or ::usage string]. [ unix man (unix manual) / man man ]
Sparse files: what are they good for? How to identify sparse files?
Sparse file characteristics are just a [file system] abstraction. Disk blocks are only written (allocated/occupied) if not zero (empty); Unix' lseek system call allows to skip bytes (that are hence not allocated). Find free sparse conversion code there too. [GNU tools tend to handle sparse file characteristics as well.] Sparse file identification is done by ls -l (real size [in bytes]) vs. ls -s (allocated size [in blocks]) on the command layer or by comparing stat system call's size information to the number of blocks, on the system call (C language) layer.
What is automount?
Automounting is a convenient way to automatically mount networked file systems [triggered by processes accessing directories, subdirectories or files residing on such file systems]. As a sample, the 'physical' file system /export/home/x/ gets automounted to /home/x/ [/home/ designated to trigger automounts, as e.g. specified in the according NIS+ tables], thus allowing users to log on any system in a networked environment and finding their expected home directory. If not used, automounted systems are eventually unmounted, allowing least invasive maintenance on the NFS servers.
How do I interpret Unix process trace information? How to view truss output? truss patterns? truss usage? Why are process tracing strategies important?
You might have a look at these Unix process tracing patterns. Some truss usage is given too, along with [small] output samples and truss output viewing strategies. Truss allows fast and easy trace-data acquiration with a quite deep level of detail.
Is Err#25 ENOTTY a severe Unix system call return value?
No. Some system call errors don't arise from unexpected behavior, but from a [simple] test, e.g. isatty [check input/output for interactivity, e.g. to check whether buffering is appropriate] eventually issuing a ioctl(...,TCGETA) call, returning with an errno set to ENOTTY. Judging the severities of system call errors and exceptions is often not an easy thing to do in intermediate software layers such as middleware or library code [apparent if there trying to figure out on what log level such a condition should be traced]. [ how to trace a process / Unix process tracing patterns on system call errors ]
Can the stat system call be sleeping?
Yes, consider accessing a file on an auto-mounted nfs file system and the nfs server doesn't respond or doesn't respond fast enough. The same may apply for a slow file-system carrying device such as a slow usb stick, or a device that requires the user to key-in a pin before information can be accessed. [ how to trace a process /