Penn State Earth System Modeling Team Developer Guidelines
Prepared by Yuning Shi (Janurary 2021)
- Penn State Earth System Modeling Team Developer Guidelines
- Introduction
- Code Management
- C Style Guide
- Naming
- Formatting
- Line Length
- Spaces vs. Tabs
- Function Declarations and Definitions
- Variable Declarations
- Function Calls
- Braced Initializer List Format
- Conditionals
- Loops and Switch Statements
- Pointer and Reference Expressions
- Boolean Expressions
- Return Values
- Preprocessor Directives
- Structure Format
- Horizontal Whitespace
- Vertical Whitespace
Introduction
The goal of this guide is to describe the best practices for Penn State Earth System Modeling Team code management, and the style that should be followed when writing C code. Code management is essential for successful collaboration, and following a style guide could significantly improve the readability and maintainability of the code. The formatting section is mainly copied from the Google C++ Style Guide, with a few changes.
Code Management
All code development is coordinated via PSUmodeling GitHub repositories.
Developers are recommended to fork the home repo to their accounts, clone the forked repository, make changes (preferably using a new branch other than the main), and push changes to the forked repository. When your branch is ready to be merged, make a pull request on GitHub. The home repository owner will review your code, and accept or reject your changes, depending on the quality of the changes. If you are making substantial changes, you can request to push directly to the home repo. However, no more than one developer should push to the home repo at the same time. If major changes are directly made to the home repo, a development branch should be used and the master branch of the home repo should always be stable.
Pulls from the Home Repo
After cloning your forked repository, an “upstream” target should be added in order to pull the latest updates from the home repo. Using GitHub command line tool is highly recommended.
$ git remote add upstream git@github.com:PSUmodeling/repo.git
Pull frequently to merge the latest updates from the home repo.
Commits
Commit and push often. Committing more means you will have more checkpoints, and you will thank yourself later. Commit messages should be concise and descriptive. Describe what features have been added, what bugs have been fixed, etc. Refer to the reported issues when possible. For example, “Increase MAXLYR to fix #7” Note that the reference in the commit message (#7 in this case) will automatically link to the issues on GitHub.
Tests
Before making a pull request, the code must be tested. While sometimes it is difficult to test on all operational systems, it should at least be tested on the system that you develop on.
- Test if the code compiles after the changes.
- Test if the code runs, using the sample input files.
- Test if the code produces reasonable results.
- Check if the code follows the coding style, as describes below.
Before merging the changes, the home repo owner should also test the code and review the coding style. The home repo owner has the right to review, comment on, and reject the pull requests.
Issues
All bug reports, feature requests, and code discussion should be made using the GitHub “Issues” feature.
Sample Input Files and User’s Manual
Each project should provide sample input files and a user’s manual. If any changes to the code require changes in input/output format, the developer should also modify the sample input files and the user’s manual accordingly.
Releases
All releases should include release notes, with a version number, and changes made since last release.
A version format of X.Y.Z (Major.Minor.Patch) should be used.
- Major version zero (0) should be used for pre-release initial development.
- Version 1.0.0 should be the first public version.
- Z should be incremented for bug fixes.
- Y should be incremented when minor new features and functionalities are added.
- X should be incremented when major new features are added.
- Any version that requires the changes in I/O format should trigger the changes in X.
The sections below present a style guide for C code, which should be followed by Penn State Earth System Modeling Team developers.
C Style Guide
Naming
General Naming Rules
Names for files, functions, constants, or variables should be descriptive and meaningful; avoid abbreviation as much as possible. Follow the guidelines below:
- Choose names with meanings that are precise and use them consistently throughout the program.
- Follow a uniform scheme when abbreviating names.
- Avoid abbreviations that form letter combinations that may suggest unintended meanings. For example, the name
inch
is a misleading abbreviation forinput character
. The namein_char
would be better. - Assign names that are unique (with respect to the number of unique characters permitted on your system).
- Use longer names to improve readability and clarity. However, if names are too long, the program may be more difficult to understand and it may be difficult to express the structure of the program using proper indentation.
- Names more than four characters in length should differ by at least two characters. For example,
systst
andsysstst
are easily confused.
File Names
Filenames should be all lowercase and can include underscores (_).
C files should end in .c
and header files should end in .h
, e.g., cycles.c
and cycles_const.h
.
Variable Names
The names of variables (including function parameters) and data members are all lowercase, with underscores between words. For example,
double radiation_growth;
double transpiration_growth;
Constant Names
The names of constants use upper-case words separated by underscores. All such variables with static storage duration (i.e., statics and globals) should be named this way. For example,
const double RESIDUE_MAX_WATER_CONC = 3.3;
Function Names
Ordinarily, functions should start with a capital letter and have a capital letter for each new word; do not use underscores. For example,
void CropGrowth(...)
{
...
}
Type Names
Type names (i.e., created with typedef
) should follow the naming standards for global variables.
Macro Names
Macros should be named with all capitals and underscores. For example,
#define PI 3.1415927
Exceptions to Naming Rules
If you are naming something that is analogous to an existing C entity then you can follow the existing naming convention scheme.
Formatting
Coding style and formatting are pretty arbitrary, but a project is much easier to follow if everyone uses the same style. Individuals may not agree with every aspect of the formatting rules, and some of the rules may take some getting used to, but it is important that all project contributors follow the style rules so that they can all read and understand everyone’s code easily.
Line Length
Each line of text in your code should be at most 120 characters long.
Comment lines can be longer than 120 characters if it is not feasible to split them without harming readability, ease of cut and paste or auto-linking, e.g., if a line contains an example command or a literal URL longer than 120 characters.
An #include
statement with a long path may exceed 80 columns.
You needn’t be concerned about header guards that exceed the maximum length.
Spaces vs. Tabs
Use only spaces (i.e., soft tabs), and indent four (4) spaces at a time. Do not use tabs in your code. Change the tab type of your editor to soft tabs, with a tab length of four spaces.
Function Declarations and Definitions
Return type on the same line as function name, parameters on the same line if they fit. Wrap parameter lists which do not fit on a single line as you would wrap arguments in a function call.
Functions look like this:
ReturnType FunctionName (Type par_name1, Type par_name2)
{
DoSomething ();
...
}
If you have too much text to fit on one line:
ReturnType ReallyLongFunctionName (Type par_name1, Type par_name2,
Type par_name3)
{
DoSomething ();
...
}
or if you cannot fit even the first parameter:
ReturnType ReallyReallyReallyLongFunctionName(
Type par_name1, Type par_name2, Type par_name3)
{
DoSomething ();
...
}
Some points to note:
- Choose good parameter names.
- Parameter names may be omitted only if the parameter is unused and its purpose is obvious.
- If you cannot fit the return type and the function name on a single line, break between them.
- If you break after the return type of a function declaration or definition, do not indent.
- The open parenthesis is always on the same line as the function name.
- There is never a space between the function name and the open parenthesis.
- There is never a space between the parentheses and the parameters.
- The open curly brace is always on the start of the next line.
- The close curly brace is always on the last line by itself.
- Default indentation is 4 spaces.
- Wrapped parameters have a 4 space indent.
Variable Declarations
Follow these guidelines for variable declarations:
- Align variable declarations so that the first letter of each variable name is in the 17th column, less the indent at the beginning of the line.
- Declare each internal variable on a separate line followed by an explanatory comment, unless variables are self-explanatory and related.
- Loop indices can all be listed on the same line with one comment.
ReturnType FunctionName()
{
Type var_name1; /* Explanatory comment */
StructType var_name2; /* Explanatory comment */
Type *var_name3; /* Explanatory comment */
int i, j, k; /* Loop indices */
Type year, month, day; /* Self-exlanatory and related */
}
Function Calls
Function calls have the following format:
result = DoSomething(argument1, argument2, argument3);
If the arguments do not all fit on one line, they should be broken up onto multiple lines, with a four space indent. Do not add spaces after the open paren or before the close paren:
result = DoSomething(averyveryveryverylongargument1,
argument2, argument3);
Arguments may optionally all be placed on subsequent lines with a four space indent if that improves readability:
if (...)
{
...
...
if (...)
{
result = DoSomething(
argument1, argument2,
argument3, argument4);
}
}
Braced Initializer List Format
Format a braced initializer list exactly like you would format a function call in its place.
If the braced list follows a name (e.g., a type or variable name), format as if the {}
were the parentheses of a function call with that name.
If there is no name, assume a zero-length name.
/* Examples of braced init list on a single line. */
SomeFunction(
{"assume a zero-length name before {"},
some_other_function_parameter);
SomeType variable{
some, other, values,
{"assume a zero-length name before {"},
SomeOtherType{
"Very long string requiring the surrounding breaks.",
some, other values},
SomeOtherType{"Slightly shorter string",
some, other, values}};
SomeType variable{
"This is too long to fit all in one line"};
MyType m = {
superlongvariablename1,
superlongvariablename2,
{short, interior, list},
{interiorwrappinglist,
interiorwrappinglist2}};
Conditionals
Prefer no spaces inside parentheses.
The if
and else
keywords belong on separate lines.
Do not use spaces between the parentheses and the condition.
if (condition)
{
...
}
else if (...)
{
...
}
else
{
...
}
You must have a space between the if
and the open parenthesis.
if
and else
must always always have accompanying braces.
The open and close curly braces are always on the start of lines by themselves.
When using ? :
conditional operators, always put a space around conditional operators, and put parenthesis around the conditions.
z = (a > b) ? a : b;
Loops and Switch Statements
Switch statements do not need braces. Annotate non-trivial fall-through between cases. Braces must be used even for single-statement loops. Empty loop bodies should use empty braces or continue.
case
blocks in switch
statements do not need curly braces.
If not conditional on an enumerated value, switch statements should always have a default case (in the case of an enumerated value, the compiler will warn you if any values are not handled). If the default case should never execute, simply abort or assert:
switch (var)
{
case 0:
...
break;
case 1:
...
break;
default:
abort();
}
Braces must be used even for single-statement loops.
for (int i = 0; i < kSomeNumber; i++)
{
printf("I love you\n");
}
Empty loop bodies should use an empty pair of braces or continue, but not a single semicolon.
while (condition)
{
/* Repeat test until it returns false. */
}
for (i = 0; i < kSomeNumber; i++)
{}
Pointer and Reference Expressions
No spaces around period or arrow. Pointer operators do not have trailing spaces. The following are examples of correctly-formatted pointer and reference expressions:
x = *p;
p = &x;
x = r.y;
x = r->y;
Note that:
- There are no spaces around the period or arrow when accessing a member.
- Pointer operators have no space after the * or &.
- When declaring a pointer variable or argument, you may place the asterisk adjacent to either the type or to the variable name:
/* These are fine, space preceding. */
char *c;
char* c;
It is allowed (if unusual) to declare multiple variables in the same declaration, but it is disallowed if any of those have pointer or reference decorations. Such declarations are easily misread.
/* Fine if helpful for readability. */
int x, y;
int x, *y; /* Disallowed - no & or * in multiple declaration */
char * c; /* Bad - spaces on both sides of * */
Boolean Expressions
When you have a boolean expression that is longer than the standard line length, the logical operators are always at the end of the lines:
if (this_one_thing > this_other_thing &&
a_third_thing == a_fourth_thing &&
yet_another && last_one)
{
...
}
Note that when the code wraps in this example, both of the &&
logical AND operators are at the end of the line.
Feel free to insert extra parentheses judiciously because they can be very helpful in increasing readability when used appropriately.
Also note that you should always use the punctuation operators, such as &&
and ||
, rather than the word operators.
Return Values
Do not needlessly surround the return expression with parentheses.
Use parentheses in return expr;
only where you would use them in x = expr;
.
Always use a space after return
.
return result; /* No parentheses in the simple case. */
/* Parentheses OK to make a complex expression more readable. */
return (some_long_condition && another_condition);
Preprocessor Directives
The hash mark that starts a preprocessor directive should always be at the beginning of the line.
Even when preprocessor directives are within the body of indented code, the directives should start at the beginning of the line.
if (lopsided_score)
{
#if DISASTER_PENDING
DropEverything();
# if NOTIFY
NotifyClient();
# endif
#endif
BackToNormal();
}
Structure Format
The basic format for a structure definition is:
typedef struct my_struct
{
Type var_name1;
Type *var_name2;
};
Horizontal Whitespace
Never put trailing whitespace at the end of a line.
General
Adding trailing whitespace can cause extra work for others editing the same file, when they merge, as can removing existing trailing whitespace. So: Don’t introduce trailing whitespace. Remove it if you’re already changing that line, or do it in a separate clean-up operation (preferably when no-one else is working on the file).
Loops and Conditionals
while (test)
{
/* There is no space inside parentheses. */
}
for (i = 0; i < 5; i++ )
{
/* For loops always have a space after the semicolon. */
...
}
switch (i)
{
case 1:
/* No space before colon in a switch case. */
...
}
Operators
/* Assignment operators always have spaces around them. */
x = 0;
/* Other binary operators always have spaces around them.
* Parentheses should have no internal padding. */
v = w * x + y / z;
v = w * (x + z);
/* No spaces separating unary operators and their arguments. */
x = -5;
++x;
if (x && !y)
{
}
When you split a long line with operators, the operators are always at the end of the lines. The lines should be split at where it is meaningful. For example,
x = a * b +
c * d;
is preferred to
x = a * b + c *
d;
Vertical Whitespace
Minimize use of vertical whitespace.
This is more a principle than a rule: don’t use blank lines when you don’t have to. In particular, don’t put more than one blank lines between functions, resist starting functions with a blank line, don’t end functions with a blank line, and be discriminating with your use of blank lines inside functions.
The basic principle is: The more code that fits on one screen, the easier it is to follow and understand the control flow of the program. Of course, readability can suffer from code being too dense as well as too spread out, so use your judgement. But in general, minimize use of vertical whitespace.
Some rules of thumb to help when blank lines may be useful:
- Blank lines at the beginning or end of a function very rarely help readability.
- Blank lines inside a chain of if-else blocks may well help readability.