Beigesoft™. Type-safe programming complex high-level applications in C.

This is about how to make complex high-level programs such as a dictionary in C in optimal way. Using C in this way to create such applications seems to be more optimal than using OOP C++ language.

This is also about programming "static" programs, i.e. that can be changed only during software update. Such complex high-performance programs are suitable for working on a weak device such as a mobile phone. For financial-like applications, Java with WEB-interface is the choice, because we can change and dynamically reload any service (e.g. a renderer JSP) on a never-stop enterprise application, and there is cross-platform WEB-app server A-Jetty. For LFSC static applications like a media-player or a dictionary the C is the choice. C is actually the most used language in a regular Linux distribution (kernel itself, glibc, system-d, dbus, glib, gtk, gimp, Java, Perl, etc. are written with C).

* In opposite to high-level applications, low level ones (e.g. a device driver) should use the pure C style, that is without type-safe wrappers, REF-counter libs...

We should consider these facts for maximum type-safe approach in a C-program:

  • there are libs and a client program that uses these libs

  • a lib consists of data models and methods dedicated to work with these data models

  • sometime direct access to lib's data models (e.g. shape->x1=1.23) is a wrong approach, e.g. when argument maybe wrong

  • by using different kind of methods and data models we can achieve any high-level programming approach (e.g. polymorphism) in type-safe way

  • checking object's (data-model) type during runtime (it do all OOP languages, GLIB...) isn't an optimal approach

  • by using different kind of type-safe approaches we can guarantee that the whole program is type-safe in COMPILE TIME, so checking object's type during runtime is redundant

  • it's all about finding gold mean between pure C-style (highest performance, but sometimes brutal and type-unsafe) and OOP-like (less error-able, more readable) approach

Using type-safe and OOP-like approaches in C on a real program.

Here is used https://github.com/demidenko05/bsdict source code for this. All approaches are made by using macros and methods-wrappers.

1. Using type-safe wrappers.

Type-unsafe methods should be wrapped by type-safe ones to expose to clients (to high-level libs or applications). Standard lib also should be wrapped, it makes more easy life for clients, e.g. bsdict/bslib/BsFioWrap.c:

void

bsfwrite_bool (bool *pData, FILE *pFile)

{

int cnt = 1;

int wcr = fwrite(pData, sizeof (bool), cnt, pFile);

if ( wcr != cnt )

{

if ( errno == 0 ) { errno = BSE_WRITE_FILE; }

BSLOG_ERR

}

}

Here BSLOG_ERR is macro that report error into LOG file in verbose way, bsdict/bslib/BsLog.h:

#define BSLOG_ERR bslog_log(BSLERROR, "%s:%s:%d\n", __FILE__, __func__, __LINE__);

2. Inheritance.

Use similar to these macros and methods to avoid code duplicating, or better, to reuse an existing code, bsdict/bslib/BsDataSet.c:

//Data set lib Types:

...

#define BSDATASET(pSetType) BS_IDX_T bsize; BS_IDX_T size; pSetType **vals;


typedef struct {

BSDATASET(void)

} BsDataSetTus;


typedef void BsVoid_Method(void);


typedef struct {

BSDATASET(BsVoid_Method)

} BsVoidMeths;


void

bsdatasettus_remove_shrink (BsDataSetTus *pSet, BS_IDX_T pIdx, Bs_Destruct *pObjDestr)

{

if ( pIdx < BS_IDX_0 || pIdx >= pSet->size )

{

errno = BSE_ARR_OUT_OF_BOUNDS;

BSLOG_ERR

return;

}

if ( pObjDestr != NULL )

{

pObjDestr (pSet->vals[pIdx]);

}

for (BS_IDX_T l = pIdx + BS_IDX_1; l < pSet->size; l++ )

{

pSet->vals[l - BS_IDX_1] = pSet->vals[l];

}

pSet->vals[pSet->size - BS_IDX_1] = NULL;

pSet->size--;

}


//Just type-safe wrapper:

void

bsvoidmeths_remove_shrink (BsVoidMeths *pSet, BS_IDX_T pIdx)

{

bsdatasettus_remove_shrink ((BsDataSetTus*) pSet, pIdx, NULL);

}

As you can see, there are basic type-unsafe data models (macros) and methods. And there are type-safe data models that extend basic ones. And there are type-safe method-wrappers that wrap and often extend type-unsafe ones.

3. Polymorphism, encapsulation.

Use a struct that encapsulates generic data and methods to expose completely abstract interface for high level libs and clients. For example bsdict/dict/BsDicObj.h:

/**

* <p>Generic destructor.</p>

* @param pDiIx - object or NULL

* @return always NULL

**/

typedef BsDiIxBs* BsDiIx_Destroy (BsDiIxBs *pDiIx);


/**

* <p>Find all matched words in given dictionary and IDX.</p>

* @param pDiIx - DIC with IDX

* @param pFdWrds - collection to add found records

* @param pSbwrd - sub-word to match

* @set errno if error.

**/

typedef void BsDiIxFind_Mtch (BsDiIxBs *pDiIx, BsDiFdWds *pFdWrds, char *pSbwrd);


/**

* <p>Read word's description with substituted DIC's tags by HTML ones

* from dictionary with search content any type.</p>

* @param pDiIx - DIC with IDX

* @param pFdWrd - found word with data to search content

* @return full description as BsHypStrs

* @set errno if error.

**/

typedef BsHypStrs *BsDiIx_Read (BsDiIxBs *pDiIx, BsDiFdWd *pFdWrd);


/**

* <p>Generic, type-safe assembly of text/audio/both/... dictionary

* with cached IDX head and methods (OOP like object).

* This is interface for high level GUI.

* This exposes abstractions (data and methods) that GUI needs.</p>

* @member nme - file name plus state - e.g. indexing..., it will be hided in GUI with opened diIx-head->nme

* @member pth - file path either from bsdict.conf or that user chose

* @member opSt - opening shared data

* @member pref - user preferences

* @member diIx - text/audio/both/... dictionary with cached IDX head

* @method diix_destroy - destroyer

* @method diixfind_mtch - finder of matched words

* @method diix_read - reader of content of found word

**/

typedef struct {

BsString *nme;

BsString *pth;

BsDiIxOst *opSt;

BsDiPref *pref;

BsDiIxBs *diIx;

BsDiIx_Destroy *diix_destroy;

BsDiIxFind_Mtch *diixfind_mtch;

BsDiIx_Read *diix_read;

} BsDicObj;

In BsDict application BsDicObj is the only OOP-object like assembly. So, this is not about using OOP for OOP itself sake. This is an extremely abstract interface that can be implemented with variety of things (here audio/text dicts). And clients never care and aware about implementation. This assembly is actually type-safe. It's made with this way:

  1. It's initialized with "Path" and "Name" from a chosen dictionary file or from the settings.

  2. BsDicObj's method void bsdicobj_open (BsDicObj* pDiObj); try to open this object, i.e. it initializes this object completely including type-safe methods.

Example of polymorphic invocation, BsDict.c:

...

for ( int i = 0; i < wdics->size; i++ )

{

errno = 0;

if ( wdics->vals[i]->opSt->stt == EBSDS_OPENED )

{

BS_DO_E_CONT (wdics->vals[i]->diixfind_mtch (wdics->vals[i]->diIx, sDicsWrds, sLastCstr))

}

}

...

Clients should have no type-unsafe code at all. By using these approaches, only libs can have type-unsafe code, but they must provide type-safe interface. As a result, clients have totally type-safe and nice code (without casting).

Other approaches for optimal programming in C.

We should also consider these approaches and facts for optimal programming:

  • the simpler the better

  • in most cases compiler's warnings are actually errors

  • use the most used in the universe part-assembly approach, an assembly can be a part of another assembly

  • part(of lib - single h and c file) is a minimum composite, there should not be cross(mutual)-dependencies between parts in a whole lib

  • of course, every lib should be tested

  • the more straight-full meaning (straightforward) the better, e.g. use "if (object_pointer != NULL)" instead of "if (object_pointer)", also use "if (is_ok)" only when "is_ok" may have TRUE(1) or FALSE(0) values, etc. Also names(aliases) of data(variables) and methods should be straightly-meanings, e.g. triangle_free(p_trngl) means just free memory and nulling pointers, but object_destroy(p_obj) means free memory and additional things like closing files (this helps to understand code more quickly without looking comments).

  • use only the best alternative, e.g. operator "? :" in Java code style considered to be hard to read, and "if" is preferred, so here "if" is also preferred to "switch"

  • source code should be well commented

  • a lot of methods should be public only for testing purpose

  • GDB should be used when "seg-fault" or a test fails, and adding new sub-tests, validators and logging seems to be excessive, but in practice looking into a code twice is often more efficient.

  • REF-libs and destructors are the best alternative, a garbage collector is a less optimal choice.

  • destructing dynamic vars depends of their scope:

    1. application scoped (even MT shared) vars are destructed when app exits, along with join main thread to non-stopped other ones.

    2. request scoped (1-st level invoking method's scoped) single-thread vars are destructed after request(sub-work) is done, they are the most used vars, they are similar to automatic vars

    3. on demand scoped (created and destroyed on events) multi-referenced vars are destructed by a REF-lib (e.g. KREF/GLIB/C++ extends RefCounted...).

    4. on demand scoped single-referenced vars are destructed usually on "destroy" event. To simplify coding such things, a REF-lib maybe the best alternative.

Rules above obey to decrease possible errors, ability to fix them more easy and faster, so as a result, programming a complex application with C isn't complicated than do it with a high level language, e.g. Java.

Also for BS LFSC standard, an application's (statically upgraded, non-enterprise) interface should be GTK2 based and adapted for mobile view.

Code style.

This is about how to make code's words more distinct (actually more faster recognized/perceived by brain). There are facts:

  • In practice short aliases (variables, functions, etc. names) look better in code and well recognized (e.g. use "strlen" instead of "string_get_length", human brain is naturally able to remember "spatial compressed/coded" graphic things such as hieroglyphs).

  • Spacing make code more readable. This is not only about inside line spaces, it's also about spaces between lines.

  • Vertical spaces is useful not only because of using upper-case macro. It's needed because of small monospace-fonts line spacing (actually glyph's height).

  • The more spaces inside line, the more longer line, i.e. it become less readable.

  • Distinct rules about naming words of different types and spacing improve readability, e.g. lower prefix "p" in method's arguments, "if ( i = 1 )", "a = s_method1 (45);"

  • type name "DataSet" is definitely better than "struct data_set". Non-struct data types (hardly ever used) should be prefixed.

  • type name "Compare_Tus*" is better than "int (*compare)(void*,void*)", because of "Tus" means type-unsafe "void*", and "int" is the standard of comparing result.

So, code should be spaced by spaces themselves and with shorting names. Naming and spacing should be distinct(different) depending of their subject.

Code style should be like this:

  • Horizontal off-start space (indentation) is 2 spaces, and big ('vary to avoid making long line) for split line's 2-nd parts (except method's "return type - name" line)

  • Vertical spacing is made of line with only brace (open/close block), empty line, and sometime with huge horizontal indentation (spacing) of 2-nd part[s] of a split line (in case of short next line)

  • Vertical spacing made of empty line is optional

  • Vertical spacing is optional for short methods

  • Types names start with upper cases.

  • methods and their types names are highlighted with "_", static ones should be prefixed with "s_"

  • variables never use "_", but use upper-cases letters after start, e.g. sTimer - static timer, pX1 - argument, bufSz - automatic var, .bufSize or better .bsize - member in an object.

  • Macro's names use upper-case letters with "_".

  • Enum values are same as macro names, except them should be prefixed with "E"

  • Debug lines can be long

Also autotools seems to be useless. Just add a new file (and its test) into Makefile during making a huge application step by step (file by file). The Make allows enough flexible behavior, e.g.:

libs=alsa sdl2

ifeq ($(shell pkg-config libass && echo $$?),0)

libs += libass

endif

//or from file created according user's preferences:

libs = $(shell cat libs.conf)


Handling and reporting errors.

Here errors mean "program's errors". Errors types:

  • program's checkable errors, e.g. a NULL pointer when it's not allowed, an index outside of an array, a wrong pointer outside the dedicated to it segment, etc.

  • program's non-checkable errors, e.g. a bad pointer inside the dedicated to it segment, a string non-initialized with the zero terminator and passed to "strlen", etc.

  • out of order errors, e.g. out of memory, network connection is broken, disk full, etc.

Even though there is no "often changed business logic", errors happen in a regular C applications. The best approach to handle and report errors should be like this:

  • any method that can cause an error must set thread scoped variable "errno" and log error

  • any subsequent-back method (that invokes errorable method) should check errno, and handle it - i.e. it should log to track the whole backtrace

  • a file is the best alternative to report any error

So, example is (see the data-set lib above):

//Error propagation generic macro:

#define BS_DO_E_OUT(p_invoked) p_invoked;\

if (errno != 0) { BSLOG_ERR goto out; }

...


//CLIENT's sub-lib: error propagation:

static int sCount = 0;


static void

s_void_meth1 (void)

{

sCount++;

}


static void

s_test1 ()

{

//arr is a request (method) scoped var. It will return error on hardly ever possible ENOMEM:

BSVOIDMETHS_NEW_E_RET (arr, 2L) //size=2


//increase=2 when full-filled:

BS_DO_E_OUT (bsvoidmeths_add_inc (arr, &s_void_meth1, 2))


BS_DO_E_OUT (bsvoidmeths_add_inc (arr, &s_void_meth1, 2))


//it will increase size or hardly ever possible ENOMEM:

BS_DO_E_OUT (bsvoidmeths_add_inc (arr, &s_void_meth1, 2))


bsvoidmeths_invoke_all (arr); //sCount must be 3


BS_DO_ERR (bsvoidmeths_remove_shrink (arr, 4L)) //ERROR! 4 > (size-1)=3


out:

bsvoidmeths_free(arr);

}

//Main client:

int

main (int argc, char *argv[])

{

...

s_test1 (); //must be BSE_ARR_OUT_OF_BOUNDS

if ( errno != 0 )

{

//report to user and stay working

//or finishing (free memory and closing files) and exiting:

...

}

//normal steps:

...

//back-trace report will be:

02/08/20 13:59:06.367 thread#140432137471744 ERROR: Out of bounds

BsDataSet.c:bsdatasettus_remove_shrink:498

02/08/20 13:59:06.367 thread#140432137471744 ERROR: Out of bounds

tst_BsDataSet.c:s_test1:67

But this approach can't catch real happen error e.g. "Segmentation fault". GLIBC utility "catchsegv" allows to track these errors by hand. But GDB is the best alternative, just use "backtrace" or "bt". To automatically report backtrace, use same named GLIBC method (execinfo.h), "open" file descriptor (fcntl.h) and "backtrace_symbols_fd" (see "debug" folder in GLIBC source). Alternatively, to record into own log file, you can use only "backtrace" method to get addresses, and it's seems enough (i.e. printing only method's addresses). Or make new method based on "backtrace_symbols/-fd" or to avoid using malloc or FD. So, after interception "SIGABRT" or similar errors, free resources and close files, then print stack-trace. To obtain the line in source code by address from report use command:

addr2line -e [program_file] [address1] [address2]

//e.g.:

addr2line -e tst_1 0x401da1 0x404d11

GLIBC's backtrace (actually GCC's _Unwind_Backtrace) can't work when cause is "wrong method's address", and so does "catchsegv" (it can print only registers dump). GDB can do it. So, if you got empty report, then see "bad pointers" to methods (use GDB). Anyway segmentation fault cause wrong method's pointer happen very seldom (used in code).

Using brutal "C-signals" plus printing backtrace (even only with methods addresses) may seem to be the best alternative to handle exactly program's errors. Because of program's errors should happen hardly ever, and never on production build (in theory, but not in practice). That is use "raise(SIGABRT)" plus well-logging or just "assert([condition])" to abort and report into stderr this condition.

In other hand, GLIB uses such excessive error checking and propagation without termination (checking for type, if parameter is NULL). This allows to handle and report error more accurately, to keep program working, so user can use other error-less functionality. So program's errors are treated as "part of functionality is out of order". Also, using "C-signals" requires additional coding to pass all destructors. In case of excessive error handling using macro can decrease coding (see example above).

Standard C-lib also treat part of exactly program's errors as non-fatal. E.g. "stdio.seek" will not abort if you point outside of a file, it just return error code. But "strlen" on NULL pointer will cause segmentation fault, instead of checking and returning error (or setting thread-local "errno" code).

So, additional (sometime may seem excessive) work like:

  • Making methods to operate objects with additional checking

  • Making type-safe wrappers

  • Making type-safe libs with macro-wrappers to allow clients to be tiny and totally type-safe at compile time

  • Testing

  • Logging (including debug)

  • Excessive error checking and handling

are considered as useful here for Beigesoft standard. They obey to decrease errors, to detect and fix them more faster.

*Such excessive things are called "fault tolerance" in "reliability engineering".

To avoid excessive double checking(validation) use well known phases:

  1. Object creating/initializing with validation

  2. Object further modification with validation

  3. Object using (without modification and double validation)

Although GLIB/GTK use excessive double checking, they are really fast.

Using excessive checking in destructors.

In this case, code becames more simple and readable, and IDE can generate automatically complex constructors and part of methods.

Constructor standard style example:

BsLogFiles*

bslogfiles_new (int pSize)

{

if ( pSize < 1 ) {

errno = BSE_ARR_WPSIZE;

fprintf (stderr, "%s %s\n", __func__, bserror_to_str(errno));

return NULL;

}

BsLogFiles *obj = malloc (sizeof (BsLogFiles));

if ( obj == NULL )

{

if ( errno == 0 ) { errno = ENOMEM; }

perror (__func__);

return NULL;

}

obj->files = malloc (pSize * sizeof (BsLogFile*));

if (obj->files == NULL)

{ goto err1; }

obj->size = pSize;

int i;

for ( i = 0; i < obj->size; i++ )

{

obj->files[i] = NULL;

}

for ( i = 0; i < obj->size; i++ )

{

obj->files[i] = malloc(sizeof(BsLogFile));

if (obj->files[i] == NULL)

{ goto err2; }

obj->files[i]->file = NULL;

obj->files[i]->path = NULL;

}

return obj;

err2:

for ( i = 0; i < obj->size; i++ )

{

if ( obj->files[i] != NULL )

{ free (obj->files[i]); }

}

free (obj->files);

err1:

free (obj);

if ( errno == 0 ) { errno = ENOMEM; }

perror (__func__);

return NULL;

}

Constructor/destructor with excessive checking example:

BsLogFiles*

bslogfiles_new (int pSize)

{

if ( pSize < 1 )

{

errno = BSE_ARR_WPSIZE;

fprintf (stderr, "%s %s\n", __func__, bserror_to_str(errno));

return NULL;

}

BsLogFiles *obj = malloc (sizeof (BsLogFiles));

if ( obj == NULL )

{

if ( errno == 0 ) { errno = ENOMEM; }

perror (__func__);

return NULL;

}

obj->files = malloc (pSize * sizeof (BsLogFile*));

if ( obj->files == NULL )

{

obj = bslogfiles_free (obj);

goto out;

}

obj->size = pSize;

int i;

for ( i = 0; i < obj->size; i++ )

{

obj->files[i] = NULL;

}

for ( i = 0; i < obj->size; i++ )

{

obj->files[i] = malloc (sizeof (BsLogFile));

if ( obj->files[i] == NULL )

{

obj = bslogfiles_free (obj);

break;

}

obj->files[i]->file = NULL;

obj->files[i]->path = NULL;

}

out:

if ( obj == NULL )

{

if ( errno == 0 ) { errno = ENOMEM; }

perror (__func__);

}

return obj;

}

//Destructor with excessive checking that always return NULL:

BsLogFiles*

bslogfiles_free (BsLogFiles *pLogFls)

{

if ( pLogFls != NULL ) {

if ( pLogFls->files != NULL ) {

for ( int i = 0; i < pLogFls->size; i++ ) {

if ( pLogFls->files[i]->file != NULL ) {

fprintf (pLogFls->files[i]->file, "BS-LOG try to close...\n");

fclose (pLogFls->files[i]->file);

}

if ( pLogFls->files[i]->path != NULL ) {

free (pLogFls->files[i]->path);

}

free (pLogFls->files[i]);

}

free (pLogFls->files);

}

free (pLogFls);

}

return NULL;

}

Method using constructors/destructors with excessive checking example (this part can be generated by IDE):

void

method1 (void)

{

OBJECTTYPE0_NEW_ERR_RET (ot0)


ObjectType1 ot1 = NULL;

ObjectType2 ot2 = NULL;

ObjectType3 ot3 = NULL;


INVOKE_ERR_OUT (ot1 = objecttype1_new())


INVOKE_ERR_OUT (ot2 = objecttype2_new())


INVOKE_ERR_OUT (ot3 = objecttype3_new())

...

out:

objecttype0_free (ot0);

objecttype1_free (ot1);

objecttype2_free (ot2);

objecttype3_free (ot3);

}

In this case destructor is more reusable and code looks more readable, less error-able, patternable, simple and similar for many use-cases.

Participate to develop additional/optimal C programs for BS LFSC (Linux From Source Code) that is planned to be portable - desktop/tablet/mobile.