What is an array in C programming language?

An array is a collection of the same data type items, and helps you access using a common name.  This is an formal definition and does not tell too much. Because arrays decay into the start address of the elements during parameter passing into a function, most of the books and lectures mentioned this data type as if they are a regular pointer for the beginning of the elements . In this article, I’ll try to explain the arrays in C as the compiler point of view, so you’ll be able to use them more effectively.
 
Before going into details, we need to understand the ELF namely Executable and Linkable Format.  When we build a code into ELF, executable codes, and variables go into different code sections.

 

You can see the details of the ELF layout in Fig.1.  As can be seen in the figure, the different part of the code goes into the different code section of the ELF file.  If you are using a native compiler that runs on an OS (such as GNU/Linux), this layout is defined by the default linker script.  A linker script is a directive file for the linker to arrange the location of the specific code blocks in the ELF file. You can see your default linker script file by appending -Wl, –verbose to the compiler command as in depicted below:

$ cc -o array array.c -Wall -g -Wl,–verbose

On the other hand if you are working on a bare-metal microcontroller, you have to explicitly define the linker script file to utilize internal flash memory properly.  The linker script file deserves a dedicated article for its use and details, thus let’s defer its discussion to the future articles.

Fig.1: ELF layout.
/* Test code 1 */
#include <stdio.h>

char *string_pointer = "test string";

int main(int argc, char *argv[])
{
    printf("sizeof(string_pointer) = %ld\n", sizeof(string_pointer));
    printf("%s\n", string_pointer);
    string_pointer[0] = 'T';
    printf("%s\n", string_pointer);
    return 0;
}
/* Test code 2 */
#include <stdio.h>

char string_array[] = "test string";

int main(int argc, char *argv[])
{
    printf("sizeof(string_array) = %ld\n", sizeof(string_array));
    printf("%s\n", string_array);
    string_array[0] = 'T';
    printf("%s\n", string_array);
    return 0;
}
bolat@pc ~ $ cc -o array array.c -Wall -g
bolat@pc ~ $ ./array 
sizeof(string_pointer) = 8
test string
Segmentation fault
bolat@pc ~ $ cc -o array array.c -Wall -g
bolat@pc ~/blogbolat $ ./array 
sizeof(string_array) = 12
test string
Test string

In Test code 1, you can see that a pointer initialized with a string literal, line 4, and we try to write into its first character line 10, then we got a “Segmentation fault.”

As contrary to Test code 1, in Test Code 2, an array has been initialized with the same string literal, and we are able to change the case of the first character of the string, line 10.

Why did we get different results? Arrays decay into the pointers anyway, right? Actually, no. Let’s take a look at the code with the compiler perspective.

As you can observe in Fig.2, a pointer to the character and a character array have been defined as the global variables with an initialization value called “test string”.  Due to the nature of these variables (both global and initialized), crt.o is responsible for their initialization. C runtime (crt.o) is also responsible to load shared libraries, to call the main function, and to handle return value of the main function.  I am not going into details for C runtime, you can refer to the GCC documentation for more information.

One important detail for the arrays (if they were specified as either global or static), the compiler allocates the necessary memory in .bss or .data (if it is initialized) section of the program. 

As you remember from the pointers in C, they have two features: its type and address value.  In this example, I used “test string” as an initial value, thus its type is char(*)[12] (the size of the string is 12), and its address value is 0xAABB. This is an arbitrary number, and I made up for the sake of this example. The C compiler is responsible to arrange the address value during compile time memory allocation. When you initialize a pointer (global or static), crt only takes into account the address of the rvalue(“test string”), then copy it into the lvalue pointer (in this case : char *string_pointer), so when you attempt to write into the first character of the string, you actually attempt to write into the .rodata section (read only code section). Therefore we got a “Segmentation fault” in the Test Code 1.
On the other hand if we initialize a character array, the C compiler first calculates the size of the rvalue (“test string”) -which is 12 bytes, allocates a memory segment at least 12 bytes in .data section, then copy the elements of the string from .rodata section into the allocated memory segment located in .data section respectively. Therefore during any attempt to override into the first character of the array is a valid runtime operation, and we do not get any “Segmentation fault.”

Fig.2 : Initialization comparison for the pointer and the array.

So far I tried to explain the main difference between a pointer and an array, now, let’s take a look at how to take advantage of this difference.

#include <stdio.h>
#include <stdint.h>

enum COMMAND { WRITE = 0, READ = 1, DONE = 2, };
struct packet {
    uint8_t addr[4];
    uint16_t command;
    uint16_t length;
    uint16_t checksum;
    uint8_t payload[1];
};

int main(int argc, char *argv[])
{
    char data[] = { 192, 168, 1, 1,      /* target address*/
                    0, WRITE,            /* command type */
                    0, 14,               /* length of the data */
                    0x01, 0x71,          /* check sum of the packet */
                    0xDE, 0xAD, 0xBE, 0xEF, 0xAA, 0x55 }; /* payload */

    struct packet *parse;

    parse = (struct packet *)data;

    printf("addr[0] : %d\naddr[1] : %d\naddr[2] : %d\naddr[3] : %d\n command : %d\n length : %d\n checksum : %d\n", 
        parse->addr[0], parse->addr[1], parse->addr[2], parse->addr[3], parse->command, parse->length, parse->checksum);
    int i;
    for (i = 0; i < 6; ++i) {
        printf("  payload[%d] = %d\n", i, parse->payload[i]);
    }

    return 0;
}

In this example, an array has been wrapped with a struct to create a parse template to extract necessary fields from a sample data packet.  This is a human readable way to implement a parser, rather using pointer arithmetics to find correct offsets from the packet data. This can may help you implement a simple protocol parser for your firmware applications.

bolat@pc ~ $ cc -o parse parse.c -Wall -g
bolat@pc ~ $ ./parse 
addr[0] : 192
addr[1] : 168
addr[2] : 1
addr[3] : 1
 command : 0
 length : 3584
 checksum : 28929
  payload[0] = 222
  payload[1] = 173
  payload[2] = 190
  payload[3] = 239
  payload[4] = 170
  payload[5] = 85

This code section also shows how to return an array from a function by wrapping a struct in C.  This example code is a little bit messy and not useful for production purposes, but it is useful for us to understand what is happening under the hood while we are dealing with arrays in C.

#include <stdio.h>

#define ARRAY_SIZE (12)

struct array_return {
    int array[ARRAY_SIZE];
};

struct array_return foo(void)
{
    struct array_return ret;
    int i;
    for (i = 0; i < ARRAY_SIZE; ++i)
        ret.array[i] = i;
    return ret;
}

int main(int argc, char *argv[])
{
    struct array_return ret_val;
    ret_val = foo();
    int i;
    for (i = 0; i < ARRAY_SIZE; ++i) {
        printf("data[%d] : %d\n", i,
            ret_val.array[i]);
    }
    return 0;
}
bolat@pc ~ $ ./return-array 
data[0] : 0
data[1] : 1
data[2] : 2
data[3] : 3
data[4] : 4
data[5] : 5
data[6] : 6
data[7] : 7
data[8] : 8
data[9] : 9
data[10] : 10
data[11] : 11

Although arrays behave like regular pointers in many cases, their behavior is far different with the operators sizeof and &, along with the structs. In programming, it is always the best to catch problem as early as possible, and I believe knowing of these differences help you not only evade bugs, but also utilize data structures efficiently.