Correct, portable way to interpret buffer as a struct

croyd

The context of my problem is in network programming. Say I want to send messages over the network between two programs. For simplicity, let's say messages look like this, and byte-order is not a concern. I want to find a correct, portable, and efficient way to define these messages as C structures. I know of four approaches to this: explicit casting, casting through a union, copying, and marshaling.

struct message {
    uint16_t logical_id;
    uint16_t command;
};

Explicit Casting:

void send_message(struct message *msg) {
    uint8_t *bytes = (uint8_t *) msg;
    /* call to write/send/sendto here */
}

void receive_message(uint8_t *bytes, size_t len) {
    assert(len >= sizeof(struct message);
    struct message *msg = (struct message*) bytes;
    /* And now use the message */
    if (msg->command == SELF_DESTRUCT)
        /* ... */
}

My understanding is that send_message does not violate aliasing rules, because a byte/char pointer may alias any type. However, the converse is not true, and so receive_message violates aliasing rules and thus has undefined behavior.

Casting Through a Union:

union message_u {
    struct message m;
    uint8_t bytes[sizeof(struct message)];
};

void receive_message_union(uint8_t *bytes, size_t len) {
    assert(len >= sizeof(struct message);
    union message_u *msgu = bytes;
    /* And now use the message */
    if (msgu->m.command == SELF_DESTRUCT)
        /* ... */
}

However, this seems to violate the idea that a union only contains one of its members at any given time. Additionally, this seems like it could lead to alignment issues if the source buffer isn't aligned on a word/half-word boundary.

Copying:

void receive_message_copy(uint8_t *bytes, size_t len) {
    assert(len >= sizeof(struct message);
    struct message msg;
    memcpy(&msg, bytes, sizeof msg);
    /* And now use the message */
    if (msg.command == SELF_DESTRUCT)
        /* ... */
}

This seems guaranteed to produce the correct result, but of course I would greatly prefer to not have to copy the data.

Marshaling

void send_message(struct message *msg) {
    uint8_t bytes[4];
    bytes[0] = msg.logical_id >> 8;
    bytes[1] = msg.logical_id & 0xff;
    bytes[2] = msg.command >> 8;
    bytes[3] = msg.command & 0xff;
    /* call to write/send/sendto here */
}

void receive_message_marshal(uint8_t *bytes, size_t len) {
    /* No longer relying on the size of the struct being meaningful */
    assert(len >= 4);    
    struct message msg;
    msg.logical_id = (bytes[0] << 8) | bytes[1];    /* Big-endian */
    msg.command = (bytes[2] << 8) | bytes[3];
    /* And now use the message */
    if (msg.command == SELF_DESTRUCT)
        /* ... */
}

Still have to copy, but now decoupled from the representation of the struct. But now we need be explicit with the position and size of each member, and endian-ness is a much more obvious issue.

Related info:

What is the strict aliasing rule?

Aliasing array with pointer-to-struct without violating the standard

When is char* safe for strict pointer aliasing?

http://blog.llvm.org/2011/05/what-every-c-programmer-should-know.html

Real World Example

I've been looking for examples of networking code to see how this situation is handled elsewhere. The light-weight ip has a few similar cases. In the udp.c file lies the following code:

/**
 * Process an incoming UDP datagram.
 *
 * Given an incoming UDP datagram (as a chain of pbufs) this function
 * finds a corresponding UDP PCB and hands over the pbuf to the pcbs
 * recv function. If no pcb is found or the datagram is incorrect, the
 * pbuf is freed.
 *
 * @param p pbuf to be demultiplexed to a UDP PCB (p->payload pointing to the UDP header)
 * @param inp network interface on which the datagram was received.
 *
 */
void
udp_input(struct pbuf *p, struct netif *inp)
{
  struct udp_hdr *udphdr;

  /* ... */

  udphdr = (struct udp_hdr *)p->payload;

  /* ... */
}

where struct udp_hdr is a packed representation of a udp header and p->payload is of type void *. Going on my understanding and this answer, this is definitely [edit- not] breaking strict-aliasing and thus has undefined behavior.

croyd

I guess this is what I've been trying to avoid, but I finally went and took a look at the C99 standard myself. Here's what I've found (emphasis added):
§6.3.2.2 void

1 The (nonexistent) value of a void expression (an expression that has type void) shall not be used in any way, and implicit or explicit conversions (except to void) shall not be applied to such an expression. If an expression of any other type is evaluated as a void expression, its value or designator is discarded. (A void expression is evaluated for its side effects.)

§6.3.2.3 Pointers

1 A pointer to void may be converted to or from a pointer to any incomplete or object type. A pointer to any incomplete or object type may be converted to a pointer to void and back again; the result shall compare equal to the original pointer.

And §3.14

1 object
region of data storage in the execution environment, the contents of which can represent values

§6.5

An object shall have its stored value accessed only by an lvalue expression that has one of the following types:
a type compatible with the effective type of the object,
— a qualified version of a type compatible with the effective type of the object,
— a type that is the signed or unsigned type corresponding to the effective type of the object,
— a type that is the signed or unsigned type corresponding to a qualified version of the effective type of the object,
— an aggregate or union type that includes one of the aforementioned types among its
members (including, recursively, a member of a subaggregate or contained union), or
— a character type.

§6.5

The effective type of an object for an access to its stored value is the declared type of the
object, if any. If a value is stored into an object having no declared type through an lvalue having a type that is not a character type, then the type of the lvalue becomes the effective type of the object for that access and for subsequent accesses that do not modify the stored value. If a value is copied into an object having no declared type using memcpy or memmove, or is copied as an array of character type, then the effective type of the modified object for that access and for subsequent accesses that do not modify the value is the effective type of the object from which the value is copied, if it has one. For all other accesses to an object having no declared type, the effective type of the object is simply the type of the lvalue used for the access.

§J.2 Undefined Behavior

— An attempt is made to use the value of a void expression, or an implicit or explicit conversion (except to void) is applied to a void expression (6.3.2.2).

Conclusion

It is ok (well-defined) to cast to-and-from a void*, but not ok to use a value of type void in C99. Therefore the "real world example" is not undefined behavior. Therefore, the explicit casting method can be used with the following modification, as long as alignment, padding, and byte-order is taken care of:

void receive_message(void *bytes, size_t len) {
    assert(len >= sizeof(struct message);
    struct message *msg = (struct message*) bytes;
    /* And now use the message */
    if (msg->command == SELF_DESTRUCT)
        /* ... */
}

本文收集自互联网,转载请注明来源。

如有侵权,请联系[email protected] 删除。

编辑于
0

我来说两句

0条评论
登录后参与评论

相关文章

来自分类Dev

Any way to interpret pstree?

来自分类Dev

Is this a correct way to use polymorphism

来自分类Dev

Best way to correct bundle ID

来自分类Dev

Shortest way to write immutable struct in C#

来自分类Dev

Correct way to develop generic modifiers in Java

来自分类Dev

What is the correct way to customize a Bootstrap theme?

来自分类Dev

what is the correct way to add type definition for this module

来自分类Dev

Correct way of referencing tools.jar in a cross-platform way

来自分类Dev

Is there a generic way to write a struct to bytes in Big Endian format?

来自分类Dev

Timer in portable class library

来自分类Dev

Correct way Entity framework master-detail insert record code for inserting child

来自分类Dev

Correct way of handling Node.js TLS server and C TLS client (openSSL) connections

来自分类Dev

Gimp Portable和文件jpeg

来自分类Dev

Ruby(Hard Way,Ex48)将数组匹配为哈希,然后将值分配给struct?

来自分类Dev

&Struct {}与Struct {}之间的区别

来自分类Dev

在struct中访问struct

来自分类Dev

How to interpret SVM-light results

来自分类Dev

Octave's 'system' does not interpret a escape sequence

来自分类Dev

A return type for a portable memory search function

来自分类Dev

Chocolatey Ngrok.portable无法正常工作

来自分类Dev

设置RStudio Portable默认R版本

来自分类Dev

.NET Portable Class Library [Serializable] Attribute

来自分类Dev

如何禁用Firefox Portable自动加载插件

来自分类Dev

Windows Portable 6.5的WCF客户端

来自分类Dev

Firefox Portable:插件被标记为“不兼容”

来自分类Dev

shasum的--portable选项有什么作用?

来自分类Dev

与 MS Windows 上的 Apache Portable Runtime 链接

来自分类Dev

nested struct - get "base" struct

来自分类Dev

C ++中的struct中的struct

Related 相关文章

热门标签

归档