Hacker News

uecker 2 days ago [ - ]

Two things stick out as un-idiomatic for C. First, the casts before malloc are unnecessary. This you do in C++ but not in C. Second, names with beginning underscore are reserved, and the underscore + capital letter is specifically problematic.

The rest looks fairly nice but there are a couple of things I would do differently: I would not have the tests for NULL, but use signed integers for indices and dimensions, use a flexible array member to integrate the data into the vector type directly, and omit the capacity field (as long as benchmarking does not show it is really needed). I would also use variably modified types for bounds checking, and with C23 the include guards become largely unnecessary.

(edit: minor edit for clarity)

oraziorillo 2 days ago [ - ]

I re-read the second part with more calm once I got back home. There were at least a couple of things I wasn't familiar with (and therefore didn't even consider), specifically FAM and variably modified types. Thanks for the pointer, I'm realizing that having written this post and reading the comments is teaching me more about C than coding the project itself, that's crazy!

I can answer about the include guards, though. I consciously added them for portability, following the same general approach that led me to handle the big-endian to little-endian conversion explicitly in examples/mnist/idx.c: even if that safeguard is not strictly necessary on most modern systems, I love the idea that this project is potentially buildable and runnable in most environments.

valleyer 2 days ago [ - ]

Names beginning with double underbar (or single underbar + capital letter) are reserved. Single underbar + lowercase is not. C23 §6.4.2.1.

uecker 2 days ago [ - ]

Also reserved as identifier with file scope, just not for "any use". In any case, the program used underbar + capital letter.

valleyer 2 days ago [ - ]

Ah, I hadn't noticed _SimpleSetNode.

poly2it 2 days ago [ - ]

This leaves out part of the clause.

  All identifiers that begin with an underscore are reserved for use as identifiers with file scope in both the ordinary and tag name spaces.

Single underscore followed by non-uppercase is allowed, but not in file scope. This means that you can use them in structs and as local variables, but never as globals.

valleyer 2 days ago [ - ]

You're right, and I guess I've been breaking that rule for a while. What's the purpose there? The double-underbar and underbar-capital rules seem to be allowing for non-conflicting introduction of keywords. Is the single-underbar rule to protect standard library headers or something?

uecker 2 days ago [ - ]

Reserved identifiers in general are not only for future use of the standard, but also for internal use of implementations and for extensions. The single-under bar rule is to give the programmer some part of the _... namespace which is otherwise reserved for the implementation. I.e. for use in macros, variables, or struct members. The intention is for use in hidden names in some API.

oraziorillo 2 days ago [ - ]

Thank you for pointing this out, and to those adding clarifications to this statement as well. I've fixed the issue by renaming the reserved identifiers.

oraziorillo 2 days ago [ - ]

[deprecated post-edit] I guess I used function names beginning with underscore as it didn’t occur to me that it might be un-idiomatic. The intention was to make clear to myself that those functions are private and meant to be only used only in that file. [\deprecated post-edit]

About the second paragraph, first of all, thank you for the suggestions. Can I ask you to elaborate a little on the reasons for your proposals? For instance, even if redundant in some cases, I thought to myself it couldn’t be a bad thing to check for null pointers (though I could improve the error handling itself).

uecker 2 days ago [ - ]

In C, you would typically rely much more on tooling to find bugs (but there are different styles and opinions). Checking for null is not bad, but does not usually add anything. If you de-reference a null pointer, you get a segmentation fault (which is safe) and a debugger will give a nice backtrace. So why catch this by writing additional code if the right tool will give you this automatically? A sanitizer could also add such tests automatically.

For a similar reason, it makes sense to use signed integers. A signed overflow sanitizer will find the overflow bugs or safely terminate the program. Finding unsigned wraparound bugs is much harder.

oraziorillo 2 days ago [ - ]

I see, I agree that especially checking for null really comes to styles and opinions, I still don't have one I can call mine. Thanks for the explanation!

srean 2 days ago [ - ]

I have a different view about checking for NULL.

I would suggest you keep checking for NULLs. It's a good habit to have to watch over details and to remain cognizant of edge cases. There are tools of course but they are neither standard nor as ubiquitous as C.

Sloppiness becomes a habit and creeps into other aspects.

oraziorillo 2 days ago [ - ]

My experience on this project was that the checks proved to be useful way before I could run the program for the first time.

The first version of the code had relatively few null checks. Later I went through and added them consistently at function boundaries, and that process ended up revealing a surprising number of bugs and bad assumptions.

Sanitizers and static analyzers could have surely caught many of these issues later, but adding the checks was a useful way to reason about the code while writing it. It felt less defensive and more proactively preventive.

srean 2 days ago [ - ]

Yeah you can have a lighter work desk that way. You bring in the heavy tools later.

uecker a day ago [ - ]

Interestingly, I would see it just the other way round. It is easier for me to reason about the code if it not cluttered with null checks and a null sanitizer is I would see as a very light-weight tool. I may put an assert some once in while if such an assumption needs to be made very explicit.