Hacker News

1718627440 4 months ago [ - ]

> For that, you needed CGI scripts, which meant learning Perl or C. I tried learning C to write CGI scripts. It was too hard. Hundreds of lines just to grab a query parameter from a URL. The barrier to dynamic content was brutal.

That's folk wisdom, but is it actually true? "Hundreds of lines just to grab a query parameter from a URL."

    /*@null@*/
    /*@only@*/
    char *
    get_param (const char * param)
    {
        const char * query = getenv ("QUERY_STRING");
        if (NULL == query) return NULL;

        char * begin = strstr (query, param);
        if ((NULL == begin) || (begin[strlen (param)] != '=')) return NULL;
        begin += strlen (param) + 1;

        char * end = strchr (begin, '&');
        if (NULL == end) return strdup (begin);

        return strndup (begin, end-begin);
    }

In practice you would probably parse all parameters at once and maybe use a library.

I recently wrote a survey website in pure C. I considered python first, but do to having written a HTML generation library earlier, it was quite a cakewalk in C. I also used the CGI library of my OS, which granted was one of the worst code I ever refactored, but after, it was quite nice. Also SQLite is awesome. In the end I statically linked it, so I got a single binary to upload anywhere. I don't even need to setup a database file, this is done by the program itself. It also could be tested without a webserver, because the CGI library supports passing variables over stdin. Then my program outputs the webpage on stdout.

So my conclusion is: CRUD websites in C are easy and actually a breeze. Maybe that also has my previous conclusion as a prerequisite: HTML represents a tree and string interpolation is the wrong tool to generate a tree description.

flanfly 4 months ago [ - ]

Good showcase. Your code will match the first parameter that has <param> as a suffix, no necessarily <param> exactly (username=blag&name=blub will return blag). It also doesn't handle any percent encoding.

stouset 4 months ago [ - ]

Further, when retrieving multiple parameters, you have a Shlemiel-the-painter algorithm.

https://www.joelonsoftware.com/2001/12/11/back-to-basics/

1718627440 4 months ago [ - ]

Thanks, good author. I also like to read him. Honestly not parsing the whole query string at once feels kind of dumb. To quote myself:

> In practice you would probably parse all parameters at once and maybe use a library.

4 months ago [ - ]

[deleted]

1718627440 4 months ago [ - ]

> Your code will match the first parameter that has <param> as a suffix, no necessarily <param> exactly

Depending on your requirements, that might be a feature.

> It also doesn't handle any percent encoding.

This does literal matches, so yes you would need to pass the param already percent encoded. This is a trade off I did, not for that case, but for similar issues. I don't like non-ASCII in my source code, so I would want to encode this in some way anyway.

But you are right, you shouldn't put this into a generic library. Whether it suffices for your project or not, depends on your requirements.

stouset 4 months ago [ - ]

This exact mindset is why so much software is irreparably broken and riddled with CVEs.

Written standard be damned; I’ll just bang out something that vaguely looks like it handles the main cases I can remember off the top of my head. What could go wrong?

1718627440 4 months ago [ - ]

Most commenters seem to miss that this is the throwaway code for HN, with a maximum allocated time of five minutes. I wouldn't commit it like this. The final code did cope with percent-encoding even though the project didn't took any user generated values at all. And I did read the RFCs, which honestly most developers I meet don't care to do. I also made sure the percent-decodation function did not rely on the ASCII ordering (it only relies on A-Z being continuous), because of portability (EBCDIC) and I have some professional honor.

bruce343434 4 months ago [ - ]

I get that, but your initial comment implied you were about to showcase a counter to "Hundreds of lines just to grab a query parameter from a URL", but instead you showed "Poorly and incompletely parsing a single parameter can be done in less than 100 lines".

You said you allocated 5 minutes max to this snippet, well in php this would be 5 seconds and 1 line. And it would be a proper solution.

    $name = $_GET['name'] ?? SOME_DEFAULT;

1718627440 4 months ago [ - ]

And in the code in C it looks like this, which is also a proper solution, I did not measure the time, it took me to write that.

    name = cgiGetValue (cgi, "name");
    if (!name) name = SOME_DEFAULT;

If you allow for GCC extensions, it looks like this:

    name = cgiGetValue (cgi, "name") ?: SOME_DEFAULT;

shakna 4 months ago [ - ]

That would fail on a user supplying a multiple where you don't expect.

> If multiple fields are used (i.e. a variable that may contain several values) the value returned contains all these values concatenated together with a newline character as separator.

stouset 4 months ago [ - ]

In GP’s defense, there is no standard behavior in the spec for handling repeated GET query parameters. Therefore any implementation-defined behavior is reasonable, including: keeping only the first, keeping only the last, keeping one at random, allowing access to all of them, concatenating them all with a separator, discarding the entire thing, etc.

1718627440 4 months ago [ - ]

Why? The actual implementation of cgiGetValue I am talking about does exactly that:

> concatenated together with a newline character

4 months ago [ - ]

[deleted]

recursive 4 months ago [ - ]

Ampersands are ASCII, but also need to be encoded to be in a parameter value.

1718627440 4 months ago [ - ]

Yeah, but you can totally choose to not allow that in your software.

recursive 4 months ago [ - ]

That's true. Your argument about how short parameter extraction can be gets a little weaker though if only solve it for the easy cases. Code can be shorter if it solves a simplified version of the problem statement.

lelanthran 4 months ago [ - ]

> That's folk wisdom, but is it actually true? "Hundreds of lines just to grab a query parameter from a URL."

No, because...

> In practice you would probably parse all parameters at once and maybe use a library.

In the 90s I wrote CGI applications in C; a single function, on startup, parsed the request params into an array (today I'd use a hashmap, but I was very young then and didn't know any better) of `struct {char name; char value}`. It was paired with a `get(const char name)` function that returned `const char ` value for the specified name.

TBH, a lot of the "common folk wisdom" about C has more "common" in it than "wisdom". I wonder what a C library would look like today, for handling HTTP requests.

Maybe hashmap for request params, union for the `body` depending on content-type parsing, tree library for JSON parsing/generation, arena allocator for each request, a thread-pool, etc.

Akronymus 4 months ago [ - ]

Just FYI the 's got swallowed by the HN formatting and made the stuff inbetween italic.

  struct { char *name; char *value}

can bypass the formatting by prepending 2 spaces

1718627440 4 months ago [ - ]

> Just FYI the 's got swallowed

Ironically enough your *'s as well.

1718627440 3 months ago [ - ]

> I wonder what a C library would look like today, for handling HTTP requests.

Isn't that CURL?

lelanthran 3 months ago [ - ]

>> I wonder what a C library would look like today, for handling HTTP requests.

> Isn't that CURL?

Curl sends requests, it doesn't handle incoming requests.

bryanlarsen 4 months ago [ - ]

> HTML represents a tree and string interpolation is the wrong tool to generate a tree description.

Yet 30 years later it feels like string interpolation is the most common tool. It probably isn't, but still surprisingly common.

1718627440 4 months ago [ - ]

Which is really sad. This is the actual reason why I preferred C over Python[*] for that project, so I could use my own library for HTML generation, which does exactly that. It also ameliorates the `goto cleanup;` thing, since now you can just tell the library to throw subtrees away. And the best thing is, that you can MOVE, and COPY them, which means you can generate code once and then fill it with the data and still later modify it. This means you can also refer to earlier generated values to generate something else, without needing to store everything twice or reparse your own output.

[*] I mean yeah, I could have written a wrapper, but that would have taken far more time.

toast0 4 months ago [ - ]

The thing is, the browser needs the tree, but the server doesn't really need the whole tree.

Building the tree on the server is usually wasted work. Not a lot of tree oriented output as you make it libraries.

1718627440 4 months ago [ - ]

My point is that treating it as the tree it is, is the only way to really make it impossible to produce invalid HTML. You could also actually validate not just syntax, but also semantic.

> Not a lot of tree oriented output as you make it libraries.

That was actually the point of my library, although I must admit, I haven't implemented actually streaming the HTML output out, before having composed the whole tree. It isn't actually that complicated, what I would need to implement would be to make part of the tree immutable, so that the HTML for it can already be generated.

nextaccountic 4 months ago [ - ]

There was a system with dependent types that ruled out invalid html at compile time, even dynamically generated html (rather than a runtime error, you would get a compile error if your code did something wrong)

https://github.com/urweb/urweb

http://www.impredicative.com/ur/

Needless to say it wasn't very practical. But there was one commercial site written in it https://github.com/bazqux/bazqux-urweb (the site still exists but not sure if it's still written in ur/web)

vshabanov 4 months ago [ - ]

It's still written in Ur/Web. And the type-safety of Ur/Web is the reason I started writing it -- I couldn't imagine myself using untyped JavaScript.

Ur/Web is not very practical for reasons other than type safety: the lack of libraries and slow compilation when the project gets big. The language itself is good, though.

Nowadays, I would probably choose OCaml. It doesn't have Ur/Web's high-level features, but it's typed and compiles quckly.

nextaccountic 3 months ago [ - ]

Whoa, that's cool!

What do you think about Typescript? I mean it's unsound but - it sounds like an ok compromise

vshabanov 3 months ago [ - ]

Didn't use TypeScript, so can't say much. I guess some types are better than no types, and it's easier to use for JavaScript developers.

Personally, I'd prefer a fully typed language.