> When implementing logging, it's important to distinguish between an error from the perspective of an individual operation and an error from the perspective of the overall program or system. Individual operations may well experience errors that are not error level log events for the overall program. You could say that an operation error is anything that prevents an operation from completing successfully, while a program level error is something that prevents the program as a whole from working right.
This is a nontrivial problem when using properly modularized code and libraries that perform logging. They can’t tell whether their operational error is also a program-level error, which can depend on usage context, but they still want to log the operational error themselves, in order to provide the details that aren’t accessible to higher-level code. This lower-level logging has to choose some status.
Should only “top-level” code ever log an error? That can make it difficult to identify the low-level root causes of a top-level failure. It also can hamper modularization, because it means you can’t repackage one program’s high-level code as a library for use by other programs, without somehow factoring out the logging code again.
This is why it’s almost always wrong for library functions to log anything, even on ”errors”. Pass the status up through return values or exceptions. As a library author you have no clue as how an application might use it. Multi threading, retry loops and expected failures will turn what’s a significant event in one context into what’s not even worthy of a debug log in another. No rule without exceptions of course, one valid case could be for example truly slow operations where progress reports are expected. Modern tracing telemetry with sampling can be another solution for the paranoid.
Depending on the language and logging framework, debug/trace logging can be acceptable in a library. But you have to be extra careful to make sure that it's ultimately a no-op.
A common problem in Java is someone will drop a log that looks something like this `log.trace("Doing " + foo + " to " + bar);`
The problem is, especially in a hot loop, that throw away string concatenation can ultimately be a performance problem. Especially if `foo` or `bar` have particularly expensive `toString` functions.
The proper way to do something like this in java is either
orIdeally a logging library should at least not make it easy to make that kind of mistake.
This isn't really something the logging library can do. If the language provides a string interpolation mechanism then that mechanism is what the programmers will reach for first. And the library cannot know that interpolation happened because the language creates the final string before passing it in.
If you want the builtin interpolation to become a noop in the face runtime log disabling then the logging library has to be a builtin too.
I feel like there's a parallel with SQL where you want to discourage manual interpolation. Taking inspiration from it may help: you may not fully solve it but there are some API ideas and patterns.
A logging framework may have the equivalent of prepared statements. You may also nudge usage where the raw string API is `log.traceRaw(String rawMessage)` while the parametrized one has the nicer naming `log.trace(Template t, param1, param2)`.
You can have 0 parameters and the template is a string...
The point of my message is that you should avoid the `log(string)` signature. Even if it's appealing, it's an easy perf trap.
There are many ideas if you look at SQL libs. In my example I used a different type but there other solutions. Be creative.
And none of those solve the issue.
You pass "foo" to Template. The Template will be instantiated before log ever sees it. You conveniently left out where the Foo string is computed from something that actually need computation.
Like both:
You just complicated everything to get the same class of error.Heck even the existing good way of doing it, which is less complicated than your way, still isn't safe from it.
All your examples have the same issue, both with just string concatenation and more expensive calls. You can only get around an unknowing or lazy programmer if the compiler can be smart enough to entirely skip these (JIT or not - a JIT would need to see that these calls never amount to anything and decide to skip them after a while. Not deterministically useful of course).Yeah, it's hard to prevent a sufficiently motivated dev from shooting itself in the foot; but these still help.
> You conveniently left out where the Foo string is computed from something that actually need computation.
I left it out because the comment I was replying to was pointing that some logs don't have params.
For the approach using a `Template` class, the expectation would be that the doc would call out why this class exists in the first place as to enable lazy computation. Doing string concatenation inside a template constructor should raise a few eyebrows when writing or reviewing code.
I wrote `logger.log(new Template("foo"))` in my previous comment for brevity as it's merely an internet comment and not a real framework. In real code I would not even use stringy logs but structured data attached to a unique code. But since this thread discusses performance of stringy logs, I would expect log templates to be defined as statics/constants that don't contain any runtime value. You could also integrate them with metadata such as log levels, schemas, translations, codes, etc.
Regarding args themselves, you're right that they can also be expensive to compute in the first place. You may then design the args to be passed by a callback which would allow to defer the param computation.
A possible example would be:
This is still not perfect as you may need to compute some data before the log "just in case" you need it for the log. For example you may want to record the current time, do the operation. If the operation times out, you use the time recorded before the op to compute for how long it ran. If you did not time out and don't log, then getting the current system time is "wasted".All I'm saying is that `logger.log(str)` is not the only possible API; and that splitting the definition of the log from the actual "emit" is a good pattern.
Unless log() is a macro of some sort that expands to if(logEnabled){internalLog(string)} - which a good optimizer will see through and not expand the string when logging is disabled.
Ideally, but realistically, I have never heard of any major programming language that allows you to express "this function only accepts static constant string literal".
Python has LiteralString for this exact purpose. It's only on the type checker level, but type checking should be part of most modern Python workflows anyway. I've seen DB libraries use this a lot for SQL parameters.
https://typing.python.org/en/latest/spec/literal.html#litera...
Beyond LiteralString there is now also t-strings, introduced in Python 3.14, that eases how one writes templated strings without loosing out on security. Java has something similar with Template class in Java 21 as preview.
We have this in c++ at Google. It's like securitytypes::StringLiteral. I don't know how it works under the hood, but it indeed only allows string literals.
Even PHP has that these days via static analysis https://phpstan.org/writing-php-code/phpdoc-types#other-adva...
c++20 offers `consteval` to make this clear, but you can do some simple macro wizardry in c++11 to do this:
(the re-evaluation of x doesn't matter if it compiles). You can also use a user-defined literal which has a different ergonomic problem.In Rust, this can almost be expressed as `arg: &'static str` to accept a reference to a string whose lifetime never ends. I say “almost” because this allows both string literals and references to static (but dynamically generated) string.
For Rust’s macros, a literal can be expressed as `$arg:lit`. This does allow other literals as well, such as int or float literals, but typically the generated code would only work for a string literal.
Not the language, but the linter can do it. IntelliJ inspections warn you if you do it: https://www.jetbrains.com/help/inspectopedia/StringConcatena...
it does seem like something a good static analysis tool should be able to catch though
This is not true. Any modern Java compiler will generate identical bytecode for both. Try it yourself and see! As a programmer you do not need to worry about such details, this is what the compiler is for. Choose whatever style feels best for you.
> Any modern Java compiler will generate identical bytecode for both. Try it yourself and see!
You may be misunderstanding something here.
If you follow the varargs-style recommendation, then concatenation occurs in the log class.
If you follow the guard-style recommendation, then the interpolated expressions will not be evaluated unless the log level matches.
In the naive approach, concatenation always occurs and all expressions which are part of the interpolation will be evaluated no matter the log level.
Could it be that you were thinking about StringBuffer vs. concatenation, an entirely unrelated problem?
Still quite like the windows log approach which (if logged) stores the template as just the id, with the values, saving lots of storage as well eg 123, foo, bar. You can concatenate in the reader.
So, it costs perf every time it’s read, instead of when it’s written (once). And of course has a lot of overhead to store metadata. Bad design. As usual.
Most logs are probably never read, but nevertheless should be written (fast) for unexpected situations when you will later need them. And logging have to be fast, and have minimal performance overhead.
> The problem is, especially in a hot loop ... The proper way to do something like this in java is either log.trace(..., ...) or if (log.traceEnabled()) log.trace(...)
The former still creates strings, for the garbage collector to mop up even when log.traceEnabled() is false, no?
Also, even if the former or latter is implemented as:
Most optimising JIT compilers will code hoist the if-condition when log.tracing is false, anyway.How about wrapping the log.trace param in a lambda and monkeypatching log.trace to take a function that returns a string, and of course pushing the conditional to the monkeypatched func.
Then you still have the overhead of the log.trace function call and the lambda construction (which is not cheap because it has closure over the params being logged and is passed as a param to a function call, so probably gets allocated on the heap)
>Then you still have the overhead of the log.trace function call
That's not an overhead at all. Even if it were it's not compareable to string concatenation.
Regarding overhead of lambda and copying params. Depends on the language, but usually strings are pass by ref and pass by values are just 1 word long, so we are talking one cycle per variable and 8 bytes of memory. Which were already paid anyways.
That said, logging functions that just take a list of vars are even better, like python's print()
> printtrace("var x and y",x,y)
> def printtrace(*kwargs):
>> print(kwargs) if trace else None
Python gets a lot of slack for being a slow language, but you get so much expressiveness that you can invest in optimization after paying a flat cycle cost.
That’s what most languages, including Java do.
The problem the OP is pointing out is that some programmers are incompetent and do string concatenation anyway. A mistake which if anything is even easier in Python thanks to string interpolation.
That is why the popular `tracing` crate in Rust uses macros for logging instead of functions. If the log level is too low, it doesn't evaluate the body of the macro
Does that mean the log level is a compilation parameter? Ideally, log levels shouldn't even be startup parameters, they should be changeable on the fly, at least for any server side code. Having to restart if bad enough, having to recompile to get debug logs would be an extraordinary nightmare (not only do you need to get your customers to reproduce the issue with debug logs, you actually have to ship them new binaries, which likely implies export controls and security validations etc).
I don't know how rust does it, but my internal C++ framework has a global static array so that we can lookup the current log level quickly, and change it at runtime as needed. It is very valuable to turn on specific debug logs at times, when someone has a problem and we want to know what some code is doing
I know this is standard practice, but I personally think it's more professional to attach a gdb like debugger to a process instead of depending on coded log statements.
A very common thing that will happen in professional environments is that you ship software to your customers, and they will occasionally complain that in certain situations (often ones they don't fully understand) the software misbehaves. You can't attach a debugger to your customer's setup that had a problem over the weekend and got restarted: the only solution to debug such issues is to have had programmed logs set up ahead of time.
In my professional life, somewhere over 99% of time, the code suffering the error has either been:
1. Production code running somewhere on a cluster.
2. Released code running somewhere on a end-user's machine.
3. Released production code running somewhere on an end-user's cluster.
And errors happen at weird times, like 3am on a Sunday morning on someone else's cluster. So I'd just as soon not have to wake up, figuring out all the paperwork to get access to some other company's cluster, and then figure out how to attach a debugger. Especially when the error is some non-reproducible corner case in a distributed algorithm that happens once every few months, and the failing process is long gone. Just no.
It is so much easier to ask the user to turn up logging and send me the logs. Nine times out of ten, this will fix the problems. The tenth time, I add more logs and ask the user to keep an eye open.
I think I get the idea, gdb is too powerful. For contexts where operator is distinct from manufacturer, the debug/logging tool needs to be weaker and not ad-hoc so it can be audited and to avoid exfiltrating user data.
It's not so much about power, but about the ad-hoc nature of attaching a debugger. If you're not there to catch and treat the error as it happens, a debugger is not useful in the slightest: by the time you can attach it, the error, or the context where it happened, are long gone. Not to mention, even if you can attach a debugger, it's most often not acceptable to pause the execution of the entire process for you to debug the error.
Especially since a lot of the time an exception being raised is not the actual bug: the bug happened many functions before. By logging key aspects of the state of the program, even in non-error cases, when an error happens, you have a much better chance of piecing together how you got to the error state in the first place.
The idea in Java is to let the JIT optimise away the logging code.
This is more flexible as it still allows runtime configuration of the logging level.
The OP is simply pointing that some programmers are incompetent and call the trace function incorrectly.
I very much appreciate libraries that provide optional logging. Tracing error causes in network protocol calls can be pretty near impossible without throwing a library/package/crate/whatever into TRACE mode.
Of course they shouldn't just be dumping text to stdout/stderr, but as long as the library logging is optional (or only logs when the library has reached some kind of unrecoverable state with instructions to file a bug report), logging is often the right call.
It's easier to have logs and turn them off at compile time/runtime than to not have logs and need them once deployed.
What you are proposing sounds like a nightmare to debug. The high level perspective of the operation is of course valuable for determining if an investigation is necessary, but the low level perspective in the library code is almost always where the relevant details are hiding. Not logging these details means you are in the dark about anything your abstractions are hiding from higher level code (which is usually a lot)
Those details don't belong in the error log level, that's what info or trace is for.
They were replying to a person who says “it’s almost always wrong for library functions to log anything”. Not just errors.
If it’s not your code how is a log useful vs returning an error?
Even relatively complex operations like say convert this document into a PDF etc basically only has two useful states either it worked or something specific failed at which point just tell me that thing.
Now independent software like web servers or database can have useful logs because they have completely independent interfaces with the outside world. But I call libraries they don’t call me.
That’s a very simple operation. Try “take these 100 user generated pdfs and translate all of them”. Oh, “cannot parse unexpected character 0x001?” Cool beans, I wish I knew more.
That’s ok, I’ll just check the log. 50MB of ‘This is my happy place.’ followed by a one liner “cannot to parse unexpected character 0x001?’
Any library can do a bad job here, that doesn’t come down to logging vs error messages.
The unspoken assumption you are making is that anyone who would disagree with your philosophy on this is incompetent.
Being incorrect doesn’t imply general incompetence.
Your statement that logging would contain zero useful information indicates an assumption of incompetence.
No, I’m only saying a useless error code and a useless log are both possible. Either could be useful or they could both be useless because the creator was actively malicious etc. Thus, the possibility of a useless error code doesn’t inherently mean a log would improve things.
Really the only thing we can defiantly say is when both approaches are executed well it’s harder to use log entries in your code. If something returns an error that’s tied to a specific call to a specific bit of code, where a log entry could in theory be from anything etc.
Trace can become so voluminous that it is switched on only on a need basis which can be too late for rare events. Also trace level as more a need to use debug tool tends to be less scrutinized for exposing sensitive data making it unsuitable for continuous operation or use in live production.
Simple: include those relevant details in the exceptions instead of hiding them.
It’s not that simple. First, this results in exception messages that are a concatenation of multiple levels of error escalation. These become difficult to read and have to be broken up again in reverse order.
Second, it can lose information about at what exact time and in what exact order things happened. For example, cleanup operations during stack unwinding can also produce log messages, and then it’s not clear anymore that the original error happened before those.
Even when you include a timestamp at each level, that’s often not sufficient to establish a unique ordering, unless you add some sort of unique counter.
It gets even more complicated when exceptions are escalated across thread boundaries.
> First, this results in exception messages that are a concatenation of multiple levels of error escalation. These become difficult to read and have to be broken up again in reverse order
Personally I don't mind it... the whole "$outer: $inner" convention naturally lends to messages that still parse in my brain and actually include the details in a pretty natural way. Something like:
"Error starting up: Could not connect to database: Could not read database configuration: Could not open config file: Permission denied"
Tells me the config file for the database has broken permissions. Because the permission denied error caused a failure opening the config file, which caused a failure reading the database configure, which caused a failure connecting to the database, which caused an error starting up. It's deterministic in that for "$outer: $inner", $inner always caused $outer.
Maybe it's just experience though, in a sense that it takes a lot of time and familiarity for someone to actually prefer the above. Non-technical people probably hate such messages and I don't necessarily blame them.
Sometimes you don’t have all the relevant details in scope at the point of error. For instance some recoverable thing might have happened first which exercises a backup path with slightly different data. This is not exception worthy and execution continues. Then maybe some piece of data in this backup path interacts poorly with some other backend causing an error. The exception won’t tell you how you got there, only where you got stuck. Logging can tell you the steps that led up to that, which is useful. Of course you need a way to deal with verbose logs effectively, but such systems aren’t exactly rare these days.
> Then maybe some piece of data in this backup path interacts poorly with some other backend causing an error. The exception won’t tell you how you got there, only where you got stuck.
Then catch the exception on the backup path and wrap it in a custom exception that conveys to the handler the fact that you were on the backup path. Then throw the new exception.
At the extreme end: If my Javascript frontend is being told about a database configuration error happening in the backend when a call with specific parameters is made - that is a SERIOUS security problem.
Errors are massaged for the reader - a database access library will know that a DNS error occurred and that is (the first step for debugging) why it cannot connect to the specified datastore. The service layer caller does not need to know that there is a DNS error, it just needs to know that the specified datastore is uncontactable (and then it can move on to the approriate resilience strategy, retry that same datastore, fallback to a different datastore, or tell the API that it cannot complete the call at all).
The caller can then decide what to do (typically say "Well, I tried, but nothing's happening, have yourself a merry 500)
It makes no sense for the Service level to know the details of why the database access layer could not connect, no more than it makes any sense for the database access layer to know why there is a DNS configuration error - the database access just needs to log the reasons (for humans to investigate), and tell the caller (the service layer) that it could not do the task it was asked to do.
If the service layer is told that the database access layer encountered a DNS problem, what is it going to do?
Nothing, the best it can do is log (tell the humans monitoring it) that a DB access call (to a specific DB service layer) failed, and try something else, which is a generic strategy, one that applies to a host of errors that the database call could return.
That’s how we get errors like ”file not found”, without a file name. A pain for mankind.
Not all problems cause exceptions.
That's a matter of good taste, but there's nothing preventing you from throwing exceptions on every issue and requiring consumers to handle them
Imagine you have a caching library that handles DB fallback. A cache that should be there but goes missing is arguably an issue.
Should if throw an exception for that to let you know, or should it gracefully fallback so your service stays alive ? The middle ground is leaving a log and chugging along, your proposition throws that out of the window.
You can log your IO and as long as your functions are idempotent that should be enough info to replicate.
Assuming everything is idempotent is a tall order.
There are a lot of libraries that haven non-idempotent actions. There are a lot of inputs that can be problematic to log, too.
Say like opening a file?
I guess in those cases standard practice is for lib to return a detailed error yeah.
As far as traces, trying to solve issues that depend on external systems is indeed a tall order for your code. Isn't it beyond the scope of the thing being programmed.
From my experience working on B2B applications, I am happy that everything is generally spammed to the logs because there would simply be no other reasonable way to diagnose many problems.
It is very, very common that the code that you have written isn't even the code that executes. It gets modified by enterprise anti virus or "endpoint security". All too often do I see "File.Open" calls return true that the caller has access, but actually what's happened is AV has intercepted the call, blocked it improperly, and returns 0 bytes file that exists (even though there is actually a larger file there) instead of saying the file cannot open.
I will never, in a million years, be granted access to attach a debugger to such a client computer. In fact, they will not even initially disclose that they are using anti virus. They will just say the machine is set up per company policy and that your software doesn't work, fix it. The assumption is always that your software is to blame and they give you nearly nothing, except for the logs.
The only way I ever get this solved in a reasonable amount of time is by looking at verbose logs, determining that the scenario they have described is impossible, explaining which series of log messages is not able to occur, yet occurred on their system, and ask them to investigate further. Usually this ends up being closed with a resolution like "Checked SuperProtectPro360 logs and found it was writing infernal error logs at the same time as using the software. Adjusted the monitoring settings and problem is now resolved."
I don’t really understand what you mean about opening files. Is this just an example of an idempotent action or is there some specific significance here?
Either way logging the input (file name) is notably not sufficient for debugging if the file can change between invocations. The action can be idempotent and still be affected by other changes in the system.
> trying to solve issues that depend on external systems is indeed a tall order for your code. Isn't it beyond the scope of the thing being programmed.
If my program is broken I need it fixed regardless of why it’s broken. The specific example here of a file changing is likely to manifest as flakiness that’s impossible to diagnose without detailed logs from within the library.
I was just trying to think of an example of a non idempotent function. As in it depends on an external IO device.
I will say that error handling and logging in general is one of my weakpoints, but I made a comment about my approach so far being dbg/pdb based, attaching a debugger and creating breakpoints and prints ad-hoc rather than writing them in code. I'm sure there's reasons why it isn't used as much and logging in code is so much more common, but I have faith that it's a path worth specializing in.
Back to the file reading example, for a non-idempotent function. Considering we are using an encapsulating approach we have to split ourselves into 3 roles. We can be the IO library writer, we can be the calling code writer, and we can be an admin responsible for the whole product. I think a common trap engineers fall for is trying to keep all of the "global" context (or as much as they can handle) at all times.
In this case of course we wouldn't be writing the non-idempotent library, so of course that's not a hat we wear, do not quite care about the innards of the function and its state, rather we have a well defined set of errors that are part of the interface of the function (EINVAL, EACCES, EEXIST).
In this sense we respect the encapsulation boundaries and are provided the information necessary by the library. If we ever need to dive into the actual library code, first the encapsulation is broken and we are dealing with a leaky abstraction, second we just dive into the library code, (or the filesystem admin logs themselves).
It's not precisely the type of responsibility that can be handled at design time and in code anyways, when we code we are wearing the calling-module programmer hat. We cannot think of everything that the sysadmin might need at the time of experiencing an error, we have to think that they will be sufficiently armed with enough tools to gather the information necessary with other tools. And thank god for that! checking /proc/fs and looking at crash dumps, and attaching processes with dbg will yield far better info than relying on whatever print statements you somehow added to your program.
Anyways at least that's my take on the specific example of glibc-like implementations of POSIX file operations like open(). I'm sure the implications may change for other non-idempotent functions, but at some point, talking about specifics is a bit more productive than talking in the abstract.
The issue with relying on gdb is that you generally cannot do this in production. You can’t practically attach a debugger to a production instance of a service for both performance and privacy reasons, and the same generally applies to desktop and mobile applications being run by your customers. Gdb is mostly for local debugging and the truth is that “printf debugging” is how it often works for production. (Plus exception traces, crash dumps, etc. But there is a lot of debugging based on logging.) Interactive debugging is so much more efficient for local development but capable preexisting logging is so much more efficient for debugging production issues.
I generally agree that I would not expect a core library to do a bunch of logging, at least not onto your application logs. This stuff generally is very stable with a clean interface and well defined error reporting.
But there’s a whole world of libraries that are not as clean, not as stable, and not as well defined. Most libraries in my experience are nowhere near as clean as standard IO libraries. They often do very complex stuff to simplify for the calling application and have weakly defined error behavior. The more complexity a library contains, the more it likely has this issue. Arguably that is leaky abstraction but it’s also the reality of a lot of software and I’m not even sure that’s a bad thing. A good library that leaks in unexpected conditions might be just fine for many real world purposes.
It's coming together more clearly now.
I guess my experience is more from the role of a startup or even in-house software. So we both develop and operate the software. But in scenarios where you ship the software and it's operated by someone else, it makes sense to have more auditable and restricted logging instead of all-too-powerful ad-hoc debugging.
I think an example where libraries could sensibly log error is if you have a condition which is recoverable but may cause a significant slowdown, including a potential DoS issue, and the application owner can remediate.
You don't want to throw because destroying someone's production isn't worth it. You don't want to silent continue in that state because realistically there's no way for application owner to understand what is happening and why.
We call those warnings, and it's very common to downgrade errors to warnings by wrapping an exception and printing the trace as you would an exception.
Logging warnings are cowardly, you just push the decision to the log consumer to decide if the error should be acted on.
Warnings are just errors that no one wants to deal with.
Warnings are for where you expect someplace else to know/log if it really is an error but it might also be normal. You might log why a file io operation failed: if the caller recovers somehow it isn't an errer, but if they can't they log an error and when investigating the warning gives the detail you need to figure it out.
Who proactively investigates warnings?
statistacs are someimes run and the most common investigated (normally shut up the noise)
mostly though when you are on a known problem warnings should be a useful filter to find where in the logs the problem might have started, then you use that timestamp to find info logs in the same area
Warning logs are usually polluted with stuff nobody wants to fix but try to wash their hands off with a log. Like deprecated calls or error logs that got demoted because it didn't matter in practice.
Anything that has a measurable impact on production should be logged above that, except if your system ignores log levels in the first place, but that's another can of worms.
In such scenarios it makes sense to give clients an opportunity to react on such conditions programmatically, so just logging is wrong choice and if there’s a call back to client, client can decide whether to log it and how.
It's a nice idea but I've literally never seen it done, so I would be interested if you have examples of major libraries that do this. Abstractly it doesn't really seem to work to me in place of simple logs.
One test case here is that your library has existed for a decade and was fast, but Java removed a method that let you make it fast, but you can still run slow without that API. Java the runtime has a flag that the end use can enable to turn it back on a for a stop gap. How do you expect this to work in your model, you expect to have an onUnnecessarilySlow() callback already set up that all of your users have hooked up which is never invoked for a decade, and then once it actually happens you start calling it and expect it to do something at all sane in those systems?
Second example is all of the scenarios where you're some transitively used library for many users, it makes and callback strategy immediately not work if the person who needs to know about the situation and could take action is the application owner rather than the people writing library code which called you. It would require every library to offer these same callbacks and transitively propagate things, which would only work if it was just such a firm idiomatic pattern in some language ecosystem and I don't believe that it is in any language ecosystem.
> library has existed for a decade
>but Java removed a method that let you make it fast, but you can still run slow without that API
I’d like to see an example of that, because this is extremely hypothetical scenario. I don’t think any library is so advanced to anticipate such scenarios and write something to log. And of course Java specifically has longer cycle of deprecation and removal. :)
As for your second example, let’s say library A is smart and can detect certain issues. Library B depending on it is at higher abstraction level, so it has enough business context to react on them. I don’t think it’s necessary to propagate the problem and leak implementation details in this scenario.
Protobuf is the example I had in mind. It uses sun.misc.Unsafe which is being removed in upcoming Java releases, but it has a slow fallback path. It logs a warning when it runs if it can tell it's only using the fallback path but the fast path is still available if the application owner set a flag to turn it back on if they want to:
https://github.com/protocolbuffers/protobuf/issues/20760
Java Protobuf also logs a warning now if you can tell you are using gencode old enough that it's covered by a DoS CVE. They actually did a release that broke compatability of the CVE covered gencode but restored it and print a warning in a newer release.
What’s stopping you from using the replacements provided in VarHandle and MemorySegment? Just wanting to support the 10 year old JDK 8?
There's a lot here, to be honest these things always come back to investment cost and ROI compared to everything else that could be worked on.
Java 8 is still really popular, probably the most popular single version. It's not just servers in context, but also Android where Java 8 is the highest safe target, it's not clear what decade we'll be in when VarHandle would be safe to use there at all.
VarHandle was Java 9 but MemorySegment was Java 17. And the rest of FFM is only in 25 which is fully bleeding edge.
Protobuf may realistically try to move off of sun.misc.unsafe without the performance regressions in a way that is without adopting MemorySegment to avoid the versioning problem, but it takes significant and careful engineering time.
That said it's always possible to have waterfall of preferred implementations based on what's supported, it's just always an implementation/verification costs.
I’ve written code that followed this model, but it almost always just maps to logging anyway, and the rest of the time it’s narrow options presented in the callback. e.g. Retry vs wait vs abort.
It’s very rarely realistic that a client would code up meaningful paths for every possible failure mode in a library. These callbacks are usually reserved for expected conditions.
> almost always just maps to logging anyway
Yes, that’s the point. You log it until you encounter it for the first time, then you know more and can do something meaningful. E.g. let’s say you build an API client and library offers callback for HTTP 429. You don’t expect it to happen, so just log the errors in a generic handler in client code, but then after some business logic change you hit 429 for the first time. If library offers you control over what is going to happen next, you may decide how exactly you will retry and what happens to your state in between the attempts. If library just logs and starts retry cycle, you may get a performance hit that will be harder to fix.
Defining a callback for every situation where a library might encounter an unexpected condition and pointing them all at the logs seems like a massive waste of time.
I would much prefer a library have sane defaults, reasonable logging, and a way for me to plug in callbacks where needed. Writing On429 and a hundred other functions that just point to Logger.Log is not a good use of time.
This sub-thread in my understanding is about a special case (a non-error mode that client may want to avoid, in which case explicit callback makes sense), not about all possible unexpected errors. I’m not suggesting hooks as the best approach. And of course “on429” is the last thing I would think about when designing this. There are better ways.
If the statement is just that sometimes it’s appropriate to have callbacks, absolutely. A library that only logs in places where it really needs a callback is poorly designed.
I still don’t want to have to provide a 429 callback just to log, though. The library should log by default if the callback isn’t registered.
It doesn’t have to provide a specific callback. This can be a starting point (Java):
And eventually you do this: Or something more interesting, e.g. with more details of retry strategy.This seems like such an obvious answer to the problem, your program isn't truly modularized if logging is global. If an error is unexpected it should bubble all the way up, but if it's expected and dealt with, the error message should be suppressed or its type changed to a warning.
I’ve worked on systems with “modularized” logging. It’s never been pleasant because investigations involve stitching together a bunch of different log sources to understand erase actually happened. A global log dump with attribution (module/component/file/line) is far easier to work with.
On paper, USDT probes are the best way for libraries (and binaries) to provide information for debugging because they can be used programmatically and have no performance overhead until they are measured but unfortunately they are not widely used.
Wonder if someone used effect handlers for error logging. Sounds like a natural and modular way of handling this problem.
It may be unwise to log errors at low layers but logging informational and debug messages are useful (at least, when the caller enables them).
Conflicting goals for the predominant libraries is what causes this. Log4J2 has a rewrite appender that solves the problem. But if you want zero-copy etc I don’t think there’s such a solution.
You need a tuple: (context, level)
The application owner should be able to adjust the contexts up or down. This is the point of ownership and where responsibility over which logs matter is handled.
A library author might have ideas and provide useful suggestions, but it's ultimately the application owner who decides. Some libraries have huge blast radius and their `error` might be your `error` too. In other contexts, it could just be a warning. Library authors should make a reasonable guess about who their customer is and try to provide semantic, granular, and controllable failure behavior.
As an example, Rust's logging ecosystem provides nice facilities for fine-grained tamping down of errors by crate (library) or module name. Other languages and logging libraries let you do this as well.
That capability just isn't adopted everywhere.
Python's built-in logging is the same if used correctly, where the library gets a logger based on its module name (this part isn't enforced) and the application can add a handler to that logger to route the logs differently if needed.
Libraries should not log on levels above DEBUG, period. If there’s something worthy for reporting on higher levels, pass this information to client code, either as an event, or as an exception or error code.
From a code modularization point of view, there shouldn’t really be much of a difference between programs and libraries. A program is just a library with a different calling convention. I like to structure programs such that their actual functionality could be reused as a library in another program.
This is difficult to reconcile with libraries only logging on a debug level.
I see your point, but disagree on a practical level. Libraries are being used while you’re in “developer” mode, while programs are used in “user” mode (trying awkwardly to differentiate between _being_ a developer and currently developing code around that library.
Usually a program is being used by the user to accomplish something, and if logging is meaningful than either in a cli context or a server context. In both cases, errors are more often being seen by people/users than by code. Therefore printing them to logs make sense.
While a lib is being used by a program. So it has a better way to communicate problems with the caller (and exceptions, error values, choose the poison of your language). But I almost never want a library to start logging shit because it’s almost guaranteed to not follow the same conventions as I do in my program elsewhere. Return me the error and let me handle.
It’s analogous to how Go has an implicit rule of that a library should never let a panic occur outside the library. Internally, fine. But at the package boundary, you should catch panics and return them as an error. You don’t know if the caller wants the app to die because it an error in your lib!
The main difference is that library is not aware of the context of the execution of the code, so cannot decide, whether the problem is expected, recoverable or severe.
And the program doesn’t know if the user is expecting failure, either. The library case is not actually much different.
It’s very reasonable that a logging framework should allow higher levels to adjust how logging at lower levels is recorded. But saying that libraries should only log debug is not. It’s very legitimate for a library to log “this looks like a problem to me”.
The same is true for programs that are being invoked. The program only knows relative to its own purpose, and the same is again true for libraries. I don’t see the difference, other than, as already mentioned, the mechanism of program vs. library invocation.
Consider a Smalltalk-like system, or something like TCL, that doesn’t distinguish between programs and libraries regarding invocation mechanism. How would you handle logging in that case?
Okay, but…most programs are written in Python or Rust or something, where invoking library functions is a lot safer, more ergonomic, more performant, and more common than spawning a subprocess and executing a program in it. Like you can’t really ignore the human expectations and conventions that are brought to bear when your code is run (the accommodation of which is arguably most of the purpose of programming languages).
When you publish a library, people are going to use it more liberally and in a wider range of contexts (which are therefore harder to predict, including whether a given violation requires human intervention)
The purpose of a program and of a library is different and intent of the authors of the code is usually clear enough to make the distinction in context. Small composable programs aren’t interesting case here, they shouldn’t be verbose anyway even to justify multiple logging levels (it’s probably just set to on/off using a command line argument).
The mechanism of invocation is important. Most programs allow you to set the logging verbosity at invocation. Libraries may provide an interface to do so but their entry points tend to be more numerous.
I have a logging level I call "log lots" where it will log the first time with probability 1, but as it hits more often the same line, it will log with lower and lower probability bottoming out around 1/20000 times. Sort of a "log with probability proportional to the unlikiness of the event". So if I get e.g. sporadic failures to some back end, I will see them all, but if it goes down hard I will see it is still down but also be able to read other log msgs.
Why? Whats wrong with logging it and passing the log object to the caller? The caller can still modify the log entry however it pleases?
Practicality. It is excessive for client code to calibrate library logging level. It’s ok to do it in logging configuration, but having an entry for every library there is also excessive. It is reasonable to expect that dev/staging may have base level at DEBUG and production will have base level at INFO, so that a library following the convention will not require extra effort to prevent log spam in production. Yes, we have entire logging industry around aggregation of terabytes of logs, with associated costs, but do you really need that? In other words, are we developers too lazy to adapt the sane logging policy, which actually requires minimum effort, and will just burn the company money for nothing?
TLDR: I agree.
A library might also be used in multiple place, maybe deeply in a dependency stack, so the execution context (high level stack) matters more than which library got a failure.
So handling failures should stay in the hands of the developer calling the library and this should be a major constraint for API design.
Eh, as with anything there are always exceptions. I generally agree with WARN and ERROR, though I can imagine a few situations where it might be appropriate for a library to log at those levels. Especially for a warning, like a library might emit "WARN Foo not available; falling back to Bar" on initialization, or something like that. And I think a library is fine logging at INFO (and DEBUG) as much as it wants.
Ultimately, though, it's important to be using a featureful logging framework (all the better if there's a "standard" one for your language or framework), so the end user can enable/disable different levels for different modules (including for your library).
> Should only “top-level” code ever log an error? That can make it difficult to identify the low-level root causes of a top-level failure.
Some languages (e.g. Java) include a stack trace when reporting an error, which is extremely useful when logging the error. It shows at exactly which point in the code the error was generated, and what the full call stack was to get there.
It's a real shame that "modern" languages or "low level" languages (e.g. Go, Rust) don't include this out of the box, it makes troubleshooting errors in production much more difficult, for exactly the reason you mention.
C++ with Boost has let you grab a stacktrace anywhere in the application for years. But in April 2024 Boost 1.85 added a big new feature: stacktrace from arbitrary exception ( https://www.boost.org/releases/1.85.0/ ), which shows the call stack at the time of the throw. We added it to our codebase, and suddenly errors where exceptions were thrown became orders of magnitude easier to debug.
C++23 added std::tracktrace, but until it includes stacktrace from arbitrary exception, we're sticking with Boost.
The idiomatic practice in Go for libraries is to wrap returned errors and errors can be unwrapped with stdlib tooling. This is more useful to handle errors at runtime than digging into a stack trace.
The point in the code is not the same information as knowing the time, or knowing the order with respect to operations performed during stack unwinding. Stacktraces are very useful, but they don’t replace lower-level logging.
Log4j has the ability to filter log levels by subject matter for twenty years. In Java you end up having to use that a lot for this reason.
Logging in rust also does that, you can set logging levels for individual modules deep within your dependency tree.
Oh that library that gives you a write() wrapper in exchange for RCE vulns
Log4j is basically a design pattern. If you don’t like the library, Slf4j/logback are based on the same principles.
Libraries should not log, instead they should allow registering hooks which get called with errors and debug info.
I think this is useful for libraries in a language like C, where there is no standardized logging framework, so there's no way for the application to control what the library logs. But in a language (Java, Rust, etc.) where there are standard, widely-used logging frameworks that give people fine-grained control over what gets logged, libraries should just use those frameworks.
(Even in C, though... errors should be surfaced as return values from functions causing the error, not just logged somewhere. Debug info, sure, have a registerable callback for that.)
Log4J style logging is effectively a hook system. But it is too easy to badly use it with too high level and delegate level fixing to the end user.
They can log if platform permits, i.e. when you can set TRACE and DEBUG to no-op, but of course it should be done reasonably. Having hooks is often an overkill compared to this.
It doesn't seem to work this way in practice, not least because most libraries will be transitive deps of the application owner.
I think creating the hooks is very close to just not doing anything here, if no one is going to use the hooks anyway then you might as well not have them.
Libraries should log in a way that is convenient to the developer rather than a way that is ideologically consistent. Oftentimes, that means logging as we know it.
I've been thinking about this all day. I think the best approach is probably twofold:
1) Thrown errors should track the original error to retain its context. In JavaScript errors have a `cause` option which is perfect for this. You can use the `cause` to hold a deep stack trace even if the error has been handled and wrapped in a different error type that may have a different semantics in the application.
2) For logging that does not stop program execution, I think this is a great case for dependency injection. If a library allows its consumer to provide a logger, the application has complete control over how and when the library logs, and can even change it at runtime. If you have a disagreement with a library, for example it logs errors that you want to treat as warnings, your injected logger can handle that.