I guess in those cases standard practice is for lib to return a detailed error yeah.
As far as traces, trying to solve issues that depend on external systems is indeed a tall order for your code. Isn't it beyond the scope of the thing being programmed.
From my experience working on B2B applications, I am happy that everything is generally spammed to the logs because there would simply be no other reasonable way to diagnose many problems.
It is very, very common that the code that you have written isn't even the code that executes. It gets modified by enterprise anti virus or "endpoint security". All too often do I see "File.Open" calls return true that the caller has access, but actually what's happened is AV has intercepted the call, blocked it improperly, and returns 0 bytes file that exists (even though there is actually a larger file there) instead of saying the file cannot open.
I will never, in a million years, be granted access to attach a debugger to such a client computer. In fact, they will not even initially disclose that they are using anti virus. They will just say the machine is set up per company policy and that your software doesn't work, fix it. The assumption is always that your software is to blame and they give you nearly nothing, except for the logs.
The only way I ever get this solved in a reasonable amount of time is by looking at verbose logs, determining that the scenario they have described is impossible, explaining which series of log messages is not able to occur, yet occurred on their system, and ask them to investigate further. Usually this ends up being closed with a resolution like "Checked SuperProtectPro360 logs and found it was writing infernal error logs at the same time as using the software. Adjusted the monitoring settings and problem is now resolved."
I don’t really understand what you mean about opening files. Is this just an example of an idempotent action or is there some specific significance here?
Either way logging the input (file name) is notably not sufficient for debugging if the file can change between invocations. The action can be idempotent and still be affected by other changes in the system.
> trying to solve issues that depend on external systems is indeed a tall order for your code. Isn't it beyond the scope of the thing being programmed.
If my program is broken I need it fixed regardless of why it’s broken. The specific example here of a file changing is likely to manifest as flakiness that’s impossible to diagnose without detailed logs from within the library.
I was just trying to think of an example of a non idempotent function. As in it depends on an external IO device.
I will say that error handling and logging in general is one of my weakpoints, but I made a comment about my approach so far being dbg/pdb based, attaching a debugger and creating breakpoints and prints ad-hoc rather than writing them in code. I'm sure there's reasons why it isn't used as much and logging in code is so much more common, but I have faith that it's a path worth specializing in.
Back to the file reading example, for a non-idempotent function. Considering we are using an encapsulating approach we have to split ourselves into 3 roles. We can be the IO library writer, we can be the calling code writer, and we can be an admin responsible for the whole product. I think a common trap engineers fall for is trying to keep all of the "global" context (or as much as they can handle) at all times.
In this case of course we wouldn't be writing the non-idempotent library, so of course that's not a hat we wear, do not quite care about the innards of the function and its state, rather we have a well defined set of errors that are part of the interface of the function (EINVAL, EACCES, EEXIST).
In this sense we respect the encapsulation boundaries and are provided the information necessary by the library. If we ever need to dive into the actual library code, first the encapsulation is broken and we are dealing with a leaky abstraction, second we just dive into the library code, (or the filesystem admin logs themselves).
It's not precisely the type of responsibility that can be handled at design time and in code anyways, when we code we are wearing the calling-module programmer hat. We cannot think of everything that the sysadmin might need at the time of experiencing an error, we have to think that they will be sufficiently armed with enough tools to gather the information necessary with other tools. And thank god for that! checking /proc/fs and looking at crash dumps, and attaching processes with dbg will yield far better info than relying on whatever print statements you somehow added to your program.
Anyways at least that's my take on the specific example of glibc-like implementations of POSIX file operations like open(). I'm sure the implications may change for other non-idempotent functions, but at some point, talking about specifics is a bit more productive than talking in the abstract.
The issue with relying on gdb is that you generally cannot do this in production. You can’t practically attach a debugger to a production instance of a service for both performance and privacy reasons, and the same generally applies to desktop and mobile applications being run by your customers. Gdb is mostly for local debugging and the truth is that “printf debugging” is how it often works for production. (Plus exception traces, crash dumps, etc. But there is a lot of debugging based on logging.) Interactive debugging is so much more efficient for local development but capable preexisting logging is so much more efficient for debugging production issues.
I generally agree that I would not expect a core library to do a bunch of logging, at least not onto your application logs. This stuff generally is very stable with a clean interface and well defined error reporting.
But there’s a whole world of libraries that are not as clean, not as stable, and not as well defined. Most libraries in my experience are nowhere near as clean as standard IO
libraries. They often do very complex stuff to simplify for the calling application and have weakly defined error behavior. The more complexity a library contains, the more it likely has this issue. Arguably that is leaky abstraction but it’s also the reality of a lot of software and I’m not even sure that’s a bad thing. A good library that leaks in unexpected conditions might be just fine for many real world purposes.
I guess my experience is more from the role of a startup or even in-house software. So we both develop and operate the software. But in scenarios where you ship the software and it's operated by someone else, it makes sense to have more auditable and restricted logging instead of all-too-powerful ad-hoc debugging.
Assuming everything is idempotent is a tall order.
There are a lot of libraries that haven non-idempotent actions. There are a lot of inputs that can be problematic to log, too.
Say like opening a file?
I guess in those cases standard practice is for lib to return a detailed error yeah.
As far as traces, trying to solve issues that depend on external systems is indeed a tall order for your code. Isn't it beyond the scope of the thing being programmed.
From my experience working on B2B applications, I am happy that everything is generally spammed to the logs because there would simply be no other reasonable way to diagnose many problems.
It is very, very common that the code that you have written isn't even the code that executes. It gets modified by enterprise anti virus or "endpoint security". All too often do I see "File.Open" calls return true that the caller has access, but actually what's happened is AV has intercepted the call, blocked it improperly, and returns 0 bytes file that exists (even though there is actually a larger file there) instead of saying the file cannot open.
I will never, in a million years, be granted access to attach a debugger to such a client computer. In fact, they will not even initially disclose that they are using anti virus. They will just say the machine is set up per company policy and that your software doesn't work, fix it. The assumption is always that your software is to blame and they give you nearly nothing, except for the logs.
The only way I ever get this solved in a reasonable amount of time is by looking at verbose logs, determining that the scenario they have described is impossible, explaining which series of log messages is not able to occur, yet occurred on their system, and ask them to investigate further. Usually this ends up being closed with a resolution like "Checked SuperProtectPro360 logs and found it was writing infernal error logs at the same time as using the software. Adjusted the monitoring settings and problem is now resolved."
I don’t really understand what you mean about opening files. Is this just an example of an idempotent action or is there some specific significance here?
Either way logging the input (file name) is notably not sufficient for debugging if the file can change between invocations. The action can be idempotent and still be affected by other changes in the system.
> trying to solve issues that depend on external systems is indeed a tall order for your code. Isn't it beyond the scope of the thing being programmed.
If my program is broken I need it fixed regardless of why it’s broken. The specific example here of a file changing is likely to manifest as flakiness that’s impossible to diagnose without detailed logs from within the library.
I was just trying to think of an example of a non idempotent function. As in it depends on an external IO device.
I will say that error handling and logging in general is one of my weakpoints, but I made a comment about my approach so far being dbg/pdb based, attaching a debugger and creating breakpoints and prints ad-hoc rather than writing them in code. I'm sure there's reasons why it isn't used as much and logging in code is so much more common, but I have faith that it's a path worth specializing in.
Back to the file reading example, for a non-idempotent function. Considering we are using an encapsulating approach we have to split ourselves into 3 roles. We can be the IO library writer, we can be the calling code writer, and we can be an admin responsible for the whole product. I think a common trap engineers fall for is trying to keep all of the "global" context (or as much as they can handle) at all times.
In this case of course we wouldn't be writing the non-idempotent library, so of course that's not a hat we wear, do not quite care about the innards of the function and its state, rather we have a well defined set of errors that are part of the interface of the function (EINVAL, EACCES, EEXIST).
In this sense we respect the encapsulation boundaries and are provided the information necessary by the library. If we ever need to dive into the actual library code, first the encapsulation is broken and we are dealing with a leaky abstraction, second we just dive into the library code, (or the filesystem admin logs themselves).
It's not precisely the type of responsibility that can be handled at design time and in code anyways, when we code we are wearing the calling-module programmer hat. We cannot think of everything that the sysadmin might need at the time of experiencing an error, we have to think that they will be sufficiently armed with enough tools to gather the information necessary with other tools. And thank god for that! checking /proc/fs and looking at crash dumps, and attaching processes with dbg will yield far better info than relying on whatever print statements you somehow added to your program.
Anyways at least that's my take on the specific example of glibc-like implementations of POSIX file operations like open(). I'm sure the implications may change for other non-idempotent functions, but at some point, talking about specifics is a bit more productive than talking in the abstract.
The issue with relying on gdb is that you generally cannot do this in production. You can’t practically attach a debugger to a production instance of a service for both performance and privacy reasons, and the same generally applies to desktop and mobile applications being run by your customers. Gdb is mostly for local debugging and the truth is that “printf debugging” is how it often works for production. (Plus exception traces, crash dumps, etc. But there is a lot of debugging based on logging.) Interactive debugging is so much more efficient for local development but capable preexisting logging is so much more efficient for debugging production issues.
I generally agree that I would not expect a core library to do a bunch of logging, at least not onto your application logs. This stuff generally is very stable with a clean interface and well defined error reporting.
But there’s a whole world of libraries that are not as clean, not as stable, and not as well defined. Most libraries in my experience are nowhere near as clean as standard IO libraries. They often do very complex stuff to simplify for the calling application and have weakly defined error behavior. The more complexity a library contains, the more it likely has this issue. Arguably that is leaky abstraction but it’s also the reality of a lot of software and I’m not even sure that’s a bad thing. A good library that leaks in unexpected conditions might be just fine for many real world purposes.
It's coming together more clearly now.
I guess my experience is more from the role of a startup or even in-house software. So we both develop and operate the software. But in scenarios where you ship the software and it's operated by someone else, it makes sense to have more auditable and restricted logging instead of all-too-powerful ad-hoc debugging.