> Effect survives controlling for activity level.
How did you control for activity level? Do you have similar BPM plots for the different situations (sauna+exercise, sauna+no exercise, no sauna + exercise, no sauna + no exercise) for a visual representation?
> minimum nighttime HR drops ~3 bpm (~5%)
What wearables were used? These devices don't usually have enough precision to reliably detect ~3bpm changes. Also, the measurements are sensitive to skin, blood flow changes and temperature. How do you know the difference doesn't come from different sensor behavior after sauna?
> What wearables were used? These devices don't usually have enough precision to reliably detect ~3bpm changes.
For large sample averages this doesn't really matter.
It does, specially if the error bars from multiple measurements show higher precision than what would be expected.
I don't understand what you mean by that.
Precision (inverse of variance) of estimate of mean increases directly proportional to number of samples (given some assumptions that very likely hold here). If you have measurement standard deviation of say 10 bpm, with 100 measurements you have mean estimate standard deviation of 10/sqrt(100) = 1 bpm.
> of estimate of mean
But you can't really assume that the estimate of the mean represents the real value. For example, if the sensor is equally likely to show 80 or 81 BPM when the real heartrate is 80.7, the mean estimator will be biased.
> with 100 measurements
Also, wearables aren't taking 100 measurements of the BPM at a given point in time. I think the highest frequency they usually have is 1 second measurement interval. So they don't really have a lot of measurements for each point in time.
> mean estimate standard deviation
That's the standard deviation of the mean of the values. Doesn't imply that the standard deviation of the values themselves will go to zero.
> I don't understand what you mean by that.
That as a rule of thumb, you should not assume that repeating measurements will give you more precision than what the tool can offer. E.g., trying to measure down to milimeters with a ruler that has only 1cm marks will not really work well.
> But you can't really assume that the estimate of the mean represents the real value. For example, if the sensor is equally likely to show 80 or 81 BPM when the real heartrate is 80.7, the mean estimator will be biased.
Bias is different from precision. If both conditions have the same bias, their difference is still unbiased.
> Also, wearables aren't taking 100 measurements of the BPM at a given point in time. I think the highest frequency they usually have is 1 second measurement interval. So they don't really have a lot of measurements for each point in time.
I did not mean taking multiple measurements in succession. Those are likely to have correlated noise, meaning the assumptions do not hold. But between participants measurement noise is very unlikely to be correlated.
> That as a rule of thumb, you should not assume that repeating measurements will give you more precision than what the tool can offer. E.g., trying to measure down to milimeters with a ruler that has only 1cm marks will not really work well.
If you quantize so much that you have no variance in the measurements, then sure. But watches typically have 1 bpm quantization, which is fine at the scale of variation in HR.
If you have independent error in measurements and quantization that gives you variance in measurement, you very much can assume repeating measurements will give you more precision than the tool can offer. This is how e.g. particle physics (and many many other fields of science) is done.