Bug 407952 - Race condition when capturing guide frames, with possible fix
Summary: Race condition when capturing guide frames, with possible fix
Status: RESOLVED FIXED
Alias: None
Product: kstars
Classification: Applications
Component: general (show other bugs)
Version: git
Platform: Microsoft Windows Microsoft Windows
: NOR normal
Target Milestone: ---
Assignee: Jasem Mutlaq
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2019-05-26 01:02 UTC by Kevin Ross
Modified: 2019-05-27 23:33 UTC (History)
0 users

See Also:
Latest Commit:
Version Fixed In:
Sentry Crash Report:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Kevin Ross 2019-05-26 01:02:37 UTC
When capturing guide frames, I would occasionally get the error in the INDI panel that the requested value (0) was out of range. Also occasionally, I would get a really, really short exposure, which would result in a guide frame with just noise.

Digging in, and putting a lot of debug logging in place, I think the problem comes from when the INDI server sends back status messages about the current exposure remaining. The problem seems to be made worse because the server and client are communicating over WiFi, which slows down the responses from the server to the client.

So, I believe what is happening:

    Guide module requests the CCD to capture a new frame.
    CCDChip::capture gets the exposure INumberVectorProperty from the device, and updates the value to the new exposure value.
    (I think this is where things go wrong) The client receives a notification from the server that the exposure value has changed, overwriting the INumberVectorProperty above
    CCDChip::capture sends the wrong value to sendNewNumber()

I verified that # 3 is happening, by modifying CCDCHip::capture so that it prints the value of the INumberVectorProperty right after it is set, and loops a few times with a small delay, and prints the values again. During this time, the value will sometimes change all on its own to 0, leading me to believe that # 3 above is the culprit.

My fix is to modify CCDChip::capture to clone the INumberVectorProperty into a local variable and make changes there. Since I don't know the proper way to clone an INumberVectorProperty, my method is probably not ideal.

But here's my version of CCDChip::capture:

bool CCDChip::capture(double exposure)
{
    INumberVectorProperty *expProp = nullptr;

    switch (type)
    {
        case PRIMARY_CCD:
            expProp = baseDevice->getNumber("CCD_EXPOSURE");
            break;

        case GUIDE_CCD:
            expProp = baseDevice->getNumber("GUIDER_EXPOSURE");
            break;
    }

    if (expProp == nullptr)
        return false;

    // clone the INumberVectorProperty, to avoid modifications to the same
    // property from two threads
    INumber n;
    strcpy(n.name, expProp->np[0].name);
    n.value = exposure;

    auto newExpProp = std::make_unique<INumberVectorProperty>();
    strcpy(newExpProp->device, expProp->device);
    strcpy(newExpProp->name, expProp->name);
    newExpProp->np = &n;
    newExpProp->nnp = 1;
	
    clientManager->sendNewNumber(newExpProp.get());

    return true;
}
Comment 1 Jasem Mutlaq 2019-05-26 05:26:40 UTC
Git commit 06adc813c4eaa97b0d5691e32ac257fa90f0497d by Jasem Mutlaq.
Committed on 26/05/2019 at 05:20.
Pushed by mutlaqja into branch 'master'.

Ok, there is no make_unique in C++11 so I just used the standard unique_ptr.

Please test this. I think this is just a workaround. We need a way to synchronize access to these variables in multi-threaded fashion
perhaps using std::unique_lock<> or something similar.

M  +38   -2    kstars/indi/indiccd.cpp

https://commits.kde.org/kstars/06adc813c4eaa97b0d5691e32ac257fa90f0497d
Comment 2 Kevin Ross 2019-05-27 23:33:21 UTC
I'm sure it is just a workaround, but I ran it all last night without issue. So, it works for me. :)