Bug 407952

Summary: Race condition when capturing guide frames, with possible fix
Product: [Applications] kstars Reporter: Kevin Ross <kevin>
Component: generalAssignee: Jasem Mutlaq <mutlaqja>
Status: RESOLVED FIXED    
Severity: normal    
Priority: NOR    
Version: git   
Target Milestone: ---   
Platform: Microsoft Windows   
OS: Microsoft Windows   
Latest Commit: Version Fixed In:

Description Kevin Ross 2019-05-26 01:02:37 UTC
When capturing guide frames, I would occasionally get the error in the INDI panel that the requested value (0) was out of range. Also occasionally, I would get a really, really short exposure, which would result in a guide frame with just noise.

Digging in, and putting a lot of debug logging in place, I think the problem comes from when the INDI server sends back status messages about the current exposure remaining. The problem seems to be made worse because the server and client are communicating over WiFi, which slows down the responses from the server to the client.

So, I believe what is happening:

    Guide module requests the CCD to capture a new frame.
    CCDChip::capture gets the exposure INumberVectorProperty from the device, and updates the value to the new exposure value.
    (I think this is where things go wrong) The client receives a notification from the server that the exposure value has changed, overwriting the INumberVectorProperty above
    CCDChip::capture sends the wrong value to sendNewNumber()

I verified that # 3 is happening, by modifying CCDCHip::capture so that it prints the value of the INumberVectorProperty right after it is set, and loops a few times with a small delay, and prints the values again. During this time, the value will sometimes change all on its own to 0, leading me to believe that # 3 above is the culprit.

My fix is to modify CCDChip::capture to clone the INumberVectorProperty into a local variable and make changes there. Since I don't know the proper way to clone an INumberVectorProperty, my method is probably not ideal.

But here's my version of CCDChip::capture:

bool CCDChip::capture(double exposure)
{
    INumberVectorProperty *expProp = nullptr;

    switch (type)
    {
        case PRIMARY_CCD:
            expProp = baseDevice->getNumber("CCD_EXPOSURE");
            break;

        case GUIDE_CCD:
            expProp = baseDevice->getNumber("GUIDER_EXPOSURE");
            break;
    }

    if (expProp == nullptr)
        return false;

    // clone the INumberVectorProperty, to avoid modifications to the same
    // property from two threads
    INumber n;
    strcpy(n.name, expProp->np[0].name);
    n.value = exposure;

    auto newExpProp = std::make_unique<INumberVectorProperty>();
    strcpy(newExpProp->device, expProp->device);
    strcpy(newExpProp->name, expProp->name);
    newExpProp->np = &n;
    newExpProp->nnp = 1;
	
    clientManager->sendNewNumber(newExpProp.get());

    return true;
}
Comment 1 Jasem Mutlaq 2019-05-26 05:26:40 UTC
Git commit 06adc813c4eaa97b0d5691e32ac257fa90f0497d by Jasem Mutlaq.
Committed on 26/05/2019 at 05:20.
Pushed by mutlaqja into branch 'master'.

Ok, there is no make_unique in C++11 so I just used the standard unique_ptr.

Please test this. I think this is just a workaround. We need a way to synchronize access to these variables in multi-threaded fashion
perhaps using std::unique_lock<> or something similar.

M  +38   -2    kstars/indi/indiccd.cpp

https://commits.kde.org/kstars/06adc813c4eaa97b0d5691e32ac257fa90f0497d
Comment 2 Kevin Ross 2019-05-27 23:33:21 UTC
I'm sure it is just a workaround, but I ran it all last night without issue. So, it works for me. :)