Skip to content

Conversation

carterturn
Copy link
Contributor

No description provided.

@carterturn
Copy link
Contributor Author

A result from testing in the lab: we should either remove timing mode or implement it fully. At the moment, the labscript device just sends an instruction table without timings (regardless of whether timing mode is 1 or 0), which fails. I think I am leaning towards removing it, but I could go either way.

On a related note, we should check that the number of bytes we want to send matches the bytes the Pico is ready for.

…n binary mode.

At the moment, if it fails it will simply send all zeros and throw an error.
I think this is the best option, as it prevents the device from getting locked up waiting for bytes (if we restart the blacs tab).
@carterturn
Copy link
Contributor Author

I added the "ready for ... bytes" check and an option to use an external reference clock for the labscript device. These should be tested before merging.

@Json-To-String
Copy link

Talked a tad with David and we settled on not supporting internal timing at all since the prawnblaster clockline is much better for less work. I'll remove those and try to add any remaining docs

Copy link
Contributor

@dihm dihm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mostly minor tweaks, with two more significant things

  1. I am now noticing that we basically have a firmware limitation that there must be at least one dynamic output to use this device. I get why and it isn't totally unreasonable, I do think it could be worth allowing for all the outputs to be StaticDDS.
  2. The BLACS worker transition functions feel a bit rough. They don't quite respect the conventional boundaries (which are admittedly poorly documented) and I think transition_to_buffered has a number of edge case issues that need to be tested and fixed. I would appreciate testing confirmation that everything works as expected if you do the following things:
  • Program no outputs
  • Only use StaticDDS
  • Only Use DDS
  • That the BLACS tab updates to the final values of all outputs correctly
  • That aborts leave the device in a functional state

@carterturn
Copy link
Contributor Author

I found a rather insidious bug: if two instructions are spaced by less than the reprogramming time (10us for 4 channels), the trigger for the second instruction arrives before the DDS Sweeper has finished reprogramming the AD9959. I think the result of this is a partial update of the AD9959 parameters.

For an example of this, we were trying to use the DDS Sweeper to pulse on an AOM for 4us, 8us, and 12us. In the first two cases, the second instruction doesn't run, and the AOM never turns off.

Copy link
Contributor

@dihm dihm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like how y'all solved these outstanding issues. I have a couple more minor things to address.

Copy link
Contributor

@dihm dihm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Hopefully) one last patch that I just took a stab at instead of pushing it off on you both.

dihm
dihm previously approved these changes May 29, 2025
Copy link
Contributor

@dihm dihm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe I have run out of things to say on this one. Going to approve. Will wait to merge until extended testing in lab is reported.

@carterturn
Copy link
Contributor Author

Mostly working so far, but I did manage to get it to crash while running an unusual sequence earlier today. I want to investigate that a bit further. It seems to have mostly affected one on a remote BLACS, so it might have been a network issue.

@carterturn
Copy link
Contributor Author

Mostly working so far, but I did manage to get it to crash while running an unusual sequence earlier today. I want to investigate that a bit further. It seems to have mostly affected one on a remote BLACS, so it might have been a network issue.

I was unable to recreate the issue, so it may have been an issue with our lab's network. I am satisfied.

@carterturn
Copy link
Contributor Author

Found and fixed a new issue with program_manual.

@dihm
Copy link
Contributor

dihm commented Jun 10, 2025

Good catch @carterturn ! There's always some lingering things that continue to validate my policy of very slow PR merges.

@carterturn
Copy link
Contributor Author

Perhaps we should keep delaying this one. There was a bug introduced in connection table compilation that lead to the order of devices in the h5 file being dependent on the order of devices in the connection table.
I think the best way to handle this is to ensure that the labscript_device always orders the devices the way the blacs_worker expects, though there is also an argument for having the blacs_worker check the device order.

@carterturn
Copy link
Contributor Author

For a while I was getting some issues with missed triggers, but I am now fairly certain this was due to a wiring problem in my lab as I adjusted some wiring and have not seen any for a few weeks.
I am now satisfied this does not have integration bugs.

@Json-To-String
Copy link

Excellent! I was actually in the process of rewiring a sweeper back into my test setup to double check. We recently found we may have a need for the ramps so after this PR gets merged I can try to tackle that too.

@carterturn
Copy link
Contributor Author

I found at least one issue, which I think was leading to the spurious triggering problems (though I am not exactly sure how). In the blacs_worker, the cache is extended to match the DDS data length at the beginning of smart reprogramming. The length of the resized cache is then compared to the length of the DDS data to determine if new rows need to be added. This comparison always returns that they match because the cache was already resized.

@dihm
Copy link
Contributor

dihm commented Sep 21, 2025

@carterturn Thank you for tracking this down. We just got one of these set up in our lab and promptly ran into missed trigger warnings. We assumed it was because of a 25ft BNC between a pulseblaster and the DDS Sweeper trigger input (and are working on a breakout board with a buffer on that input to test). But I suspect what you are describing is more likely. We can help test out more edge cases in the smart-caching. I suspect there are many dangers lurking there, and reliably tracking them down in an active experiment is painful.

@carterturn
Copy link
Contributor Author

There was an issue I was able to reliably produce that went away after making that change.
To produce that issue, what I did was

  1. Program a sequence with N DDS updates
  2. Program a sequence with M update where M<N
  3. Program a sequence with P updates where M<P<N and the M+1 entry is the same as for the sequence with N updates.

Now it seems like there is a second (probably similar) issue that is still happening, but I have yet to reliably produce it.

@Json-To-String
Copy link

Okay I think I've been able to set up a test using the same N, M, P steps, but testing before the last commit I'm struggling to reproduce the same bug.

I kept the tables relatively small:

# ... labscript setup stuff above ...

# Table N
amplitudes = [1.0, 0.9, 0.8, 0.7, 0.6]

# Table M, M < N
amplitudes = [0.1, 0.2, 0.3]

# Table P, M < P < N, P[M+1] instruction is the same as N[M+1]
amplitudes = [0.5, 0.4, 0.5, 0.7]


t = 0
t_step = 2000 / (125e6)

start()

# each gets a common t = 0 entry
chann0.setfreq(value = 125*MHZ, t=t)
chann0.setamp(value = 0.75, t=t)

for amp in amplitudes:
    t += t_step
    chann0.setamp(value = amp, t=t)

stop(t)

In a minimal test setup (notebook, grabbing the AD9959DDSSweeperInterface and AD9959DDSSweeperWorker classes, and forcing a 90% refresh threshold to put us in the seti line-by-line reprogramming)

I'm wondering if I've missed something (potentially something subtle with the t=0 entry), but a labmate shared a pair of shotfiles that do in fact give an error that I suspect is missed triggers, since the sweeper returns:

labscript.utils.LabscriptError: Buffered operation did not end with status RUNNING. Is triggering working?

His first (working) file has 106 dds_data entries, and the second file has 107, which breaks.

@Json-To-String
Copy link

Json-To-String commented Sep 24, 2025

I think we've found it, this patch should fix a really nasty and evasive bug that we think arises from np.empty creating an array with arbitrary values, which in our testing even grabbed values from previous tables (since numpy might be accessing uncleared memory for performance). This means that the smart cache would be extended with old data and, in line-by-line programming, reprograms would be missed. Here's the patch so you can try it in parallel:

diff --git a/labscript_devices/AD9959DDSSweeper/blacs_workers.py b/labscript_devices/AD9959DDSSweeper/blacs_workers.py
index 2cc373a..bfdcbe0 100644
--- a/labscript_devices/AD9959DDSSweeper/blacs_workers.py
+++ b/labscript_devices/AD9959DDSSweeper/blacs_workers.py
@@ -414,18 +414,21 @@ class AD9959DDSSweeperWorker(Worker):
             else:
                 self.intf.set_channels(len(dyn_chans))
                 self.logger.debug('Comparing changed instructions')
-                cache = self.smart_cache['dds_data']
+                cache = self.smart_cache['dds_data'].copy() # avoid overwriting smart_cache entries too early
                 n_cache = len(cache)
 
+                # Boolean mask of each rows
+                changed_mask = np.zeros(len(dds_data), dtype=bool)
+                
                 # Extend cache if necessary
                 if len(dds_data) > n_cache:
                     new_cache = np.empty(len(dds_data), dtype=dds_data.dtype)
                     new_cache[:n_cache] = cache
+                    changed_mask[n_cache:] = True # ensure new rows are programmed
                     self.smart_cache['dds_data'] = new_cache
                     cache = new_cache
 
-                # Boolean mask of each rows
-                changed_mask = np.zeros(len(dds_data), dtype=bool)
+                # check line-by-line for rows that need to be programmed
                 for name in dds_data.dtype.names:
 
                     # need to check field-by-field, both vals and dtypes
@@ -439,10 +442,6 @@ class AD9959DDSSweeperWorker(Worker):
                         changed_mask |= cache[:len(dds_data)][name] != dds_data[name]
 
                 changed_indices = np.where(changed_mask)[0]
-                # Handle potential row count difference
-                if n_cache != len(dds_data):
-                    self.logger.debug(f"Length mismatch: cache has {n_cache}, dds_data has {len(dds_data)}")
-                    changed_indices = np.union1d(changed_indices, np.arange(len(dds_data), n_cache))
                 self.logger.debug(f"Changed rows: {changed_indices}")
 
                 # Iterate only over changed rows

@carterturn
Copy link
Contributor Author

carterturn commented Sep 29, 2025

I am a bit confused as to why the length checking method does not catch everything, but updating the changed_mask explicitly is probably safer anyway. I have applied the patch and will see if anything happens.
I also observed that empty was (frequently!) finding blocks of memory that were previously used as the cache. Somewhat unfortunate that one still has to worry a bit about memory management in Python.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants