Data Acquisition¶
“Keep the DAQ Running”
The data acquistion (DAQ) is the process by which the D-Eggs are configured, readout, and information is saved. DAQ runs on the icehap-daq PC, analyses run on Grappa. As a shifter you need access to these PCs - for this you will need a Grappa account.
If there are no errors, the DAQ will entirely run on its own.
Running the DAQ¶
Enter into the ‘master’ screen session on icehap-daq.
Under most circumstances you should be able to just hit the up arrow and enter.
If this is a fresh run, launch it with python3 fat_master.py configs/fat_config.ini
.
If the DAQ has crashed at all, also use the recover
flag.
Common Errors and Debugging¶
If any errors occur during the FAT, they should be automatically sent to #chiba_daq (more on checking Slack later).
STF Errors¶
Issues with connecting to the internet for STF occasionally come up.
File "/home/icehap-daq/stf/stf/core.py", line 249, in check_system_clock
raise exceptions.STFRefuseToRun(msg)
stf.util.exceptions.STFRefuseToRun: System time is inaccurate. Cannot proceed
You might see something like this. In this case, please re-launch the master FAT script. This error is unlikely to happen twice in a row.
Timeouts¶
Sometimes the D-Egg trigger stream fails to work correctly, causing the DAQ to crash.
If the error is TIMEOUT
, try to relaunch the DAQ.
Held Process¶
Recently an issue is when the DAQ is on hold. This appear to happen due to a socket error in iceboot. Currently there is no fix for this issue.
The socket error does not throw a warning into Slack - if the DAQ has not updated its status for more than 1.5 hours, manually verify that the DAQ is running
Manual verification is done by connecting to the ‘master’ screen session and checking if it is updating. If it is not updating:
Kill the current process with ‘control+c’
Wait 5 seconds
If it does not fully exit, use ‘control+c’ again.
Repeat until the terminal prompt appears.
Log this incident using
error_logging.py
in/home/icehap-daq/software/error_logging
.Copy the stack (error message) and post it into #degg_fat along with which script was running at the time.
Then restart the DAQ.
SSH Error¶
Sometimes connection to the FAT CAT database can be disrupted. This appears to be a stocastic process. The error looks like this:
File "/misc/disk20/fat/software/venvs/degg-fat/lib/python3.10/site-packages/paramiko/transport.py", line 2271, in _check_banner
buf = self.packetizer.readline(timeout)
File "/misc/disk20/fat/software/venvs/degg-fat/lib/python3.10/site-packages/paramiko/packet.py", line 380, in readline
buf += self._read_timeout(timeout)
File "/misc/disk20/fat/software/venvs/degg-fat/lib/python3.10/site-packages/paramiko/packet.py", line 609, in _read_timeout
raise EOFError()
EOFError
paramiko.ssh_exception.SSHException: Error reading SSH protocol banner
The solution in this case is to re-run the analysis.
It can happen for any of the analyses, so please check the logfile /disk20/fat/data/logs/
.
This will most likely be the most recent logfile.
Cold Boot¶
The cold boot procedure currently has errors with D-Eggs becoming readable following disabling the wire pair voltage (wp_off.py
).
This will likely also throw an error into Slack.
Again, log this incident using error_logging.py
in /home/icehap-daq/software/error_logging
and reprot it to Slack.
In this case, enter the ‘setup’ screen session, and kill the current process.
Then re-run setup.py
, answering “yes” to the on-screen prompts.
Make sure all 16 D-Eggs are visible in domnet before proceeding to flashing the FPGAs.
If not all modules are visible, kill the script and try again.
Repeat at least 5 times.
If all modules are eventually visible, keep the process running and exit the screen session.
If not all modules are visible following 5 attempts, contact an expert immediately
We now need to manually modify the configuration file to issue a pass of the cold boot. Otherwise the master script will attempt to re-try the cold boot.
Open
/home/icehap-daq/data/json/run/run_00XYZ.json
.Go to the most recent master script task list (ex. start from the bottom).
‘coldboot’ should have a 0 next to it’s name - signalling to the master script it crashed.
Change the 0 –> 1, indicating a pass.
Save the file and exit.
Re-attach to the ‘master’ screen session and re-launch the DAQ.