Water Background Subtraction (Overwrite)
This module, overwrite_10_2_23.py, adapts previous background subtraction programs for specific crystallography data analysis. It processes .stream files with “high” and “low” keV values, overwriting the “low” keV background with the “high” keV stream for improved peak accuracy.
The rational behind this is to take two stream files, one with “high” keV values and one with “low” keV values, and analyze their respective .stream files. The program will then take the background from the “low” keV stream file and overwrite the background in the “high” keV stream file.
The program will then “overwrite” the “low” keV stream file, as the “high” keV stream, with the new more accurate peak values.
Refer to Water Background Subtraction Project (Stream) for foundational coding structures.
GitHub Repositories:
Imports
The script primarily extends h5_stream_background_subtraction_10_2_23.py to streamline the code, alongside crucial libraries:
import os
import shutil
import numpy as np
import h5py as h5
import h5_stream_background_subtraction_10_2_23 as streampy
Duplicate Function
Creates a copy of the stream files before overwriting, using shutil.copyfile().
- duplicate_before_overwrite(filename)
- Parameters
filename (str) – The file to duplicate.
- Returns
The duplicated file’s path.
- Return type
str
def duplicate_before_overwrite(filename): # taking filename and adding copy extension to it. base_file, extension = filename.rsplit('.',1) new_base_file = f'{base_file}_copy' new_filename = f'{new_base_file}.{extension}' duplicate = shutil.copyfile(filename, new_filename) return duplicate
Debugging Functions
compare_high_low() is used to compare the two stream file contents to ensure that they’re loaded and read properly.
retrieve() is used to retrieve the columns of data to retrieve, and append to a list.
- compare_high_low(high_data, low_data, *columns)
Compare the high and low data and return the compared data.
- Parameters
high_data (dict) – The high keV file to compare.
low_data (dict) – The low keV file to compare.
- Option columns
The columns desired to be compared.
- Returns
The columns desired to be compared.
- Return type
dict
def compare_high_low(high_data, low_data, *columns): """ Compare the high and low data and return the compared data. """ compared_data = {} for col in columns: if col in high_data and col in low_data: print(f'High: {high_data[col]} \n') print(f'Low: {low_data[col]} \n') print() compared_data[col] = (high_data[col], low_data[col]) retrieve(list(high_data), list(low_data), *columns) return compared_data
This function directly appends certain columns in data_columns for ease of use and debugging purposes.
- retrieve(data_columns, *args)
Retrieve the columns of data to retrieve.
- Parameters
data_columns (dict) – The columns of data to retrieve.
args (list) – Takes the desired columns to be retrieved, appends to list.
- Returns
The columns of data to retrieve.
- Return type
list
def retrieve(data_columns, *args): result = [] try: # taking in data_columns and selecting the desired columns to retrieve result = [data_columns[col] for col in args if col in data_columns] except Exception as e: pass return result
Overwrite Function
This function executes the overwriting procedure of the “high” keV stream file with the “low” keV stream file.
- overwrite_low_in_high(filename, overwrite_data)
Overwrite the low data in the high stream file with the given overwrite data.
- Parameters
filename (str) – The name of the file to overwrite.
overwrite_data (dict) – A dictionary containing the data to overwrite.
- Returns
None
Intensity Finder Function
This function simply finds the intensity of the peaks in the image, and returns a list of the intensities. If the x,y coordinates are out of bounds, the function will simply ignore the peak.
- intensity_finder(x_coords, y_coords, image_name)
Retrieve the intensity values for every x,y coordinate in the image.
- Parameters
x_coords (list) – The x coordinates of the peaks.
y_coords (list) – The y coordinates of the peaks.
image_name (str) – The name of the image to find the intensity of the peaks.
- Returns
The intensity of the image.
- Return type
list
def intensity_finder(x_coords, y_coords, image_name): """ Retrieve the intensity values for every x,y coordinate in the image. """ with h5.File(image_name, "r") as f: intensities = f['/entry/data/data'][()] intensities = np.array(intensities) found_intensities = [] for x, y in zip(x_coords, y_coords): if x < intensities.shape[0] and y < intensities.shape[1]: found_intensities.append(intensities[int(x), int(y)]) return found_intensities
Populate Intensity Array Function
Populates the intensity array to recreate the array of a loaded image with the stream data.
- populate_intensity_array(data_columns, image_name)
Populate the intensity array with the intensity values for each x,y coordinate.
- Parameters
data_columns (dict) – The columns of data to populate.
image_name (str) – The name of the image to populate the intensity array.
- Returns
The populated intensity array.
- Return type
np.array
def populate_intensity_array(data_columns, image_name): """ Populate the intensity array with the intensity values for each x,y coordinate. """ # reads the h5 image with h5.File(image_name, "r") as f: intensities = f['/entry/data/data'][()] intensities = np.array(intensities) # generates a new array of zeros with the same shape as the image new_intensities = np.zeros((intensities.shape[0], intensities.shape[1])) # for each x,y coordinate in the data_columns, set the value in the new array to the intensity value # populate the intensity array with corresponding (fs,ss) coordinates for i in range(len(data_columns['fs'])): x = int(data_columns['fs'][i]) y = int(data_columns['ss'][i]) if x < intensities.shape[0] and y < intensities.shape[1]: new_intensities[x][y] = intensities[x][y] return new_intensities
Main Function
Orchestrates the overall process, from loading files to processing data and coordinates.
The main function of the program, which executes the program.
The function performs the following steps:
File Loading:
Displays the current working directory.
Setup Paths:
Initializes src_path to the current working directory.
Creates stream_dir` and image_dir paths by joining src_path with respective directory names.
Initialize Variables:
Initializes intensities_array to None.
Initializes high_stream_name and low_stream_name to the respective stream file names.
Load and Compare Stream Data:
Loads data from the high and low stream files using load_stream.
Compares high and low data using compare_high_low.
Overwrite Data:
Overwrites data in the high stream file with data from the low stream.
Image Processing:
Sets up image_name and image_path for processing.
Finds intensities using intensity_finder with high data stream coordinates and image path.
Populates the intensities_array with intensity data using populate_intensity_array().
Threshold Processing and Coordinate Extraction:
Initializes a PeakThresholdProcessor with a very low threshold.
Prints the original threshold value.
Retrieves coordinates above the threshold using get_coordinates_above_threshold.
Coordinate Menu Processing:
Initializes another PeakThresholdProcessor with a higher threshold value.
Iterates through a list of radii, processing coordinates with different threshold values and radii.
Sets completed to True after processing
To execute the program, ensure it’s correctly set up in your environment and call it from the command line or another script.
- main()
def main(): print("Current working directory:", os.getcwd()) src_path = os.getcwd() stream_dir = os.path.join(src_path, "high_low_stream") image_dir = os.path.join(src_path, "images") intensities_array = None high_stream_name = 'test_high.stream' low_stream_name = 'test_low.stream' high_stream_path = "high_low_stream/test_high.stream" low_stream_path = "high_low_stream/test_low.stream" if not os.path.exists(high_stream_path): print(f"File {high_stream_path} does not exist.") return elif os.path.exists(low_stream_path) and os.path.exists(high_stream_path): print(f"Files {low_stream_path} and {high_stream_path} exist.") if not os.path.exists(high_stream_path): print(f"File {high_stream_path} does not exist.") return elif os.path.exists(low_stream_path) and os.path.exists(high_stream_path): print(f"Files {low_stream_path} and {high_stream_path} exist.") # compare_high_low(high_data, low_data) high_data = load_stream(high_stream_path) low_data = load_stream(low_stream_path) compare_high_low(high_data, low_data) # Took low data from low_stream and put in high_stream file. overwrite_data = low_data overwrite_low_in_high(high_stream_path, overwrite_data) # compare any columns in data_columns # compare_high_low(high_data, low_data, "h") # now high_stream has data from low_stream image_name = '9_18_23_high_intensity_3e8keV-1_test.h5' image_path = os.path.join(image_dir, image_name) # retrieved from stream coordinate menu intensities = intensity_finder(high_data['fs'], high_data['ss'], image_path) # populate_inteneity_array is not correctly working intensities_array = populate_intensity_array(high_data, image_path) print("Number of non-zero values in intensity array\t", np.count_nonzero(intensities_array)) # for debugging # intensities_array = np.array(intensities_array) # print(intensities_array) # compare_high_low(high_data, low_data, "I") threshold_stream = streampy.PeakThresholdProcessor(intensities_array, threshold_value=1e-5) # very low! print("Original threshold value: ", threshold_stream.threshold_value, "\n") coordinate_list_stream = threshold_stream.get_coordinates_above_threshold() completed = False radius = [1,2,3,4] threshold = streampy.PeakThresholdProcessor(intensities_array, threshold_value=9000) for r in [1, 2, 3, 4]: print(f"Threshold value for radius {r}: {threshold.threshold_value}") streampy.coordinate_menu(intensities_array, threshold_value=threshold.threshold_value, coordinates=coordinate_list_stream, radius=r) print(f"Completed coordinate menu for radius {r}") completed = True if __name__ == '__main__': main()