Coordinate Optimization¶
Upon initialization of a Cell
object, the parameters of the coordinate system are initialized
with on initial guesses based on the binary image of the cell. This is only a rough estimation aimed to provide a starting
point for further optimization. In ColiCoords
, this fitting and optimization is handled by symfit
, a python
package that provides a symbolic and intuitive API for using minimizers from scipy
or other minimization solutions.
The shorthand approach for optimizing the coordinate system is:
cell.optimize()
By calling optimize()
the coordinate system is optimized for the current Cell
object by
using the default settings. This means the optimization is performed on the binary image using the Powell
minimizer
algoritm. Although the optimization is fast, different data sources and minimizers can yield more accurate results.
Input data classes and cell functions¶
ColiCoords
can optimize the coordinate system of the cell based on all compatible data classes, either binary,
brightfield, fluorescence, or STORM data. All optimization methods are implemented by calculating some property by applying
the current coordinate system on the respective data element, and then comparing this property to the measured data by
calculating the chi-squared.
For example, optimimzation based on the brightfield image can be done as follows:
cell.optimize('brightfield')
Where it is assumed that the brightfield data element is named ‘brightfield’. The appropriate function that is used for
the optimization is chosen automatically based on the data class and can be supplied optionally by the cell_function
keyword argument. Note that optimizaion by brightfield cannot be used to determine the value for the cell’s radius parameter,
for this the function measure_r()
has to be used.
In the case of brightfield optimization, a NumericalCellModel
is generated which is used
by symfit
to perform the minimization. When the default cell_function (CellImageFunction
)
is called it first calculates the radial distribution of the brightfield image, and this radial distribution is then used
to reconstruct the brightfield image. The resulting image is an estimation of the measured brightfield image and by
iterative bootstrapping of this process the optimal parameters can be found. This particular optimization strategy can be
use for any roughly isotropic image - ie a cell image that looks identical in all directions radially outward - and is thus
independent of brightfield image type and focal plane and can also be applied to selected fluorescence images.
The most accurate approach of optimization is by optimizing on a STORM dataset of a membrane marker. Here, the default
cell_function used (CellSTORMMembraneFunction
) calculates for every localization the distance
r to the midline of the cell. This is compared to the current radius parameter of the coordinate system to give the
chi-squared. This fitting is a special case since the dependent data (y-data) also depends on the optimization parameter
r. To allow a variable dependent data for fitting, the class RadialData
is used, which
mimics a ndarray
, however whose value depends on the current value of the r parameter.
Minimizers and bounds¶
Optimization can be done by any symfit
compatible minimizer. All minimizers are imported via colicoords.minimizers
.
More details on the different minimizers can be found in the symfit
or scipy
docs.
The default minimizer, Powell
is fast but does not always converge to the global minimum. To increase the probability to
find the global minimum, the minimizer DifferentialEvolution
is used. This minimizer searches the parameter space
defined by bounds on Parameter
objects defined in the model scan for candidate solutions.
Multiprocessing and high-performance computing¶
The optimization process can take up to tens of seconds per cell, especially if a global minimizer is used. Although the
process only needs to take place once, the optimization process of several thousands of cells can take too much time to
be conveniently executed on normal desktop PCs. ColiCoords
therefore supports multiprocessing so that the user can
take advantage of parallel high-performance computing. To perform optimization in parallel:
cells.optimize_mp()
Where cells is a CellList
object. The cells to be divided is equally distributed among the
spawned processes, which is by default equal to the number of physical cores present on the host machine.
Models and advanced usage¶
The default model used is NumericalCellModel
. Contrary to typical symfit
workflow,
the Parameter
objects are defined and initialized by the model itself, and then used to make up the
model. To adjust parameter values and bound manually, the user must directly interact with a CellFit
object instead of calling optimize()
.
from colicoords import CellFit
fit = CellFit(cell)
print(fit.model.params) # [a0, a1, a2, r, xl, xr]
# Set the minimum bound of the `a0` parameter to 5.
fit.model.params[0].min = 5
# Se the value of the `r`parameter to 8.
fit.model.params[3].value = 8
The fitting can then be executed by calling fit.execute()
as usual.
Custom minimization functions¶
The minimization function cell_function is a subclass of CellMinimizeFunctionBase
by default.
This when this object is used it is initialized by CellFit
with the instance of the cell object and the name of the
target data element. These attributes are then accessible in the custom __call__
method of the function object.
The __call__
function must take the coordinate parameters with their values as keyword arguments and should return
the calculated data which is compared to the target data element to calculate the chi-squared. Alternatively, the target_data
property can be used, as is done for CellSTORMMembraneFunction
to specify a different target.
Alternatively, any custom callable can be given as cell_function, as long as it complies with the above requirements.