Here’s some stuff that’s in progress. These changes will show up
first in the development dev branch, which you are welcome
to install and test.
If you would like to contribute to AutoSpectral, these are key areas I have identified where it has shortcomings or needs improvement. Suggestions for strategies, or better, code suggestions, would be appreciated.
-
Parallelization improvements.
- The parallelization of functions like
define.flow.controlwas problematic on Windows due to memory usage withfuturesand the tendency of Windows to usurp threads that R was using. - A new parallel backend using
parLapplyon Windows ormclapplyon Mac/Linux is being implemented indevbranch. This should gracefully fall back to sequentiallapplyprocessing if it runs into any problems when being set-up. - There is currently a problem with the new backend in
get.spectral.variantswhere in some cases it does not assess all fluorophores. Presumably there are some items missing from the exports to the clusters, but I don’t know.
- The parallelization of functions like
-
Code clean-up for
define.flow.control.- This was the first part of the code I wrote, since it is required for everything downstream. It is, consequently, probably the worst.
- I’ve made a little progress on this, but any changes here have ramifications throughout, so it is slow going.
- Currently, we read in all of the FCS data into active memory in R,
gating as we go to reduce data size. It might be better to restructure
this to simply generate gates and retain indices of gated events in each
file. FCS files would need to be read in on the fly for
clean.controlsandget.fluorophore.spectra. This would require a faster FCS reader, such as thedata.table-based approach in Nathan Laniewski’s flowState.
More flexibility in negative/unstained samples.
Allow multiple controls per fluorophore.
-
An alternative to the automated gating.
- This is now in progress.
- This will be based on the “Cellular Landmarks” approach from Nathan Laniewski’s flowState
- Initial definition of the gate will be based on the location of the brightest positive events in controls where the marker corresponds to known lymphocyte or monocyte populations. The code should fall back to the current gating approach when no known markers are provided.
-
Cell-specific weighting for per-cell unmixing.
- This is problematic for the ID7000, probably due to the PMT noise.
- Benefits on other systems are surprisingly minimal, so this may not be implemented.
-
Speed up per-cell AF extraction
- This currently operates via a
forloop, unmixing each AF signature sequentially. This is the best I have found. It is okay with fast BLAS. - Parallelization of this using
futurewas not faster and fails with large data sets due to memory overrun. - I think it is unlikely that parallelization of this will help since computation of the unmixing will be faster than communication of the large expression matrix to different cores.
- Approaches using
lapplyandvapplywere slower and required more memory.
- This currently operates via a
-
Speed up per-cell fluorophore optimization.
- Some progress on this has been implemented in
devbranch. Should be ~4x faster now. - The primary improvement comes from generating fewer spectral variants to search through. My testing shows that a lower number is, if anything, better, presumably due to slightly higher quality of the spectra being generated while still covering the range of variation. This will help in R as well as Rcpp.
- Secondary improvement comes from changing the solving strategy in C++ so that we aren’t recalculating the unmixing matrix (righthand solve) every time.
- Tertiary improvment from changing the C++ compiler flags.
- The pure R version will now benefit from parallelization, particularly on Mac/Linux.
- A lot more work is needed on this. I need a smarter strategy.
- Some progress on this has been implemented in
-
Fix the issue causing discontinuities.
- Some progress on this in
devbranch. - One improvement is better decontamination of AF from the spectral variants, which was causing cells to be pushed further away from the threshold for optimization.
- A second is scaling the changes to cellular spectra based on the abundance of the fluorophore in question. This means cells close to the threshold will not change much, so the optimization will apply to higher expression cells, which is what is needed for reducing spillover spread and unmixing errors.
- A third is the reduction in low-level noise introduced into the spectral variants due to fluctuations in electronic noise or autofluorescence. Any signal in the variant spectrum in a channel where the optimized single spectrum has less than 5% of the peak detector signal is now regularized towards the optimized spectrum. This focuses the variation on the “peaks”, which is what causes the unmixing errors and spread.
- The primary remaining problem is, I think, with the crude approximation employed in the “fast” approach. The best solution would be to speed up the “slow” method to the point where the “fast” approach can be deprecated. Alternatively, some degree of smoothing based on kNN or regularization may be required.
- Some progress on this in
-
Better correction of unmixing errors
- There are several strategies I’m investigating to handling the cases where the multi-colour samples contain obvious errors outside the range of variation seen in the single-colour controls. This is not stuff I’m going to put online, so get in touch if you’d like to work on this.
-
Integration of Poisson IRLS
- The Poisson IRLS in
AutoSpectralRcppis now working considerably better and faster. This may allow for integration of the Poisson optimization after identification of cellular autofluorescence and fluorophore signatures. - For this to be practical, per-cell fluorophore optimization needs to be faster.
- The Poisson IRLS in
To install the dev branch:
devtools::install_github("DrCytometer/AutoSpectral@dev")To replace this with the (hopefully) stable version, run this:
remove.packages("AutoSpectral")
devtools::install_github("DrCytometer/AutoSpectral")