Machine Learning Perspectives in Compression, Distributed Computing, and Brain Imaging

dc.contributor.advisorKhisti, Ashish
dc.contributor.authorEbrahimi, MohammadReza
dc.contributor.departmentElectrical and Computer Engineering
dc.date2024-11
dc.date.accepted2024-11
dc.date.accessioned2024-11-13T19:34:16Z
dc.date.available2024-11-13T19:34:16Z
dc.date.convocation2024-11
dc.date.issued2024-11
dc.description.abstractThis thesis explores three critical dimensions in machine learning: modeling, training, and theory. Each dimension, represented by studies in brain imaging, distributed computing, and compression, addresses unique challenges with the goal of advancing machine learning methodologies and applications. First, within the domain of data modeling, we introduce Shared Gaussian Process Factor Analysis (S-GPFA), a novel probabilistic model for analyzing multi-subject fMRI datasets. S-GPFA addresses the challenge of modeling individual variability while uncovering shared temporal dynamics and spatial organization of brain activity. By incorporating Gaussian Process priors and emphasizing the temporal dimension of data, S-GPFA offers a more accurate and interpretable representation of brain activity compared to traditional static methods. The application of S-GPFA to a large fMRI dataset demonstrates its ability to identify group-specific dynamical characteristics and brain regions with meaningful functional variability, providing valuable insights into socioemotional cognitive capacity and potential avenues for studying psychiatric disorders. Second, focusing on the training aspect, we address the problem of straggler mitigation in distributed training of machine learning models. We present two innovative coding schemes, Selective Reattempt Sequential Gradient Coding (SR-SGC) and Multiplexed Sequential Gradient Coding (M-SGC), that leverage coding across both the spatial and temporal dimensions to achieve straggler resilience while reducing computational load. These schemes exploit the temporal diversity of straggler behavior, adapting to varying worker speeds and minimizing delays. Experiments on a large-scale AWS Lambda cluster demonstrate the effectiveness of the proposed schemes in reducing runtime and improving training performance under real-world conditions. Third, from a theoretical perspective, we investigate the foundations of data coupling and compression through the lens of information theory. We introduce the Minimum Entropy Coupling with Bottleneck (MEC-B) framework for lossy compression under logarithmic loss. This framework extends the classical Minimum Entropy Coupling (MEC) by incorporating rate limits, enabling a more controlled and flexible approach to compression. We explore the Entropy-Bounded Information Maximization (EBIM) formulation for compression and propose a novel search algorithm for identifying deterministic mappings with guaranteed performance bounds. Additionally, we characterize the optimal solution in the neighbourhood of deterministic mappings, providing valuable theoretical insights into the problem structure. Through these studies, this thesis contributes to machine learning methodologies and applications across diverse domains, ranging from brain imaging and distributed computing to information theory and data compression.
dc.description.degreePh.D.
dc.identifier.urihttp://hdl.handle.net/1807/141339
dc.rightsAttribution 4.0 International
dc.rights.urihttp://creativecommons.org/licenses/by/4.0/
dc.subjectBrain Imaging
dc.subjectCompression
dc.subjectDistributed Training
dc.subjectInformation Theory
dc.subjectMachine Learning
dc.subjectMinimum Entropy Coupling
dc.subject.classification0984
dc.titleMachine Learning Perspectives in Compression, Distributed Computing, and Brain Imaging
dc.typeThesis

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Ebrahimi_MohammadReza_202411_PhD_thesis.pdf
Size:
7.58 MB
Format:
Adobe Portable Document Format

License bundle

Now showing 1 - 3 of 3
No Thumbnail Available
Name:
CC_BY.rdf
Size:
908 B
Format:
RDF serialized in XML
Description:
No Thumbnail Available
Name:
TSpace_LAC_SGS_license_MOA2015.txt
Size:
2.45 KB
Format:
Plain Text
Description:
No Thumbnail Available
Name:
TSpace_LAC_SGS_license_MOA2015.pdf
Size:
69.65 KB
Format:
Adobe Portable Document Format
Description: