Bachelor/Master Theses, Semester Projects, and DAS DS Capstone Projects

If you are interested in one of the following topics, please send an email to Prof. Bölcskei and include your complete transcripts. Please note that we can not respond to requests that are sent directly to PhD students in the group or do not contain your transcripts.

These projects serve to illustrate the general nature of projects we offer. You are most welcome to inquire directly with Prof. Bölcskei about tailored research projects. Likewise, please contact Prof. Bölcskei in case you are interested in a bachelor thesis project.

Also, we have a list of ongoing and finished theses on our website.

List of Semester Projects (SP)

List of Master Projects (MA)



Acoustic sensing and trajectory estimation of objects flying at supersonic speed (with industry) (SP)

mlerjen_scoringSystem.png
In shooting sports, hunting, and law-enforcement applications measuring the speed and trajectory of projectile flight at high precision and reliability constitutes an important technical challenge. For supersonic projectiles these quantities are estimated from signals acquired by microphones placed at different locations. Recently, more powerful microprocessors have made it possible to employ more sophisticated algorithms.

The goal of this project is to investigate new techniques for the task at hand, such as linearization of non-linear systems of equations, least squares fitting, and neural network driven machine learning. Existing hardware and algorithms provide a starting point for the project, which will be carried out in collaboration with an industry partner called SIUS (located in Effretikon, Zurich). SIUS offers close supervision and the possibility to use hardware and a test laboratory.

About the industry partner: SIUS is the world’s leading manufacturer of electronic scoring systems in shooting sports. The company is specialized in producing high speed and high precision measurement equipment capable of measuring projectile position and trajectory and has been equipping the most important international competitions including the Olympic Games for decades.

Type of project: 20% literature research, 20% theory, 50% implementation/programming, 10% experiments
Prerequisites: Solid mathematical background, knowledge of SciPy, Matlab or a similar toolset, ideally knowledge on (deep) neural networks
Supervisor: Michael Lerjen, Steven Müllener
Professor: Helmut Bölcskei

References:
[1] SIUS Homepage [Link to Document]



Building probability models from scattering networks (SP)

haeberlk_scattering_transform.svg
Scattering networks [1,2] offer a powerful approach to feature extraction and summary statistics for stochastic processes, finding successful applications in various practical machine learning tasks, such as image and sound signal classification. More recently, they have also been used to build generative models from a limited amount of data. Specifically, [3] leverages scattering networks to estimate the probability measure of a stationary stochastic process given a single realization.

The goal of this project is to study how well the probability measure obtained via the scattering network method [3] approximates the true probability measure. Our focus is on understanding the impact of the network design, particularly the choice of the filters, on the approximation error.

Type of project: 100% theory or 60% theory and 40% programming
Prerequisites: Strong mathematical background
Supervisor: Konstantin Häberle
Professor: Helmut Bölcskei

References:
[1] S. Mallat, “Group invariant scattering,” Communications on Pure and Applied Mathematics, vol. 65, no. 10, pp. 1331–1398, 2012. [Link to Document]

[2] T. Wiatowski and H. Bölcskei, “A mathematical theory of deep convolutional neural networks for feature extraction,” IEEE Transactions on Information Theory, vol. 64, no. 3, pp. 1845–1866, 2018. [Link to Document]

[3] J. Bruna and S. Mallat, “Multiscale sparse microcanonical models,” Mathematical Statistics and Learning, vol. 1, no. 3, pp. 257–315, 2018. [Link to Document]



Learning cellular automaton transition rules with transformers (MA)

yanizhang_ca2.svg

A cellular automaton (CA) is a discrete dynamical system consisting of a regular lattice in one or more dimensions with cell values taken from a finite set. The cells change their states at synchronous discrete time steps based on a transition rule [1]. Despite the simplicity of the CA model, it can exhibit complex global behavior. With suitably chosen transition rules, cellular automata can simulate a plethora of dynamical behaviors [2, 3]. The inverse problem of deducing the transition rule from a given global behavior is extremely difficult [4]. In this project, you will investigate the possibility of training transformers [5] to learn CA transition rules.

Type of project: 30% theory, 70% implementation
Prerequisites: Good programming skills, knowledge in machine learning
Supervisor: Yani Zhang
Professor: Helmut Bölcskei

References:
[1] J. Kari, “Theory of cellular automata: A survey,” Theoretical computer science, 334(1-3):3–33, 2005. [Link to Document]

[2] T. Toffoli and N. Margolus, “Cellular automata machines: A new environment for modeling,” MIT press, 1987. [Link to Document]

[3] A. Adamatzky, “Game of life cellular automata,” vol. 1, Springer, 2010. [Link to Document]

[4] N. Ganguly, B. K. Sikdar, A. Deutsch, G. Canright, and P. P. Chaudhuri, “A survey on cellular automata,” 2003. [Link to Document]

[5] A. Vaswani, et al., “Attention is all you need,” Advances in Neural Information Processing Systems 30, 2017. [Link to Document]



Minimal depth to represent continuous piecewise linear (CPWL) functions by ReLU networks (MA/SP)

yanpan_Piecewise_linear_function2D.svg
A core problem in machine learning is to analyze the expressive power of neural networks. So-called universal approximation theorems [1, 2, 3] show that even with a single hidden layer, one can essentially approximate any (continuous/measurable) function to within arbitrary precision. Nevertheless, it can be advantageous to employ deeper networks as this can reduce the overall network size in terms of the number of nodes [3, 4]. To get a better quantitative understanding of this phenomenon, it is important to characterize the classes of functions that are exactly representable by neural networks of a certain architecture. Specifically, we will consider the class of continuous piece-wise linear (CPWL) functions and ReLU neural networks.

In [5], a conjecture was recently formulated regarding this question. The conjecture states that the minimum number of hidden layers needed to represent any 𝑑-dimensional CPWL function is exactly ⌈log₂(𝑑+1)⌉. In this project, you would first write a literature review based on [4, 5, 6, 7] and then make attempts towards solving the conjecture formulated in [5].

Type of project: 100% theory, or 80% theory and 20% programming
Prerequisites: Strong mathematical background
Supervisor: Yang Pan
Professor: Helmut Bölcskei

References:
[1] G. Cybenko, “Approximation by superpositions of a sigmoidal function,” Mathematics of Control, Signals and Systems, vol. 2, no 4, pp. 303–314, 1989. [Link to Document]

[2] K. Hornik, “Approximation capabilities of multilayer feedforward networks,” Neural Networks, vol. 4, no 2, pp. 251–257, 1991. [Link to Document]

[3] D. Elbrächter, D. Perekrestenko, P. Grohs, and H. Bölcskei, "Deep neural network approximation theory," IEEE Transactions on Information Theory, vol. 67, no. 5, pp. 2581–2623, May 2021. [Link to Document]

[4] R. Eldan and O. Shamir, "The power of depth for feedforward neural networks," Conference on Learning Theory, pp. 907–940, 2016. [Link to Document]

[5] C. Hertrich, A. Basu, M. Di Summa, and M. Skutella, "Towards lower bounds on the depth of ReLU neural networks," Advances in Neural Information Processing Systems, vol. 34, pp. 3336–3348, 2021. [Link to Document]

[6] C. Haase, C. Hertrich, and G. Loho, "Lower bounds on the depth of integral ReLU neural networks via lattice polytopes," arXiv preprint arXiv:2302.12553, 2023. [Link to Document]

[7] P. Petersen, M. Raslan, and F. Voigtlaender, "Topological properties of the set of functions generated by neural networks of fixed size," Foundations of Computational Mathematics, vol. 21, pp. 375-444, 2021. [Link to Document]



Finite-precision neural networks (MA)

wou_finite_precision_neural_networks.svg
Deep feedforward neural networks with quantized real-valued weights can approximate a wide class of functions in a Kolmogorov-optimal manner [1, 2]. These results do, however, not fully explain the success of neural networks in practical applications, where besides network weights also the signals in all layers of the network have to be stored in computer memories and hence have finite precision only.

The first step of the project is to generalize the theory developed in [1, 2] to neural networks with both weights and signals in all layers of finite precision and to establish fundamental limits on function approximation through such networks. Specifically, the new theory should be able to answer the question of how a given overall bit budget for operating the neural network should be distributed across the weights and signals in the network so as to minimize the end-to-end approximation error for a given task. The second major goal of the project is to identify function classes for which approximation through finite-precision neural networks achieves the fundamental limits identified in the first part.

The project is carried out in collaboration with Dr. Van Minh Nguyen in the form of an internship at Huawei Labs in Paris.

Type of project: 70% theory, 30% simulation
Prerequisites: Strong mathematical background and good programming skills
Supervisor: Weigutian Ou
Professor: Helmut Bölcskei

References:
[1] H. Bölcskei, P. Grohs, G. Kutyniok, and P. Petersen, "Optimal approximation with sparsely connected deep neural networks," SIAM Journal on Mathematics of Data Science, vol. 1, no. 1, pp. 8–45, 2019. [Link to Document]

[2] D. Elbrächter, D. Perekrestenko, P. Grohs, and H. Bölcskei, "Deep neural network approximation theory," IEEE Transactions on Information Theory, vol. 67, no. 5, pp. 2581–2623, May 2021. [Link to Document]



General signal denoising (MA/SP)

wou_general_signal_denoising.svg
Denoising signals contaminated by Gaussian noise has been a prevailing problem in the fields of statistics and signal processing. The majority of the results available in the literature assume that the true signal comes from certain specific subsets of Euclidean spaces and then design and analyze the denoising algorithm accordingly. Examples include one-dimensional intervals [1], hyperrectangles [2], 𝓁p-balls [3], and unions of manifolds corresponding to low-rank matrices [4].

This project seeks to explore the signal denoising problem under more general assumptions on the data structures, while imposing low dimensionality in a suitable sense. The first goal of the project is to identify the best possible performance achievable by any denoising algorithm, and the second is to design algorithms that can provably attain this best performance.

Type of project: 80% theory, 20% simulation
Prerequisites: Strong mathematical background
Supervisor: Weigutian Ou
Professor: Helmut Bölcskei

References:
[1] P. J. Bickel, "Minimax estimation of the mean of a normal distribution when the parameter space is restricted," The Annals of Statistics, 9(6): 1301–1309, 1981. [Link to Document]

[2] D. L. Donoho, R. C. Liu, and B. MacGibbon, "Minimax risk over hyperrectangles, and implications," The Annals of Statistics, 18(3): 1416–1437, 1990. [Link to Document]

[3] D. L. Donoho and I. M. Johnstone, "Minimax risk over 𝓁p-balls for 𝓁q-error," Probability Theory and Related Fields, 99(2):277–303, 1994. [Link to Document]

[4] D. L. Donoho and M. Gavish, "Minimax risk of matrix denoising by singular value thresholding," The Annals of Statistics, 42(6): 2413–2440, 2014. [Link to Document]



The "logic" behind transformers (MA)

vabadie_logic_transformers.png
The transformer [1] is a neural network architecture–-underlying software such as ChatGPT–-that can simulate Turing machines [2]. Specifically, this is done through the simulation of a recurrent neural network (RNN) with a transformer, and then using the fact that RNNs can, in turn, simulate Turing machines [3]. This simulation process does, however, not yield a clear vision of which parts of the transformer architecture are connected to the different constituents of the Turing machine.

In this project, you will familiarize yourself with literature on links between Turing machines and neural networks. You would then establish direct connections between the transformer architecture and Turing machines, with the goal of understanding better how transformers simulate Turing machines.

 

Type of project: 100% theory
Prerequisites: Knowledge in neural network theory and theoretical computer science, appetite for theory in general
Supervisor: Valentin Abadie
Professor: Helmut Bölcskei

References:
[1] A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, L. Kaiser, and I. Polosukhin, "Attention is all you need," Advances in Neural Information Processing Systems, 2017. [Link to Document]

[2] S. Bhattamishra, A. Patel, and N. Goyal, "On the computational power of transformers and its implications in sequence modeling," arXiv preprint arXiv:2006.09286, 2020. [Link to Document]

[3] H. T. Siegelmann and E. D. Sontag, "On the computational power of neural nets," Journal of Computer and System Sciences, vol. 50(1), pp. 132–150, 1995. [Link to Document]



Generating singular distributions through neural networks (MA)

eriegler_koch.png
There is a large body of work on neural networks as function approximators, either in the context of regression or classification. In recent years another use for neural networks has emerged, namely for generating high-dimensional probability distributions. It was shown in [1] that the set of probability distributions supported on the 𝑑-dimensional unit cube can be approximated arbitrarily well by push-forwards of the uniform distribution on the unit interval through a set of neural networks of cardinality depending on 𝑑.

The goal of this project is to derive results on the generation of singular, e.g., fractal, distributions through a set of neural networks of cardinality depending on the Hausdorff dimension of the probability distribution.

Type of project: 100% theory
Prerequisites: Strong mathematical background, measure theory
Supervisor: Erwin Riegler
Professor: Helmut Bölcskei

References:
[1] D. Perekrestenko, L. Eberhard, and H. Bölcskei, "High-dimensional distribution generation through deep neural networks," Partial Differential Equations and Applications, Springer, vol. 2, no. 64, pp. 1–44, Sep. 2021. [Link to Document]

[2] Y. Yang, L. Zhen, and Y. Wang, "On the capacity of deep generative networks for approximating distributions," Neural Networks, vol. 145, no. C, pp. 144–154, Jan. 2022. [Link to Document]

[3] K. Falconer, "Fractal geometry: Mathematical foundations and applications," Wiley, 2nd ed., 2003. [Link to Document]



Why transformers cannot learn maths? (MA/SP)

vabadie_transformer.jpg
The transformer [1] is a neural network architecture—underlying software such as ChatGPT—which can provably simulate Turing machines [2]. However, even state-of-the-art architectures appear unable to perform simple mathematical operations, such as addition and multiplication. Recently, techniques for improving the performance of transformers on such tasks—by changing the way data is represented—were reported in [3]. Moreover, investigations on which kind of heuristics transformers do learn have been carried out in [4].

In this project, you will study which part of the learning process hinders the realization of simple mathematical operations. Specifically, you will implement the transformer architecture simulating Turing machines in [2], and you will try to reach this architecture through a learning algorithm.

Type of project: 100% simulation
Prerequisites: Knowledge in neural network theory and good programming skills
Supervisor: Valentin Abadie
Professor: Helmut Bölcskei

References:
[1] A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, L. Kaiser, and I. Polosukhin, "Attention is all you need," Advances in Neural Information Processing Systems, 2017. [Link to Document]

[2] S. Bhattamishra, A. Patel, and N. Goyal, "On the computational power of transformers and its implications in sequence modeling," arXiv preprint arXiv:2006.09286, 2020. [Link to Document]

[3] R. Nogueira, Z. Jiang, and J. Lin, "Investigating the limitations of transformers with simple arithmetic tasks,” arXiv preprint arXiv:2102.13019, 2021. [Link to Document]

[4] F. Charton, ”Can transformers learn the greatest common divisor?,” arXiv preprint arXiv:2308.15594, 2023. [Link to Document]