The Code Path: a seeker's guide
Table of Contents
1 Introduction
This document (The Code Path) is meant as a guide for seekers of deep Computer Science and Engineering wisdom. If you are an intelligent entity with an inner desire to explore the infinite depths of the computing world to gain clear understanding and mastery of complex programmable systems, this is for you.
Throughout this document, you will be introduced to different Computer Science and Engineering concepts/skills through a set of project recommendations that will help you explore and grasp the basics, as well as develop sharp skills across multiple fields of the CS&E discipline.
Keep in mind that this is only one instance of The Code Path, there is a plethora of other projects that can be added, and others than can be expanded far beyond what's described or proposed here.
Enjoy!
2 Words of encouragement
The Path is long and winding, so:
- Keep your heart joyful and humble.
- Keep your mind focused and your breath deep.
- Invest your time wisely.
- Be slow to anger and judgement, prefer kindness.
- Always RTFM!
- Walk the Path in silence, complaining doesn't solve problems.
- You have nothing to prove, you are just having fun and exploring.
- "The difference between the novice and the master is that the master has failed more times than the novice has tried.", Koro-sensei.
- Take good care of your body. Sitting for hours in front of computer screens isn't a healthy habit. Sleep well, eat well, exercise, and breath deep. Health matters more than skills.
3 First steps
Install a Linux distro and master its inner workings. I recommend Arch Linux for the adventurous, and Linux Mint for the gentle beings. You can also try playing around with other distributions in a virtualization environment (QEMU, VirtualBox, Xen…).
Put some money aside and, when you can, acquire a microcontroller board (Arduino, Teensy, …), an SBC (Raspberry PI, Pine64, …), and an electronics kit (with a breadboard, LEDs, resistors, capacitors, tac switches, … ) so you can practice implementing and designing basic circuits, as well as writing low level code (firmware, interrupt handlers, drivers, …).
4 The Code Path levels
The levels presented here are based on my own experience and are totally subjective. This said, some projects may seem misplaced with regards to the difficulty level, they are not. Hopefully, the more you dig, the more you'll realize I was right and that this document is the product of thoughful and deep contemplations. In the end, whether I am right or wrong about the difficulty level of a project does not really matter, rather, the key questions you should ask yourself is: have you done the work before allowing yourself to form an opinion?, and how deep did you go?
4.1 Level 0 projects: novice
- Write a C program that returns the sum of the values of an integers array.
- Write a C program that returns the sum of the odd indexed values and the sum of the even indexed values.
- Write a C program that reads a string and that returns the longest word and its length.
- Write a C program that reads a text file and returns the frequency of occurence of its ASCII characters.
- Write a C program that returns whether a given number is prime or not.
- Write a C program that returns an approximation of Phi using the Fibonacci series.
- Write a C program that reads a set of integer values from a file and inserts them into a binary tree. The program must generate a dot formatted file representing the binary tree.
- Write a C program that looks up a value within the binary tree described above and that generates a dot formatted file with the node representing the given value colored. If the value is not found, the program must generate an uncolored binary tree.
- Write a C program that reads a parenthesized expression and checks whether the parentheses are placed properly.
- Write a C program that builds the Abstract Syntax Tree of a mathematical expression.
- Write a C program that generates an image representing the Mandelbrot set in the ppm or png format.
4.2 Level 1 projects: intermediate
- Implement Huffman compression in C. The program must take a file as input and generate a compressed or decompressed version.
- Extend the Huffman compression program to handle directories and file premissions.
- Implement MD5, SHA1, SHA256, SHA512, BLAKE2, scrypt, yescrypt, SIPHASH, and Argon2 hashing algorithms in C. The program must take a file as input and produce the correct hash. The implementations must be constant-time and secure against timing attacks.
- Implement, in C and using OpenSSL or libsodium, a secure (with peer authentication and full encryption) peer-to-peer client/server application for chatting and file transfers. The program must use a blockchain to ensure the authenticity and coherence of the logs and messages.
- Implement an image processing library in C with blurring, color conversion, dithering, edge detection, …
- Implement a Linear Algebra library in C with vector-vector, vector-matrix, and matrix-matrix operations (BLAS1, BLAS2, and BLAS3 levels).
- Implement a Convolutional Neural Network library in C for object recognition in images and videos (see YOLO).
- Implement a Neural Network in C that can play Snake.
- Implement a 2D side-scroller game in C using SDL or SFML.
- Implement a 2D asteroids game in C using SDL or SFML.
- Implement a fluid-dynamics and gas diffusion simulations in C using SDL or SFML.
- Analyze and optimize the performance of all the previous codes for
x86_64
and/oraarch64
architectures. You must perform hand-made performance analysis usingclock_gettime
orRDTSC
, as well as profilers such as: Linux perf, MAQAO, Intel VTune, or LIKWID. You must produce a report presenting a performance comparison between the different versions with plots and stable measurements. - Implement the following ciphers with AEAD in C: Blowfish, Twofish, Threefish, AES, Chaha20, Salsa20, SERPENT, and ASCON. The program must take a file and an encryption key as inputs and produce a correctly encrypted or decrypted file. The implementations must be secure against known attacks.
- Implement secure and functional versions of RSA and Ed25519.
- Implement a steganography tool that can hide any file inside png and ppm images. The tool must allow for the encryption and compression of the file before insertion.
4.3 Level 2 projects: advanced
- Design a scripting language and implement an interpreter.
- Extend the scripting language to handle cryptographic primitives, vectors, matrices, and linear algebra operations.
- Design a performance benchmarking tool for
x86_64
andaarch64
architectures with frequency evaluation per CPU core, support for NUMA domains, cache levels, … - Design and implement a secure backup and archiving tool for Linux using a client/server architecture with a central node and the possibility of configuring a secondary/fallback node. The server app must run as a daemon that receives data from the clients. The clients back up the files and directories periodicity (configurable per file: seconds, minutes, hours, days, …) and can be configured to send data either compressed or uncompressed.
- Design and implement LLVM and GCC instrumentation plugins that retrieve the memory trace of target loops. After the target program is executed, it must produce a file containing the trace of all the memory accesses occuring within one or multiple specified loops.
- Design and implement a performance oriented cache simulator for loops that uses the previously generated traces. The simulator must be configurable and able to mimick
the behavior of an
x86_64
and/oraarch64
cache hierarchy, as well as custom cache models. - Implement fully functional, secure, and optimized versions of RSA and Ed25519.
4.4 Level 3 projects: initiate
- Implement a dictionary based parallel (CPU, GPU, …) and distributed hash cracking tool.
- Create your own Linux distro based on busybox.
- Create an OpenBSD distro.
- Design an Operating System for
x86_64
and/oraarch64
that only runs a set of benchmarks to evaluate the raw performance of a target machine. - Implement an HTTPS and SFTP servers in C.
- Design and implement a secure passwords manager for Linux.
- Design a minable cryptocurrency and implement the tools to build an infrastructure.
- Design and implement an EDR for Linux using eBPF.
- Design and implement a virus scanner for Linux binaries.
4.5 Level 4 projects: monk
- Design a procedural programming language and implement a compiler that generates
x86_64
and/oraarch64
assembly. - Design a CPU architecture (ISA) and implement a simulator/VM along with binary analysis tools: assembler, disassembler, hexeditor, debugger, profiler, and patcher. The disassembler must generate reassemblable code.
- Upgrade the compiler to generate your CPU architecture assembly code.
- Design and implement a fully functional Operating System using your programming language for your target CPU.
- Implement the CPU architecture on an FPGA or any microcontroller (ATmega328, ARM Cortex-M4, …) and port your Operating System as well as your development tools on it.
- Design and implement a SCADA system (hardware and software) managing a pump, a motor, some sensors, or LEDs.
- Design and implement an encrypted and obfuscated Linux malware that targets the SCADA system. The aim of the malware is to be able to influence the field components (pump, motor, …)
while feeding the monitoring process wrong metrics. See
Stuxnet
. - Design and implement a performance profiling tool for Linux that combines static and dynamic analyses with hardware counters to build a performance profile of a target application.
4.6 Level 5: guru
- Find 0-day vulnerabilities in any software or hardware product.
- Design an obduscated and covert botnet infrastructure with fault tolerance, internal routing for data exfiltration, spyware and ransomware capabilities, …
- Design an electronic USB device and implement a Linux kernel driver. For example, a true random numbers generator that shows up as a device in /dev, a handheld passwords manager, or an encrypted drive.
4.7 Level 6: master
- Design and implement an SBC based on an x86, ARM, or RISC-V chip. You can also design and tapeout your own CPU chip and build a board in China for around ~25.000$ for a 1000 45nm chips.
- Design and implement, or port, an OS on your SBC.
5 References
5.1 Books
- Introduction to Algorithms, D. Cormen & co.
- The C Programming Language, B. Kernighan & D. Ritchie
- Handbook of Mathematics, I. N. Bronshtein & co
- Operating Systems, A. Tanebaum
- The C++ Programming Language, B. Stroustrup
- Compilers: Principles, Techniques, and Tools, A. Aho & co
- Concrete Mathematics, D. Knuth
- The Art of Unix Programming, E. S. Raymond
- Hacker’s Delight, W. Gay
- Graph Theory, K. Ruohonen
- Bash Guide for Beginners, M. Garrels
- Think Python: How to Think Like a Computer Scientist, A. Downey
- The R Book, M. J. Crawley
- Handbook of Applied Cryptography, A. J. Menezes & co
- The Art of Computer Systems Performance Analysis, R. Jain
- Measuring Computer Performance, D. Lilja
- Hacking the Xbox (https://bunniefoo.com/nostarch/HackingTheXbox_Free.pdf)
- Computer Architecture: A Quantitative Approach, J. Henessy & D. Patterson
- Practical Electronics for Inventors, S. Monk
- Real-world Cryptography, D. Wong
- Algorithmic cryptanalysis, A. Joux
5.2 Articles
- What every computer scientist should know about floating-point arithmetic, D. Goldberg
- What every programmer should know about memory, U. Drepper
- Parallel Algorithms, G. E. Blelloch & co
- PThreads Primer: A Guide to Multi-threaded Programming, B. Lewis & co