Reference Material:
Livestreams
Day 0: Getting Python environment setup and going over some basics to prepare for Day 1.
Stream on X
Setup Python Environments with MLX
- Install Conda - https://docs.conda.io/en/latest/miniconda.html
- Choose your IDE - after going through this stream I want to use PyCharm from Jetbrains unless I can figure out how to configure this with Cursor.sh
- Make a project in your
~/workspace/GPTfromScratch
- From the terminal window in the IDE run
conda create -n GPTfromScratch python=3.10
conda activate GPTfromScratch
- Install MLX
conda install -c conda-forge mlx
Preparing Data - LLM Tokenization Explained
Breaking down the following statement to a 5th grader.
The first step to training an LLM is collecting a large corpus of text data and then tokenizing it. Tokenization is the process of mapping text to integers, which can be fed into the LLM.
To Do
Figure out environments using https://mise.jdx.dev