Week 5: Optimization!

Week 5: Optimization!

Imposter Syndrome has hit me particularly hard this week. I feel defeated and stupid whenever I look away from the code I'm writing for a few seconds only to come back completely lost. However, this has also fueled my urge to invest more time and energy into learning than ever before. Being ignorant is frustrating, but not being able to understand one's own ignorance is even worse.

Machine Learning:

As I continue to work on my small Scikit-learn project, I've realized that many of the techniques and implementations I've been taught have already been simplified by developers, often to a one-liner. For example, an advanced optimization technique using the Adam algorithm in TensorFlow can be implemented simply as:
optimizer = tf.keras.optimizers.Adam(learning_rate=0.001)

Explanation
- Optimizer (variable name) = tf(TensorFlow).keras(library name).optimizers(built-in function).Adam(function name), and 0.001 is the initial learning rate that Adam will automatically increase or decrease accordingly and it also assigns different learning rates to different parameters.

I've also learned about convolutional layers and neural work design, where not all units in a hidden layer process the same input. Although I'm not yet sure when to use this technique best, it's an interesting concept nonetheless. I expect to see this design pattern frequently in advanced neural networks. As shown in the picture below, 100 initial inputs are going into the first layer of the neural network with 9 units, but each unit in layer 2 takes in a different vector of activations output from layer 1.

This is also connected to the concept and implementation of cross-validation. Essentially, you split the dataset into a larger training data and a smaller test data portion (a 7-3 or 8-2 split is common), to train and test the model in a way that maintains unbiased, or less overfitting, results. This leads to a more generalized model that is more reliable for making actual predictions.

On a more complicated subject, I've also learned the math behind backpropagation by finding the symbolic derivatives to determine how the weight w and bias b affect the cost J(w,b). All the topics I cover this week are methods for optimizing models.

Additionally, I've done some experiments with my Scikit-learn project to optimize the predictions. I haven't achieved success yet, but I've plenty of ideas for what to try next. Feel free to take a look if the topic interests you. It's also perfect timing that the ML course introduced me to methods and advices for evaluating and modifying learning algorithms or data preprocessing to improve model performance as I'm struggling with this project.

Methods I have in mind:

  1. use more training examples

  2. cross-validation technique

  3. feature engineering (polynomial features like x1x2, x1**2, x2**2)

  4. adjust regularization penalty (lambda)

Google Data Analytics:

I'm accelerating my learning in course one because I've realized it's a general introductory course of the program, which explains why I find the content very lackluster and shallow. As of this moment, I am at week 4 of course one, there's nothing substantial enough that I want to discuss, but I expect to start course two this coming week.

CS50:

I've revisited concepts such as arrays, singly-linked and doubly-linked lists, and hashtables. I'm becoming more comfortable with pointers and the dereference symbol * though they can still be confusing in terms of when to use or not use a pointer in code. I'm currently working on the speller problem set which involves writing a hash function and an implementation of a hash table. A little showcase here:

bool load(const char *dictionary)
{
    //1. open dictionary file
    FILE *file = fopen(dictionary, "r");
    if (file == NULL)
    {
        printf("Could not load %s.\n", dictionary);
        return false;
    }
    //2. read strings from file one at a time, +1 for the \0
    char lexis[LENGTH + 1];
    // iterate through the file
    while (fscanf(file, "%s", lexis) != EOF)
    {
        //3. create a new node for each word
        node *n = malloc(sizeof(node));
        if (n == NULL)
        {
            return false;
        }
        strcpy(n->word, lexis);
        //4. hash word to obtain a hash value
        int hashcode = hash(lexis);
        //5. insert node into hash table at that location
        n->next = table[hashcode];
        //point header to n to declare that is the new header
        table[hashcode] = n;
    }
    fclose(file);
    return true;
}

A hashtable is essentially an array of which each element is of the data type of a linked list. Above shown a simplified structure of a hashtable called table, and in it, there're 5 elements with the indexes of 0, 1, 2, 3 and 4. In this example, a list has been instantiated in index 3. Let's continue to look at how to expand the hashtable by adding more to the list.

When a new node n is created, it has an empty pointer, so the first step would be pointing the pointer to the current head of the list, in this case, lexis (left). After that, we have to declare n to be the new head of the list so we point the pointer of the table[3] to the n node (right). The procedure repeats itself as we add more new nodes to the list. Therefore the entire process can be iterated through a loop and in my particular case I used a while loop to loop through every single word of the dictionary file until the end of the file (EOF).

Prospects of my study:

As I've mentioned in the monthly update, I started brainstorming my first big project which will involve the use of a pre-train GPT model. I'm going to allocate some time this week to map this out and outline the necessary tools and skills. Hopefully, in 3 to 6 months this can become reality.

Meanwhile, I'm starting my new job tomorrow, so there's a lot of uncertainty in terms of scheduling. My goal is to maintain the 40-55 productive lifestyle which I've only just started to adapt.

And lastly, I want to talk about how I view this endeavor of breaking into the tech industry. As far as I am concerned, web development is often the easiest entry point into the tech industry, but my interests lie in machine learning and game development. If I pick this easy path that would mean I've to pick up tools like JavaScript and React etc. I do want to build my portfolio carefully, much like a niche and specialized skill stack, which is why I want to focus on Python, C and C++, with a potential foray into back-end development using Rust or GO. I'm also considering learning Vim as an alternative to VSCode and exploring courses on testing, scripting, AWS and agile management after completing my current courses.