Week 4: C & Advanced Learning Algorithms

Week 4: C & Advanced Learning Algorithms

This week, I've taken my self-discipline and dedication to new heights, clocking in 49 hours of study (30% was spent on Japanese) on top of a regular 40-hour work week. Surprisingly, I feel more energized than ever. I'll continue the hard work and as mentioned in last week's update, the current goal is 40-55. Oh, and I managed to secure a new job (another blue-collar job), so a big congratulations to me! But let's dive into what I've accomplished this week, beginning with the least impressive on my list.

Google Data Analytics:

I anticipated more from the course, given that it's taught by Google. However, it could be because I'm still in the early stages of the program (week 2 of course 1 out of 8) that I haven't encountered any truly insightful or practically useful content, apart from wordy semantic distinctions. For example, "Data analysis is not the same as Data Analytics; the former uses data to address existing problems, while the latter seeks to understand the unknown through modeling." This approach doesn't quite align with my preference for practical learning. That being said, I still have hope that this will prove to be a valuable study.

Here's a simplified workflow for Machine Learning Scientists or Developers, whichever you prefer to call us. What's intriguing is that each component can be optimized, and often, you'll find that one aspect is more crucial than another. For instance, I could spend countless nights perfecting the model itself, making it so intricate and accurate that it delivers 99.9% successful predictions. It would still produce poor results if you input low-quality data. Therefore, my goal with the Google Data Analytics program is to develop a solid understanding of data's value and become skilled at asking the right questions to optimize the input.

Advanced Learning Algorithms

This week, I delved deeper into the activation functions employed across layers of a neural network. The following example showcases a neural network implementation in TensorFlow. While you can disregard the rest, pay attention to the central three lines of the Dense function. Each line features a specific activation model, signifying the algorithm utilized by all units within that particular layer.

The final layer, the 3rd layer of this neural network, is often referred to as the output layer. Typically, using Linear Regression for the output layer is discouraged, as it essentially transforms the entire neural network into a Vectorized Linear Regression model. This is somewhat ironic, considering we refer to this algorithm as a neural network inspired by the human neural system. Yet, a minor alteration reverts it to basic mathematics. Does this imply that our brains have always revolved around math and logic? Certainly not, but it does highlight the misalignment between the neural network's inspiration and its actual functionality. While not necessarily mirroring human brain functions, neural networks remain incredibly useful and deliver robust performance. I'm sure there's more to this, at the moment the 3 main activation functions I've been introduced to are the Linear function, Sigmoid function and ReLu function.

I've also explored softmax regression as an alternative output layer activation for solving multi-class classification problems. For instance, when given a set of handwritten digits from 0 to 9, the machine can determine which class (or number) each digit belongs to. This approach also clarifies why the output layer in the first image of this section used a Linear Regression model. It's more numerically accurate to directly incorporate the activation into the loss function via TensorFlow implementation, rather than creating and storing the output of layer 2 in a variable (for example, a2, where a stands for activation) as an intermediate term. This method helps minimize roundoff errors.

Another interesting thing I've learned is that there's a different type of classification problem called Multi-label classification, the difference is that each sample, instead of being classified as class 1, class 2, class 3 etc, was put on different "labels" meaning they have overlapping features that cannot be easily distinguished from each other yet they are different. For example:

Each object captured by a car's camera becomes an input (x), and through this multi-label classification model, the input is assigned a probability for each label, such as car, bus, or pedestrian. Depending on the parameters, an input can belong to multiple classes or labels simultaneously, as opposed to being assigned a fixed class in multi-class classification.

Sklearn on Kaggle:

Building on last week's initial effort, I've plotted the prediction versus the actual output y:

Ideally, the dots should form a straight line when x[j] = y[j] which forms a perfect fit model. But it's also unrealistic let alone impractical to create one because it will likely be overfitting. Shown below are each input feature [x] against y to see if there's any nonlinearity in the relationship.

# by dataframe convention (m, n) where m = number of sample and n = the n th item among the input features 
n = X.shape[1] #which should be 6, but I prefer not to hard code it:)
n_cols = 2
n_rows = round(n / n_cols)

# this is just to make a easy-to-read visualization
plt.figure(figsize=(10, 12))

for index, feature in enumerate(X.columns):
    plt.subplot(n_rows, n_cols, index + 1)
    plt.scatter(X[feature], y)
    plt.xlabel(feature)
    plt.ylabel('Score')
    plt.title(f'Score vs {feature}')

# tight_layout() automates optimal spacing for my subplot
plt.tight_layout()
plt.show()

Now, without digging too deep into reasoning, just by observation I notice a couple of interesting phenomena.

  1. Social support seems to have a strong correlation with the happiness score, aside from GDP per capita and life expectancy.

  2. Freedom to make choices is all over the place, it seems to be a driving factor among the countries with higher happiness scores, but not so much obvious in the lower-score countries.

  3. Generosity and Perceptions of corruption seem to not form a linear relation with the happiness score. It appears to me that most people expect low to moderate generosity anyway across the board quite evenly distributed.

  4. Perceptions of corruption are fairly low for the vast majority of countries with happiness scores ranging from 3 to 7. A small cluster of countries score higher in perceptions of corruption and have particularly high happiness scores (above 7).

My work here is far from over. I'll keep working on this project until I feel I've sufficiently explored scikit-learn. Additionally, I have an intriguing and larger-scale Neural Network project in mind that I'd like to develop using TensorFlow.

CS50:

I've got to a point I feel like I'm more skilled in coding in C than in Python, which is something I've never really given much thought to before. But after weeks of course assignments in C, and in particular recover.c from week 4, I finally started to realize the power of a lower-level language.

#include <stdio.h>
#include <stdlib.h>
#include <stdint.h>

//given this is using FAT filesystem, each block is defined by 512 bytes
typedef uint8_t BYTE;
const int BLOCK_SIZE = 512;
int isJPEG(BYTE buffer[]);

int main(int argc, char *argv[])
{
    if (argc != 2)
    {
        printf("Usage: ./recover IMAGE\n");
        return 1;
    }
    //If image cannot be opened for reading, inform the user and main should return 1.
    FILE *f = fopen(argv[1], "r");
    if (f == NULL)
    {
        printf("Could not be opened for reading.\n");
        return 1;
    }

    // buffer stores the 512 bytes of data you are currently looping through
    BYTE buffer[BLOCK_SIZE];
    // counter keeps track of file number, should not exceed 50.
    int counter = 0;
    // allocate memory for new file name
    char *filename = (char *)malloc(8 * sizeof(char));
    FILE *img;

    // now we loop through the memory card 
    while (fread(buffer, BLOCK_SIZE, 1, f) == 1)
    {
        // found a JPEG header
        if (isJPEG(buffer) == 1)
        {
            // when the found JPEG is not the first one, close the previous one
            if (counter != 0)
            {
                fclose(img);
            }
            // naming convertion for new file, and counter + 1
            sprintf(filename, "%03i.jpg", counter++);
            // open and write into new file
            img = fopen(filename, "w");
            fwrite(buffer, BLOCK_SIZE, 1, img);
        }
        else if (counter > 0)
        {
            //store into buffer
            fwrite(buffer, BLOCK_SIZE, 1, img);
        }
    }
    if (counter > 0)
    {
        fclose(img);
    }
    fclose(f);
    free(filename);
}

// verify header of block to see if it is a JPEG file
int isJPEG(BYTE buffer[])
{
    if (buffer[0] == 0xff && buffer[1] == 0xd8 && buffer[2] == 0xff && (buffer[3] & 0xf0) == 0xe0)
    {
        return 1;
    }
    return 0;
}

If programming feels like searching for a parking lot on Google Maps, this exercise made me feel as though I not only found the lot, but also pinpointed the exact parking spaces, using and freeing them with precision. Valgrind is incredibly reliable, pinpointing memory leaks and identifying issues with ease. All of this occurs on a micro-level, within the embedded environment that's typically inaccessible through the abstractions provided by high-level languages like Python and JavaScript. Although I'd still prefer the cleanliness and simplicity of Python, I've gained a newfound appreciation for C's effectiveness in memory optimization and allocation. That reminds me of a podcast when Elon Musk talk about the optimization in memory they did to Tesla autopiloting:

Functional Programming:

I was listening to some other podcast and I stumbled upon a programming paradigm called functional programming. Immediately, I thought, "Oh, it's something different from OOP." I think they are like water vs rock.

  • Functional programming emphasizes immutability, purity, and the use of functions as first-class citizens. Just as water takes the shape of its container and flows, functional programming tends to be flexible and adaptable and can be seen as more fluid and transparent, making it easier to reason about and understand the flow of data.

  • Object-oriented programming is based on the concept of objects, which are instances of classes that encapsulate data and behavior. Just as rocks are solid and form the foundation for many structures, OOP promotes modularity and reusability through inheritance and polymorphism, allowing for a more organized and hierarchical approach to software design.

In my opinion, they simply represent different programming philosophies. I aim to critically analyze them to avoid potential pitfalls in the future. I think Functional programming may lead to complex representation and management systems due to its less apparent structure as the project expands. Conversely, OOP addresses this issue through inheritance, but it can become rigid and challenging to maintain or modify. Ultimately, it may appear as a murky mess to the untrained eye as developers adapt different approaches to solve different tasks, but I believe a skilled programmer can identify the solid foundation and fluid elements within the chaos, uncovering the core principles that guide any project. Or maybe in 5 years I'll come back and look at what I just said and call my old self a naive idealist.

That's all for this week! The next blog post will probably be my first monthly review – I can hardly believe it's been a month since I began documenting my learning journey. All I can say is, this has been the best decision I've ever made.