Getting Started

A non-technical tour of data compression, entropy, encryption, Kolmogorov complexity, and why they matter

A zoomed-in view of random pixels. Image by the author.

Introduction

This article will look at information theory with the goal of developing useful intuition for some concepts and their applications. It will stay away from technical details, but there are references linked at the bottom for further reading. To begin, I’d like to start with a simple question:

Can any file be compressed?

The answer to this question is profound. Rather than just stating the answer, let’s see if we can reason our way to the solution.

  1. Suppose any file can be compressed, resulting in a smaller file.
  2. Therefore, the output of a compressed file could also always be compressed.
  3. If that’s true, then any…


A Reflection on Mixed Identity

Tokyo, circa 1999 (Family photograph)

For me, the sentence “I’m Asian and American” is true in the simplest sense: I am from Asia and I am also from America. Despite being Asian and American, I am not Asian-American.

The phrase “Asian-American” may bring to mind the image of families from across Asia journeying to the United States in search of opportunity, navigating life in a new country, building careers to save for their children’s futures, rising through the ranks to pursue the American Dream, and proudly taking on the mantle of American identity. …


A survey of methods, from non-ML techniques to genetic algorithms to deep reinforcement learning

Photo by Museums Victoria on Unsplash

0. Introduction

You’ve probably played, or at least seen, the game of Snake before. The player controls a snake by pressing the arrow keys, and the snake has to maneuver around the screen, eating apples. With each apple eaten, the tail’s snake grows one unit. The goal is to eat as many apples as possible without running into a wall or the snake’s ever-increasing tail.

Building an AI agent to play Snake is a classic programming challenge, with many videos on YouTube showing various attempts using a wide range of techniques. In this article, I review the pros and cons of various…


Hands-on Tutorials

Catching Cheaters using Language Models (With Code, Explanations, and Visualizations)

Photo by Scott Graham on Unsplash

Having been a teacher for a few years, I understand the frustration of educators who work hard to impart knowledge and meaningful feedback to their students, only for a small number of cheaters to circumvent the system by passing off plagiarized work as their own. Thanks to the Internet, cheating is now easier and more tempting than ever. Fortunately for teachers, there are ways to detect cheating automatically. Of course, there are services that do this for a fee, but I wanted to create a plagiarism detection system in Python to see how these systems might work under the hood…


Getting Started

A guide to the elegant and helpful uses of built-in Python sets

Photo by Alexandre Valdivia on Unsplash

In my first five years of coding in Python, I had almost never used the built-in set data structure. Recently I have gained an appreciation for sets, thanks to a project that required lots of text processing and checking if certain words occur in certain texts. I wish I had explored them earlier! Let’s start with a quick overview of what sets are.

There are two key differences between a Python list and a set:

  • Unlike lists, sets are unordered. You cannot access the elements of a set by index. …


Use the YouTube Data API to Analyze Political Discourse in YouTube Comments and Identify Bots

As you might expect in an election year, social media these days is awash in political discourse — some of it reasonable and productive, much of it incendiary. Plenty of articles have been written on the polarizing “echo chamber” effects of social media, as well as targeted disinformation campaigns and fake news. With the goal of investigating political discourse on YouTube, I will explain how to use the free YouTube Data API to gather YouTube comments into an interesting dataset. We will then use some data science tools such as Pandas and Plotly to visualize this dataset and look for…


Basic Overview of the Universal Approximation Theorem with PyTorch Code and Visuals

In this article, I will explain the Universal Approximation Theorem and showcase two quick examples with PyTorch code to demonstrate neural networks learning to approximate functions. Feel free to skip straight to the code and visualizations if you already know the basics of how a neural network works!

When a lot of people hear the word function, they just think of high school algebra and relations like f(x)=x². Although I have nothing against high school algebra (I taught it for two years!), …


Comparing Text Generation with Markov Model and RNN

Photo by Sara Kurfeß on Unsplash

Let’s use machine learning to generate fake tweets that emulate Donald Trump’s language and style! A lot of introductory machine learning tutorials focus on classification tasks such as MNIST, so I hope you enjoy this article about generative models.

We will first use a relatively simple approach for this task, called a Markov Model, and we will pretty much code it from scratch (the only library I used was NumPy). Then we will take a look at a more advanced Recurrent Neural Network implementation. …


Video Tutorial

How I used New York’s wealth of publicly available data along with Plotly and Pandas to develop my data visualization skills

Video accompanies the article and walks through the code for the first example.

Introduction

Like many others, I have been working from home for several weeks due to the coronavirus. Inspired in part by Terence Shin’s great article on data science projects to try during quarantine, I decided to use some of my free time on improving my data science skills. I am a high school computer science teacher, and adding data science skills to the portfolio of content that I can teach my students is important to me, given the rising prominence of data science in today’s world. One thing…


A deep zoom into the Mandelbrot set. Unless otherwise noted, all images are produced by the author and are available under the Creative Commons license (CC BY-SA).

When Antonie van Leeuwenhoek first placed swabs of saliva under a microscope in the 1670s, he discovered a previously unknown world of microbes that lived all around us — on our bodies, in the water we drink, in the food we eat. Imagine the shock of discovering an entire realm of complexity and activity on a scale invisible to the human eye.

This is essentially what I felt when I was introduced to the Mandelbrot set, a mathematical object known for its self-similar intricacy and beauty. Videos on YouTube show hour-long zooms into deeper and deeper regions of the Mandelbrot…

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store