Markov Chains

Markov chains are a way of creating a process (model) which can produce an output based on provided input. The way it does this is by creating a state machine and giving a chance (weight) to each potential transition within the state machine. But what does that mean?

State machines of the Finite Variety

What is a state machine? Well, imagine you want to define the behavior of a dog. You break the dogs current behavior into three “states”: eating, sleeping, playing. You can then establish paths between each state which are traversed based on a criteria. Say the dog can go from eating to playing if the dog is full, the dog can go from playing to sleeping if the dog is tired, and the dog can go from sleeping to eating if they wake up (is awake). This is a very basic state machine, particularly a finite state machine since there are a finite amount of states, transitions and combinations thereof.

dog state

But there’s a problem with the state machine above; how do we illustrate the times when the dog is still eating, sleeping, or playing? Simple! We put a loop on each state with the necessary criteria. We can make the criteria is eating for Eating, is playing for Playing, and is sleeping for Sleeping. This way, at each state there is an option to remain at that state until another criteria is met.

dog state with loops

You could add more states like instead of just sleeping, you could have a resting state or you can add more transitions such as one for is hungry which goes between Playing and Eating. Once you get a basic grasp of state machines, you can really start to see the power they contain.

If you want to know more about state machines you can read about them here, here, and the wiki here.

Does this type of modeling and thought interest you more? Then you might be interested in computer theory!

Computers are just thought experiments we manages to convince rocks to perform.

State Machine + Probabilities = Baby AI

So now you ahve a basic understanding (or advanced, I don’t know you) of state machines. How do you add probability into them? You put it on the transitions! Now instead of criteria defining when to transition from one state to another, it’s all up to chance! These are types of state machines are referred to as probabilistic finite state machines. Let’s use the dog state machine from earlier but change the transition criteria to have probabilities.

  • is awake has a 30% (0.3) chance.
  • is sleeping has a 50% (0.5) chance.
  • is eating has a 40% (0.4) chance.
  • is full has a 20% (0.2) chance (he likes to eat).
  • is playing has a 70% (0.70) chance.
  • is tired has a 40% (0.4) chance.

The new state machine could look something like this.

probability dog

Now each transition will only be taken a percentage chance of the time. We can alter the chances of the transitions based on different types of dogs or the age of the dog. With this you can create machine biases which is where a certain type of transition is taken more often than any other transition or a certain state is maintained for longer (like sleeping if it’s an old dog). You can find some great interactive diagrams demonstrating this here.

How is this a “baby AI” you ask? Well, in very hand wavy fashion I will say this: AI is nothing more than a very complex probabalistic state machine.

I said it.

Markov Chains!

Markov chains are, more or less, probabalistic finite state machines which are “trained” on something. The most common type of Markov chain is a text analysis chain which takes a piece of text, breaks it down into smaller pieces, then creates a state machine based on how those pieces were ordered. The probability on the transistions can come from multiple different sources but a common method is to use frequency. So if you see can comes after We more often than pee, then you can reflect this by having a higher probability between We to can than pee to can. Where the training comes in is when you are creating the state machine. With text analysis, you’d automate the discovery of the probabilities between piecese of strings by looking at existing strings, breaking them down, and getting the values automagically. Well, with code.

Text analysis is but one type of Markov chain, there’s a world of different types of chains which can be used for all kinds of things from determining what page a result in a search engine ends up on to analyzing images.

I highly suggest reading more about Markov chains here.

Go, Twitter, and Markov Chains

Using Markov chains you can attempt to simulate what someone or a group of people could say based on what they’ve said previously. The way you’d go about this is by creating a text analysis chain which takes chat samples and processes them into a model. This training material could be a static file of chat history or a dynamic collection (think a “chatbot” for Discord, Slack, etc), in this case we’ll work from a semi-static source: tweets.

Twitter provides a wonderful source of semi-static training material. I say “semi-static” since the tweets themselves are static but the source, be it a user or tag, is always changing; this means we would want to take a snapshot of a selection of tweets. With Go this can be done quite trivially using a lovely module named twitter-scraper. This module makes gathering data from Twitter easy as pie. To demonstrate this, let’s look at some code.

scraper := twitter.New()
fmt.Println(fmt.Sprintf("%s@%s", Magenta("Training on "), user))
for tweet := range scraper.GetTweets(context.Background(), user, sample_size) {
	go Train(tweet.Text, chain)

Here we created a new twitter-scraper object then using the username provided (user), we get sample_size amount of tweets from their profile in parallel. Then we take the text from the tweet and feed it into the Train command.

This can handle up to ~3200 tweet retrievals at a time and using the goroutines method of retrieval makes it extremely fast.

Now we have data to work with.


gomarkov is another fantastic module which allows us to use Markov chains fairly easily compared to writing our own from scratch. It supports training and generation but even more useful it supports saving to JSON natively which will be useful later on. Let’s look at that Train function. Assume we’ve declared our gomarkov.Chain earlier to pass along to Train.

func Train(tweet string, c *gomarkov.Chain) {
	defer wg.Done()
	scanner := bufio.NewScanner(strings.NewReader(tweet))
	for scanner.Scan() {
		c.Add(strings.Split(scanner.Text(), " "))

Train takes a string (in our case a tweet) and a reference to our Markov chain object. After deferring the wait group completion, we create a new bufio scanner to scan through the tweet text line-by-ling. Looping over the scanner, we add the text to the chain until the end of the tweet text. As discussed earlier, this processes the string into words and starts assigning them weights on the transitions.

Now we’ve gotten our tweets and trained the chain on all of them, what’s next? Why generation of course!

func GenerateResponse(chain *gomarkov.Chain) string {
	tokens := []string{gomarkov.StartToken}
	for tokens[len(tokens)-1] != gomarkov.EndToken {
		next, _ := chain.Generate(tokens[(len(tokens) - 1):])
		tokens = append(tokens, next)
	return strings.Join(tokens[1:len(tokens)-1], " ")

GenerateResponse is fairly simple. First it creates a slice of strings initialized with gomarkov.StartToken as the first string then we loop until the last string in the tokens slice is the gomarkov.EndToken appending strings to the slice it goes. That is to say, it goes through the chain we trained earlier starting at a random point then using the weights on the transitions to determine which string to append next. We return the joined slice, omitting the first and last strings since they’re gomarkov.StartToken and gomarkov.EndToken. This returned string will be our generated response which will be similar to the training material, in our case a tweet.

Now we do a simple loop like this:

for i := 0; i < 5; i++ {
	fmt.Println("- " + GenerateResponse(chain))

To get an output like this:


Static image so you can read the results:


It’s not going to be perfect but from what the author can tell, it’s pretty convincing with tweets.


As an extra step we’ll save the model results for future use. Luckily the gomarkov.Chain type supports native marshalling to JSON.

func Save(name string, c *gomarkov.Chain) {
	jsonObj, _ := json.Marshal(c)
	err := ioutil.WriteFile(name, jsonObj, 0644)
	if err != nil {

This saving function is as simple as it gets. It takes a gomarkov.Chain reference and string name then it marshals the chain to a JSON object. Last step is writing the JSON to a file, then you’re done. Simple as.


As a wrap up I’ll use the main function of my provided code to summarize it all.

func main() {
	chain := gomarkov.NewChain(1)
	scraper := twitter.New()
	fmt.Println(fmt.Sprintf("%s@%s", Magenta("Training on "), user))
	for tweet := range scraper.GetTweets(context.Background(), user, sample_size) {
		go Train(tweet.Text, chain)
	fmt.Println(fmt.Sprintf("%s%s", Magenta("Saving to "), user+".brain"))
	Save(user+".brain", chain)
	fmt.Println(fmt.Sprintf("%s", Green("Samples: ")))
	for i := 0; i < 5; i++ {
		fmt.Println("- " + GenerateResponse(chain))

We create a new Markov chain and Twitter scraper then we scrape sample_size amount of tweets from the user Twitter account, adding them to the chain each time for training. Once all the scraping and training is complete, we save the file then print out five generated responses from the chain as a sampling of the model.

Simple as it gets. You really didn’t need to know anything about Markov chains to write this but it’s always helpful to understand a tool when you’re using it than blindly use it.

n joy