Markov Chains and Twitter Tweets
Mentions⌗
- skelebrainmaker - program used in this post.
- gomarkov - good markov chains library.
- twitter-scraper - Twitter scraping made easy? Blasphemy!
- aurora - pretty terminal colors!
Markov Chains⌗
Markov chains are a way of creating a process (model) which can produce an output based on provided input. The way it does this is by creating a state machine and giving a chance (weight) to each potential transition within the state machine. But what does that mean?
State machines of the Finite Variety⌗
What is a state machine? Well, imagine you want to define the behavior of a dog. You break the dogs current behavior into three “states”:
eating, sleeping, playing. You can then establish paths between each state which are traversed based on a criteria. Say the dog can go from eating
to playing if the dog is full
, the dog can go from playing to sleeping if the dog is tired
, and the dog can go from sleeping to eating if
they wake up (is awake
). This is a very basic state machine, particularly a finite state machine since there are a finite amount of states, transitions
and combinations thereof.
But there’s a problem with the state machine above; how do we illustrate the times when the dog is still eating, sleeping, or playing? Simple! We put a loop on
each state with the necessary criteria. We can make the criteria is eating
for Eating, is playing
for Playing, and is sleeping
for Sleeping.
This way, at each state there is an option to remain at that state until another criteria is met.
You could add more states like instead of just sleeping, you could have a resting state or you can add more transitions such as one for is hungry
which goes between Playing and Eating. Once you get a basic grasp of state machines, you can really start to see the power they contain.
If you want to know more about state machines you can read about them here, here, and the wiki here.
Does this type of modeling and thought interest you more? Then you might be interested in computer theory!
Computers are just thought experiments we manages to convince rocks to perform.
State Machine + Probabilities = Baby AI⌗
So now you ahve a basic understanding (or advanced, I don’t know you) of state machines. How do you add probability into them?
You put it on the transitions! Now instead of criteria defining when to transition from one state to another, it’s all up to chance!
These are types of state machines are referred to as probabilistic finite state machines
. Let’s use the dog state machine from earlier but change the
transition criteria to have probabilities.
is awake
has a 30% (0.3) chance.is sleeping
has a 50% (0.5) chance.is eating
has a 40% (0.4) chance.is full
has a 20% (0.2) chance (he likes to eat).is playing
has a 70% (0.70) chance.is tired
has a 40% (0.4) chance.
The new state machine could look something like this.
Now each transition will only be taken a percentage chance of the time. We can alter the chances of the transitions based on different types of dogs or the age of the dog. With this you can create machine biases which is where a certain type of transition is taken more often than any other transition or a certain state is maintained for longer (like sleeping if it’s an old dog). You can find some great interactive diagrams demonstrating this here.
How is this a “baby AI” you ask? Well, in very hand wavy fashion I will say this: AI is nothing more than a very complex probabalistic state machine.
I said it.
Markov Chains!⌗
Markov chains are, more or less, probabalistic finite state machines which are “trained” on something. The most common type of Markov chain is a text
analysis chain which takes a piece of text, breaks it down into smaller pieces, then creates a state machine based on how those pieces were ordered.
The probability on the transistions can come from multiple different sources but a common method is to use frequency. So if you see can
comes after
We
more often than pee
, then you can reflect this by having a higher probability between We
to can
than pee
to can
. Where the training comes
in is when you are creating the state machine. With text analysis, you’d automate the discovery of the probabilities between piecese of strings by
looking at existing strings, breaking them down, and getting the values automagically. Well, with code.
Text analysis is but one type of Markov chain, there’s a world of different types of chains which can be used for all kinds of things from determining what page a result in a search engine ends up on to analyzing images.
I highly suggest reading more about Markov chains here.
Go, Twitter, and Markov Chains⌗
Using Markov chains you can attempt to simulate what someone or a group of people could say based on what they’ve said previously. The way you’d go about this is by creating a text analysis chain which takes chat samples and processes them into a model. This training material could be a static file of chat history or a dynamic collection (think a “chatbot” for Discord, Slack, etc), in this case we’ll work from a semi-static source: tweets.
Twitter provides a wonderful source of semi-static training material. I say “semi-static” since the tweets themselves are static but the source, be it a user or tag, is always changing; this means we would want to take a snapshot of a selection of tweets. With Go this can be done quite trivially using a lovely module named twitter-scraper. This module makes gathering data from Twitter easy as pie. To demonstrate this, let’s look at some code.
scraper := twitter.New()
fmt.Println(fmt.Sprintf("%s@%s", Magenta("Training on "), user))
for tweet := range scraper.GetTweets(context.Background(), user, sample_size) {
wg.Add(1)
go Train(tweet.Text, chain)
}
wg.Wait()
Here we created a new twitter-scraper object then using the username provided (user
), we get sample_size
amount of tweets from their profile in
parallel. Then we take the text from the tweet and feed it into the Train
command.
This can handle up to ~3200 tweet retrievals at a time and using the goroutines method of retrieval makes it extremely fast.
Now we have data to work with.
gomarkov⌗
gomarkov is another fantastic module which allows us to use Markov chains fairly easily compared to writing our
own from scratch. It supports training and generation but even more useful it supports saving to JSON natively which will be useful later on. Let’s look
at that Train
function. Assume we’ve declared our gomarkov.Chain
earlier to pass along to Train
.
func Train(tweet string, c *gomarkov.Chain) {
defer wg.Done()
scanner := bufio.NewScanner(strings.NewReader(tweet))
for scanner.Scan() {
c.Add(strings.Split(scanner.Text(), " "))
}
}
Train
takes a string (in our case a tweet) and a reference to our Markov chain object. After deferring the wait group completion,
we create a new bufio scanner to scan through the tweet text line-by-ling. Looping over the scanner, we add the text to the chain until the end of the
tweet text. As discussed earlier, this processes the string into words and starts assigning them weights on the transitions.
Now we’ve gotten our tweets and trained the chain on all of them, what’s next? Why generation of course!
func GenerateResponse(chain *gomarkov.Chain) string {
tokens := []string{gomarkov.StartToken}
for tokens[len(tokens)-1] != gomarkov.EndToken {
next, _ := chain.Generate(tokens[(len(tokens) - 1):])
tokens = append(tokens, next)
}
return strings.Join(tokens[1:len(tokens)-1], " ")
}
GenerateResponse
is fairly simple. First it creates a slice of strings initialized with gomarkov.StartToken
as the first string then we loop
until the last string in the tokens slice is the gomarkov.EndToken
appending strings to the slice it goes. That is to say, it goes through the chain we trained earlier
starting at a random point then using the weights on the transitions to determine which string to append next. We return the joined slice, omitting
the first and last strings since they’re gomarkov.StartToken
and gomarkov.EndToken
. This returned string will be our generated response which
will be similar to the training material, in our case a tweet.
Now we do a simple loop like this:
for i := 0; i < 5; i++ {
fmt.Println("- " + GenerateResponse(chain))
}
To get an output like this:
Static image so you can read the results:
It’s not going to be perfect but from what the author can tell, it’s pretty convincing with tweets.
Saving⌗
As an extra step we’ll save the model results for future use. Luckily the gomarkov.Chain
type supports native marshalling to JSON.
func Save(name string, c *gomarkov.Chain) {
jsonObj, _ := json.Marshal(c)
err := ioutil.WriteFile(name, jsonObj, 0644)
if err != nil {
fmt.Println(err)
}
}
This saving function is as simple as it gets. It takes a gomarkov.Chain
reference and string name then it marshals the chain to a JSON object.
Last step is writing the JSON to a file, then you’re done. Simple as.
Conculsion⌗
As a wrap up I’ll use the main function of my provided code to summarize it all.
func main() {
chain := gomarkov.NewChain(1)
scraper := twitter.New()
fmt.Println(fmt.Sprintf("%s@%s", Magenta("Training on "), user))
for tweet := range scraper.GetTweets(context.Background(), user, sample_size) {
wg.Add(1)
go Train(tweet.Text, chain)
}
wg.Wait()
fmt.Println(fmt.Sprintf("%s%s", Magenta("Saving to "), user+".brain"))
Save(user+".brain", chain)
fmt.Println(fmt.Sprintf("%s", Green("Samples: ")))
for i := 0; i < 5; i++ {
fmt.Println("- " + GenerateResponse(chain))
}
}
We create a new Markov chain and Twitter scraper then we scrape sample_size
amount of tweets from the user
Twitter account, adding them to the
chain each time for training. Once all the scraping and training is complete, we save the file then print out five generated responses from the
chain as a sampling of the model.
Simple as it gets. You really didn’t need to know anything about Markov chains to write this but it’s always helpful to understand a tool when you’re using it than blindly use it.
n joy