So I had the idea to generate fake twitter pages from a users old tweets. I wrote a bunch of Perl scripts to do just that. It worked out well.
The process essentially consists of four steps:
- Acquire a users tweets.
- Train a fourth order markov model with those tweets.
- Generate new tweets by having the markov model spew out new chains.
- Generate a twitter page from those tweets.
Except for the last step, I’ve written scripts to do this. (Writing one for the last step wouldn’t be hard, but boring, and it’s easy enough to do by hand)
Step 1: Obtaining tweets.
Essentially, it’s a loop that gets the tweets via twitters api, page for page. Simple enough. The only real problem is twitters rate limiting, so grabbing more than one user per hour does not work.
#!/usr/bin/perl
my $max_page = 200;
my $start_page = 0;
my $user = "username";
for( my $page = $start_page; $page < $max_page; $page++ ) {
my $cmd = "wget http://twitter.com/statuses/user_timeline/$user.json?page=$page -O - >> tweets.json";
`$cmd`;
}
Step 2 and 3: Training and generating.
This script takes json as output by the script in step 1 as the input, and outputs generated, fake tweets, one per line.
Since I was too lazy to implement a markov chain myself, I used a library off CPAN to do the heavy lifting.
#!/usr/bin/perl
use JSON;
use Encode;
use Algorithm::MarkovChain;
# Parse JSON
my @tweetsJsonA = <>;
my $tweetsJson =decode_utf8( join( "", @tweetsJsonA ) );
$tweetsJson =~ s/\]\[/,/gi;
my $tweets = decode_json( $tweetsJson );
# Train
my $user = Algorithm::MarkovChain::->new();
foreach my $tweet (@{$tweets}) {
my @symbs = ("START", split( " ", $tweet->{text}), "END" );
$user->seed(
symbols => \@symbs,
longest => 4
);
}
# Generate 20 tweets
binmode STDOUT, ":utf8";
for( my $i = 0; $i < 20; $i++ ) {
my @generated = ("START");
my $l = 1;
while( $generated[-1] ne "END" ) {
@generated = $user->spew(
length => $l,
complete => \@generated
);
$l++;
}
@generated = @generated[1..(@generated-2)];
print join( " ", @generated ) . "\n";
}
Step 4: Generating a fake twitter page.
This consists of two parts, making the tweets into twittery html, and adding what comes before and after the tweets in a twitter HTML page. For the former, I wrote a small script, again, which mostly just concatenates text a lot, I put it here if you want it (Save as “mktwpage.pl”).
The second part, I’ve done by hand, thus far, assisted by my browsers “Save page” feature. Too lazy to automate.
And, there you have it: Autogenerated fake twitter pages. Halfway convincing, too. Go generate your own!
This month
| May | ||||||
|---|---|---|---|---|---|---|
| Mo | Tu | We | Th | Fr | Sa | Su |
| 29 | 30 | 1 | 2 | 3 | 4 | 5 |
| 6 | 7 | 8 | 9 | 10 | 11 | 12 |
| 13 | 14 | 15 | 16 | 17 | 18 | 19 |
| 20 | 21 | 22 | 23 | 24 | 25 | 26 |
| 27 | 28 | 29 | 30 | 31 | 1 | 2 |


— Full post RSS feed
— Comment RSS feed
— CC-BY-NC license
— Valid XHTML 1.1
— Debian operated
— Powered by Ruby
— Co-Powered by Perl
— Made with kate