So I had the idea to generate fake twitter pages from a users old tweets. I wrote a bunch of Perl scripts to do just that. It worked out well.
The process essentially consists of four steps:
- Acquire a users tweets.
- Train a fourth order markov model with those tweets.
- Generate new tweets by having the markov model spew out new chains.
- Generate a twitter page from those tweets.
Except for the last step, I’ve written scripts to do this. (Writing one for the last step wouldn’t be hard, but boring, and it’s easy enough to do by hand)
Step 1: Obtaining tweets.
Essentially, it’s a loop that gets the tweets via twitters api, page for page. Simple enough. The only real problem is twitters rate limiting, so grabbing more than one user per hour does not work.
#!/usr/bin/perl
my $max_page = 200;
my $start_page = 0;
my $user = "username";
for( my $page = $start_page; $page < $max_page; $page++ ) {
my $cmd = "wget http://twitter.com/statuses/user_timeline/$user.json?page=$page -O - >> tweets.json";
`$cmd`;
}
Step 2 and 3: Training and generating.
This script takes json as output by the script in step 1 as the input, and outputs generated, fake tweets, one per line.
Since I was too lazy to implement a markov chain myself, I used a library off CPAN to do the heavy lifting.
#!/usr/bin/perl
use JSON;
use Encode;
use Algorithm::MarkovChain;
# Parse JSON
my @tweetsJsonA = <>;
my $tweetsJson =decode_utf8( join( "", @tweetsJsonA ) );
$tweetsJson =~ s/\]\[/,/gi;
my $tweets = decode_json( $tweetsJson );
# Train
my $user = Algorithm::MarkovChain::->new();
foreach my $tweet (@{$tweets}) {
my @symbs = ("START", split( " ", $tweet->{text}), "END" );
$user->seed(
symbols => \@symbs,
longest => 4
);
}
# Generate 20 tweets
binmode STDOUT, ":utf8";
for( my $i = 0; $i < 20; $i++ ) {
my @generated = ("START");
my $l = 1;
while( $generated[-1] ne "END" ) {
@generated = $user->spew(
length => $l,
complete => \@generated
);
$l++;
}
@generated = @generated[1..(@generated-2)];
print join( " ", @generated ) . "\n";
}
Step 4: Generating a fake twitter page.
This consists of two parts, making the tweets into twittery html, and adding what comes before and after the tweets in a twitter HTML page. For the former, I wrote a small script, again, which mostly just concatenates text a lot, I put it here if you want it (Save as “mktwpage.pl”).
The second part, I’ve done by hand, thus far, assisted by my browsers “Save page” feature. Too lazy to automate.
And, there you have it: Autogenerated fake twitter pages. Halfway convincing, too. Go generate your own!
(English translation: Scroll down.)
Nur noch mal zur Erinnerung: Heute ist Wahl.

Also auf gehts, schnell noch mal die Programme diverser Parteien durchgehen, damit die Entscheidung morgen nach Argumenten gefällt werden kann und nicht nach “Ach, der name hört sich cool an”. Bei mir werden es dieses Jahr wohl die Piraten werden, die ich euch hier auch noch einmal wärmstens ans Herz legen will. Manche würden mir ja nun vorwerfen, meine Stimme für eine “Ein-Themen-Partei” wegzuwerfen, aber in unseren Mehrparteiensystem, in dem ohne Koalition sowieso nichts geht, halte ich eine Stimmabgabe für die Piraten, die in den Themen auf die es mir im Moment am meisten ankommt am nächsten an meiner Meinung sind, durchaus für vertretbar.
Und bevor jetzt wieder Leute anfangen mit “Nein ich wähle nicht weil ich hasse das SYSTEM!”: Nichtwählen ist feige, besonders nun, da wir wirklich viele auch kleinere Parteien zur Wahl haben. Wenn schon nichts anderes, dann geht wenigstens und macht eurem Protest Ausdruck, indem ihr ungültig wählt. Wer sich weigert abzustimmen, der hat auch sein Recht nachher zu meckern verwirkt. ;)
Fail-Wahl verhindern - am 27ten September Wählen gehen!
In English:
Just in case you forgot: The german national elections are today.
Time to re-read the parties programmes, so that the decision tomorrow can be based on arguments and not “Eh, that name sounds cool”. For me, it’ll be the Pirate Party this time around, which I want to heartily recommend to you. Some would argue that I am throwing my vote away by voting for a “One-Topic-Party”, but I think in our multi-party system where governing outside of a coalition is pretty much impossible anyways, a vote for the Pirates, who are very close to my opinion in a lot of topics I care about a lot, is entirely justifiable.
And, for the Members of the “I don’t vote because I hate the SYSTEM!” crowd: Not voting is cowardly, especially with the huge number of smaller parties on the ballot this time around. If nothing else, then at least go and make your vote invalid to give your protest a voice. If you refuse to vote, then you’ve lost any right to complain afterwards. ;)
(This joke really only works in german because it relies on the german word for “Election”, which is “Wahl”, and the german word for “Whale”, which is “Wal”, sounding like each other, so avoiding a “Fail-Wahl” makes sense. But go vote tomorrow, anyways.)
This month
| February | ||||||
|---|---|---|---|---|---|---|
| Mo | Tu | We | Th | Fr | Sa | Su |
| 30 | 31 | 1 | 2 | 3 | 4 | 5 |
| 6 | 7 | 8 | 9 | 10 | 11 | 12 |
| 13 | 14 | 15 | 16 | 17 | 18 | 19 |
| 20 | 21 | 22 | 23 | 24 | 25 | 26 |
| 27 | 28 | 29 | 1 | 2 | 3 | 4 |


— Full post RSS feed
— Comment RSS feed
— CC-BY-NC license
— Valid XHTML 1.1
— Debian operated
— Powered by Ruby
— Co-Powered by Perl
— Made with kate