Signed up: 8 years ago (3/13/07)
Last signed in: 1 month ago
Friends517 Friends
Radius55 Site Admin
23 year-old male from Austin, TX
Join RVB:TX!!!

RFR Policy: If I know you, I'll accept it. If you're active on the site (Karma 10+), I'll probably accept it. Otherwise, don't be surprised or offended if I deny the request.

I'm one of the newer Admins on the Site. If you need help with anything, don't hesitate to ask. But...

What I can and can't help with + FAQ
Latest Post
Radius55 Site Admin
What Does RT Look Like?An interesting question. I'm not talking about in terms of the layout or coding or anything visual. I'm talking in terms of relationships.

Many of you may have seen the Internet Map It's an interesting application of network theory and web scraping. Basically, it takes a huge percentage of the major websites on the planet and groups them by a number of metrics.

Well, I don't have access to an internet trunk line and I'm a bit impatient. But it's still possible to do something similar on a small scale. So when I started a job that required me to learn Python and saw some of @Desayjin's Ask an Economist journals, I decided to write my own site scraper for RT.

How does a scraper work, you might ask. Well, hold on. I was just getting to that. A scraper is a term for a program that visits a number of webpages based on certain criteria and "scrapes" information off of them. I built mine of of the urlLib2 module in python. It starts with one or more seeds, in this case users with lots of friends, and gathers their data. Then it records a list of their friends and adds all of those friends to a queue. After each user is processed, their pages are scraped. If the scrape finds they have more friends than a certain threshold (in my case, 500), it adds their data to a list and then puts all their friends on the queue. If not, they're ignored. You have to ignore a bunch of people, otherwise you'll be running the program for a week. As it is, I processed about 75,000 users in 18 hours for a list of 319 users of over 500 friends. It's not necessarily complete, but it would have been difficult to be missed. As it is, the data takes up about 7 MB, which doesn't sound like much until you realize that an average ebook is less than half a megabyte.

But plaintext is difficult to to visualize, and they say a picture is worth a thousand words. So I used the module NetworkX to start manipulating the data. The results look a ...
6 days ago  |  Comments (22)
Staff Awards
Milestones   [ Compare ]
The Goods
Name Austin
Occupation Enginering Student
Birthday March 3rd, 1992
Groups Show 92 More
Games Show 9 More
Hobbies Show 2 More
Music Show 12 More
Movies Show 7 More
TV Shows Show 5 More
Books Show 8 More
Favorite Videos
Events Show 26 More