Archive for the ‘Systems Administration’ Category

Quick and dirty way to find broken links on your website

Thursday, July 23rd, 2009

These days I had to find if there was any broken link (error 404) in a group of sites.

I found this to be very useful:

wget –recursive –spider http://levycarneiro.com

This command will download everything from the URL, and generate a report like this:

Found 13 broken links.

http://levycarneiro.com/levy@levycarneiro.com referred by:

http://levycarneiro.com/

http://levycarneiro.com/images/posts/Multiple_models_one_form_NewProject.jpg referred by:

http://levycarneiro.com/category/ruby-on-rails/

http://levycarneiro.com/tag/twitter/levy@levycarneiro.com referred by:

http://levycarneiro.com/tag/twitter/

http://levycarneiro.com/tag/rails/levy@levycarneiro.com referred by:

http://levycarneiro.com/tag/rails/

http://levycarneiro.com/tag/portfolio/levy@levycarneiro.com referred by:

http://levycarneiro.com/tag/portfolio/

http://levycarneiro.com/2009/04/transito-nao-an-experiment-with-twitter-traffic-jams-and-ruby-on-rails/levy@levycarneiro.com referred by:

http://levycarneiro.com/2009/04/transito-nao-an-experiment-with-twitter-traffic-jams-and-ruby-on-rails/

http://levycarneiro.com/tag/traffic/levy@levycarneiro.com referred by:

http://levycarneiro.com/tag/traffic/

http://levycarneiro.com/tag/projects/levy@levycarneiro.com referred by:

http://levycarneiro.com/tag/projects/

http://levycarneiro.com/category/projects/levy@levycarneiro.com referred by:

http://levycarneiro.com/category/projects/

http://levycarneiro.com/tag/ruby/levy@levycarneiro.com referred by:

http://levycarneiro.com/tag/ruby/

http://levycarneiro.com/category/ruby-on-rails/levy@levycarneiro.com referred by:

http://levycarneiro.com/category/ruby-on-rails/

http://levycarneiro.com/category/twitter/levy@levycarneiro.com referred by:

http://levycarneiro.com/category/twitter/

http://levycarneiro.com/tag/ruby-on-rails/levy@levycarneiro.com referred by:

http://levycarneiro.com/tag/ruby-on-rails/

I’ve got some work to do then :)

Hidden Costs of Scaling Up vs. Scaling Out

Thursday, June 25th, 2009

The well-known dilemma: scale vertically (buy hardware) or scale horizontally (add machines)?

Here’s an interesting point of view from an article at Coding Horror entitled

It’s fair to conclude that scaling out is only frictionless when you use open source software. Otherwise, you’re in a bit of a conundrum: scaling up means paying less for licenses and a lot more for hardware, while scaling out means paying less for the hardware, and a whole lot more for licenses.

All in all I still prefer scaling out, so you don’t have a single point of failure.

Boo Box web servers layout and application scaling tips

Saturday, May 30th, 2009

infra-boo-box2

Boo Box, the ad network, released a layout of their web servers’ infrastructure. It seems the beast is growing fast and this diagram shows how they’re coping with the challenge.

Here are some things I found very interesting:

  • Separate servers for reading and writing (MySQL). This way you can optimize servers for a specific purpose (read or write), since the reads and writes aren’t competing with each other anymore in the disk or memory;
  • Serve static files from a different domain to speed things up is well-known, but serving them right from the RAM is new to me. However some people disagree with caching files in memory, beyond what the OS already does in this field. The other good thing is that Nginx is a super fast web server, and it’s replacing Apache in many scenarios;
  • The use of a queue server for handling time-consuming tasks is paramount for horizontal scaling. Everything that takes more than a few miliseconds (or do some sort of processing) should be ran in an asynchronous way.

It’s very nice of them to share this layout. Thanks guys!