Archive for March, 2010

I was having problems building libpangocairo. I had the latest cairo, I had freetype installed, I had fontconfig intalled, but everytime I ran configure on pango, it would tell me yes I had cairo, but it was being disabled because there were no backends to use. After trudging through pango configure, I finally figured out that my “fontconfig” was not a high enough version to even test for FreeType, and so it was there were no cairo font backends to use, and so it didn’t build pangocairo and the only error was no cairo backends (when clearly there was!) anyway, if after you google the other posts, and don’t find a solution, update for fontconfig package.

Not building tags and branches into the base system on svn seems like a bad idea. Shouldn’t these be the easiest things to do? I just want mark the current files as a release I made, but apparently, svn projects should be laid out a particular way.
This is a sure sign that the tool is not helping but hindering when you have to organize your files to suit the tool. It’s not unreasonable to make small changes in your project to make your life easier, but if the top level has to be rearranged, then you know you are in for some pain.

suggest:

svn tag “My tag” .

fancy that, easy to use…

properties for ignoring files. Come on guys (gals?), this is crazy. Imagine you wanted an easy to use source code control system. You’d have something like:

svn ignore [files|dirs]+

But no, ignore files are handled through properties.
So fine, you need to do:

svn propset svn:ignore [files|dirs]* [target]

you can feel the pain, but even worse, when you run two in a row, it just takes the last (which makes sense because it’s “propset”, but please.

Like the dumbarse who put ?a=b&c=d in the W3 standard for links (& not &), not having standard control for ignore is just crazy.

Here is my “fix”. File “svnignore”:

#!/bin/sh
#Usage svnignore [file|dir|pattern]+

if [ $# -eq 0 ]; then
        svn propget svn:ignore .
        exit 0;
fi
FILE="/tmp/svnignore.$$"
svn propget svn:ignore . > $FILE
i=0;
while [ $i -lt $# ]; do
        echo "$1" >> $FILE
        shift
done
sort -u $FILE | grep -v "^$" > $FILE.2
svn propset svn:ignore -F $FILE.2 .
svn propget svn:ignore .
/bin/rm -f $FILE $FILE.2

exit 0;

Example usage:

#ignore all files ending in .cgi (note the "s)
svnignore "*.cgi"

# just ignore the file mylog.out
svnignore mylog.out

#ignore the files file1.txt, blah.out and the subdirectory "mydir"
svnignore file1.txt blah.out mydir

ahh .. now I can easily ignore all the stuff I don’t want. Perhaps an alias for “svn propedit svn:ignore .” would also be handy…

Did I mention a nice interface for tags would also be good?

I’ve always known a few things that I should do to make my website go faster, (turn on expires, use multiple domains, geoip, ..etc) but nothing made it as clear as http://www.webpagtest.org/ that gave a pictorial view of what really happens when you load a page in IE (it uses an IE plugin to measure when things really happen).  Anyway to cut a long story short, I got the page load times for my site down to 2-3 seconds instead of 6-8 seconds.

Step 1. Turn on Keep-Alives.

 Surprisingly, this is off by default in apache, in httpd.conf set:

KeepAlive On

Step 2. Turn on compression.

You’ll need mod_deflate, but this is included by default.

In your VirtualHost config (assuming you are using that)

<VirtualHost *:80>
...
AddOutputFilterByType DEFLATE text/html text/plain text/xml text/css text/javascript
</VirtualHost>

Step 3. Add Expires headers.

For me, I very rarely change my images, but I do occasionally change my javascript. So I set the expires for my images to 1 month, and js/css to 1day. What I probably should do is include a version number in my css and js, and also have them expire in 1 month.

<VirtualHost *:80>
...
    ExpiresActive on
    ExpiresByType image/gif "access plus 1 month"
    ExpiresByType image/jpg "access plus 1 month"
    ExpiresByType image/png "access plus 1 month"
    ExpiresByType image/ico "access plus 1 month"
    ExpiresByType text/javascript "access plus 1 day"
    ExpiresByType text/css "access plus 1 day"
</VirtualHost>

Step 4. Move your images/css off to a different domain.

IE 7 (and other browsers) will only make 2 connections to any given domain name at a time (more recent browsers will make more). Perhaps the easiest is to make one for your images, one your for css and/or javascript. If you use cookies for your domain, and your images are on your domain, then the cookies will be sent with every image request. So what starts to happen is that the user is slowed down making the request for the image. Many users (most) have slower upload than download, and your cookies might be as large as 1k, and that can have a fairly big impact on how fast your site is viewed.

You should trade off the number of domains with the cost of a dns lookup. In the US you can expect a DNS lookup to take 20-120ms. In Australia, it’s more likely to be around 200ms for a US site.

You can just make simple aliases with apache using something like:

<VirtualHost *:80>
   ServerName ftimg.com
   ServerAlias i1.ftimg.com
   ServerAlias i2.ftimg.com
   ServerAlias i3.ftimg.com
   ServerAlias i4.ftimg.com
...
</VirtualHost>

If you run a larger site with lots of images, you probably generate your site (well you should). A great way to get your images to be across multiple domains, and still get cached on different pages is to do a hash of the filename (or contents – slower), to generate the hostname to use. ie: “i”+(((hash(filename)%4)+1)+”ftimg.com”. I use java and jsp (and C), so I used the following snippets (watch out this will explode for large filenames).

<%!
        public int fileHash(String uri) {
                int len = uri.length();
                int ret = 0;
                for (int i=len;--i>=0;)
                        ret = ret * 7 + uri.charAt(i);
                ret = ((ret >> 4) & 0x03)+1;
                //System.out.println("fileHash("+uri+")="+ret);
                return ret;

        }
%>

and in java:

        public static int fileHash(String uri) {
                int len = uri.length();
                int ret = 0;
                for (int i=len;--i>=0;)
                        ret = ret * 7 + uri.charAt(i);
                return ((ret >> 4) & 0x03)+1;
        }

Step 5. Insert small js and css files directly into the page.

A lot of time is spent just connecting to a site (and very little time downloading for small files), so save the extra connection and include small css and js items right on the page. Also, if they are specific to that page, just include them. The advantage to having them separate is that on subsequent pages they will already be loaded, but if they are small or never used again, it’s pointless. Include css files like this:

<style type="text/css">
div.page {
    width:99%;
    margin-left:auto;
    margin-right:auto;
}
...
</style>

And for Javascript:

<script type="text/javascript"><!--
window.location=...blahblah;
...
//-->
</script>
(these days, you can probably not bother with the &lt;!-- and //--&gt;

Step 6. Put large repeated javascript and css into a seperate file.

Converse to Step 5, if you have repeated javascript, you should externalize them.

Step 7. Merge javascript files together. Merge css files together.

Most of the time you can do this by just making one file with all the contents in it. I’m currently tracking down a problem where two javascript files didn’t work nicely when placed in the same file.

Step 8. Use a CDN

If your site is large enough, use a CDN (google it). I’m using MaxCDN. Hard to beat $10 for 1TB, and even normally it’s $100/TB. They don’t have a presence in some countries where I do, so I hack up the IP address using my vdns so that I usually use them, but for some countries, I point to my own servers (that is a much longer story). I’ve only just started using them, but so far so good.

Step 9. Make sure webpages/images are not heavy

Sometimes you can use javascript to generate the html to reduce the size of the page. Less is more, so if you have too much stuff on your page, consider trying to simplify. It doesn’t work for all sites, but it works for most.

For images consider that you really don’t need that 24bit png, and an 8bit one would do. What I have been doing for the larger images loading the image in the Gimp (of course), compressing to say 256, 128, or 64 colours and see if I can notice a difference compared to the original image when I am zoomed in at 200%. In GIMP right click -> Image -> Indexed. Select the number of colours, and then click Ok. Then use Ctrl-Z, Ctrl-Y, Ctrl-Z, .. etc to see if you notice any difference.

Step 10. Put your ads in an iframe.

Actually, this should be like step 1, since even though you think ads run on fast servers, some analysis will quickly show you that 1. They are heavy, and 2. they are slow (even google). If your ad provider says they don’t support it, get someone else. It makes a huge difference in the loading time on your page. I’ve seen many say they don’t support it, but I tried it, and it worked just fine.

Step 11. Minimize your js (and css)

Remove unnecessary comments, and use minify tool to compress your javascript. There are a number of tools around to reduce the size. Check google for some (I didn’t minize my code yet).

Step 12. Turn off e-tags

It’s just a waste of space. Google for more info.

<VirtualHost *:80>
...
    FileETag none
</VirtualHost>

I got my content loaded in 1.7 seconds rather than 1.8 seconds without the etags 🙂 [of course this was not tested very well]. Certainly in this case less is more!

Other notes

  • Don’t use redirects – these will cause another page hit and more round trips. Instead you should configure apache to just load the page you want.
  • Put your css files at the top, so that rendering can start. Similarly, if you can, put javascripts at the bottom.
  • Have a small favicon.ico. Better to have one than not (else you get 404, which costs time), and don’t forget to make sure it has expires header.
  • Compress multiple images into one. If you are using css, you can put all your images in the one image and just select the bits that you want. You can play around with the way you arrange the images, but generally having the images across the page will make them compress better.
  • Include width and height tags. This means when the site renders it won’t be jumping around, and the user can just click on what they want. And of course the width/height should match the actual image. It annoys me to see large images scaled down to a small one, which sucks up bandwidth, slows down the image view, and generally is not a good user experience.
  • Consider what browsers your users are using. Browsers like ie6/7 (and earlier versions of firefox) only allow two connections per domain, but newer browsers will allow more connections, and so the balance tips away from multiple servers because of the extra cost for DNS lookups. HTTP Pipelining seems to be a great feature to me, and I’m surprised that it’s not turned on by default for more browsers. It’s supported in most non-ie browsers, but usually disabled by default (except opera, which uses some heuristics to turn it on/off). Poor support from proxies seems to be the reason it is not more widely adopted.

Results

Although I haven’t finished optimizing my site, the changes I made have dramatically improved the page load time (and therefor the user experience). Note that in these graphs I already have ads in an iframe before and after, and I already have the images running off a different web server.

Before the optimizations:

After the optimizations: You can see that around 1.8 seconds all of my files have finished loading, and it’s only ad files that are still loading, and this has little impact to the user, since the ads are in an iframe.

Other considerations

You should also consider your situation as to how to optimize your site. For example if you are a scammer site, just there for the google keywords, and don’t get any repeat traffic and very low page views per user, then just whack everything onto the one page, you might as well get it all downloaded straight away, and no point in waiting for the secondary connections.

If on the other hand, you are facebook and get 100 page views per user for a visit, and they come back regularly, then you might consider having a lot of content written out by a javascript file that never changes so that you don’t need to reload that content each time. A good example for my site is the menu systems. I currently have a large div with all the contents of the menus, but if I changed that it a javascript that wrote out the contents, then I would only need to load that javascript once, while the user goes to all the pages (and I get higher than average page views / visit).

One thing that I always found annoying was that in Parallels, I would hit Control-Option to get back to the mac, but the parallels screen would stay there (in my case CentOS 64bit). I generally have goo separation of tasks between Mac and CentOS, so when I switch, I want all of the Parallels windows to go away. Turns out the solution is really easy: Apple Spaces. This post describes how I set it up.

1. Select Apps -> Utilities

2. Select Spaces

3. The first time a message appears saying that Spaces is not setup. Click “Set Up Spaces”.

4. Next check “Enable Spaces”, check “Show Spaces in menu bar”.  Next, I really just wanted two screens so that I can easily toggle between them, so I hit – on  Rows (Whoops, I highlighted the wrong one 🙂 ). Then click the “+” to add an application, and choose Other…

5. Click on Parallels Desktop, and click “Add”.

Next I clicked just to the right of Parallels Desktop, and changed the space to Space 2. And I wanted to use the “alt” key (the option key), to do the switching, so I changed “To switch between spaces:” and “To switch directly to a space” to use the alt key (see picture).

So now when I want to switch between CentOS and mac, I use the arrow keys “Alt-Right” and “Alt-Left”.

So much better now.

yum install wireshark

First check the arp packets, type tshark -q -z “io,stat,1,arp”, then wait 10 seconds and hit Ctrl-C. Wireshark replaces ethereal.

[root@au1 ~]# tshark -q -z "io,stat,1,arp"
'Running as user "root" and group "root". This could be dangerous.
Capturing on eth0
781 packets captured

===================================================================
IO Statistics
Interval: 1.000 secs
Column #0: arp
|   Column #0
Time            |frames|  bytes
000.000-001.000       1        60
001.000-002.000       2       120
002.000-003.000       2       120
003.000-004.000       1        60
004.000-005.000       0         0
005.000-006.000       2       120
===================================================================

Now run the same thing looking at everything else:

[root@au1 ~]# tshark -q -z "io,stat,1,not arp"
Running as user "root" and group "root". This could be dangerous.
Capturing on eth0
624 packets captured

===================================================================
IO Statistics
Interval: 1.000 secs
Column #0: not arp
|   Column #0
Time            |frames|  bytes
000.000-001.000     112     10080
001.000-002.000      97      8694
002.000-003.000     111      9960
003.000-004.000     131     11754
004.000-005.000     114     10188
005.000-006.000      50      4500
===================================================================

and you can see that this is not the case right now. Recently we had a faulty device on the network that was absolutely hammering the local network (I was getting 800k/sec of ARP, and 1.2M/s of total traffic).

The usual way to determine if this is the case is to check the bandwidth on your managed switch, or the stats from your firewall, if you are unlucky then you won’t have this information.  In our case the network was unusable and we had an un-managed switch. So the quick way to determine the problem machine is to unplug the machines one at a time until the problem goes away. (or unplug them all and add them back in slowly).  YMMV.

You can enable the debug menu by doing the following command:

Apps -> Utilities -> Terminal

Then type:

defaults write com.apple.Safari IncludeDebugMenu 1

and then restart safari.  Then you will get a develop menu which has “Show Error Console” and many other options.


Fairly straight forward

yum install dhcp
chkconfig dhcpd on
Create /etc/dhcpd.conf from samples
/etc/rc.d/init.d/dhcpd on
/sbin/iptables -A INPUT -i eth0 -p tcp --dport 53 -j ACCEPT
/sbin/iptables -A INPUT -i eth0 -p udp --dport 53 -j ACCEPT

The example below allocates in the range 192.168.0.129 – 192.168.0.254

open up the ports on your machine (careful to only open it up on the internal side if the machine is dual facing).
You could use system-config-securitylevel and add port 53 for TCP and UDP. If you are having problems getting an IP address, check that it’s eth0 (and not say eth1), and if that fails, briefly turn off the firewall while testing.

/etc/dhcpd.conf: you’ll need to edit this to put in the address of your gateway (router), and the ip addresses of your dns servers, and add in any fixed ip address computers at the bottom.

ddns-update-style none;
ddns-updates off;
option T150 code 150 = string;
deny client-updates;
one-lease-per-client false;
allow bootp;
#
# DHCP Server Configuration file.
#   see /usr/share/doc/dhcp*/dhcpd.conf.sample
#

ddns-update-style interim;
ignore client-updates;

subnet 192.168.0.0 netmask 255.255.255.0 {

# — default gateway
option routers 192.168.0.1;
option subnet-mask 255.255.255.0;

option nis-domain “mydomain.example.com”;
option domain-name “mydomain.example.com”;

#enter IP addresses of your dns servers (from /etc/resolv.conf)
option domain-name-servers changeme.dns.server.ip, xx.yy,zz.aa;

#option time-offset -18000; # Eastern Standard Time
# option ntp-servers 192.168.1.1;
# option netbios-name-servers 192.168.1.1;
# — Selects point-to-point node (default is hybrid). Don’t change this unless
# — you understand Netbios very well
# option netbios-node-type 2;

range dynamic-bootp 192.168.0.129 192.168.0.254;
default-lease-time 21600;
max-lease-time 43200;

host MYMACHINE {
fixed-address 192.168.0.66;
hardware ethernet 00:26:44:72:e9:15;
}
}

Lucky me, I need to upgrade yet another machine, and it’s my email server. LVM (which I am not a big fan), takes all of the drive by default, and so now I want to resize the volume to be much smaller, then install the new operating system on the same drive (in some of the free space).

Quick Answer:
1. boot rescue cd, don’t mount drives
2. run:

lvm vgchange -a y
e2fsck -f /dev/VolGroup00/LogVol00
resize2fs -f /dev/VolGroup00/LogVol00 100G
lvm lvreduce -L100G /dev/VolGroup00/LogVol00

More Detail:
First I looked to see how much space I’m using.

[root@au1 c]# df -k
Filesystem           1K-blocks      Used Available Use% Mounted on
/dev/mapper/VolGroup00-LogVol00 297673144  49140752 233167480  18% /
/dev/sda1               101086     61896     33971  65% /boot
tmpfs                  3109444         0   3109444   0% /dev/shm

so around 50G, kid of a shame because I normally allocate 50G for the installed OS. No worries, I decide to go with 100G. (It’s a 250G drive I think) – hmm… TODO: check it’s not a 300G or 500G drive. Damn, I think it’s 500G…

Insert rescue CD into drive (eg the normal install disk (CentOS 5.4 64bit)).
At the prompt type: linux rescue
and then “Skip” the mounting of the drives.

lvm vgchange -a y
e2fsck -f /dev/VolGroup00/LogVol00
resize2fs -p -f /dev/VolGroup00/LogVol00 100G
lvm lvreduce -L100G /dev/VolGroup00/LogVol00

e2fsck and resize2fs will probably take a long time. For me e2fsck was probably around 5 minutes. resize2fs is certainly longer than that and I won’t know how long as I forgot to add in the -p and I’m about to head out for lunch (Happy Birthday Paul!).

If the Logical volume can be unmounted, then you can do these things without the rescue cd.

after reboot:
[root@au1 c]# df -k
Filesystem 1K-blocks Used Available Use% Mounted on
/dev/mapper/VolGroup00-LogVol00 101573920 49124416 48255200 51% /
/dev/sda1 101086 61896 33971 65% /boot
tmpfs 3109444 0 3109444 0% /dev/shm

I recently upgraded a machine to 64bit CentOS, and now the drives are running crazy slow. hdparm -t /dev/hda showed result like this:

Timing buffered disk reads:   14 MB in  3.18 seconds =   4.40 MB/sec
Timing buffered disk reads:   12 MB in  3.23 seconds =   3.72 MB/sec

And that was on a striped drive! (very slow, should be ~100MB/sec for 1 drive, ~180MB/sec for 2 drive stripe).

I had a similar problem with an SSD, and thought it odd that the drives appeared as /dev/hd?? instead of /dev/sd??. The solution is not to probe the IDE interfaces and you do this by adding ide0=noprobe ide1=noprobe to the kernel params. So now my entry in /etc/grub.conf looks like:

title CentOS (2.6.18-164.11.1.el5) No Probe
root (hd0,0)
kernel /boot/vmlinuz-2.6.18-164.11.1.el5 ro root=LABEL=/ ide0=noprobe ide1=noprobe
initrd /boot/initrd-2.6.18-164.11.1.el5.img

When making such a change, check your /etc/fstab to make sure that it’s not going to load /dev/hd?? since now the drives will change to /dev/sd??. And possibly they might get renumbered (probably thanks to the CD drive). So mine went from /dev/hda -> /dev/sda and /dev/hdc -> /dev/sdb

An earlier post will show that I setup a raid using /dev/hda3 and /dev/hdc1, so I recreated the strip with:

mdadm --stop /dev/md0
mdadm -A /dev/md0 /dev/sda3 /dev/sdb1
#add entry back into /etc/fstab
mount /dev/md0
echo 'DEVICES /dev/hda3 /dev/hdc1 /dev/sda3 /dev/sdb1'  &gt; /etc/mdadm.conf
mdadm --detail --scan &gt;&gt; /etc/mdadm.conf

reboot to test (consider commenting out /dev/md0 in /etc/fstab first).

so after all is fixed up hdparm shows much much better results.

hdparm -t /dev/sda

Timing buffered disk reads:  320 MB in  3.01 seconds = 106.16 MB/sec
Timing buffered disk reads:  320 MB in  3.00 seconds = 106.50 MB/sec

htparm -t /dev/md0

Timing buffered disk reads:  572 MB in  3.00 seconds = 190.41 MB/sec
Timing buffered disk reads:  592 MB in  3.01 seconds = 196.91 MB/sec

Certainly that is acceptable!