We had a situation where our image thumbnail memcached cluster somehow got empty thumbnails. The thumbnails are generated on the fly by image proxy servers and the thumbnail is stored into memcached. For some reason some of the thumbnails were truncated.
As I didn’t have time to start debugging the real issue, I quickly wrote this oneliner which detects corrupted thumbnails when the thumbnail is fetched from the memcached and issues a delete operation to it. This will keep the situation under control until I can start the actual debugging. We could also have restarted the entire memcached cluster, but it would result in big preformance penalty for several hours. Fortunately all corrupted thumbnails are just one byte long, so detecting them was simple enough to do with an oneliner:
tcpdump -i lo -A -v -s 1400 src port 11213 |grep VALUE | perl -ne 'if (/VALUE (cach[^ ]+) [-]?\d+ (.+)/) { if ($2 == 1) { `echo "delete $1 noreply\n" | nc localhost 11213`; print "deleted $1\n"; } }'
Here’s how this works:
- tcpdump prints all packets in ascii (-A) which come from port 11213 (src port 11213), our memcached node, from interface loopback (-i lo)
- the grep passes only those lines which contains the response header which has the following form: “VALUE <key> <flags> <length>“
- for each line (-n) the perl executes the following script (-e ‘<script>’) which first uses regexp to catch the key “(cach[^ ]+)” and then the length.
- It then checks if the length is 1 if ($2 == 1) and on success it executes a shell command which sends a “delete <key> noreply” message to the memcached server using netcat (nc). This command will erase the corrupted value from memcached server.
- Last it prints a debug message