Post date: Feb 27, 2015 8:52:9 PM
Well I've had an interesting week. Over the weekend, the FortiGate 100D firewall appliance that I use to connect my home network to the Internet up and failed on me. It's had some issues in the past, but nothing a reboot would not fix.
After a quick power-cycle, the thing would not boot. The status light was flashing followed by a reboot. So I pulled out my trusty Cisco rollover console cable (thankfully compatible with the FortiGate) and took a look. I tried upgrading to the latest firmware (5.2) using the TFTP option G (I was due anyway), but after rebooting I was greeted with the following:
FortiGate-100D (11:09-01.11.2012)
Ver:04000009
Serial number:FG100D3G12800173
RAM activation
CPU(00:000106ca bfebfbff): MP initialization
CPU(01:000106ca bfebfbff): MP initialization
CPU(02:000106ca bfebfbff): MP initialization
CPU(03:000106ca bfebfbff): MP initialization
Total RAM: 2048MB
Enabling cache...Done.
Scanning PCI bus...Done.
Allocating PCI resources...Done.
Enabling PCI resources...Done.
Zeroing IRQ settings...Done.
Verifying PIRQ tables...Done.
Boot up, boot device capacity: 15272MB.
Press any key to display configuration menu...
......
Reading boot image 1503011 bytes.
Initializing firewall...
System is starting...
Starting system maintenance...
Scanning /dev/sda1... (100%)
Scanning /dev/sda3... (100%)
Formatting shared data partition ... done!
Using default data disk. plat=100 ver=5.
Cannot mount shared data partition.
Using default data disk. plat=100 ver=5.
read dict size eEXT2-fs error (device sd(8,1)): read_inode_bitmap: Cannot read inode bitmap - block_group = 17, inode_bitmap = 139266
rror!
geoip_reaEXT2-fs error (device sd(8,1)): read_inode_bitmap: Cannot read inode bitmap - block_group = 0, inode_bitmap = 4
d_country_map, OEXT2-fs error (device sd(8,1)): read_inode_bitmap: Cannot read inode bitmap - block_group = 0, inode_bitmap = 4
It seemed pretty clear that my boot flash was bad. Apparently 3 years of writing and deleting firewall logs are a bit more than the onboard flash was able to handle. Interestingly enough, if I installed and ran the 5.0 version of the FortiGate firmware, the device would boot and pass traffic. I could even configure it and apply firewall policies. Unfortunately, the configs would not save between boot cycles and that is no way to run a firewall (especially one with only a single power supply). On with the troubleshooting...
Next, I downloaded the HQIP (Hardware Quick Inspection P???) firmware image from the vendor (https://fortinet.com), loaded it over TFTP and figured out how to run the diagnostics (why is that so poorly documented): login with admin / no password and run "diagnose hqip start". I didn't bother wiring the network ports as instructed (looks like an interesting way to do a full test though) and only paid attention to the important bits:
==> Boot Device Test
Testing write and read to hard disk(/dev/sda): Size 14GB
hdtest.c(191): Xhdtest.c(191): Xhdtest.c(191): Xhdtest.c(191): Xhdtest.c(191): Xhdtest.c(191): Xhdtest.c(191): Xhdtest.c(191): Xhdtest.c(279):
Total 8 block(s) read/write error.
<== Boot Device Test - FAIL
Eight bad blocks seems like it should be recoverable somehow, but apparently not. I couldn't find a way to mark the bad blocks and carry on, despite finding and running the device formatting option in the HQIP firmware (at 4 hours+). Apparently older FortiGate devices had some debug/recovery options to check for bad blocks but the functionality has been removed as the newer devices use USB based flash drives that are supposed to handle bad blocks at the device level. Maybe FortiNet should reconsider this decision. But wait, did you say USB flash drive? Why yes, I did - more on that later, bur first, a little tangent...
At this point, I had done my research and it was time to engage vendor support. But I had a problem. I hadn't actually bought this device, it had been provided to me by the vendor. I was working as an external consultant on a large data center migration with FortiGate data centre firewalls. The Enterprise Account Manager had decided that I was being especially helpful and provided me with some free hardware and licensing. Everything came with a year of support, but that has long since expired. I decided to try my luck and open a ticket online anyway.
I was initially very hopeful when I received an email that my ticket had been logged and dispatched to support. But really came back and slapped me in the face when, shortly later, I received an email telling me to go buy some support or bugger off. "Well, I guess I'm on my own", I thought.
With all ideas of vendor supplied support out the window, I decided to explore a little deeper. I found these sites that pointed me to the very useful fnsysctl command:
Ken Felix's Security Blog entry on killing FortiProcesses
The FortiNet sysctl CLI command (fnsysctl) provides a hook into the underlying Linux OS. It allows you to run some select commands that can be eminently helpful in diagnosing problems. Some of the helpful commands:
fnsysctl ls -al
fnsysctl cat /proc/mounts
fnsysctl cat /proc/partitions
A little digging shows that the FortiGate 100D presents a USB storage device to the Linux kernel. Since the device has a couple of USB ports on the front, I should theoretically be able to replace the existing device with an off-the-shelf USB stick.
Sure enough, after popping in a couple of 4GB sticks I had around and rebooting, I noticed that my boot device changed from /dev/sda to /dev/sdb. Not only are my USB drives being detect in a manner virtually identical to the boot flash, but one of them was taking priority (being detected before the onboard flash). Victory seemed near...
At this point, I was looking to put together some sort of permanent solution. The only flash drives I had were a could of 4GB models and a nice 32GB encrypted drive that I didn't want to dedicate to this purpose. I decided on using an old 80Gb laptop hard drive and a USB adapter that I had on hand, hopefully giving my lots of room to store logs and a medium that was better suited to the write/delete cycles that I had been using the onboard storage for.
I also decided that I had to somehow get rid of the existing flash drive to prevent any chance that the BIOS would prefer the internal flash and boot the bad drive. Without any chance of future vendor support, I decided to ignore the "voided warranty" FUD and crack the case. Here's what I found.
The layout is pretty basic - power supply on top, 2GB SO-DIMM (DDR3-800) below it, Atom CPUs under the large heatsinks, Ethernet switch modules on the lower right. No simple USB flash drive that could be easily pulled out. But wait, what's that on the upper right?
Looks like a flash chip. With some jumpers beside it. Jackpot! This flash chip is indeed the 16GB flash boot device. There is space on the board for another module, and I would guess the current model of the FortiGate 100D ships with it populated (the specs now list 32GB onboard storage).
The JUSB2 header beside the chip looks like it may be a standard USB connector with Vp, D-, D+ and Gnd pins. When I opened the box, the middle pins (almost certainly D- and D+) were jumpered. When I booted the box with the jumpers removed, the bad flash drive magically disappeared. Problem solved.
If I was a little more ambitions, I could probably put a standard USB cable header on the JUSB2 jumpers and secured some sort of USB storage device (flash / SSD / Hard Drive) to the inside of the box (look at all that room!). As this is the primary firewall for my home network and my wife goes a little bit nuts if the Internet isn't available (need more CeiLiNG CaT ViDeoS!!) I decided to put things back together and use the 80GB laptop drive hanging off the external USB ports. It wouldn't be pretty, but it would be the fastest way to re-open the Intertubes.
Once re-assembled, I booted from TFTP using the console cable once again and formatted the boot device (now the 80GB laptop drive). The boot device is partitioned into 2 256MB boot partitions (default and backup) and a third partition with the remaining space (no 79GB).
The FortiGate 100D doesn't seem to know how to handle having 79GB of data space. It shows up in the GUI has having 0MB storage space available, but seems to have no problem setting up a 32GB web cache partition (which is basically just remounts the data partition under a different part of the filesystem).
So far, the configs are backing up properly and things seem to be stable. I don't seem to be able to store firewall logs on the device, but it's quite possible that this option has been removed from the most recent firmware version. It seems like a good way for the vendor to stop the flow of RMA requests for bad blocks on the flash drive.
Unfortunately, I don't have a support contract for IPS/Antivirus signatures so I can't tell if these databases would be stored on the disk properly or not. For now, my network's up, my dual ISP connections are being load balanced (something that prevented my from just replacing the FortiGate with an ASA I have lying around) and my inbound NAT access is working again.
Now back to my day job...