ClearFoundation Tracker - ClearOS
View Issue Details
0000025ClearOSapp-multiwan - Multi-WANpublic2010-02-08 11:532013-02-02 12:22
Vejlefjordskolen 
dsokoloski 
normaltweakunable to reproduce
closedfixed 
5.1 
5.2 
0000025: Syswatch periodically reports interfaces as down
The syswatch service seems to be having problems sending pings, as it reports interfaces as down randomly and then removes them and restarts the firewall, at which point they are brought down even though they were functioning fine.

This happens many times a day; a quick look at the syswatch log, shows about one hundred mentions of "down" for each external interface every day. This is a problem as it breaks active connections through the interface that is shut down.

We have four external interfaces, and the problem began as three of the connections were changed from DHCP to static setups. The last external interface was always static and never had any problems before the other interfaces were changed to static, but the problem also affects this interface.
We've been having this issue for a while, and first thought it was an ISP problem, but they've checked everything and they did find a few minor issues, but even after they've been corrected, we're still experiencing this problem.

After our ISP ensured us that there was no problem on their end, I logged on with SSH and monitored the syswatch log in real-time (tail -f /var/log/syswatch), and when syswatch began reporting errors on an interface (e.g. eth3), I manually did a ping to the same server syswatch wasn't able to ping from that interface (e.g. ping 69.90.141.72 -I eth3) and the manual ping worked.

This test showed that the server was indeed able to ping from the interface, but syswatch was somehow having problems, which leads to believe that it is a problem in the syswatch software itself.

I've tried changing different parameters in /etc/syswatch, but nothing has helped so far. I hope you can find a fix for this problem soon :)
No tags attached.
Issue History
2010-02-08 11:53VejlefjordskolenNew Issue
2010-02-08 17:57user2Note Added: 0000026
2010-02-08 17:57user2Reproducibilityhave not tried => unable to reproduce
2010-02-08 17:57user2Statusnew => feedback
2010-02-08 23:51VejlefjordskolenNote Added: 0000027
2010-02-09 11:56user2Note Added: 0000028
2010-02-09 14:06user2Severitymajor => tweak
2010-02-16 05:40VejlefjordskolenNote Added: 0000032
2010-02-16 06:53VejlefjordskolenNote Added: 0000033
2010-02-17 10:37user2Note Added: 0000034
2010-02-17 10:40user2Statusfeedback => assigned
2010-02-17 10:40user2Assigned To => dsokoloski
2010-02-17 11:00dsokoloskiResolutionopen => fixed
2010-02-17 11:00dsokoloskiFixed in Version => 5.2
2010-02-17 11:00dsokoloskiNote Added: 0000035
2010-08-26 08:38user2Statusassigned => confirmed
2013-02-02 09:00user2Statusconfirmed => resolved
2013-02-02 12:22user2Statusresolved => closed

Notes
(0000026)
user2   
2010-02-08 17:57   
Hi there. Every time we have seen this reported, the root cause was:

- A network loop
- A problem with the ISP
- Some other network issue

Using the "ping" command (even with the -I flag) does not guarantee that the network packet goes out the correct interface. The -I flag merely sets the source address. In fact, I don't know a good way to force a ping packet down a specific network interface using the command line (maybe netcat?). Next time it happens, use the tcpdump command to see what's really happening with network traffic:

tcpdump -i eth3 icmp

Don't be surprised if you see a ping test go out on eth3 and then come back on another interface like eth2. I have personally seen this a handful of times. Strange behavior that shouldn't work in my mind... but it does!
(0000027)
Vejlefjordskolen   
2010-02-08 23:51   
You're absolutely right! In my test they went out eth3 and came back on eth0.

How can this be fixed? Is it our ISP that needs to fix their network configuration?

Thanks for the quick response. I didn't want to believe that it was ClearOS, as we have had quite a few problems with bad quality connections at our location, but our ISP insisted that there was no problem, and my (inadequate) test showed that the pings were being replied to, so I thought there was only our system left to blame.
(0000028)
user2   
2010-02-09 11:56   
Fundamentally, it is an ISP issue. Pragmatically, it is a ClearOS issue since we shouldn't count on the ability of ISPs to fix the problem. Let me see if there's a workaround for ClearOS. More to come!
(0000032)
Vejlefjordskolen   
2010-02-16 05:40   
Having looked at the ARP traffic (tcpdump -i eth0 arp), I've noticed that the server sometimes responds with the wrong IP address for a MAC. The server responded with the correct MAC on a who-has coming in on the correct interface for the IP, but when a who-has came for the IP address of one of the other NIC's, the server responded with the MAC address of the current NIC and not with the MAC of the NIC that actually has that address.

I believe that this may have some relevance, as we seem to be poisoning the ARP cache of our ISP. Is this still an ISP issue?
(0000033)
Vejlefjordskolen   
2010-02-16 06:53   
Ok, it seems I have fixed the issue... The problem was that arp_filter option of the external NIC's was set to 0. This meant that all external interfaces answered with their own MAC whenever our ISP broadcasted for a specific IP address.

I found a description at the following URL:
http://www.linuxinsight.com/proc_sys_net_ipv4_conf_eth0_arp_filter.html [^]

It specifically mentions that having this option set to 0 could give problems with load balancing setups.

This should be fixable by you :)
(0000034)
user2   
2010-02-17 10:37   
Nice detective work Vejlefjordskolen!
(0000035)
dsokoloski   
2010-02-17 11:00   
Committed revision 2560.