ClearOS Bug Tracker


View Issue Details Jump to Notes ] Issue History ] Print ]
IDProjectCategoryView StatusDate SubmittedLast Update
0000025ClearOSapp-multiwan - Multi-WANpublic2010-02-08 11:532013-02-02 12:22
ReporterVejlefjordskolen 
Assigned Todsokoloski 
PrioritynormalSeveritytweakReproducibilityunable to reproduce
StatusclosedResolutionfixed 
PlatformOSOS Version
Product Version5.1 
Target VersionFixed in Version5.2 
Summary0000025: Syswatch periodically reports interfaces as down
DescriptionThe syswatch service seems to be having problems sending pings, as it reports interfaces as down randomly and then removes them and restarts the firewall, at which point they are brought down even though they were functioning fine.

This happens many times a day; a quick look at the syswatch log, shows about one hundred mentions of "down" for each external interface every day. This is a problem as it breaks active connections through the interface that is shut down.

We have four external interfaces, and the problem began as three of the connections were changed from DHCP to static setups. The last external interface was always static and never had any problems before the other interfaces were changed to static, but the problem also affects this interface.
Additional InformationWe've been having this issue for a while, and first thought it was an ISP problem, but they've checked everything and they did find a few minor issues, but even after they've been corrected, we're still experiencing this problem.

After our ISP ensured us that there was no problem on their end, I logged on with SSH and monitored the syswatch log in real-time (tail -f /var/log/syswatch), and when syswatch began reporting errors on an interface (e.g. eth3), I manually did a ping to the same server syswatch wasn't able to ping from that interface (e.g. ping 69.90.141.72 -I eth3) and the manual ping worked.

This test showed that the server was indeed able to ping from the interface, but syswatch was somehow having problems, which leads to believe that it is a problem in the syswatch software itself.

I've tried changing different parameters in /etc/syswatch, but nothing has helped so far. I hope you can find a fix for this problem soon :)
TagsNo tags attached.
Attached Files

- Relationships

-  Notes
(0000026)
user2
2010-02-08 17:57

Hi there. Every time we have seen this reported, the root cause was:

- A network loop
- A problem with the ISP
- Some other network issue

Using the "ping" command (even with the -I flag) does not guarantee that the network packet goes out the correct interface. The -I flag merely sets the source address. In fact, I don't know a good way to force a ping packet down a specific network interface using the command line (maybe netcat?). Next time it happens, use the tcpdump command to see what's really happening with network traffic:

tcpdump -i eth3 icmp

Don't be surprised if you see a ping test go out on eth3 and then come back on another interface like eth2. I have personally seen this a handful of times. Strange behavior that shouldn't work in my mind... but it does!
(0000027)
Vejlefjordskolen (reporter)
2010-02-08 23:51

You're absolutely right! In my test they went out eth3 and came back on eth0.

How can this be fixed? Is it our ISP that needs to fix their network configuration?

Thanks for the quick response. I didn't want to believe that it was ClearOS, as we have had quite a few problems with bad quality connections at our location, but our ISP insisted that there was no problem, and my (inadequate) test showed that the pings were being replied to, so I thought there was only our system left to blame.
(0000028)
user2
2010-02-09 11:56

Fundamentally, it is an ISP issue. Pragmatically, it is a ClearOS issue since we shouldn't count on the ability of ISPs to fix the problem. Let me see if there's a workaround for ClearOS. More to come!
(0000032)
Vejlefjordskolen (reporter)
2010-02-16 05:40

Having looked at the ARP traffic (tcpdump -i eth0 arp), I've noticed that the server sometimes responds with the wrong IP address for a MAC. The server responded with the correct MAC on a who-has coming in on the correct interface for the IP, but when a who-has came for the IP address of one of the other NIC's, the server responded with the MAC address of the current NIC and not with the MAC of the NIC that actually has that address.

I believe that this may have some relevance, as we seem to be poisoning the ARP cache of our ISP. Is this still an ISP issue?
(0000033)
Vejlefjordskolen (reporter)
2010-02-16 06:53

Ok, it seems I have fixed the issue... The problem was that arp_filter option of the external NIC's was set to 0. This meant that all external interfaces answered with their own MAC whenever our ISP broadcasted for a specific IP address.

I found a description at the following URL:
http://www.linuxinsight.com/proc_sys_net_ipv4_conf_eth0_arp_filter.html [^]

It specifically mentions that having this option set to 0 could give problems with load balancing setups.

This should be fixable by you :)
(0000034)
user2
2010-02-17 10:37

Nice detective work Vejlefjordskolen!
(0000035)
dsokoloski (developer)
2010-02-17 11:00

Committed revision 2560.

- Issue History
Date Modified Username Field Change
2010-02-08 11:53 Vejlefjordskolen New Issue
2010-02-08 17:57 user2 Note Added: 0000026
2010-02-08 17:57 user2 Reproducibility have not tried => unable to reproduce
2010-02-08 17:57 user2 Status new => feedback
2010-02-08 23:51 Vejlefjordskolen Note Added: 0000027
2010-02-09 11:56 user2 Note Added: 0000028
2010-02-09 14:06 user2 Severity major => tweak
2010-02-16 05:40 Vejlefjordskolen Note Added: 0000032
2010-02-16 06:53 Vejlefjordskolen Note Added: 0000033
2010-02-17 10:37 user2 Note Added: 0000034
2010-02-17 10:40 user2 Status feedback => assigned
2010-02-17 10:40 user2 Assigned To => dsokoloski
2010-02-17 11:00 dsokoloski Resolution open => fixed
2010-02-17 11:00 dsokoloski Fixed in Version => 5.2
2010-02-17 11:00 dsokoloski Note Added: 0000035
2010-08-26 08:38 user2 Status assigned => confirmed
2013-02-02 09:00 user2 Status confirmed => resolved
2013-02-02 12:22 user2 Status resolved => closed