Connectivity challenges

Hard to know what's going on when devices don't connect

One of the challenges with Z-Wave devices is that when things don't work it can be difficult to know what's the cause. There also appears to be a dearth of clear authoritative and accessible documentation. There are a range of articles online which are accessible, but these are frequently limited in scope. Some of these articles suggest certain approaches to setting up and resolving Z-Wave network issues with limited details as to how to perform the steps or a clear explanation as to why to perform certain steps. It's a pity the companies and organisations that produce the Z-Wave standard and equipment haven't done more to provide suitable documentation and support for consumers. That said, I've been using a small number of Z-Wave device over the last few years and have experienced numerous issues where Z-Wave devices stopped communicating. Many of the issues have been intermittent and with limited diagnostic information available problem solving is based on trial and error. This page is a running commentary I made as I tried to better understand what was going wrong and with the hope of solving my problem. It is also hoped that this may assist others who are trying to resolve Z-Wave issues.

Solving a Z-Wave switch that has lost connectivity

A recent issue I experienced was that an Aeotec Z-Wave Smart Switch 6 I have was no longer being controlled and was appearing as dead. My Z-Wave network is controlled via an Aeotec Z-Stick Plus (Gen 5) in a Raspberry Pi. I run Home Assistant (HA), using a Z-Wave JS integration and Z-Wave JS to MQTT add-on. I am far from being an expert with any of these technologies. I've included a screen image of the "Z-Wave JS to MQTT" web interface below. As can be seen in the image the status of ID=2 Smart Switch 6 indicates that it's dead (red sad face symbol). In this instance if you look at the date that it was last active it was three days prior to the date this image was created. An automation I had was supposed to turn the switch on and off once a day and that hadn't been occurring for 3 days.

Why is this occurring

As indicated above, discovering the exact cause of a Z-Wave fault is not easy. I'd only recently discovered another fault with my Z-Wave network where a faulty Smart Switch 6 appeared to be repeating or corrupting Z-Wave traffic and causing other devices to fail. (While I was not absolutely certain this is what was happening, removing the faulty switch at the time allowed other apparent dead devices to start operating.) Having removed that faulty device everything appeared to be working for a short period before the issue I am describing on this page presented itself. I really want my Z-Wave equipment to work so this page is about trying to track down the cause.

Problem solving

With limited diagnostic methods at my disposal (that I could access and understand) I tried initially to discover if it was the device itself that was problematic or an issue in communicating with this particular device. Via the HA Z-Wave JS to MQTT web interface it's possible to ping devices and send binary on and off messages. I tried pinging and to repeatedly turn the dead switch on and off but it remained dead. The next strategy was to substitute a working ID=6 Smart Switch 6 with the dead device and see if the devices worked in different locations. When I did this I found that the ID=2 switch worked in it's new location and the ID=6 switch became dead in it's new location (where the ID=2 resided). Based on this it appeared that the issue was with the location and not with the switch itself. Although as described further below this method of substitution was possibly not valid (due to possible routing). It also didn't really explain why the location had apparently been okay days prior but now appeared problematic.

What's wrong with certain locations

Although I don't have a large number of Z-Wave devices within my house, they are all placed within a relatively close proximity to one another. If I could run a tape measure (through walls) and measure line of site there is likely less than 10 to 12m between the controller and any of the Z-Wave devices. I have at times moved the range extender I have to try to improve other issues I've experienced in the past, but I don't think the issues I have are distance related. Over the years I've read and heard of other reasons which may cause location specific connection issues. I'm never sure whether some of these sources are authoritative. However, for the sake of trying to resolve this issue I will discuss advice from a few of those information sources here. Oz smart things (https://www.ozsmartthings.com.au/blogs/news/7-top-zwave-tips) has some interesting advice about ensuring there are enough devices within a network to form an appropriate mesh. The advice also indicates that distance may not be the cause of dead nodes but rather it could be reflection from metal objects.

The way Z-Wave creates and manages a mesh network is a somewhat opaque topic. It may be that a comprehensive understanding of how a Z-Wave mesh works can be achieved by accessing the appropriate documentation, but such details do not appear to be easily nor freely available. Accessing training on Z-Wave at https://z-wavealliance.org/z-wave-alliance-training-education/ appears to require membership to the Z-Wave Alliance, which has a cost associated with it and without having accessed any of these resources I am not clear whether this would assist an individual to diagnose their Z-Wave issues. The Z-Wave alliance options indicate the least costly membership is for installers/resellers and not for consumers. The Z-Wave alliance website has some fairly general information about Z-Wave available and also refers to the more general Z-Wave website https://www.z-wave.com/. None of the resources from either of these sites assisted me, as most of the information is either of a trivial nature or highly technical and intended for developers. However, given the number of individuals who appear to experience issues with Z-Wave and the somewhat flaky way it appears to work it's a pity more helpful information is not freely and easily available.

Drzwave has a website which contains some excellent explanations of technical details of Z-Wave and the details here: https://drzwave.blog/2021/07/13/z-wave-mesh-priority-routes-explained/ provide some insight as to how a Z-Wave mesh works. There are a number of aspects of Z-Wave routing which seems fairly straightforward (i.e. that traffic can be routed through up to four hops and several alternate routes can be stored). However, there are other aspects which are less clear. Firstly the details about how these routes are created and managed is unclear. For example Wikipedia https://en.wikipedia.org/wiki/Z-Wave states the following. "The controller learns the signal strength between the devices during the inclusion process, thus the architecture expects the devices to be in their intended final location before they are added to the system. Typically, the controller has a small internal battery backup, allowing it to be unplugged temporarily and taken to the location of a new device for pairing. The controller is then returned to its normal location and reconnected." These details appear to suggest that routes learned when adding devices are significant, but the article appears to contradict itself by suggesting the controller can be moved during the inclusion process. Fortunately, as indicated I am using a setup based on HA and Z-Wave JS and although it uses a USB hub, the inclusion (and exclusion) can be performed from within HA and there is no need to move (or even touch) the USB hub.

Healing

Some advice around setting up a network suggest that when you build a Z-Wave network you add the devices in a strategic order to ensure appropriate routing https://www.clarecontrols.com/dealer-news/6-z-wave-tips-and-tricks. This article https://sensative.com/article/how-to-create-a-stable-z-wave-network/ also indicates devices should be in position when added. This advice supports that concept that routes are created during inclusion and to some degree fixed. However, again this advice is somewhat difficult to follow or understand because as mentioned in details above physical distance is not the only metric which may impact successful communication. Materials such as metal in nearby or intervening walls may also absorb or reflect transmissions. However, it does appear that a process of healing can update routes. Again the advice around this appears contradictory. Advice here: https://drzwave.blog/2017/01/20/seven-habits-of-highly-effective-z-wave-networks-for-consumers/ states that "once a Z-Wave network is built, it has to be “healed” so every node can use all the other nodes in the network to route messages", but also states that "Z-Wave is able to self-heal automatically". I may be misunderstanding advice related to routing and healing, but it does appear that healing (one way or another) fixes routes and creates optimal routes regardless of how devices were added and where they were located when added to the Z-Wave network.

Dead devices

Another aspect that Drzwave suggested might be the cause of communication issues was a dead device. The advice here: https://drzwave.blog/2017/01/20/seven-habits-of-highly-effective-z-wave-networks-for-consumers/ states: "If it (dead device) remains in the controller then the controller will try to route thru this dead node on occasion resulting in delays in delivering messages". The article does suggests "running a Heal on the network will remove the node from the routing tables but it will remain in the controllers routing tables". Again this is very unclear what which routing table (if not the controller) that the route (including the dead node) is removed from. It would appear to be reasonable advice to remove nodes which are dead and not coming back from the controller.

How to heal using Home Assistant

Within Home Assistant and within "Devices & Services" and under Device (Tab) it's possible to select "Heal Device". I'm not sure exactly what this does, and so far using it has either not run or more recently indicated: "A Z-Wave network heal is already in progress. Please wait for it to finish before healing an individual device" and at other times it reports: "xxx could not be healed. Additional information may be available in the logs." (where xxx is the device I have attended to heal.

Device location

Advice about where to setup Z-Wave devices generally suggest ensuring the controller is in a central location and a location which avoids interference with other objects. This article suggest corners can be problematic for devices https://dome.zendesk.com/hc/en-us/articles/115005584647-How-can-I-improve-my-Z-Wave-network-coverage-. The initial problem I started with was that node ID=2 was dead. It was located on the RHS of a bedside table, located with some other power supplies and cables and also near the corner of the house. As indicated above part of the troubleshooting process I followed was to substitute the dead device with another similar device which was working with the dead device to see if I could establish whether the issue was to do with the location or with device itself. As it turned out neither the originally dead device nor the substituted device worked in the original location, but both worked when I moved the Z-Wave devices to a slightly different location (which was a bit further from the corner and slightly away from some other power supplies). As shown in the photo the new location was just to other side of a bedside chest of drawers. However, even though this test suggested the problem may have been to do with the location rather than the device itself, it may have just been coincidental as routing and intermittent functionality may have accounted for why the device worked in one location and not in the other.

Location of USB hub

In addition to advice on the location of devices there is also advice about where the hub should be placed. The recommendation is that the hub is placed in a location which is fairly central and has good connectivity to nearby devices. I understand this is likely to be best so that communication can occur directly between hub and device and not needing to be routed. However, I find this a little frustrating because I don't want to have to locate my raspberry pi in a particular location just to solve Z-Wave's awkward way of routing traffic. As it turns out, my Pi and USB hub are fairly central in the house, but are surrounded by a lot of cables and communications equipment. The advice from the zwave_js team is: https://www.home-assistant.io/integrations/zwave_js "All users are encouraged to use a USB extension cable to prevent such interference. Please try such a cable before opening an issue or requesting support on Discord. It will nearly always be the first troubleshooting step that we ask you to take anyway." I added in a short USB extension cable, but so far this has not made any difference.

Understanding the routes

At the point of writing this I have tried numerous ways to solve my communications issues. I have extended the USB hub a short distance from the Pi it's plugged into, I have moved the Z-Wave switch a short distance to separate it from other equipment, I repeatedly tried to heal and re-interview devices, I have upgraded and restarted the Pi software and integrations several times. Through all this and over several days the two of my Z-Wave switches ID=2 and ID=6 have worked intermittently. Whereas another switch and my range extender (ID=5 and ID=4 respectively) have not appeared to fail at all. On the topic of healing a network, details here: https://linkdhome.com/articles/What-is-z-wave, explain the importance of having devices in the location they will be used. It explains how it produces a neighbouring table based on signal strength of 'nearby' devices. It describes how this table is created during 'inclusion' but can be refreshed by performing a 'heal'.

The failures may be because I simply don't have equipment in locations to allow the Z-Wave signal to be repeated and routed and get to all the devices that I have. The frustrating aspects about this are that when I move devices slightly (or significantly) it can appear to solve communication issues, but the issues may then re-occur. Also the switch that appears to rarely fail doesn't appear to be in a better location that the switches that do fail.

Having an interface which just reports that a device is "dead" really doesn't assist with diagnosing a communication fault. What I really want to know is what routes were tried prior to the system reporting that the device was dead. I'd like to be able to see the routes the controller and device have, and it would be even better if I could update these or control them is some manner. Perhaps healing would do this and perhaps the system self heals, but I can see limited evidence that this occurs in HA. Z-Wave JS to MQTT provides a network graph which can be viewed in HA by selecting the appropriate options as shown in the following image. However, the comments here: https://community.home-assistant.io/t/70-zwave-device-network-stuck-healing-for-over-24hrs/332425/4. One comment suggests that this network graph is "entirely useless", but another suggest it shows the "preferred route" and possibly not the actual route taken by traffic. It's also not very clearly presented and it's unclear to me from the graph whether the lines represent traffic being routed via the FishTankLightSwitch and/or Range Extender or whether the lines just cross these device symbols. It's not possible to move the individual devices in the graph, which if you could do this it might help confirm with the connecting lines were behind the device symbol or intersecting with it.

After many hours of playing around with my Z-Wave setup I realise I need to dig deeper than I have done so already to try to resolve the connection issues.