We propose a new type of attack, EvilScreen, against multi-channel communication between a smart TV and its remote controls. Unlike existing attacks that need to install a malicious app on the TV or exploit the TV OS, EVILSCREEN only reuses communications of remote controls to hijack the victim's TV, making it more difficult to detect and prevent the attack.
We exploit two new features introduced by modern smart TVs.
Multi-channel Communication.
A distinct feature of smart TVs is the use of multiple wireless communication channels, i.e., Consumer Infrared (IR), Bluetooth Low Energy (BLE), and Wi-Fi.
IR and BLE based communications are commonly employed in traditional TV remote controls (namely physical remote control). But they are still supported by most smart TVs because of user experience considerations and compatibility issues. When a user presses a button on the remote control, the remote control will obtain the key code of the button and send it to the smart TV using a IR/BLE signal. Then the IR/BLE receiver on the TV will decode the signal into instructions that the TV OS can understand. However, both IR and BLE signals are short-range, which requires data transmission within a short distance. Worse still, they lack strong authentication mechanisms. IR communication does not need to authenticate the involved devices, and BLE authentication is known to suffer from pairing issues such as Man-in-the-Middle, Brute-force and Method Confusion attacks. Therefore, transmitting sensitive data via IR and BLE channels is risky.
In addition to IR and BLE, modern smart TVs also support companion apps on smartphones (called virtual remote control) to control the smart TV and access TV resources via Wi-Fi. After binding with a smart TV, a user can send various commands though these companion apps via the Internet, which will then be received, parsed and executed by the TV OS. Compared with IR and BLE signals, Wi-Fi data transmission adopts well-designed security specifications (e.g., WPA2 and WPA3), and therefore is suitable for data transmission with strong protection requirements.
Advanced User Interactions.
With different types of remote controls, the interactions between users and TV devices differ significantly. For the physical IR/BLE remote control, a user can only execute some basic operations (like power on/off, volume up/down, channel switching) due to limited number of key buttons and resources on the remote control device. In contrast, a companion app is able to support more advanced user interactions. In addition to basic functions, a companion app can directly send one high-level command to the smart TV instead of controlling the cursor on screen steps by steps to perform the operation. Typically, three advanced operations are supported by modern smart TVs:
TV App Operations: Most smart TVs allow a user to open/install/uninstall a TV app via the companion app. For example, a user can select an app displayed in the companion app, and the companion app will then send an TV app installing request to the smart TV. After verifying the request, the smart TV will download the app file and install it.
Screencast: Smart TVs usually support screencast operations. That is, a user is allowed to display the contents (e.g., media files, documents) stored in the smartphone on the smart TV screen; the content can then be watched by multiple people at the same time.
Screenshot: The screenshot function is also implemented for most smart TVs, and a user can monitor the TV screen in real time through this function. Typically, there are two ways to implement TV monitoring. One is taking a screenshot picture of what the TV is displaying, which will then be transmitted to the companion app later. Another is real-time streaming monitoring - in this case the smart TV transmits the current screen content as streaming video data and displays it on the smartphone.
A remarkable attack surface for smart TVs is their TV-Accessory wireless communication. To protect smart TVs against remote wireless signals based attacks, the following measures are often employed.
Network Isolation. When a user initially launches a smart TV, the user typically configures the TV to connect to a Wi-Fi network. At this stage, the smart TV often relies on the user to send the Wi-Fi credentials (i.e., the SSID and password of the Wi-Fi) via its remote controls. After the network connection, the smart TV is under the protection of WLAN isolation. Thus, only authenticated devices are allowed to join the (WLAN) network and access TV resources.
TV-Accessory Binding. The smart TV and its accessories (a physical remote control or a smartphone with a companion app) in the same WLAN need to complete a binding process before further remote interactions.
Remote Interaction Validation. the remote user is not allowed to use resources of the smart TV arbitrarily. The TV OS introduces new interfaces to handle different remote operations sent by the user and perform permission checks. The permissions are granted to acces- sories after the binding phase, and when a resource interface is invoked by an accessory, the TV OS determines whether the accessory has been granted the specific permissions to access the resource.
EVILSCREEN attack is a new type of wireless communication attack that exploits a type of multi-channel remote control mimicry vulnerabilities. Abstractly, an EVILSCREEN vulnerability exploits three weaknesses in the architecture/implementation of smart TV systems to circumvent the three protections presented above.
The first weakness is that the victim smart TV supports various remote controls using different wireless signals, and there exists an implicit authentication dependency between two or more remote controls.
The second weakness is that the smart TV fails to provide enough displayed information for users to distinguish malicious remote controls from benign ones.
The third weakness is that the smart TV provides a series of interfaces for remote controls to access sensitive data on smart TV and exe- cute high-privilege operations, while the access control of those interfaces is ill-designed.
By simultaneously leverag- ing these three weaknesses, a successful EVILSCREEN attack would allow the attacker to remotely monitor the screen and/or perform hijacking against the victim smart TV.
Our attacker model is as follows.
The smart TV OS is securely implemented. Thus the attacker cannot hijack and compromise the smart TV by exploiting the OS. In addition, we assume that no malicious apps were previously installed on the victim TV.
In order to analyze the smart TV and find potential security flaws, we assume that the attacker can purchase any products beforehand. Also we assume that, through his own analysis, the attacker is able to gather necessary information, such as device configurations and companion app implementations, and extract the specific side channel information (e.g., cer- tain range of mac address [31], signal characteristics [32]) to identify the model and brand of the victim TV.
The communication channels between the accessory and the smart TV are exposed to the attacker. The attacker can sniff network traffic and capture network packets. However, we assume that the attacker is not able to circumvent exist- ing communication channel protections. For instance, data transmitted through the Wi-Fi channel is protected, that is, the attacker is unable to crack Wi-Fi credentials by launching brute-force attacks.
In real-world scenarios, a smart TV may be bought for different usage purposes, hence be installed at various locations. Depending on the location variants, the attacker model might also be slightly different. In the following, we discuss two main real-world attack scenarios, i.e., public smart TV and personal smart TV.
1. Publicly Accessed Smart TVs
A large number of smart TVs are placed in public areas, such as shopping malls and gyms for commercial or entertainment purposes (e.g., advertis- ing). Under such a scenario, the attacker can easily approach the publicly placed smart TV. Therefore, the attacker can not only monitor (and exploit) the remote communication between the TV and its accessories, but can also actively send malicious signals to the screen. However, if the attacker significantly changes the content on the screen, the attack is easily discovered. Thus attacks against publicly accessed smart TV must be stealthy.
2. Personal Smart TVs
As personal smart TVs are generally placed in private spaces, such as homes or hotel rooms, the attacker cannot easily get close to them. The attacker would then have to place a malicious device in the private space, or compromise a vulnerable device inside the space, and use it as a relay device to launch the follow-up attacks, as shown in Figure 1.
With such a relay device, an attacker is able to remotely communicate and interact with smart TV (e.g., send commands and receive TV private data) instead of being physically present into the private places.
Like to publicly accessed smart TVs, attacks for personal smart TVs must also be stealthy. The attacker should launch attacks during certain time periods when the TV is seldom used (e.g., late nights) without the user noticing anything and then wait for the victim to use the smart TV. This is feasible because when the user sends a “power off” command to a smart TV, most smart TVs do not actually power off themselves but only turn off the screen and keep the OS running. Hence, an attacker will be able to send a “power on” command to wake up the TV and perform follow-up attacks.
Figure 1. Attack Scenarios for Personal Smart TVs
Relay device:
We expect that such a relay device should satisfy the following conditions:
it can constantly sniff the victim network and actively send various types of signals (i.e., IR, BLE and Wi-Fi) to the smart TV;
it can be fully controlled by an attacker, and would be able to send captured data to the remote server (i.e., attacker) or receive data from the attacker.
Previous works [1, 2, 3, 4, 12, 13, 14] have proved that these requirements for a relay device are achievable. Therefore the attacker would not need to enter the private place. For example, the attacker could get full control of the relay devices before the device is placed into the victim’s house through supply chain attack [11], or compromise vulnerable IoT devices already in the house by exploiting unauthenticated services [15] or cracking weak passwords [16].
A generalized EvilScreen attacking process consists of three consecutive steps, as shown in Figure 2:
❶ Network isolation bypassing. The goal of the EVILSCREEN attack is to hijack the TV screen and perform more complicated malicious activities with the use of the companion app via a Wi-Fi channel. Hence, an attacker (or a relay device) should first connect to the same Wi- Fi as the victim's smart TV. The reason is that most TV manufacturers restrict the communication between smart TV and companion app to the same Wi-Fi.
❷ Malicious remote control binding. After the first step, the attacker can communicate with the victim smart TV. In order to further control the smart TV, the attacker must be able to construct a malicious TV-Accessory binding between the victim smart TV and the companion app on the attacker’s smartphone (or a simulated one).
❸ Remote interface abusing. If the binding is successful, the smart TV then can receive, parse, and execute commands sent from the companion app. However, in order to perform sensitive operations and hijack the smart TV screen, the attacker still has to bypass the permission check.
To launch EvilScreen attack successfully, those three steps must be executed sequentially. The reason is that: (i) only after connecting to the same Wi-Fi, the smart TV and the companion app will be able to communicate with each other; and (ii) only after binding successfully with the smart TV, the companion app is able to control the smart TV. After that, if the attacker wants to execute some sensitive operations, then he should perform the third step. Furthermore, before performing the three attack steps, a pre-analysis is necessary to detect if a victim smart TV is vulnerable to the EvilScreen attack. Besides, an EvilScreen attack should be finally launched via the Wi-Fi channel (as well as using companion apps) since stealthiness is required. Otherwise, an attacker can only perform complex TV operations by moving the cursor on the screen step by step, which will be easily noticed.
Figure 2. A typical flow of EvilScreen attack: it conducts a consecutive three-step attacking process to circumvent common protection measures
Weakness
Although authentication schemes for smart TVs are securely designed and implemented, they have the major issue of authentication dependency, that is, Wi-Fi credentials are transmitted via other channels. More specifically, we found out that Wi-Fi provisioning is commonly executed via IR/BLE communications; however both IR and BLE communications have vulnerabilities. As a result, Wi-Fi provisioning for smart TV is vulnerable to passive and active attacks (e.g., man-in-the-middle attacks and impersonation attacks).
In order to bypass network isolation, the attacker could either retrieve the Wi-Fi credentials to connect a malicious remote control to the secure Wi-Fi or force the smart TV to re-connect to a new malicious Wi-Fi. To understand how to carry out such attacks, we analyze IR/BLE communications to exploit authentication dependency.
Analysis
To detect authentication dependency, we capture the IR/BLE wireless signals to analyze the authentication schemes utilized by each type of signal. In advance, we checked the user instruction and the technical manual of each smart TV to confirm which types of wireless channels are supported. Then we execute Wi-Fi provisioning. If either IR or BLE is supported, we check whether they are used for distributing Wi-Fi credentials, and the security of the distribution.
To analyze the IR signals, we leverage the existing public databases [37], [38] and mobile apps with IR remote control functions (e.g., Universal Remote Control [39]) to integrate the existing encoding and decoding methods. Then an IR receiver is used to capture the IR signals from remote controls during network provisioning, and we decode the IR signals by utilizing the integrated decoding methods. If Wi- Fi credentials (i.e., Wi-Fi SSID and password) are identified from IR signals, we consider the smart TV as suffering from authentication dependency issues. We further verify if there is any authentication mechanism embedded in the IR channel. In particular, we build an IR simulator on top of an IR emitter [40], [41] and use the simulator to send simulated IR signals in different encoding schemes. If the smart TV accepts these IR signals and executes the corresponding operations, we consider the Wi-Fi credential distribution as authentication dependent over a vulnerable IR channel that cannot protect against active impersonation attacks.
For analyzing the BLE signals, we leverage TI CC1352 Development Board [42] to sniff BLE packets. We first analyze the packets and examine the Secure Connection-bit in the BLE Pairing packets. If the Secure Connection-bit is set to 0, it indicates that the devices are bound via Legacy Connection, which is vulnerable to brute force attacks [23]. When a Legacy Connection vulnerable BLE scheme is identified, we further utilize Crackle [43] to decrypt the BLE packets, learn the data formats and check whether there are any creden- tials included. Similar to the IR channel analysis, if the BLE packets contain Wi-Fi credentials, we consider credential distribution as authentication dependent on an insecure BLE channel, which is vulnerable to passive man-in-the-middle (MITM) and brute force attacks. Additionally, we determine which BLE pairing mode is used by checking the MITM- bit and IOCaps [44]. If none of the MITM-bits in both the exchanged device pairing features is 1, the mode is Just Works, in which no authentication mechanism is applied. In this case, we also regard Wi-Fi provisioning procedure as authentication dependent on a BLE channel vulnerable to active impersonation attacks.
Attack
By the pre-analysis we can determine whether a smart TV is vulnerable to authentication dependency and learn the encoding scheme and protocol formats used in IR and BLE communication. Then we can launch either passive attacks or active attacks to compromise the smart TV. When Wi-Fi credentials are identified from the vulnerable IR or BLE communications with Legacy Connection, we use the extracted Wi-Fi credentials to connect a malicious device to the same Wi-Fi. Alternatively, if the Wi-Fi provisioning procedure is authentication dependent on a vulnerable IR or BLE channel without any authentication mechanisms against active impersonation attacks, we impersonate a legal user to compromise the smart TV. By connecting a malicious remote control with the TV, we are able to reuse the encod- ing methods or protocol message formats and then force the TV to reconnect a new malicious Wi-Fi.
Weakness
Due to efficiency and usability reasons, manu- facturers usually deploy light-weight, easy-to-understand, but insecure binding mechanisms. In order to bind a smart TV with its remote control, manual attestation is generally required. However, most smart TVs only display a device name or a binding token on the screen. This information is insufficient for users to distinguish whether the request is sent from a legitimate remote control or a malicious one; thus the binding mechanisms are vulnerable to impersonate attacks.
Specifically, we consider two types of vulnerable binding mechanisms.
Insufficient binding information. We consider that the binding information provided by a smart TV to the user should be enough to distinguish two remote controls. To analyze such an issue, we send binding requests with different identifiers and then execute UI differential analysis to investigate the binding information. The binding authentication is vulnerable if the display information is constant or only limited device information is provided. For example, some smart TVs adopt a binding mechanism that only displays the name of the remote control (i.e., physical remote control or companion app). This is insufficient for user authentication because an attacker can easily modify the device name with the legitimate to confuse the user to confirm a malicious binding.
Insecure binding tokens. Some smart TVs adopt the binding tokens as authentication credentials for remote control binding. For example, the smart TV may display four digits on screen and require the user to input these digits on the companion app. In a special case, the attacker can directly obtain the binding tokens if he is able to watch the TV screen or the relay device is a camera. But we did not make such assumptions in our attack model. To make our attack more general, we consider three types of ill-implemented binding tokens:
Some vendors embed a fixed binding token in the companion app. As a result, it is easy for an attacker to extract the token by reverse engineering the app.
Some smart TVs, when displaying a binding token (e.g., 4-digits) on the TV screen, simultaneously broadcast the binding token in plaintext via an out-of-band channel such as BLE. The attacker can then monitor IR and BLE channels and intercept the communication packets to obtain the corresponding token.
Some binding tokens are not implemented securely, that is, they do not have enough entropy to defend against a brute force attack. Hence, if the binding token is detected as vulnerable in this way, the attacker can perform a brute force attack to recover the binding token.
Hence, we first investigate what binding information is displayed on the screen and exploit the transmission channels to identify whether the binding information is verified by the smart TV.
Analysis
To exploit the protection schemes of remote con- trol binding, we execute UI differential analysis to investigate the binding information utilized by the smart TV and how the binding request is formed. First, we use different smart remote controls with unique identifiers (e.g., series numbers and MAC addresses) to send binding requests to each smart TV manually. Then we record the binding information displayed on the screen while using different remote controls. Figure 3 shows two typical UIs diplayed when remote control binding. By comparing the information generated for different remote controls, we examine whether the displayed information is invariant. The binding authentication is regarded as vulnerable if the display information is con- stant or only limited device information is provided.
Figure 3. Two typical UI cases when binding
3. Attack
After having obtained the binding information, we modify the binding request to connect a malicious remote control with the smart TV, as shown in Figure 4. First, we check whether the binding request is properly validated by the smart TV. By monitoring network traffic, we intercept communication packets to study the packet format. If the binding request is transmitted over an insecure communication channel, we can directly explore the packet format and further change the authentication-related fields containing binding information (e.g., “username”, “password”, “device name”) with the legitimate remote control information and then send the modified packets to the smart TV to request for binding. For the binding requests protected by the SSL/TLS protocol, we use Burp Suite [45] to retrieve the binding request format and then replace the authentication-related fields with the legitimate information. The forged request is then sent from a fake client. If the smart TV accepts the request and displays indistinguishable binding information on the screen, we consider such a binding mechanism as vulnerable and a malicious remote control binding can be established.
Some smart TVs rely on binding tokens to protect against impersonate attacks. Nonetheless, such a binding scheme is still vulnerable because the involved token is not well-protected. In particular, a binding token may be broadcast to the remote control, or embedded in the companion app by default. If the token is broadcast, we intercept the communication packets to obtain the corresponding token. Otherwise, we reverse engineer the companion app to retrieve the token. If the binding token is implemented without enough entropy, brute force methods can be considered to obtain such a token.
Figure 4. Malicious remote control binding procedure
Weakness
As a smart TV stores a variety of sensitive re- sources (e.g., system setting, media files, user configuration), the smart TV checks access permissions to avoid arbitrary resource access. However, unlike personally owned devices (e.g., smartphones, laptops), smart TVs commonly apply a coarse-grained access control to protect against unau- thorized access because they are commonly shared by a group of people such as a family (private-use) or consumers (public-use). Such an authorization scheme has permission check weaknesses that can be exploited by an attacker to access resources without having the corresponding permis- sions. Considering the usage purposes and scenarios of smart TVs, we believe that smart TVs should be developed with restrictive fine-grained access control [46]. That is, users connecting to the smart TV should not be able to access all resources but be allowed to use basic functionali- ties. Only after the TV owner grants the proper permissions, users should perform sensitive operations.
In order to check whether the permissions are properly granted, we investigate the protocols for remote interactions and forge commands to access resources to which we are not unauthorized for access. Note that we focus on analyzing Android companion apps because most users operate the TV smart features (e.g., screencast and screenshot) by using their smartphone.
Analysis
To study the protocols for remote interactions, we first capture network traffics transmitted between the smart TV and its companion app by usingtcpdump [47]. By analyzing network packets, we identify the communication protocol (e.g., MQTT, HTTP and private application layer protocols over TCP or UDP) used for transmitting control commands by Wireshark [48]. According to the communication protocol, we determine which standard APIs [49], [50] should be applied for sending and receiving data. For example, the API <java.net.Socket: java.io.OutputStream getOutputStream()> is adopted for TCP communi- cation, while <java.net.DatagramSocket: void send(java.net.DatagramPacket)> is used for UDP.
After retrieving network configurations, we recover remote interaction protocols and identify the authentication fields. Specifically, we utilize JEB [51] and IDA PRO [52] to reverse engineer the companion app. Starting from each argument of identified network APIs, we carry out a backward program slicing to identify the variables that are directly/indirectly data-dependent on the argument to determine how the argument is constructed. Given the Data Dependence Graph (DDG), we further identify the authentication variables that are related to access control. If the variable is assigned a constant or a value generated by a pseudo-random number generator or a timestamp, we consider the variable irrelevant to access control since these values do not contain any identity information. Otherwise, when a variable is assigned by a value related to the smart TV (e.g., binding credentials) or a return value of a memory- read function (e.g., <android.content.Context: java.io.FileOutputStream openFileOutput (java.lang.String, int)>), we conclude that the variable is relevant to access control.
Attack
Given the variables that are related to access con- trol, we launch remote interface abusing attacks to execute sensitive operations such as screencast and screenshot. First, we extract how the operation commands are formed and then modify the values of these variables by using the dynamic instrumentation framework Frida [53]. We further send the modified commands via a malicious device. If the smart TV is operated successfully without any warnings and we can access the unauthorized resources arbitrarily, we re- gard the remote interaction access control of the smart TV as vulnerable. For example, the attacker can send a screenshot command to the smart TV when a user is using the TV. Then the content displayed on the TV will be captured and leaked.
To investigate whether protections are securely implemented in real-world smart TVs, we tested eight smart TVs manufactured by different well-known manufacturers [54], i.e., Samsung, TCL, Hisense, Xiaomi, Sony, Skyworth, LeTV, and Konka from China, Japan, Korea and the United States. Shipments of smart TVs from those manufacturers range from 7 million (Konka) to 48 million (Samsung).
All these eight smart TVs are equipped with smart TV OSes and remote controls. Details about each smart TV are listed in Table 1. Most manufacturers customize their smart TV OSes on top of Android TV [9]. Since Android TV and Samsung TV are two widely popular smart TVs throughout the world and dominate the global market [55], we focus on them in our experiments.
We list the technical details of each smart TV and its remote control in Table 2.
Attacks for smart TVs can result in serious security and privacy issues. We discuss two of those in what follows.
Illegal content display
Most smart TVs support screencast functions, which can display a smartphone content on the TV screen for multi-people watching. However, if a smart TV screen is hijacked by an attacker, the attacker would be able to cast any illegal contents (e.g., bloody or violent videos) to the TV screens of people he/she wants to victimize, which may damage people’s health, especially in the case of families with kids. Besides, if the attacker is a profiteer or malicious merchant, he/she could cast nasty contents on the screen of the victims to denigrate competing products or display his/her product advertisements to potential customers to earn profits.
Leakage of private data
The screenshot function is supported by many smart TVs. With this function, a user can monitor what the TV is displaying on the TV screen via the companion app even if the user is not in front of the TV. Unfortunately, this could lead to a serious threat to users’ privacy. If an attacker can hijack one’s TV screen and monitor the contents watched by the user, the attacker will be able to learn information about the user’s interests, political orientation and personality [33], [34]. What’s worse, since more and more people working at home frequently cast their computer screens on the smart TV during meetings to have an enhanced viewing experience, confidential company information or secret documents may be leaked via such a TV attack. Besides, some smart TVs also allow users to sign into the TV apps or pay for subscriptions, such as YouTube [35] and Disney Plus [36]. If the attacker takes a screenshot when the user is inputting her account information or payment information, the user will not only suffer from personal privacy breaches but also from economic losses.
Our attack has to ultimately utilize the companion app (or a simulated one) to perform sensitive operations and achieve its malicious goals (Section 4, Page 5). The reason is that the functionalities provided by a physical IR/BLE remote control are very basic and limited. Such a remote control can be only used to execute several simple operations like screen cursor moving, channel switching, power off/on and volume up/down. In comparison, a smart TV companion app can provide more complicated and advanced functionalities, like app installation, screencast and screenshot.
Although an attacker would be likely to perform an EvilScreen attack by controlling the screen cursor with physical remote controls to drive a sequence of actions step by step, such a process will take longer time and more operation steps, which would be easily noticed by the user. Therefore, the attack has to utilize the companion apps to remain stealthily.
For example, with companion app as a virtual remote control, when a user wants to install an app on her smart TV, she could select one TV app (e.g., options in the companion app or local apk files in the smartphone) and then the companion app would directly send a "installapp" command with the app information (e.g., package name, app download link, or the whole apk file) to the smart TV, which will parse this command and perform the app installation operation. In this case, if the attacker wants to install a malicious app on a smart TV, with companion apps, he could directly send a “installapp” command but replace the app information with the malicious one. If the request is not protected well or the smart TV does not securely check the user identity, the malicious app would be installed on the smart TV. However, without the companion app, an attacker would have to take more steps with physical IR/BLE remote controls to move the cursor to enter into the TV app store, search the app and install it. The attack procedure would take more time, thus greatly increasing the risk of being noticed by the user, or not even being completed (if the malicious app is not included in the TV app store).
It is noteworthy that a recent study [2] has demonstrated a less powerful attack model if the attacker only utilizes IR signals: instead of directly control the smart TV, the proposed IR based attack is mainly used to collect the leakage of the IR communication pattern and finally lead to privacy issues such as inferring users' interests, activities, and account information. Our EvilScreen attack advances previous works by combining less powerful and yet not-well-protected IR and BLE signals and the powerful Wi-Fi based remote control (including the companion app) operations to fulfill a more severe attack against many smart TVs, which would finally lead to privacy issues such as inferring viewers' interests, activities, and account information.
The smart TV access control discussed in this paper is in a narrow sense (Section 4.4, Page 7). Taking into account both usability and security, we believe that not each user connecting to the smart TV should be able to access all resources but should only execute basic actions. In general, the smart TV users should be able to perform sensitive operations only after receiving the proper permissions by an authenticated user. In this way, even if the attacker can bind with the smart TV and interact with it, he is still unable to perform advanced sensitive operations without the authenticated user’s permission.
This hypothesis is acceptable but not necessary (Section 3, Page 3) since our aim is to construct a more general smart TV attack. If an attacker can walk into the victim’s house, it would make launching our attack much more easy. But the attack steps and results would be similar regardless of whether the attacker can physically access the smart TV or not.
If the attacker can walk into the victim’s house (for instance the technician), he would be able to physically access the smart TV and its remote control, and observe the content (e.g., binding token) displayed on the TV screen. The attacker could then easily launch the EvilScreen attack by connecting to the same Wi-Fi with the smart TV, binding the companion app (logged with his account) to the smart TV and performing sensitive operations, finally hijacking the TV screen. But only the first two steps are simplified in this case, a relay device is still necessary if the attacker wants to launch attacks remotely after leaving the house.
In many situations, however, the attacker would not be able to enter the victim’s house. Hence, we consider a more general method to launch our attack, that is, using a relay device. In our attack model, we assume that the attacker can place a malicious device in the victim's home (e.g., by sending it as a gift), or compromise a vulnerable device already inside the home, and then use it as a relay device to launch the follow-up attacks (Section 3.1.2, Page 4). Such a relay device should satisfy two main requirements:
It is able to constantly sniff the victim network and actively send various types of signals (i.e., IR, BLE and Wi-Fi) to the smart TV.
It can be fully controlled by the attacker, and is able to send captured data to the remote server (i.e., attacker) or receive data from the attacker.
With such a relay device, the attacker would be able to communicate and interact with the smart TV remotely, thus launching the attacks remotely.
Previous work [1, 2, 3, 4, 12, 13, 14] have shown that these requirements for a relay device can be satisfiable and thus the attacker would be able to perform the attacks remotely. For example, the attacker could get full control of the relay devices before placing into the victim’s house through a supply chain attack [11], or compromise vulnerable IoT devices already at home by exploiting unauthenticated services [15] or cracking weak passwords [16].
Another factor that makes a remote EvilScreen attack feasible is that most smart TVs do not actually power off themselves. Instead, when the user sends a “power off” command to a smart TV, the smart TV only turns off the screen but still keeps its OS running. Therefore, an attacker could simply send a “power on” command to wake up the smart TV and launch the follow-up attacks.
We focus on Android TV and Samsung TV in this paper, because they are the two most popular smart TVs throughout the world and dominate the global market [10]. But our analysis methods and attack steps are generic. For smart TVs with other OSes, e.g., Apple OS, we can still apply the same pre-analysis on them to detect if there are EvilScreen vulnerabilities in these smart TVs. If all three protections for smart TV are incorrectly implemented, then we consider them vulnerable to the EvilScreen attack.