Symposium‎ > ‎

2014 CSE Abstracts

Troubleshooting Quality Of Experience For Cellular Networks

Authors: Faraz Ahmed; Alex X. Liu; He Yan; Ziuhi Ge; Jeffrey Erman; Jia Wang

Abstract: Quality of experience (QoE) of a user is one of the main factors which determine the reputation of cellular network operators. In addition to the performance of cellular network nodes, the hardware and software performance of different device types and applications also largely define a user's QoE over the cellular network. This project deals with the problem of maintaining QoE of cellular network users by proactive detection of service degradation issues. Although, cellular network providers utilize existing end-to-end service quality management systems for detecting issues inside the network, but under certain conditions issues affecting QoE of a group of customers may go undetected. These conditions may arise due to problems in different dimensions such as mobile devices, applications, websites and network nodes. We analyze aggregated TCP flow data across different geographical regions over a period of six weeks to develop a regression based model for estimating the network performance perceived by groups of users. Our analysis show that specific user groups experience significantly different performance behavior as compared to performance perceived by most of the users because of their association with a particular device type and/or an application and/or a website and/or a network node. We design a holistic performance monitoring system to detect and localize issues, causing service degradation for groups of customers sharing one or more of the above mentioned dimensions. Through a recursive rule mining approach we show that not only the overall training error is reduced but also there is a significant decrease in false positives.


Effect Of Decoder On Bit Error Rate

Authors: Alireza Ameli Renani; Jun Huang; Guoliang Xing; Abdol-Hossein Esfahanian

Abstract: Today wireless communications suffer from high transmission error especially in high data rates. It was believed that channel condition is responsible for most of these errors but recent research have proved that there are patterns in the bit error rate which are not caused by channel conditions. It turns out that the bit error pattern has a fluctuating nature which can potentially be used to improve the accuracy and throughput of the system, in applications such as video streaming.

Our research has validated the existence of such a pattern in different environments and across different devices. Further, we have demonstrated that this behavior depends mainly on the transmission rate. It appears that the pattern is caused by the decoder in the receiver. Unfortunately the decoder codes are not available to public so we cannot find what part of the algorithm is causing this behavior.

One major advantage of the bit error pattern is that it is known prior to the transmission, so no handshake or synchronization is required in order to use it. To utilize our findings, we formulated the pattern and used it in video streaming. Our preliminary results from simulations show more than 15% improvement in throughput by utilizing the pattern. We are expecting more improvement in next set of experiments.


3D Fingerprint Phantoms

Authors: Sunpreet S. Arora; Kai Cao; Anil K. Jain; Nicholas G. Paulter, Jr.

Abstract: One of the critical factors prior to deployment of any large scale biometric system is to have a realistic estimate of its matching performance. In practice, evaluations are conducted on the operational data to set an appropriate threshold on match scores before the actual deployment. These performance estimates, though, are restricted by the amount of available test data. To overcome this limitation, use of a large number of 2D synthetic fingerprints for evaluating fingerprint systems had been proposed. However, the utility of 2D synthetic fingerprints is limited in the context of testing end-to-end fingerprint systems which involve the entire matching process, from image acquisition to feature extraction and matching. For a comprehensive

evaluation of fingerprint systems, we propose creating 3D fingerprint phantoms (phantoms or imaging phantoms are specially designed objects with known properties scanned or imaged to evaluate, analyze, and tune the performance of various imaging devices) with known characteristics (e.g., type, singular points and minutiae) by (i) projecting 2D synthetic fingerprints with known characteristics onto a generic 3D finger surface and (ii)

printing the 3D fingerprint phantoms using a commodity 3D printer. Experimental results show that the captured images of the 3D fingerprint phantom using state-of-the-art fingerprint sensors can be successfully matched to the 2D synthetic fingerprint images (from which the phantom was generated) using a commercial fingerprint matcher. This demonstrates that our method preserves the ridges and valleys during the 3D fingerprint phantom creation process ensuring that the synthesized 3D phantoms can be utilized for comprehensive evaluations of fingerprint systems.

This work was supported in part by National Institute of Standards and Technology (NIST)


Face Recognition: Identifying A Person Of Interest

Authors: Lacey Best-Rowden; Hu Han; Charles Otto; Anil K. Jain

Abstract: As face recognition applications progress from constrained and controlled scenarios (e.g., driver license photos) to unconstrained and uncontrolled scenarios (e.g., video surveillance), new challenges are encountered ranging from illumination, image resolution, and background clutter to facial pose, expression, and occlusion. In forensic investigations where the goal is to identify a "person of interest" based on low quality evidence, we need to utilize whatever information is available about the suspect. This could include one or more video sequences, multiple still images captured by bystanders, and descriptions of the suspect provided by witnesses. The description of the suspect could lead to drawing of facial sketch and provide some ancillary information about the suspect (age, gender, race, scars, marks, and tattoos). While traditional face matching methods take single media (a still face image, video track, or face sketch) as input, our research considers the entire media collection as a probe or query to generate a single candidate list for the person of interest. We show that our approach boosts the likelihood of forensic identification through the use of different fusion schemes, three-dimensional face models, and incorporation of quality measures for fusion and video frame selection.

This work was supported in part by National Physical Science Consortium Graduate Fellowship


Multi-Kernel Multi-Label Ranking

Authors: Serhat S. Bucak; Anil K. Jain

Abstract: Recent studies have shown that multiple kernel learning is very effective for image classification, leading to the popularity of kernel learning in computer vision problems. In this work, we formulate image classification as a multi-label learning problem and develop an efficient algorithm for multi-label multiple kernel learning (ML-MKL). We assume that all the classes under consideration share the same combination of kernel functions, and the objective is to find the optimal kernel combination that benefits all the classes. In addition, we address multi-label learning with many classes via a ranking approach, termed multi-label ranking. Given a test image, the proposed scheme aims to order all the object classes such that the relevant classes are ranked higher than the irrelevant ones. We propose a wrapper approach that learns the ranking functions and optimal linear combination of base kernels simultaneously. Our experiments on ESP Game and MIRFlickr image datasets demonstrate the superior performance of the proposed multi-kernel multi-label ranking approach for image classification.


Automatic Facial Makeup Detection With Application In Face Recognition

Authors: Cunjian Chen; Antitza Dantcheva; Arun Ross

Abstract: Facial makeup has the ability to alter the appearance of a person. Such an alteration can degrade the accuracy of automated face recognition systems, as well as that of methods estimating age and beauty from faces. In this work, we design a method to automatically detect the presence of makeup in face images. The proposed algorithm extracts a feature vector that captures the shape, texture and color characteristics of the input face, and employs a classifier to determine the presence or absence of makeup. Besides extracting features from the entire face, the algorithm also considers portions of the face pertaining to the left eye, right eye, and mouth. Experiments on two datasets consisting of 151 subjects (600 images) and 125 subjects (154 images), respectively, suggest that makeup detection rates of up to 93.5% (at a false positive rate of 1%) can be obtained using the proposed approach. Further, an adaptive pre-processing scheme that exploits knowledge of the presence or absence of facial makeup to improve the matching accuracy of a face matcher is presented.

This work was supported in part by the NSF Center for Identification Technology Research (CITeR).


Identifying Transcription Start Sites And Transcription End Sites Of MiRNAs In C.elegans

Authors: Jiao Chen; Yanni Sun

Abstract: MiRNAs are crucial small non-coding RNAs that regulate gene expression in the growth period of C.elegans. Transcriptional regulation of miRNAs is critical because it directly affects miRNA-mediated gene regulatory networks. However, the transcription start sites (TSSs) and transcription termination sites (TTSs) of most miRNA genes have not been characterized because pri-miRNAs are quickly spliced in cells. Here, we performed a whole genome analysis of DNA sequence, chromatin signatures, and Polymerase II surrounding intergenic miRNAs in C.elegans genome to identify their TSSs and TTSs. Our results will improve the understanding of the regulation of miRNAs.


Visual Diagram Interpretation For Blind Programmers

Authors: Sarah Coburn; Charles Owen

Abstract: Computer Science education frequently demonstrate program structure through the use of visual diagrams (such as UML diagrams), which are largely inaccessible to blind programmers. These diagrams show things like relationships between objects in the program using lines to connect shapes, with types of shapes indicating the types of objects and relationships. This heavy reliance on visual cues (such as peripheral information and complicated connections between objects) is a hurdle that blind programmers must get over in order to succeed academically. Some programs exist to translate UML diagrams into a format readable by blind programmers, but frequently do not accurately and efficiently communicate all of the essential information. We developed a program that will automatically interpret UML diagrams into an auditory format. Information is related using a combination of audio tones and text to speech audio presented in stereo to help relate location. We will also discuss information that should be included by any diagram translator, and the future directions of this research.


A Difference Resolution Approach To Compressing Access Control Lists

Authors: James Daly; Alex X. Liu; Eric Torng

Abstract: Access Control Lists (ACLs) are the core of many networking and security devices. As new threats and vulnerabilities emerge, ACLs on routers and firewalls are getting larger. Therefore, compressing ACLs is an important problem. We present a new approach, called Diplomat, to ACL compression. The key idea is to transform higher dimensional target patterns into lower dimensional patterns by dividing the original pattern into a series of hyperplanes and then resolving differences between two adjacent hyperplanes by adding rules that specify the differences. This approach is fundamentally different from prior ACL compression algorithms and is shown to be very effective. We implemented Diplomat and conducted side-by-side comparison with the prior Firewall Compressor, TCAM Razor and ACL Compressor algorithms on real life classifiers. Our experimental results show that Diplomat outperforms all of them on most of our real-life classifiers, often by a considerable margin, particularly as classifier size and complexity increases. In particular, on our largest ACLs, Diplomat has an average improvement ratio of 30.6% over Firewall Compressor on range-ACLs, of 12.1% over TCAM Razor on prefix-ACLs, and 9.4% over ACL Compressor on mixed-ACLs.

This work was supported in part by Nation Science Foundation Grant No. CNS-0916044


iSleep: Unobtrusive Sleep Quality Monitoring Using Smartphones

Authors: Tian Hao; Guoliang Xing; Gang Zhou

Abstract: The quality of sleep is an important factor in maintaining a healthy life style. To date, technology has not enabled personalized, in-place sleep quality monitoring and analysis. Current sleep monitoring systems are often diffcult to use and hence limited to sleep clinics, or invasive to users, e.g., requiring users to wear a device during sleep.

iSleep is a practical system to monitor an individual's sleep quality using off-the-shelf smartphone. It uses the built-in microphone of the smartphone to detect the events that are closely related to sleep quality, including body movement, cough and snore, and infers quantitative measures of sleep quality. By providing a fine-grained sleep profile that depicts details of sleep-related events, iSleep allows the user to track the sleep efficiency over time and relate irregular sleep patterns to possible causes.

This work was supported in part by This work is supported in part by the NSF under grant CNS-0954039 (CAREER), CNS-1250180 and ECCS-0901437.


WiFi-BA: Choosing Arbitration Over Backoff In High Speed Multicarrier Wireless Networks

Authors: Pei Huang; Xi Yang; Li Xiao

Abstract: Advancements in wireless communication techniques have increased the wireless physical layer (PHY) data rates by hundreds of times in a dozen years. The high PHY data rates, however, have not been translated to commensurate throughput gains due to overheads incurred by medium access control (MAC) and PHY convergence procedure. At high PHY data rates, the time used for collision avoidance (CA) at MAC layer and the time used for PHY convergence procedure can easily exceed the time used for transmission of an actual data frame. As collision detection (CD) in wireless communication became feasible recently, some protocols migrate random backoff from the time domain to the frequency domain, but they fail to address the introduced high collision probability. We investigate the practical issues of CD in the frequency domain and introduce a binary mapping scheme to reduce the collision probability. Based on the binary mapping, a bitwise arbitration (BA) mechanism is devised to grant only one transmitter the permission to initiate data transmission in a contention. With the low collision probability achieved in a short bounded arbitration phase, the throughput is significantly improved. Because collisions are unlikely to happen, unfairness caused by capture effect of radios is also reduced. The bitwise arbitration mechanism can further be set to let high priority messages get through unimpeded, making WiFi-BA suitable for real time prioritized communication. We validate the effectiveness of WiFi-BA through implementation on FPGA of USRP E110. Performance evaluation demonstrates that WiFi-BA is more efficient than current Wi-Fi solutions.


ARC: Adaptive Reputation Based Clustering Against Spectrum Sensing Data Falsification Attacks

Authors: Chowdhury Hyder; Brendan Grebur; Li Xiao; Max Ellison

Abstract: IEEE 802.22 is the first standard based on the concept of cognitive radio. It recommends collaborative spectrum sensing to avoid the unreliability of individual spectrum sensing while detecting primary user signals. However, it opens an opportunity for attackers to exploit the decision making process by sending false reports. In this paper, we address security issues

regarding distributed node sensing in the 802.22 standard and discuss how attackers can modify or manipulate their sensing result independently or collaboratively. This problem is commonly known as spectrum sensing data falsification (SSDF) attack or Byzantine attack. To counter the different attacking strategies, we propose a reputation based clustering algorithm that does not require prior knowledge of attacker distribution or complete identification of malicious users.We provide an extensive probabilistic analysis of the performance of the algorithm. We compare the performance of our algorithm against existing approaches across a wide range of attacking scenarios. Our proposed algorithm displays a significantly reduced error rate in decision making in comparison to current methods. It also identifies a large portion of the attacking nodes and greatly minimizes the false detection rate of honest nodes.


A Wireless Sensor Network Within An Aquatic Environment

Authors: Tam Le; Matt Mutka

Abstract: Wireless sensor networks have been widely used in many environmental monitoring applications. For aquatic environments, the deployment is quite expensive since the sensors need to be anchored to prevent them from floating away and losing communications. We propose an inexpensive and flexible approach to provide environmental monitoring in aquatic environments. We propose a special mobile sensor robot that acts as a mobile base station and travels the water area to collect data from sensors as well as locations that cannot be covered by sensors. The sensors in the water have a jumping capability that enables an extended communication range in comparison to sensors that merely float upon the water. By leveraging the jumping capability, the sensors can collaborate with others to exchange data and communicate with the robot, so that the robot can compute an efficient path to travel. The problems we study are: 1) given a set of visited points, how to find the robot’s optimal path with support of sensors to cover the remaining points; 2) to design an efficient jumping strategy and communication protocol between sensors.


miR-PREFeR: An Accurate, Fast, And Easy-To-Use Plant miRNA Prediction Tool Using Small RNASeq Data

Authors: Jikai Lei; Yanni Sun

Abstract: Plant microRNA prediction tools that utilize small RNA sequencing data are emerging with the advances of the next generation sequencing technology. These existing tools have at least one of the following problems: 1. high false positive rate; 2. the positions of the predicted miRNAs are not accurate; 3. long running time; 4. work only for genomes in their databases; 5. hard to install or use. We develop miR-PREFeR, which utilizes expression patterns of miRNA and follows the criteria for plant microRNA annotation to accurately predict plant miRNAs from one or more small RNA-Seq data samples of the same species. We tested miR-PREFeR on several plant species. The results show that miR-PREFeR is sensitivity, accurate, fast, and has low memory footprint.

This work was supported in part by NSF


Discrete Connection And Covariant Derivative For Vector Field Analysis And Design

Authors: Beibei Liu; Fernando de Goes; Yiying Tong; Mathieu Desbrun

Abstract: In this paper, we introduce a discrete definition of connection on simplicial manifolds, with closed-form continuous expressions within simplices and finite rotations across simplices. The finite-dimensional parameters of this connection are optimally generated by minimizing a quadratic measure of the deviation to the discontinuous connection induced by the embedding of the input mesh. We also construct from this discrete connection a covariant derivative through exact differentiation, leading to analytical expressions for local integrals of first-order derivatives (such as divergence, curl and the Cauchy-Riemann operator), and for L2-based energies (such as the Dirichlet energy). We finally demonstrate the utility, flexibility, and accuracy of our discrete formulations for the design and analysis of vector, n-vector, and n-direction fields. 

This work was supported in part by NSF


Learning To Mediate Perceptual Differences In Situated Human Robot Dialogue

Authors: Changsong Liu; Joyce Chai

Abstract: To support natural interaction between a human and a robot, technology enabling human-robot dialogue has become increasingly important. In human-robot dialogue, although a robot and its human partner are co-present in a shared environment, they have significantly mismatched perceptual capabilities (e.g., recognizing objects in the surroundings). When a shared perceptual basis is missing, communication about the shared environment often becomes difficult, such as identifying referents in the physical world that are referred to by the human (i.e., a problem of referential grounding). To overcome this challenging problem, we have developed an optimization based approach that allows the robot to quickly adapt to the perceptual differences. Given any new situation, through a couple of dialogues, the robot can quickly learn a set of weights indicating how reliable/unreliable each dimension of its perception of the environment maps to human’s linguistic expressions. The robot then adapts to the situation by applying the learned weights for grounding linguistic expressions to physical entities. Our empirical results have shown that, when the perceptual difference is high (i.e., the robot can only correctly recognize 10-40% of objects in the environ-ment), applying learned weights significantly improves referential grounding performance by an absolute gain of 10%.

This work was supported in part by N00014-11-1-0410 from the Office of Naval Research and IIS-1208390 from the National Science Foundation.


TAS-MAC: A Traffic-Adaptive Synchronous MAC Protocol For Wireless Sensor Networks

Authors: Pei Huang; Chin-Jung Liu; Li Xiao

Abstract: Duty cycling improves energy efficiency but lim- its throughput and introduces significant end-to-end delay in wireless sensor networks. In this paper, we present a traffic- adaptive synchronous MAC protocol (TAS-MAC), which is a high throughput low delay MAC protocol tailored for low power consumption. It achieves high throughput by using Time Division Multiple Access (TDMA) with a novel traffic-adaptive allocation mechanism that assigns time slots only to nodes located on active routes. TAS-MAC reduces the end-to-end delay by notifying all nodes on active routes of incoming traffic in advance. These nodes will claim time slots for data transmission and forward a packet through multiple hops in a cycle. The desirable traffic-adaptive feature is achieved by decomposing traffic notification and data transmission scheduling into two phases, specializing their duties and improving their efficiency respectively. Simulation results and tests on TelosB motes demonstrate that the two-phase design significantly improves the throughput of current synchronous MAC protocols and achieves the similar low delay of slot stealing assisted TDMA with much lower power consumption.


Hierarchical Classification Of Mobile Applications Using Semi-Supervised Non-Negative Matrix Tri-Factorization

Authors: Xi Liu; Pang-Ning Tan; Han Hee Song; Mario Baldi

Abstract: The proliferation of smartphones in recent years has led to a phenomenal growth in the number and variety of mobile applications developed for personal use, businesses, education, and other purposes. The app markets, such as Google Play and Apple iTunes, provide a one-stop shop for users to download or purchase their apps and for software developers to market their inventions. As the number of mobile apps rapidly grows, searching or recommending relevant apps for users becomes a challenging problem. The broad, coarse-grained categories currently provided by the market place may not fit the actual description and intended use of the apps. In this poster, we present a hierarchical classification approach based on non-negative matrix tri-factorization to classify mobile apps while simultaneously constructing a category tree that reveals a deeper relationship among the categories. We demonstrate the limitations of using existing concept hierarchies (such as Google Ad Trees) and present a semi-supervised learning approach that integrates existing hierarchies with the mobile app description data to significantly improve classification accuracy.


Toward Tractable Instantiation Of Conceptual Data Models

Authors: Matthew Nizol; Laura K. Dillon; R.E.K. Stirewalt

Abstract: Complex, data-intensive software systems play an increasingly crucial role in enterprise decision making. Developers of these systems must validate both the database design and the application programs that interact with the database. If a conceptual data model is developed during requirements analysis, instantiation of that model can facilitate both validation activities.

Domain experts can inspect test instances of the model to confirm that constraints have been properly expressed, and application programmers can use generated data to test their programs. Object Role Modeling (ORM) is a popular modeling language that maps to predicate logic. Due to ORM's expressive constraint language, instantiating an arbitrary ORM model is NP-hard, but a restricted subset of the language called ORM- can be solved in polynomial time. Some models that include "hard" constraints (i.e., constraints outside the ORM- subset) can nevertheless be transformed into ORM- models. Such transformations do not necessarily need to preserve the original model's semantics: the existence of some mapping from instances of the target model to instances of the original model is sufficient. This poster presents a research project to extend the set of ORM models that can be transformed to ORM- models through a class of non-semantics-preserving transformations called constraint strengthening. We illustrate an example constraint-strengthening transformation and note limitations of the approach.

Future research will investigate the composition of transformations, the use of genetic algorithms to search for instances of complex models, and the use of SAT-solvers to find partial instances of the "hard" portions of a model that may be combined to form an instance of the original model.


Regular Distance-Preserving Graphs

Authors: Ronald Nussbaum; Abdol-Hossein Esfahanian

Abstract: A graph is distance-hereditary if the distances in any connected induced subgraph are the same as those in the original graph. Relaxing the requirement that every connected induced subgraph be distance-preserving allows us to explore the idea of a distance-preserving graph. Formally, a graph of order n is distance-preserving if for each integer k in the interval [1, n] there exists at least one isometric subgraph of order k. Previously we worked to characterize and find applications for distance-preserving graphs. Here we give methods for constructing r-regular distance-preserving preserving graphs on n vertices for various values of r and n. We also consider constructing r-regular non-distance-preserving graphs on n vertices for various values of r and n, and related conjectures.


De-Identifying Biometric Images For Enhancing Privacy And Security

Authors: Asem Othman; Arun Ross

Abstract: The goal of this poster is to discuss methods that have been developed in our lab (i-probe) to extend privacy to biometric data in the context of an operational system. Biometric data can be viewed as personal data, since it pertains to the biological and behavioral attributes of an individual. Therefore, it is necessary to ensure that the biometric data stored in a system is used only for its intended purpose by de-identifying prior to storage. In this poster, we will briefly discuss two approaches to de-identify biometric images. The first approach is based on Visual Cryptography that de-identifies a face image prior to storing it by decomposing the original image into two images in such a way that the original image can be revealed only when both images are simultaneously available; further, the individual component images do not reveal any information about the original face image. The second approach is based on the concept of mixing to extend privacy to fingerprint images. The proposed scheme mixes a fingerprint with another fingerprint (referred to as the "key") in order to generate a new mixed fingerprint image that can be directly used by a fingerprint matcher. The mixed image obscures the identity of the original fingerprint; further, different applications can employ different "keys", thereby ensuring that the identities enrolled in one application cannot be matched against the identities in another application.


Demographic Estimation From Face Images: Human Vs. Machine Performance

Authors: Charles Otto; Hu Han; Anil K. Jain

Abstract: We present a generic framework for automatic age, gender and race estimation from face images, including a quality assessment measure used to identify low-quality images for which it will be difficult to obtain reliable estimates. Experimental results on a diverse set of face image databases show that the proposed approach has better performance than other state of the art methods. Finally, we use crowdsourcing to study humans’ ability to estimate demographics from face images, and compare the crowdsourced estimates to our automatic demographic estimates.


RAIL: Robot-Assisted Indoor Localization

Authors: Chen Qiu; Matt Mutka

Abstract: Location Based Services (LBS) are expanding rapidly for mobile devices. Global Positioning System (GPS) has been commonly adopted for outdoor localization. However, since the accuracy of GPS is very low (or nonexistent) indoors, it cannot support LBS in indoor environments. The indoor location information available for most current mobile devices is not accurate. We introduce an approach that improves a smartphone's localization accuracy with help of a moving robot. By installing on a robot a tablet personal computer, the proposed application program and a known map, moving robots can improve a smartphone's localization accuracy. The robot can use Bluetooth to send its accurate location information to the customers' smartphones. Customers who carry smartphones do not need any special-purpose device to obtain location information. We need to design a path for a robot so that all the smartphones in the environment may have smaller deviations from the ground truth due to interaction with the robot. The robot collects Bluetooth RSSI values from smartphones in different rooms. We classify different rooms into different crowd density levels by the RSSI values. Higher crowd density rooms should be served more often. By using different crowd density levels, we use dynamic programming to design algorithms to generate a robot's moving route. We evaluate our approach in different environments, the location errors from a localization application on a smartphone are reduced effectively. After each serving round, a robot can choose an appropriate algorithm from proposed algorithms according to crowd density.

This work was supported in part by the National Science Foundation grant no. CNS-1320561.


Efficient Kernel-Based Data Stream Clustering

Authors: Radha Chitta; Anil K. Jain

Abstract: Recent advances in sensor technologies have facilitated “continuous” data collection. Unbounded sequences of data called data streams are generated in many applications such as IP networks, stock markets, and social networks. There are two major challenges in data stream analysis: (i) Due to the unbounded nature of the data, it is not possible to store all the data in memory, so the data can be accessed at most once, and (ii) the data evolves over time, i.e. the recent data in the stream may be unrelated to the older data in the stream.

Stream clustering is the task of finding groups in the data stream, based on a pre-defined similarity measure. Most of the current stream clustering algorithms are “linear” clustering algorithms, and use Euclidean similarity. Kernel-based clustering algorithms use non-linear similarity measures, thereby achieving higher clustering accuracy than linear clustering algorithms. However, kernel-based clustering algorithms are ill-suited to streams because of their high computational complexity. In this poster, we present an approximate kernel-based stream clustering technique which identifies the most influential points in the stream, and retains only these points in memory. The final clusters are then obtained using only the stored data points. Only a small subset of the data (less than 1%) needs to be stored in memory, thereby enhancing the efficiency of kernel clustering for data streams. We demonstrate the accuracy and efficiency of our approximate stream clustering algorithm on several public domain data sets like the Network Intrusion and Tiny image data sets.

This work was supported in part by the Oce of Naval Research (ONR Grant N00014-11-1-0100).

Detecting Fake Fingerprints

Authors: Ajita Rattani; Arun Ross

Abstract: Recent research has highlighted the vulnerability of fingerprint recognition system to spoof attacks. A spoof attack occurs when an adversary mimics the fingerprint of another individual in order to circumvent the system. Fingerprint liveness detection algorithms have been used to disambiguate live fingerprint samples from spoof (fake) fingerprints fabricated using materials such as latex, gelatine, etc. Most liveness detection algorithms are learning based and dependent on the a) fabrication material used to generate and b) sensor used to acquire the fake fingerprints during the training stage. Consequently, the performance of a liveness detector is significantly degraded in multi-sensor environment and when novel fabrication materials are encountered during the testing stage. The aim of this work is to improve the interoperability of fingerprint liveness detectors across different sensors and fabrication materials. To this aim, the contributions of this work are i) a graphical model that accounts for the impact of the sensor on fingerprint match scores, quality and liveness measures and ii) a pre-processing scheme to reduce the impact of fabrication material on fingerprint liveness detector.


Local Predictions In Social Network Graphs

Authors: Dennis Ross; Guoliang Xing; Abdol-Hossein Esfahanian

Abstract: Using the data from social networks, predictions have been made in several domains including: disease proliferation modeling, criminal activity detection, and recommender system design. With data from established social networks, like Twitter and Facebook, we try to make accurate predictions of several national trends on a local level. To do this with a social network G, an influential subgraph is created for each vertex v called Gamma of v . Each Gamma of v is chosen using a variety of graph properties like degree, modularity, and the clustering coefficient. Efficient algorithms to determine Gamma of v are discussed. By extracting the influential subgraph for each v in G, we attempt to make relevant predictions for any individual user. Some results will be presented along with potential real-world system deployments.


On Hair Recognition In The Wild By Machine

Authors: Joseph Roth; Xiaoming Liu

Abstract: We present an algorithm for identity verification using only the information from the hair. Face recognition in the wild (i.e., unconstrained settings) is highly useful in a variety of applications, but performance suffers due to many factors, e.g., obscured face, lighting variation, extreme pose angle, and expression. It is well known that humans utilize hair for identification under many of these scenarios due to either the consistent hair appearance of the same subject or obvious hair discrepancy of different subjects, but little work exists to replicate this intelligence artificially. We propose a learned hair matcher using shape, color, and texture features derived from localized patches through an AdaBoost technique with abstaining weak classifiers when features are not present in the given location. The proposed hair matcher achieves 71.53% accuracy on the LFW View 2 dataset. Hair also reduces the error of a COTS face matcher through simple score-level fusion by 5.7%.


NSGA-III Performance In Bi-Objective Optimization

Authors: Haitham Seada; Kalyanmoy Deb

Abstract: NSGA-III is a recently suggested evolutionary many-objective optimization algorithm that is designed to solve three or more objective problems. Although, NSGA-III was found to be superior to other state of the art algorithms in handling three or more objectives (up to 20), no formal assessment of its performance was conducted on handling only two objectives. This study aims at directing subsequent lines of research, either towards enhancing NSGA-III in terms of two objectives without sacrificing its superiority in higher number of objectives, or towards a more unified version of NSGA-III that can handle any arbitrary number of objectives with the same efficiency. In this paper, we assess the performance of NSGA-III against a number of test as well as real-life engineering problems. We also empirically investigate the effect of some critical parameters on the overall performance. Based on the obtained results, we introduce some interesting directions that researchers in the field can pursue in the future.


Secure Unlocking Of Mobile Touch Screen Devices By Simple Gestures: You Can See It But You Can Not Do It

Authors: Muhammad Shahzad; Alex X. Liu; Arjmand Samuel

Abstract: With the rich functionalities and enhanced computing capabilities available on mobile computing devices with touch screens, users not only store sensitive information (such as credit card numbers) but also use privacy sensitive applications (such as online banking) on these devices, which make them hot targets for hackers and thieves. To protect private information, such devices typically lock themselves after a few minutes of inactivity and prompt a password/PIN/pattern screen when reactivated. Passwords/PINs/patterns based schemes are inherently vulnerable to shoulder surfing attacks and smudge attacks. Furthermore, passwords/PINs/patterns are inconvenient for users to enter frequently. We propose GEAT, a gesture based user authentication scheme for the secure unlocking of touch screen devices. Unlike existing authentication schemes for touch screen devices, which use what user inputs as the authentication secret, GEAT authenticates users mainly based on how they input, using distinguishing features such as finger velocity, device acceleration, and stroke time. Even if attackers see what gesture a user performs, they cannot reproduce the behavior of the user doing gestures through shoulder surfing or smudge attacks. We implemented GEAT on Samsung Focus running Windows, collected 15009 gesture samples from 50 volunteers, and conducted real-world experiments to evaluate GEAT's performance. Experimental results show that our scheme achieves an average equal error rate of 0.5% with 3 gestures using only 25 training samples.


Assembly In The Cloud: Benchmarking

Authors: Leigh Sheneman; C. Titus Brown

Abstract: The project focuses on quantifying the computational effectiveness of mRNAseq protocols on various cloud computing platforms. The Illumina HiSeq 2500 (currently used at MSU's RTFS) can produce 600 GB of data per run. While the price of extracting a single dataset at this sensitivity level is extremely high in its own right, adding in steps to assemble and analysis it can easily cost $10K. This results in small biology labs being instantly excluded from conducting research.

Through high-level analysis of patterns in during all stages of the protocol, bottlenecks in algorithms can be identified and addressed. Since each cloud-computing cluster has a different hardware implementation, the bottleneck is not universally constant. By leveraging the strengths of each platform, the GED lab aims to reduce overall cost.

The Eel Pond mRNAseq Tutorial by C. Titus Brown, et al., has been the basis of initial testing. The results from this testing show Amazon's vCPU system out-performs traditional CPU structures.


Who Will Go Viral? A Distribution-Preserving Approach For Node Degree Prediction In Social Networks

Authors: Courtland VanDam; Ding Wang; Pang-Ning Tan; Shuai Yuan; Xi Liu

Abstract: Predicting the future degree of a new node in an evolving network is an important problem, with many potential applications. For example, advertisers may want to know who may become the next most popular or influential user in a social network. Similarly, predicting highly retweeted tweets or highly liked social media could help detect breaking news or postings that may go viral. In this study, we consider two approaches for node degree prediction. Node degrees can be predicted directly, e.g., using regression-based approach, or indirectly, e.g., using link prediction to infer the presence or absence of the links associated with a given node. Though regression methods are more accurate than link prediction, their predicted degree distribution may not fully satisfy the power law distribution typically observed in many real-world networks. In this poster, we present a distribution-regularized regression framework to predict the future degree of the nodes in a network using both the node feature and link information. Experimental results on real-world networks demonstrate both the accuracy of the prediction as well as fidelity of the predicted distribution compared to other baseline methods.


Stain Simulation On Curved Surfaces Through Homogenization

Authors: Shiguang Liu; Xiaojun Wang; Yiying Tong

Abstract: This poster provides methods for physically-based simulation in stain formation in computer graphics. We propose to use proper averaging of the textile diffusion property to create realistic stains. The simulation is performed directly on the surface. For different type of knitting, we apply the homogenization technique in 2D to extract bulk diffusion tensor which is anisotropic in general. We then map the diffusion tensor onto curved surfaces by specifying the alignment of the textile to a direction field on the surface. The influence on the shape of the stain is determined by using the inertial force experienced in a comoving framework attached to the deforming surface. Our results demonstrate that the process is physically plausible.

This work was supported in part by NSF


ORION: Online Regularized Multi-Task Regression And Its Application To Ensemble Forecasting

Authors: Jianpeng Xu; Pang-Ning Tan; Lifeng Luo

Abstract: Ensemble forecasting is a well-known numerical prediction technique for modeling nonlinear dynamic systems. The ensemble member forecasts are generated from computer-simulated models, where each forecast is obtained by perturbing the initial conditions or using a different model representation

of the dynamic system. The ensemble mean or median is typically chosen as a point estimate of the final forecast for decision making purposes. However, this approach is limited in that it assumes each ensemble member is equally skillful and does not consider the inherent correlations that may exist among the ensemble members. In this poster, we cast the ensemble forecasting task as an online, multi-task regression problem with partially observed data and present a novel framework called ORION to estimate the optimal weighted combination of the ensemble members. The weights are updated using an online learning with restart algorithm to deal with the partially observed data.

The framework can accommodate different types of loss functions including epsilon-insensitive and quantile loss. Experimental results on seasonal soil moisture predictions from 12 major river basins in North America demonstrate the superiority of the proposed approach compared to the ensemble median and other baseline methods.

This work was supported in part by NOAA Climate Program office through grant NA12OAR4310081 and partially supported by NASA Terrestrial Hydrology Program through grant NNX13AI44G.


Data Cleaning In Long Time-Series Plant Photosynthesis Phenotyping: Detecting And Diagnosing Data Abnormalities

Authors: Lei Xu; David M Kramer; Jin Chen

Abstract: The scale of the plant phenotyping data is growing exponentially, and they have become a first-class asset for understanding the mechanisms affecting energy intake and storage in plants, which are essential for improving crop productivity and biomass. However, the quality of data is compromised by systematic errors, unbiased noise as well as abnormal patterns, which are difficult to remove in data collection step. Given the value of clean data for any operation, the ability to improve their quality is a key requirement.

Data cleaning is the process of identifying incorrect or corrupt records in a dataset, integrating ad-hoc tools, manually tuned algorithms designed for specific tasks, and ideal statistical methods. However, removing impurities from long time-series plant phenotyping data requires the handling of high temporal dimension, which has not been extensively discussed in literature.

In this work, we develop a novel computational framework to effectively identify abnormalities in plant phenotyping data using Michaelis-Menten kinetics, one of the best-known models of enzyme kinetics in biochemistry. Specifically, our model employs an EM process to repeatedly classify the temporal data into two classes: abnormalities and non-abnormalities. In each iteration, it uses values of non-abnormality class to generate photosynthesis-irradiance curves at different granularities using Michaelis-Menten kinetics, and then reassigns the class membership of every value based on their fitness to the curves. The iteration stops when all the class memberships don't change. The results show our algorithm can identify most of the abnormalities in both real and synthetic datasets. Note that our algorithm is independent of actual biological constrains. With simple extension, it makes it possible to automate the cleansing process on long time-series data for a variety of domains.


Fingerprint Recognition: A Longitudinal Study Of Intra-Person Similarity

Authors: Soweon Yoon; Anil K. Jain

Abstract: Fingerprint recognition is widely used to identify a person in applications ranging from law enforcement and international border control to mobile phone access. Friction ridge patterns, including fingerprints and palmprints, have been one of the major sources of evidence in crime scene investigation. There are two properties of friction ridge patterns invoked in promoting its use: (i) persistence (ridge pattern does not change over time), and (ii) uniqueness (ridge pattern of a finger is different from any other finger). The admissibility of fingerprints as evidence was accepted in Frye v. United States in 1923. However, the general acceptance test of Frye ruling was superseded by the Federal Rules of Evidence in Daubert v. Merrell Dow Pharm. in 1993. Since then, friction ridge analysis has been challenged on the basis of the fundamental premises, persistence and uniqueness. Although a number of statistical models have been proposed to demonstrate fingerprint uniqueness, the persistence of fingerprints has generally been accepted based on anecdotal evidence. In this study, our objectives are to (i) formally study the impact of elapsed time between two fingerprint impressions on genuine match (comparing multiple impressions of the same finger) scores, (ii) model the fingerprint longitudinal data with multilevel statistical models, (iii) identify additional predictive variables of genuine match scores (e.g., subject’s age, gender, race, etc.), and (iv) quantify the impact of these factors on genuine match scores. Our preliminary study shows that the genuine match scores decrease with respect to elapsed time, but with a very small rate-of-change.

This work was supported in part by NSF's Center for Identification Technology Research (CITeR)


WWN: Integration With Coarse-To-Fine, Supervised And Reinforcement Learning

Authors: Zejia Zheng; Juyang Weng; Zhengyou Zhang

Abstract: Are your visual capabilities largely learned or largely innate?  How does a human child learn directly from his cluttered environments?   How do his actions play a central role in not only associating between sensation with the required action, but also attention and perception of objects of interest in a cluttered scene?  How does the child develop concepts and invariant properties when he is not even aware of such concepts?   Our research group has been addressing these scientific questions that are fundamental to not only Artificial Intelligence (AI) and its practical applications but also our understanding of human intelligence.  This poster explains the work after Where-What Network 8 (WWN-8), where we intend to show how the above questions are addressed not only in a brain-inspired way, but also in terms of efficiency of autonomous learning:  How learning must incorporate various modes of learning by a single general-purpose architecture --- supervised learning, coarse-to-fine learning, and reinforcement learning (i.e., learning through punishments and rewards).

This work was supported in part by Microsoft Research