Ongoing Projects

  • Structural Entropy of Complex Networks

    The concept of “structural diversity'” of a network refers to the level of dissimilarity between the various agents acting in the system. This key property of networks has been studied in multiple settings, including diffusion of ideas in social networks and complex brain networks. In this project, we propose a novel approach to quantify and monitor the structural diversity of complex networks over time. We apply our method in different settings, such as evolving social networks and emergent organization of financial markets.This provides us a way to explore their underlying structural changes, revealing remarkable insights.

  • Can AI Make Better Hiring Decisions than Human Recruiters?

    The goal of this study is to develop a comprehensive analytics framework, which can be used as a decision support tool for HR professionals, that will improve accuracy while accounting also for fairness (there is an inherent tradeoff between the two). The proposed framework follows two main phases: a local prediction scheme for recruitment success and a robust mathematical model that provides a global optimization to enhance multisided fairness and diversity of the recruitment plan. Our results show that the proposed method allows for improving fairness, while maintaining the high level of accuracy. In comparison to actual recruitment decisions, the devised framework can improve both fairness and accuracy, despite the inherent tradeoff between the two. The results of this research are expected to provide recruiters and organizations with useful models and insights for addressing a recruitment plan that is highly efficient, balanced and fair.

  • Ensembles for Label Ranking based on Voting Rules

    We propose a novel meta-ensemble framework for label ranking. Label ranking is a sub-field of machine learning that deals with learning a mapping from instances to rankings over a finite number of labels. Ensembles of label ranking typically combine the results of a few label ranking classifiers, using some simple aggregation rule such as majority voting. We create a meta-ensemble of label rankings. The meta-ensemble consists of a pool of ensembles, where each ensemble uses a different voting scheme to aggregate the label ranking classifiers. The label ranking ensemble is then chosen according to the task at hand. An extensive experiment of 15 semi-synthetic and five real-world label ranking tasks has shown that our approach significantly increases accuracy in comparison with a simple ensemble.

  • Screenless Text Entry & Signature Verification

    In recent years we have become accustomed to interacting with systems through screens – both for input and for output purposes. In the emerging world of virtual and augmented reality, output of the system is typically projected directly into our eyes using dedicated glasses. However, the AR/VR industry has not yet converged to one standard when it comes to input/control. We believe that smart-watches and smart-bands are in excellent position to be part of this input/control standard due to the broad distribution expected to them and their built-in sensors. Therefore, in this project, we explore two different use cases to demonstrate the ability of smart-watches to be used as AR/VR controllers: (1) Text Recognition and (2) Signature verification.

  • A Network Theory Perspective of Blockchain Dynamics

    Issuance of cryptocurrencies on top of the Blockchain system by startups and private sector companies is becoming a ubiquitous phenomenon. This new rising economy presents great difficulties in modeling its dynamics using semantic, conventional parameters. We present a novel approach for modeling the dynamical properties of the ERC20 protocol compliant crypto-coins’ trading data, using a network theory prism.
  • Itemization of Invoices using Machine Learning

    This research is in collaboration with PayPal.  PayPal uses the item that was sold in a transaction as a very important feature both to its machine learning models and rule-based transaction approval system.  In person to person transactions, the sold item is described only as a short free text sentence. Thus, the information is not enough for PayPal’s approval decision process. In this study we will use the invoice item description and external data (e.g.: Ebay) to classify the sold item into predefined categories. 

Past Projects

  • Continuous Monitoring of Parkinson's Disease Patients Using Wearables

    This research is in collaboration with Intel and the Michael J Fox foundation for Parkinson’s Disease research. The study includes about one thousand patients that use wearable smart watches on their day-to-day environment to monitor their activity and their symptoms. In addition, the patients receive a smart-phone application that allows them to follow their measurements and their medication schedule. We use the collected data to inspect the effect of medications and activity on motor symptoms and to predict their effect – in hope to provide some actionable understandings to the PD community of patients and doctors.

  • Interactions and Location Privacy

    When we walk around in the city, we cross the ways of many other people every day. Such interactions might cause privacy leaks. In fact, if some of the people we meet are ‘corrupted’, they could disclose our location to a malicious entity. By analyzing a unique mobility dataset collected in a close community and containing absolute (GPS, Wi-Fi) and relative positioning (Bluetooth), we found that the chances that a person is detected by another one are extremely high. This suggests that if a number of devices are tracked, we can get a good sense of the mobility of the whole community.

  • Quantitative Land Use Planning

    Land use planning is one of the core processes leading city development: by determining the quantities, spatial allocation, and mix of amenities, it plays a key role in shaping the character of urban areas and cities as a whole. The practice of land use planning has yet to capitalize on the predictive power of universal and quantifiable patterns emerging from the last 50 years of studies of cities as complex adaptive systems. We study a quantitative framework that incorporates data-driven methods to the urban development process.

  • Privacy-Aware Cyber Security

    Cyber attacks and security breaches are quickly becoming a serious threat to organizations, governments and individuals, and this trend is expected to expand. End-users are among the favorite targets of cyber attacks since they are considered the weakest link in the security loop. To counter this threat, cyber-security mechanisms increasingly track users’ devices indirectly through network monitoring or directly with specialized software. As a result, users’ activities and data can be exposed and users’ privacy can be potentially compromised. This situation can lead users to evade cyber-security mechanisms altogether and leads regulators to limit the abilities of cyber-security technologies. We aim to evaluate methods for understanding the trade-off between privacy and cyber-security, and to propose solutions for balancing them. Specifically, we are studying how personal data stores can be used to provide cyber-security protection without exposing private data to a centralized server.

  • Scheduled Seeding for Viral Marketing

    One highly studied aspect of social networks is the identification of influential nodes that can spread ideas in a highly efficient way. The vast majority of works in this field seek for a set of nodes that if ‘seeded’ simultaneously, would then maximize the information spread in the network, by a viral infection process. However, only a few recent works have started to investigate the timing aspect, namely, finding not only which nodes should be seeded but also when to seed them. Moreover, recent works have shown that some of the underlying assumptions behind existing information spread models do not fit real-world scenarios. For example, most of these models rely on a large-scale viral infection process, while these processes have been shown to be quite rare in reality. In this work, we suggest a new model for information spread that better reflects real-world marketing scenarios, and a corresponding seeding heuristic which takes into account the timing aspect. By conducting a large set of empirical simulations, we show that under broad realistic assumptions, our suggested heuristic is able to improve the information spread by 50%-80% in comparison to state-of-the-art seeding heuristics.

  • Incentivizing Safer Driving Behavior

    Car crashes have a tremendous toll on human life and the economy. In order to decrease risky driving two complimentary efforts are needed: (1) An effective measure for evaluating driving behavior and (2) an effective incentive scheme to encourage driving behavior changes. Previous studies have mainly focused on machine automated feedback. To test several schemes for incentivizing safer driving behavior, we conducted a two months field study in cooperation with a large public transportation company in Israel. The drivers were divided into three experimental groups: a control group, an individual incentive group (in which drivers were paid based on their improvement) and a social incentive group (in which drivers were paid based on their peers’ improvement). Analyzing the experiment results, we find that the two incentive groups presented an overall improvement of 25% in driving behavior, whereas the control group presented no difference. Moreover, our analysis reveals several surprising insights regarding the effectiveness of the two incentive schemes under different circumstances.

  • Online Signature Verification Using Wrist-Worn Devices

    Many systems nowadays, such as those used by banks and government offices, rely heavily on signature verification. With recent advancements in technology, many of these systems make use of dedicated ad-hoc digital devices such as tablets and smart-pens to capture, analyze and ultimately verify the signature. This paper suggests a novel verification system which makes use of wrist-worn devices, such as smartwatches and fitness trackers, instead of ad-hoc digital devices. The suggested method uses a set of known genuine and forged signatures, captured by the motion sensors available in a wrist-worn device, to train a machine learning classifier. Given an unknown signature, the resulting classifier is able to determine whether the signature is genuine or forged. In order to validate our method, we collected 1980 genuine and forged signatures from 66 different subjects, recorded simultaneously from both a tablet device as well as a smartwatch device. Applying our method on the collected dataset, we showed that the suggested method significantly outperforms two state-of-the-art tablet-based signature verification systems, obtaining 2.36% EER and 98.52% AUC.

  • Population Dynamics in Israel

    In order to create advanced models in epidemiology, human mobility, network analysis and much more, population dynamics can be used as an input to each model independently from the other models or other input. Using CDR and data from Israel’s Central Bureau of Statistics we aim to find the population dynamics in Israel to better understand, and later predict population growth, clusters, relations and much more.

  • Optimizing Vaccination Allocation for Pertussis in Israel

    Pertussis, also known as whooping cough, is a highly contagious bacterial disease that primarily affects infants. Globally, the disease is responsible for over 200,000 deaths annually in children under five. Despite vaccination against the disease, over the past decade, reported pertussis incidence has risen in the developed countries. Furthermore, regardless of a similar vaccination policy, the per capita incidence observed in Israel is 2-4 times higher than incidence observed in the U.S. Therefore, revisiting existing vaccination policies on a country-specific basis is essential. The first part of this study aims at evaluating the actual extent of pertussis in Israel. To achieve this aim, we analyze reported cases of pertussis accumulated for nearly two decades in the surveillance systems of the Israeli Ministry of Health (IMoH) throughout the entire country. Using Markov chain Monte Carlo, We find that the pertussis incidence were quadrupled and follow a four-year pattern of periodicity. Moreover, our findings could not be better explained by human factors such a misclassification or under reporting. The second part of this study aims to offer a total vaccination policy to reduce morbidity and mortality. We develop an age-structured continuous-time Markov processes of pertussis transmission in Israel. Our model integrates the primary IMoH data alongside a large dataset (over 2 TB) of private cellphone based GPS traces to accurately capture mobility as well as the contact mixing patterns of the Israeli population. In our future work, we will finalize the construction of the transmission model, and run simulation studies to optimize vaccine effectiveness in Israel. In light of our preliminary findings, and supported by our collaborators from the IMoH, our model is predicted to shape pertussis immunization policy in Israel.

  • PDS-Based Recommender Systems

    Recommender systems have become extremely common in recent years, and are applied in a variety of applications such as movies, e-commerce, etc. Existing recommender systems exhibit two major limitations: (1) Privacy – each service (‘application’) which implements a recommender system requires a database that contains information about all the users of the service. (2) Partial view – when recommending to users, each such service can rely only on the data that was collected by the service itself and it does not have access to other data collected about the user. The Open Personal Data Store (OpenPDS) architecture was recently suggested for storing personal data in a privacy preserving way. Inspired by the OpenPDS architecture, we suggest an architecture for content-based recommender systems that overcomes the two limitations mentioned above. The suggested architecture allows users to manage and gain control over their own data, and at the same time allows the recommender system to utilize the rich data collected about the user (potentially through other services) to produce more accurate recommendations in a privacy preserving manner. We implement a prototype of the system and evaluate it through multiple recommender system settings, including web browsing data and public datasets. The evaluation focuses on the recommendation process’ enhancement by the use of multiple data sources about the user, and test whether multi-source-based recommendations perform better than single-source-based recommendations.

  • Ride Sharing

    Ride sharing’s potential to improve traffic congestion as well as assist in reducing CO2 emission and fuel consumption was recently demonstrated via analysis of available mobility datasets. By analyzing a dataset of over 14 Million taxi trips taken in New York City during January 2013, we find that if people are willing to experience up to five minutes delay, almost 70% of the rides could be shared (fig. A). Using the source-destination network of rides (fig. B), we identified seven network topological features that combined can effectively predict the benefit of ride sharing. We also observed that the number of rides is highly variable with time of the day and day of the week (fig. D). Therefore, we in future work we will investigate the time-related benefits of ride sharing, and also exploit different available datasets to suggest specific strategies for promoting ride sharing. [for further details see E. Shmueli et al., Ride Sharing: A Network Perspective. SBP 2015]