2017 Capstone Projects

Performance Analysis and Tracking for NYC’s Transit System

CUSP Students: Hongting Chen, Francis Ko, Shay Lehmann, Nurvita Monarizqa, and Ian Wright

CUSP Mentors: Kaan Ozbay and Huy Vo

USI Sponsor: TransitCenter

New York City buses have been gradually losing ridership to an increasing overcapacity in the subway system for several years now. One reason for this shift may be declining service reliability throughout parts of the MTA bus system. This capstone team partnered with the public transit reform advocacy group, TransitCenter, to drill into this issue from a data-driven perspective. This project is an interactive dashboard that uses open data to build and display useful reliability metrics, right down to the bus stop level, for about 200 of the city’s most popular bus lines. The idea was to generate public interest in bus service reliability, in addition to building a robust tool that may be useful for government advocacy campaigns. The team believes that analytical tools like ours should help surface particular parts of the bus system that require deeper investment from the MTA, and ultimately begin to rebuild the public trust that is necessary for an effective transit system.

Predicting Unmet Trip Demand

CUSP Students: Anita Ahmed, Alexey Kalinin, Pooneh Famili, Xin Tang, and Ziman Zhou

CUSP Mentors: Kaan Ozbay and Huy Vo

USI Sponsor: NYC Taxi and Limousine Commission

The purpose of this project is to predict the number of taxi trip that goes unmet in NYC. As per TLC’s problem definition “Unmet Trip Demand” is a situation when New Yorkers or tourists would like to take a taxi, but hardly can do it and should spend more than 5 minutes to find one. According to community surveys reports conducted by TLC many of the neighborhoods claimed they do not get a Taxi when they are hailing on the street. TLC wanted to verify this claim using the trip data that is generated by Taxi cabs. The TLC’s goal of this project is to identify the contributing aspects and develop legitimate metrics to reveal unmet demand locations across New York City.

The project was broken down into 2 phases. In the first phase, we identify areas with potentially unmet demand where historical trip records and breadcrumbs data were made available. Three approaches were developed in this phase. The first approach was to compare the monthly total pickup and drop-off counts. If in a census tract, the number of pickups is less than 25% of the number of drop-offs, and the taxi activities have reached a certain level, then this area is identified as having a potential unmet demand. The second approach identified the underserved census tracts by finding those with the least number of vacant taxis within a given time duration. It is computed as the ratio of total evaluation duration and total “free” minutes of vacant taxis. In the third approach, the rate of change Uber pickup was compared with the rate of change of combined pick up of all taxi services over a 6 months period. In census tract where a positive growth of Uber was detected but an overall decline is taxi services was observed those census tracts were identified as having unmet demand. The census tracts that we identified having unmet demand using all 3 approaches were selected for study in Phase 2. In the second phase, we determine socioeconomic features in top-matched regions discovered in phase 1 to determine key factors that contribute to taxi trip demand. The features we included are median income, median rent, population density, car ownership, crime rate, access to public transit, and commercial land use. Based on the findings, we hope to derive a measurable unmet demand even for areas without sufficient historical trip data.TLC can then possibly make new rules and regulation to encourage drivers to serve that underserved neighborhood.

Optimizing the Location and Use of Taxi/FHV Relief Stands

CUSP Students: Cheng Hou, Le Xu, Vishwajeet Shelar, and Yao Wan

CUSP Mentors: Kaan Ozbay and Huy Vo

USI Sponsor: NYC Taxi and Limousine Commission

There are over 100,000 licensed For-Hire Vehicles (FHVs) in NYC. The For-Hire industry is constantly growing with more vehicles going out on the road every day. An increase of vehicles on the road has led to more congestion on the road. Some drivers may take short breaks from work and inadvertently add to this congestion. In an effort better manage our curb space, Taxi/FHV relief stands have been implemented to give drivers a place to pull over and take a break. However, there are only 69 Taxi/FHV Relief Stands (TRSs) spread across NYC. This projects aims to provide an analysis on the effectiveness of existing relief stands and suggest locations for installing new Taxi/FHV relief stands.

A model is developed to suggest new locations using two major datasets. One is the breadcrumb data and the other dataset is the taxi trip data. The total data size is about 30 GBs. The NYU High-Performance Hadoop Cluster is used to process this data. Breadcrumbs data records the coordinates of all the yellow and green taxis at every two minutes. This data is not openly available and it is authorized for CUSP students for certain taxi data related projects. The second dataset is the Taxi Trip data, which contains both taxi and FHVs trip records which are used to understand taxi demand within a certain geospatial unit (in our case a hexagonal zone). Additional datasets like public park toilets, food stands, food trucks and restaurants are used.

The final deliverable of the project is twofold. First is a descriptive analysis of the effectiveness of existing Taxi/FHV Relief Stands. The other is to provide a list of possible hexagonal zones where new Taxi/FHV Relief stands can be located.

Food Distribution Network Vulnerability

CUSP Students: Chenxi Cui, Patrick Gitundu, Kaylyn Levine, William Xia, and Xinshi Zheng

CUSP Mentor: Stan Sobolevsky

USI Sponsor: NYC Mayor’s Office of Recovery and Resiliency

This capstone project was conducted on behalf of the New York City Office of Recovery and Resiliency (ORR) to study the potential impacts of a disruption to the City’s food distribution network.  The ORR was created in response to Superstorm Sandy, which crippled much of the City: costing billions of dollars and damages and exposing many infrastructure vulnerabilities within it.  It is the ORR’s goal to mitigate a future catastrophic event, and to ensure that there is no need for emergency feeding on any level within the City.

Our CUSP team was tasked with describe the impact of food supply disruptions on the New York City population, with a specific focus on vulnerable communities.  From this, we aim to identify them as well as strategic point-of-sale (POS) locations within those communities that would impact residents the most if closed. Our results will provide specific recommendations to the New York City Office of Recovery and Resiliency to mitigate the impact on the food distribution system. More specifically, our results will identify strategic POS locations that the New York City Office of Recovery and Resiliency will target in their business assistance program to help maintain food accessibility to vulnerable communities in emergency scenarios.

Networks of Urban Vulnerability

CUSP Students: Sunny Kulkarni, Ekaterina Levitskaya, Lani M’cleod, Yuan Shi, Richard Vecsler

CUSP Mentor: Dr. Stanislav Sobolevsky

USI Sponsor: Lockheed Martin Advanced Technology Laboratories (ATL)

Urban vulnerability research is critical in understanding and mitigating exposure to disruptive events within the urban environment. While the literature is replete with the examination of individual disruptive events, few studies have taken scenarios including multiple simultaneous disruptions. This paper seeks to identify and measure the effect of multiple simultaneous disruptions on the NYC Subway system as an example of urban vulnerability.  We construct a network model based on train schedule information for a typical weekday rush hour and estimate demand distribution from origin-destination data derived from census data in order to assess the cumulative delay caused by first a single, then a pair of simultaneous disruptions to the network. In particular, we define the scenario in which the pair of simultaneous disruptions cause an effect greater than the effect of the single disruptions together as ‘synergy’.  We find top stations pairs that have a strong synergistic effect. Further we identify a marked difference in network characteristics between positive and negative synergy.  We discuss network characteristics, such as distance, degree, similarity and community, that may underlie this synergy and close with next steps on how our model can be used to analyze network vulnerability at different times of day,  for different categories of the population, and expanded to include other modes of transportation. To help communicate the findings, we provide a visualization tool that can be used to assess the synergy throughout the network.

Large-Scale Analysis of Water Efficiency in California

CUSP Students: Yue Cai, Kevin Han, Fernando Melchor, and Ian Stuart

CUSP Mentor: Brendan Reilly

USI Sponsor: California Data Collaborative (CaDC)ARGO Labs, and Moulton Niguel Water District (MNWD) 

Our project intends to help CaDC and its partners accurately analyze and predict customer demand so that more informed and targeted conservation policies can be implemented. Automation and scalability are key, as CaDC is also looking to extend its data management and analysis tools to water districts throughout the region.

Our data mainly comes from two sources: CaDC (for water use information) and public APIs (such as Google, Yelp, Mapzen, and Foursquare, for business establishment information). In terms of our analytical method, we are proceeding through 4 phases: Data Acquisition and Cleaning, Data Integration, NLP Classification, and Iterative Approach to Classification and Benchmarking.

Our project has two primary objectives: 1. Develop a reliable, scalable software tool that automates the classification of water customers according to North American Industry Classification System (NAICS) standards. These classifications for customers in the commercial, industrial, and institutional (CII) spaces will be more granular than the data currently available from the water district. 2. Use the customer type classifications described above to model water usage for the district(s) being studied, and develop a benchmarking system whereby a CII customer’s usage patterns can be accurately evaluated against its peers. These tools can be used to predict water usage, detect anomalous users, and guide targeted policy interventions. Ideally, this model development and benchmarking process would be exportable to any water district.

The “Energy Snapshot”: Driving Behavior Change Through Energy Data Analytics

CUSP Students: Xianbo Gao, Enrique Sanz Gonzalez, Victor Sette Gripp, and Peng Jia

CUSP Mentors: Constantine Kontokosta, Bartosz Bonczak and Sokratis Papadapoulos

USI Sponsor: NYC Mayor’s Office of Sustainability and NYC Department of Buildings

In the “80 x 50” plan New York City has taken the challenge of reducing 80% of its carbon emissions, based on 2005 levels, by 2050. Since more than 70% of the city’s emissions come from energy used in buildings, in order to meet that target, it is essential to understand energy consumption patterns in the city’s buildings and identify opportunities to improve energy efficiency. As initial steps, the city passed two laws aimed at collecting granular and detailed data about the energy usage in some of its most significant emission sources: large buildings. These two laws are Local Law 84 (LL84) and Local Law 87 (LL87).In this context, this project is being sponsored by the NYC Mayor’s Office of Sustainability (MOS) and has two main intended outcomes. The first is to design a performance metric for energy efficiency, on the same lines of the Environmental Protection Agency Energy Star Score, but specific to the New York City building market. This developed metric will take into account specific features of different buildings’ typologies and offer insights about the reasons behind the performance of each building in addition to possible actions to improve that performance. Secondly, this project has the goal of defining a methodology to identify peer groups within the NYC buildings (included in LL84). The performance metric will then be evaluated only within peer groups. Both the performance metric and the peer group will be part of the Energy Snapshot, a building specific score card that will be sent to each building owner (or manager) as a way for the city to give back to them, in a more interpretable and actionable way, the information that they provide when complying with LL84.

The underlying motivation behind this project is that simply by better informing the building owners it is possible (and expected) that there will be some behavioral changes towards a more efficient use of energy. That alone, i.e. sending a clear and informative Energy Snapshot to each of the building owners, may be found in the future to be an effective policy to significantly contribute for a more energy efficient and less carbon intensive NYC.

City of New Orleans Emergency Medical Services Resource Optimization

CUSP Students: Alexis Soto-Colorado, Connor Chen, Adriano Yoshino, Matt Sloane

CUSP Mentor: Martin Traunmueller, Boyeong Hong, Constantine E. Kontokosta

USI Sponsor: City of New Orleans, Office of Performance and Accountability

Since 2010, requests for emergency medical services (EMS) within the City of New Orleans have steadily increased while the resource capacity (i.e, ambulances and associated staff) of the New Orleans Emergency Medical Services (NOEMS) has not increased to meet this rising demand. This dynamic has resulted in a declining quality of service on the part of NOEMS, including increased wall times, use of mutual aid, the failure of NEOMS to respond to high priority calls in response times that are consistent with national goals and averages.

With this resource inadequacy and associated service shortcomings in mind, the City of New Orleans Office of Performance and Accountability (NOOPA) has requested a data drive analysis of how to optimize the scheduling of current NOEMS ambulance resources in order to maximize their effectiveness in responding to EMS requests throughout the City of New Orleans. Further, NOOPA has also requested that the optimization also allow for the consideration of hypothetical additional NOEMS ambulance resources in order to measure how additional ambulances would affect NOEMS’s ability to respond to EMS requests.

The basic framework of the optimization analysis is twofold. Firstly, a prediction model based on various data that will “predict” the location and time of future EMS incidents / requests. Secondly, an EMS resource optimization model will be developed that incorporates this prediction model to best identify the likelihood of the inability of NEOMS to respond to an incident.

Municipal Performance Management for Small Cities

CUSP Students: Adrian Dahlin, Danny Fay, Maisha Lopa, Jonathan Pichot, Chris Streich

CUSP Mentor: Neil Kleiman

USI Sponsor: National Resource Network

There are 825 US cities with populations between 40,000 and 400,000. They are not the cities with the biggest economies, latest technology, most available data, or strongest tourism, and they’re mostly not getting large smart cities investments. They are former industrial towns, suburbs of larger cities, population centers in otherwise rural regions, and capitals of smaller states. Most of them have tight budgets, limited capacity, and few if any dedicated data analysts, but they still have many of the same needs and challenges of bigger cities. They have school systems to run, infrastructure to maintain, fires to put out, crime to fight, workforces to manage, IT systems to update, and economies to reform. All of these efforts can be improved with good data, smart analytics, and effective performance reporting. A big city like New York has data scientists in every major department. But smaller city governments consist of department heads and the people on the ground doing the daily work; they don’t have mid-level professionals who can do the analytical work needed to help an organization run better.

This project has centered around consulting for two small cities: New Bedford, MA, and Cleveland Heights, OH. In New Bedford we were asked to analyze code violations, crime, and fire department dispatches to see if relationships between these data might help the city reduce all three of these phenomena. In Cleveland Heights we built two tools that the city will use moving forward. First, we built a “building intelligence” database that gathers all available data about properties in the city and prints out reports that can be used by building inspectors and other City personnel who visit homes. Second, we built a “neighborhood score” tool that aggregates buildings data and other information and visualizes it with an online map that will help the city identify struggling neighborhoods and allocate services accordingly.

This work matters, because local government affects people’s lives. This is the level of government that determines the safety of neighborhoods, the condition of streets, the quality of schools, and development rights of properties. It’s also the scale at which individuals can have the greatest impact, which makes performance management an issue that any civically engaged person should care about.

The Quantified Community: Pulse of the City

CUSP Students: Trang Dam, Aaron D’Souza, Benjamin Miller, Zhaohong Niu, and Chunqing Xu

CUSP Mentors: Constantine Kontokosta, Martin Traunmueller, Nick Johnson, and Yuan Lai

USI Sponsor: New York Downtown Alliance

Internet has become one of the indispensable components in people’s lives. Being able to have access to wifi, especially to public wi-fi network, does not only fulfill individual’s need to “connect”, but also shed light into the research of urban dynamics.The rapid growth of public wi-fi networks have expanded the public space to fulfill urban residents’ needs of both leisure and work. High aggregated level of population are often seen in areas where public wi-fi is provided. Therefore, understanding urban dynamics through the analysis of wifi connection could be a novel and promising way for urban planners.

Understanding urban dynamics is essential because urban decision makers, including business improvement districts (BIDs) like New York Downtown Alliance, transportation policy makers, and law enforcement officials, need to know how many people are in a city at a given time. Transportation policy makers need to know how many people there are and how best to move them throughout the city. Law enforcement officials need to know how many people are in an area to prevent crime and prepare emergency response plans. Our client, Downtown Alliance, wants to know how many people there are at a given time in order to improve their services of public wifi and events in Lower Manhattan area. Following client’s directions and thinking as urban planners, we raised our question: to what extent does the Wi-Fi connections data in Lower Manhattan correlate with population and other city services (detect pulse of the city)? Furthermore, can this relationship be used to inform business and policy decision making?

This project aims to understand the “pulse of city” through space and time by using wi-fi counts data as a proxy. Pulse of the city, by our definition, is the regular fluctuations in population and demand for services. Our topic started with wifi counts data at hand provided by clients in order to better help city development and urbanization. To address the problem, we defined the term “pulse of the city” and built our analytical model based on its definition.

SQUID-Bike, Street Quality Identification for Citywide Bike Lane Infrastructure

CUSP Students: Sichen Tang, Geoff Perrin, Nicola Macchitella, and Felipe Diego Gonzalez

CUSP Mentor: Varun Adibhatla

USI Sponsor: ARGO Labs

SQUID-Bike is a project to measure, in a standardized manner, the general condition of citywide bike lane infrastructure by integrating digital street imagery and ride quality data using open source technology.

SQUID-Bike enables cities answer a simple question in a cost-effective manner “Which bike lanes are worse than others in a city?“. SQUID-bike empowers cities to be proactive about bike lane maintenance by adopting digital surveys of all bike lanes in a city.

Upon frequent use, longitudinal data from SQUID-bike will allow city agencies observe bike lane degradation over time. and could be used to power an anticipatory maintenance program and avoid the huge financial and political costs of deferred maintenance.

WiFind: Analyzing Wifi Density Around NYCHA Housing Projects

CUSP Students: Christian Rosado, Dongjie Fan, Jie Zhou, Kai Qu, Xiaomeng Dong

CUSP Mentor: Charlie Mydlarz and Justin Salamon



The team last year developed an android application called WiFind as a practice in urban sensing, and visualized Wi-Fi signals both on the mobile and the website. This year we collected new data and conducted the analysis with NYC open data.

Our projects is motivated to examine the fairness of Wi-Fi accessibility across the New York City, especially focusing on comparisons of open Wi-Fi density around public housing projects and their adjacent housing projects. Our team would like to find out whether the reality of Wi-Fi access density in neighborhoods with different income levels is also different.

There are several limitations on our projects. We have limited access to the buildings, the population getting access to Wi-Fi is uncertain, and the sample size is small in the analysis.

The Urban Observatory: First Empirical Quantification of the Rebound Effect

CUSP Students: Daynan Crull, Akshay Penmatcha, Anastasia Shegay, Priyanshi Singh

CUSP Mentor: Gregory Dobler


The project aims to carry out the first empirical quantification of the rebound effect–a reduction in expected gains from energy efficient technologies because of increased consumption. The approach is to collect raw data on the use of lighting technologies through remote sensing, leveraging CUSP UO instrumentation for hyperspectral and broadband imaging, apply image processing techniques to identify light sources, classify technology types, and measure durations of use by extracting on/off transitions. These light sources will be integrated with available records data (3-dimensional models of the observed urban landscape, land use data, and socioeconomic and demographic survey data) to characterize the observed population. Efficient and conventional lighting technologies will be compared in terms of durations of use in order to interpret the presence of a rebound effect.

Prosecutorial Data Justice

CUSP Students: Hrafnkell Hjörleifsson, Michelle M. Ho, Christopher Prince, Achilles Saxby

CUSP Mentor: Federica B. Bianco

USI Sponsor: BetaGov and The District Attorney’s Office of Santa Clara County (SCC)

The District Attorney’s office of Santa Clara County (SCC), California has observed long durations for their prosecution processes. It is interested in assessing the drivers of prosecutorial delays and determining whether there is evidence of disparate treatment of accused individuals in pre-trial detention and criminal charging practices.

This Capstone has two goals: to create a visualization tool aimed that facilitates the DA’s office exploration of their own data and analytical models that identify drivers of prosecutorial delays.

The visualization tool allows a comparison of four phases of the prosecutorial process for SCC criminal cases: case issuing to arraignment, arraignment to preliminary hearing, preliminary hearing to plea, plea to disposition, and also post-disposition court events. It enables aggregation, filtering, and extraction of statistical quantities of prosecutorial features (e.g., the crime type, the number of defendants on the case, and the number of charges) and demographic features (e.g., race/ethnicity, gender, age). It is designed to run in the CUSP computational environment, assuring protection of identifiable data.

The prosecutorial process duration and outcome are modeled with decision tree algorithms (random forest and gradient boosted trees). The models enable the identification of the most significant features associated with prosecutorial delays, and an assessment of the importance of demographic features in determining prosecution process duration and outcome.

Predicting “Failure to Appear” for Misdemeanor Offenses

CUSP Students: Tashay Green, John Hall, Henry Lin, and Jordan Vani

CUSP Mentors: Ravi Shroff

USI Sponsor: New York County District Attorney’s Office

Each year, The New York County District Attorney’s Office (DANY) prosecutes approximately 100,000 violation, misdemeanor, and felony arrests. Of these 100,000 arrests, approximately 25,000 cases involve defendants who have received a Desk Appearance Ticket (DAT) for misdemeanor offenses. Unlike a typical arrest, where defendants are detained from arrest to arraignment (within 24 hours), DAT recipients are instructed to return to court for arraignment at a future date, normally 3-7 weeks after arrest.

Nearly one-in-four DAT defendants fail to appear (FTA) for their initial arraignment date and subsequently receive a warrant for arrest. The New York County District Attorney’s Office would like to know what risk factors are associated with a defendant neglecting to attend their arraignment date after receiving a DAT. Ultimately, by using historical DAT data and machine learning methods, with a focus on interpretability, this work aims to support prosecutorial interventions, such as strategic scheduling, to reduce FTA rates.

Piercing the Landlord Corporate Veil

CUSP Students: Sebastian Bana, Nathan Weber, Shalmali Kulkarni, Xinge Zhong

CUSP Mentors: Debra Laefer and Huy Vo

USI Sponsor: New York State Office of the Attorney General

Landlords often obfuscate their identities by purchasing individual buildings with individual corporations and Limited Liability Companies (LLCs). This practice makes locating a landlord’s entire portfolio or even the true owner of a single property difficult. However, the research department in the Office of the New York State Attorney General (“OAG”) has found that landlords frequently use the same address for many of their building transactions, including: purchases, mortgages, and registration.

Given this insight, the veil piercing team, made up of Master Students from New York University’s Center for Urban Science and Progress (NYU CUSP) have built a web-based data exploration tool that can be queried by non-technical end users. This tool will help OAG to improve their understandings towards the ownership network which consists of multiple buildings, multiple LLCs and potential true owners behind. This heightened understanding of owner’s portfolios will enable OAG to strengthen its aggressive efforts to combat harmful landlord practices, such as tenant harassment, deed theft, bank fraud, and other harmful violations affecting tenants and homeowners.

SmartShelters: Using Technology and Data Science to Improve Outcomes for Homeless Families in NYC

CUSP Students: Xueqi (Claire) Huang, Kristi Korsberg, Dara Perl, and Avikal Somvanshi

CUSP Mentor: Constantine Kontokosta, Boyeong Hong, and Awais Malik

USI Sponsor: Women In Need NYC

New York City (NYC) faces the challenge of an ever increasing homeless population with more than 70,000 people living in city shelters in 2016. In 2015, 17% of families with children that exited a homeless shelter returned to the shelter system within a year of leaving. On an average a family stays in the shelter for 11 months. This suggests that “long term stayers” and “repeat entrants” contribute significantly to the homeless population in NYC.

This capstone focuses to understand the factors that affect the readmission and length of stay of homeless families at NYC based Women-in-Need shelters to predict the likelihood of length of stay and re-entry on exit. It also explore deployment of technologies that can help improve service delivery and performance the homeless shelters.