2022 Capstone Projects

CUSP received its highest number of proposals ever for the 2022 cycle of the Capstone Program. Thanks to our sponsors, our graduate students have the opportunity to have an impact on critical urban issues while continuing to develop their data science skills and honing their public policy expertise. Projects have been grouped into four categories: disaster resilience and climate change, fairness and inclusivity, health and wellbeing, and modern civil and communications infrastructure. To view the full list of 2022 projects, see below. Stay tuned throughout the spring and summer semesters to hear more about how our 2022 Capstone projects are progressing!

Disaster Resilience and Climate Change

View Projects

Fairness and Inclusivity

View Projects

Health and Wellbeing

View Projects

Modern Civil and Communications Infrastructure

View Projects

Disaster Resilience and Climate Change

A New Dataset to Develop Smart Assistants for Specialized Training with Augmented Reality

Project Sponsors

Iran R. Roman, Postdoctoral Scholar, NYU CUSP
Juan Pablo Bello, Professor and Director, NYU CUSP

Project Abstract

Emergency response personnel (i.e. firefighters, medical personnel, and utility workers) require specialized training to act in time- and precision-sensitive tasks. Comprehensive training requires time, practice, and continuous guidance from a professional and experienced trainer able to predict and correct the trainee’s actions. The trainer-to-trainee ratio currently limits the amount of individuals who are trained at a time. Ideally, such training could be carried out by an automatic and smart agent using augmented reality devices like the Hololens. In this project, we aim to develop a system for guided monitoring of a person’s actions as they learn a specialized task.

Project Details

Project Description & Overview

Smart assistants can guide a trainee’s actions as they learn a specialized task. Such a system can: 1) identify the task being performed, and 2) predict the trainee’s actions. The assistant must process the trainee’s field-of-view (i.e. egocentric video) and surrounding sounds, carry out object recognition (including the trainee’s body parts, like arms), attend to relevant objects, and predict future actions.

Existing approaches rely on multimodal datasets with egocentric video and audio where an individual is seen carrying out a task. These datasets must be annotated so that actions in the egocentric video are associated with clear human-language descriptors. Annotation can be carried out completely by a person, but this is time-consuming and prone to error. Alternatives include automatic annotation via speech recognition, if the video data features an individual narrating their own activities. But narration leads to pauses between actions as the individual speaks, or errors when the individual thinks about what to say, or talks and acts at the same time. As a result, currently existing smart assistants are limited by data used for their development.

This project aims to 1) Improve data quality with a new dataset of egocentric video and audio, where an individual receives verbal instructions from a third party. 2) Benchmarking of pre-trained machine learning models that carry out video summarization and audio-visual correspondence. 3) Evaluation of action prediction models. Hence, the project’s question is: do multimodal egocentric recordings of instructed actions result in better annotations and predictions of human performance by an artificial agent?

Datasets

Collecting data from emergency response workers would be logistically challenging and not really necessary to first address our research question (whether multimodal egocentric video of instructed actions result in better annotations to predict human behavior). Instead, to ask our question we will use videos recorded by a real-life worker at the Subway restaurant chain. He uploads to Youtube everyday and his videos are openly-available. The videos feature him making specific menu items as he follows the verbal instructions of customers. He started his Youtube channel in June and has already uploaded 7 hours of egocentric multi-modal video (and his list of videos continues growing every day). Moreover, we have established direct contact with him, shared our research ideas, and if this project is approved he will support us by uploading at least 10 minutes of his real-life footage at work, per day.

Competencies

The students should be comfortable with Python and familiar with data analysis tools such as numpy and pandas. Having a machine learning background is also desirable (basic classification models such as random forests and test/train splits for evaluation).

Learning Outcomes & Deliverables

To conduct such a project we need audio-visual annotations. First, students will learn how to use existing models for automatic speech recognition and visual object detection. This will result in a real-world audio-visual dataset of egocentric perspective in an instruction-following task with annotated actions. Secondly, students will learn to evaluate performance of existing video summarization and audio-visual correspondence models against their newly-curated dataset. This will result in a study of performance of different off-the-shelf models on egocentric multimodal data in an instruction-following task. Finally, students will learn how to use and benchmark state-of-the-art multimodal action prediction models. The third deliverable would be a report with a summary of the work carried out and main conclusions along with the associated code used.

A Tale of Two Cities: Assessing the state of the thermal environment for New York and Athens

Project Sponsors

Constantinos Cartalis, Professor, National and Kapodistrian University of Athens (NKUA)
Anastasios Polydoros, Ph.D. Candidate, National and Kapodistrian University of Athens (NKUA)

Project Abstract

Mitigation plans to counteract overheating in urban areas need to be based on a thorough knowledge of the state of the thermal environment, most importantly on the presence of areas which consistently demonstrate higher or lower urban land surface temperatures (hereinafter referred to as “hot spots” or “cold spots”, respectively).

This is because Land Surface Temperature (LST) is a controlling factor of energy exchange between the surface and the atmosphere, and thus a cause of meteorological and climatic variation. Such exchange is through latent and sensible heat as well as the emission of radiation at the thermal infrared part of the spectrum.

As a matter of fact, as urban areas are covered with buildings and pavements; as a result moist soil and vegetation are being replaced with cement and asphalt. These materials have high thermal mass and tend to absorb more solar radiation than the surfaces found in rural areas, with the result being higher land surface temperatures. Additionally these surfaces are impermeable and tend to dry more quickly after precipitation, reducing evaporation, which has a cooling effect in green areas.

Project Details

Project Description & Overview

The main objective of the project is to develop and apply a methodological approach for the recognition of thermal “hot spots” and “cold spots” in New York City and Athens, during the warm months of the year. Results will be analyzed separately for each city as well as in a combined manner in view of recognizing potential similarities which may be rolled out as urban typologies.

Specific objectives are:

(a) the classification of land cover in each city (current state and retrospectively for a period of 20 years),
(b) the estimation of LST from Landsat-8 at 30 m x 30 m and Sentinel-3 and MODIS at 1 km x 1 km spatial resolution,
(c) the estimation of the downscaled LST at 30 m x 30 m on a, roughly, daily basis – and diurnally to the extent possible – with the use of the results of b),
(d) the analysis of the extracted land surface temperatures so as to recognize and cluster “hot spots” and “cold spots”,
(e) the correlation of the “hot spots” and “cold spots” to such independent variables as: (e1) the urban form—the two and three-dimensional urban structure (e2) the urban fabric – the surface materials and (e3) the urban green – presence, extent and distribution,
(f) the estimation of the Surface Park Cooling Intensity, both in areas with extended parks as well as in areas with pocket parks.

Datasets

Satellite data (free) from: Landsat 8 (visible and TIR), Sentinel-2 (visible) and Sentinel-3 (TIR) and MODIS from Aqua and Terra (vIS and TIR).

Competencies

Provisionally image processing and/or GIS.

Learning Outcomes & Deliverables

Students will understand the thermal/climate dynamics of urban areas
Students will exploit the potential of earth observation satellites and remote sensing for urban applications
Deliverables
- City maps depicting thermal hot spots, the cooling intensity of parks, land use/land cover as well as changes over space and time

Creating a High Performance Construction Project Database To Accelerate Building Decarbonization and Resilience in NYC

Project Sponsor

Marianna Koval, Director, Invest NYC SDG, NYU Stern Center for Sustainable Business

Project Abstract

The power of experience curves in technology (known as Wright’s Law, Swanson’s Law, or “learning by doing”) has made clean energy technologies less expensive than fossil fuel-generated energy, driving exponential growth in clean energy deployment globally. Can this power of learning also be harnessed for the technology of low carbon building to make Passive House construction less costly than traditional methods? This project, a partnership between Invest NYC SDG, Passive House Accelerator, and Source 2050 will (1) study how “experience curves” apply to Passive House design and construction, and (2) create a global project database to accelerate those experience curves.

Project Details

Project Description & Overview

The Invest NYC SDG initiative is committed to creating a sustainable, inclusive, and resilient built environment in NYC. Passive House is a proven technology for dramatically reducing the greenhouse gas emissions of buildings and providing climate resilience to building occupants, making Passive House (1) a key tool for achieving the goals of Local Law 97, (2) a centerpiece of NYSERDA’s building decarbonization work, and (3) a rapidly growing building-based climate solution in NYC, NYS, and nationally.

At Passive House Accelerator events and industry conferences, it is common to hear project teams report rapid, project-based learning such that each Passive House project they complete becomes more efficient and less costly than the last. Do these anecdotes translate to quantifiable and significant experience curves that can be harnessed to drive costs down and accelerate market uptake? Invest NYC SDG will partner with Passive House Accelerator and Source 2050 to empower Capstone students to:

Answer the research question, “do technology experience curves apply to Passive House, and if so, at what learning rate?”
Increase this learning rate by building a visualization-rich High Performance Construction Project Database that shares replicable details from hundreds of projects, documents cost and performance, and shares lessons-learned with thousands of practitioners, owners, manufacturers, and policymakers in NYC, NYS, and nationally.
Design an optimal “case study” design for each project listing, determining which data points transmit “best practice” most effectively, and how best to share that information in visual and written formats.

Datasets

PHI Passive House Database: This online database lists 5,000 PHI Passive House buildings internationally, with basic project data; it is a good foundation for more robust case studies with data visualizations, higher quality project images, and information. Cost data is not published, so will need to be gathered from project teams.
Phius Certified Projects Database: This online database lists 350 Phius Passive House buildings in North America, with basic project data; it is a good foundation for more robust case studies with data visualizations, higher quality project images, and information. Cost data is not published, so will need to be gathered from project teams.
NYSERDA Buildings of Excellence Datasets: NYSERDA publishes cost, performance, and project data for its 42 Buildings of Excellence projects.
Pennsylvania Housing Finance Agency LIHTC applicant data: Three years of cost, square footage, and certification type data for 268 proposed affordable housing projects (74 of which were Passive House).
Passive House Accelerator Project Gallery: The Accelerator’s project gallery features 200 project entries, some with very data-rich descriptions and others with considerably less data published; it is a good foundation for more robust case studies.
Massachusetts Clean Energy Center Passive House project data: MassCEC is tracking performance, cost, and project data for the growing number of Passive House multifamily projects that are underway thanks to state policies that incentivize Passive House development.
Source 2050 Vendor Project Profiles: Source 2050 is asking all vendors to provide case studies, testimonials, project profiles, and similar materials as they are onboarded; these will provide unique perspectives from the trades on how these products and details performed in the context of specific project types.

Competencies

Data analytics, visualization, data mining and processing, outreach/interview skills, database management, and web integration

Learning Outcomes & Deliverables

Data analytics report on experience curves in Passive House and the relevant learning rate.
High Performance Construction Project Database to document data findings and share Passive House project lessons learned, published to the Passive House Accelerator website, with data visualization.
Development of project detail packages from the Database to make available to trades through Source 2050 as complete project solutions.

Data-Life: Exploring Post-Covid Scenarios Through Data Science

Project Sponsors

Marco Brambilla, Professor, Politecnico di Milano
Stefano Ceri, Professor, Politecnico di Milano

Project Abstract

Project Details

Project Description & Overview

COVID-19 has affected our lives in unprecedented ways, in many personal, professional, and educational aspects. This impacted both our daily plans and logistics, as well as our perception of the world and of the future. A lot of data is available about COVID-19 impacts and many data-driven studies have been conducted for understanding the associated dynamics at the medical, logistic, and organizational levels. However, fewer studies concentrated on the perceptions, feelings, and emotions involved in these changes.

The project will combine quantitative data coming from open data, social media, and other sources, together with ad-hoc analysis, crowdsourcing, and gamification practices, to collect data and understand the perception of people about the present and future of the pandemic. It will look into the post-pandemic future, trying to understand how things are going to change and how people may react to different alternative policies and decisions at different levels. Possible perspectives to be explored include: educational opportunities, university life, city life and dynamics, professional life, family life, government decisions (mandates, regulations, lockdowns).

The study will be conducted in the cities of New York and Milano, both severely affected by COVID-19, both having lived long and painful lockdown periods, but both nowadays showing important signs of reactions, with a strong revival in social life, cultural events, and desire to return to normality. The project will highlight many aspects of similarity but also many aspects of diversity of the reaction to COVID-19 of the two cities.

Datasets

Dataset exploration and definition will be part of the project activities.

Competencies

Students should have basic data management skills and competence in social media data and crowdsourcing practices.

Learning Outcomes & Deliverables

The project will study methods, tools, and reports/analysis on the expectations, reactions and feelings of people with respect to the return to normal life after the pandemic.

Developing an AI-based Image Classifier for School Infrastructure Baseline Data Collection in Large Scale Disaster Risk Analysis

Project Sponsors

Luis Ceferino, Assistant Professor, NYU Disaster Risk Analysis Lab (DRAL) and NYU AI4CE Lab
Jingzhe Wu and Angie Garcia, Consultants, World Bank’s Global Program for Safer Schools (GPSS)

Project Abstract

This project will develop a risk-informed classification system to support AI computer vision algorithms for assessing seismic vulnerabilities in schools. The project will be led by the World Bank’s Global Program for Safer Schools (GPSS), the NYU Disaster Risk Analysis Lab, and the NYU AI4CE. The project’s main goal is to develop a simplified vulnerability classification system based on existing detailed taxonomy from the Global Library of School Infrastructure (GLOSI: https://gpss.worldbank.org/en/glosi/overview), to support AI-based computer vision tools to reduce structural vulnerability data collection time and costs in large building portfolios. We envision that this simplified classification will enable more reliable AI computer vision tools to empower communities to be engaged in governments’ disaster risk management efforts more easily, make risk analysis more accessible and informed by up-to-date baseline information worldwide and guide large-scale school safety and resilience investments more efficiently.

Project Details

Project Description & Overview

This project will focus on a simplified classification system to support easier and more reliable extraction of structural classifications (features) from pictures of schools using computer vision algorithms. The structural classifications will be set according to the Global Library of School Infrastructure (GLOSI) structural taxonomies. GLOSI taxonomies are key to defining structural vulnerabilities in school buildings that are considered in large-scale earthquake risk analysis. The project will use the vulnerability data of GLOSI typologies, and school inventory data from ~2000 schools collected in World Bank projects. The project will have four main parts:

Methodology Development: Redefining and simplifying GLOSI categories through potential clustering based on seismic vulnerability similarities (e.g., brittle materials); and redefining intermediate labels (e.g., proxy variables like material, building component sizes) to support a hierarchical classification logic to better predict GLOSI categories.
Data Pre-processing: Curating the school dataset using the simplified GLOSI categories and the developed intermediate labels.
Computer Vision Analysis: Adjust and retrain the existing computer vision models following the developed intermediate labels.
Measuring risk errors: Assessing the accuracy/uncertainty with the inventory with the developed simplified GLOSI categories compared to the original detailed inventory, in terms of risk metrics using existing risk software.

Datasets

Vulnerability curves of relevant school building typologies.
School inventory dataset with ~2000 schools and their current GLOSI categories. Note: The dataset is proprietary and will be provided under NDA. This dataset contains multiple photos per building for all the buildings in each school, their labels in the GLOSI categories, and other relevant exposure information such as coordinates, occupancy, and building footprint dimensions. The total number of images is over 16,000 images. This dataset was collected by teams deployed in the field to survey the schools.
Existing AI computer vision model prototype for reference.

Competencies

The team needs at least one person with a background in disaster risk analysis, and preferably computer vision and machine learning. For computer vision, this requirement means a student that took or is taking a class equivalent to CSCI-GA.2271-001 or ROB-GY 6203. A team member with a strong programming background (especially with hands-on deep learning experience) will increase the success rate of the project. For disaster risk analysis, this requirement means a student that took or is taking a course equivalent to CUSP-CX 8006. Please get in touch with Professor Ceferino for further inquiries about competencies.

Learning Outcomes & Deliverables

Learning Objectives
- Developing hands-on experience with AI and computer vision tools for seismic resilience
- Understanding the data requirements for conducting regional disaster risk analysis and defining seismic vulnerabilities
Deliverables
- Simplified classification system on school building vulnerabilities for computer vision uses
- Progress and final reports
- A final presentation with wider GPSS team

Emergency Response after Earthquakes: Assessing Risk and Guiding Coordination in Hospital Systems

Project Sponsors

Luis Ceferino, Assistant Professor, NYU Disaster Risk Analysis Lab (DRAL)
World Bank’s Global Facility for Disaster Reduction and Recovery (GFDRR)
Earthquake Engineering Research Institute’s Public Health Working Group

Project Abstract

This project will assess the earthquake risk of hospitals and their ability to sustain operations after future large earthquakes. The project will be led by the NYU Disaster Risk Analysis Lab, the World Bank’s Disaster Risk Management Division, and the Earthquake Engineering Research Institute (EERI)’s Public Health Working Group. The project’s main goal is to apply robust disaster risk analysis techniques on hospital datasets to better understand post-disaster hospital capacity. The project will investigate new risk metrics relevant to inform practical risk mitigation policy implementation and emergency planning, e.g., mobilizing patients from neighborhoods with little hospital capacity to high hospital capacity. The goal is to inform communities on how to mitigate not only potential economic losses, as currently done in practice, but also potential functional and societal impacts.

Project Details

Project Description & Overview

This project will focus on conducting earthquake risk analysis in two cities’ hospital systems. One case study will be located in the Bay Area, California, and the other one will be in a developing country defined according to data availability. The project will have four parts:

(A) Curating and completing datasets for assessing earthquake risk: The students will be provided with initial datasets that map hospital infrastructure and its seismic vulnerabilities. Students will extend these datasets, with guidance from their advisors, to be integrated into disaster risk analyses.
(B) Conducting earthquake risk analysis: Students will use the datasets to conduct earthquake risk analysis using open-source software like OpenQuake or SimCenter Tools. The analysis will incorporate seismic hazards and vulnerability models to quantify risk comprehensively, e.g., probability of hospital disruptions in a given year, return period of hospital collapse.
(C) Conceptualizing and assessing hospital system risk metrics: Students will review scientific reports and emergency response articles and define metrics to assess hospital system risk with the project mentors. The students will carefully examine what metrics can be estimated with risk analysis and use their results to quantify them. For example, students can assess the spatial distribution of post-earthquake hospital capacity to investigate potential disparities in hospital accessibility after earthquakes.
(D) Visualizing hospital system risk: Students will generate visualizations to communicate their findings to stakeholders. An integral part of risk analysis is using results to inform policy, e.g., retrofitting hospitals. Thus, the students will carefully prepare visualizations to showcase their findings. Also, the students will draft recommendations for preparedness and risk mitigation in hospital systems, with guidance from their advisors.

Datasets

Disaster Risk Analysis requires hazard, vulnerability, and exposure data. Data for hazard analysis will be provided. Vulnerability and exposure data will be provided partially. Students will work on completing and curating datasets for vulnerability and exposure. Specifically, the following will be provided:

(A) Bay Area
- (A.1) Dataset with hospitals and their locations
- (A.2) Seismic hazard data
(B) Another city in a developing country
- (B.1) Dataset with hospitals, their locations, and their vulnerabilities. This dataset may contain proprietary information and be provided under an NDA.
- (B.2) Seismic hazard data

Competencies

Students will require a background in statistics and risk analysis. Otherwise, they are highly encouraged to enroll in the Disaster Risk Analysis and Urban Systems Resilience Class (CUSP-CX 8006). Please get in touch with Professor Ceferino for further inquiries about competencies.

Learning Outcomes & Deliverables

Learning Objectives
- Understand the data requirements for conducting regional disaster risk analysis
- Build skillsets for hands-on disaster risk quantification
- Develop risk communication and visualization skills relevant for policymaking in disaster risk management
Deliverables
- Documented datasets with hospital information to conduct disaster risk analysis
- Maps visualizing the risk study
- Presentations, including one with a wider audience from the World Bank
- Progress and final reports

FloodNet - Computer Vision for Urban Street Flood Detection

Project Sponsors

Maddalena Romano, Director, Data/Asset Management, NYC Department of Transportation
Charlie Mydlarz, Research Assistant Professor, NYU CUSP

Project Abstract

In NYC, sea level rise has led to a dramatic increase in flood risk, particularly in low-lying and coastal neighborhoods. Urban flood water can impede pedestrian and vehicle mobility, and also can contain a diverse array of contaminants, including fuels, raw sewage, and industrial/household chemicals. For this capstone project, the team will train, test and deploy computer vision (CV) and deep learning (DL) models for the detection of street flood events. Existing labelled datasets will be used for training. In addition, an unlabelled NYC street image dataset will be provided for labelling and training of a NYC centric model.

Project Details

Project Description & Overview

The FloodNet project is interested in evaluating whether a longitudinally deployed fleet of CV flood sensors can monitor urban flooding events in real-time. This data can improve resiliency by (1) allowing residents to identify navigable transportation routes and make informed decisions to avoid exposure to flood water contaminants, and (2) informing city agencies in targeting flood control improvements through data-driven decision making.

The Capstone team will train, test and deploy CV/DL models for the detection of street flood events. Existing labelled datasets will be used for training. In addition, an unlabelled NYC street image dataset will be provided. The labelling strategy of this dataset will be determined by the team. Unsupervised or weakly supervised DL approaches could also be explored.

The team will work through three stages:

General flood detection model: The team will train and test a model built using existing labelled datasets. (40%)
Literature review on privacy and ethical concerns: The team will complete a review on CV ethics/privacy concerns in urban sensing. (10%)
Data cleaning/labelling and training of NYC centric flood model: Data collected from NYC streets including flood and non-flood imagery will be cleaned and labelled, then used to build a NYC centric flood detection model. (50%)

Datasets

Existing labelled flood imagery datasets will be used in Stage 1 of the Capstone project. Stage 3 will involve the generation of a NYC centric dataset using existing unlabelled images collected from NYC streets in both flood and non-flood conditions.

Competencies

Machine learning
- Dimensionality reduction
- Supervised learning
- Semi-/Weak-supervised learning
- Computer vision experience
Good code and data management skills
Python SciPy stack and PyTorch DL library
Team technical experience
- Python programming (required for >=2 team members)
- Data processing pipelines
- Documentation
Data management experience
- Privacy and data
- Ethics

Learning Outcomes & Deliverables

The team will be using a broad range of urban analytics approaches that will result in proven abilities in: computer vision, remote sensing, data science, and machine learning.

The expected deliverables for each project stage are:

A model that operates with a given minimum performance level on the provided test data.
A literature review on the ethical and privacy concerns surrounding urban sensing and CV solutions.
The NYC centric model with performance levels exceeding a given threshold under varying real world conditions, including a new labelled urban flood dataset and associated tools for open sourcing on the data platform Zenodo.

All deliverables will be based around Jupyter notebooks and committed to a well documented public GitHub repository.

Hardening New York City’s Interdependent Water and Energy Infrastructures Against Climate Change and Cyberattacks

Project Sponsors

Charalampos Avraam, Smart Cities Postdoctoral Associate, NYU CUSP
Yury Dvorkin, Assistant Professor, NYU CUSP

Project Abstract

Extreme events stress New York City’s (NYC’s) interdependent water and energy infrastructures; impact human livelihood; and can disrupt local ecosystems. The dependence of water and wastewater operations on power implies that a blackout, coupled with backup system components’ failures, can force the discharge of untreated wastewater into NYC’s waterways, and result in a public health emergency. Data-driven and optimization techniques can leverage publicly available data to reveal vulnerabilities in electricity, water, and wastewater infrastructures. Our analysis can aid policy design against natural hazards and cyberattacks, and thus inform the modernization of interdependent urban water and electricity infrastructures.

Project Details

Project Description & Overview

This project aims to identify supply chain vulnerabilities of New York City’s physical water and wastewater infrastructure; understand water-energy interdependencies; and inform resilience policies against natural disasters. The project will:

Provide a water and wastewater management framework which integrates publicly available databases.
Assess vulnerabilities and inform water and energy infrastructure policy design in New York City against extreme events and potential cyberattacks.

Students will collect and process data either on water supply chains (i.e., from reservoirs to NYC water consumption), or on wastewater supply chains (i.e., from water consumption and drainage to wastewater treatment). Between February and May, students will integrate databases that include water consumption and technical features of reservoirs, tunnels, the drainage system, and wastewater treatment facilities. Between May and August, the students will use optimization, statistical methods, machine learning, or a combination of techniques to identify critical assets, interconnections, or spatial and temporal interdependencies with the electricity sector within the water and wastewater supply chains that are vulnerable to disruptions. This project will provide comprehensive databases of the different components of New York City’s water and wastewater infrastructures.

Datasets

NYC Open Data:
- New York City (NYC) aggregate water consumption
- NYC Aggregate Population
- NYC water consumption per capita
US Census Bureau:
- NYC population by zip code
NYC Open Data:
- Energy and Water Data on privately owned buildings over 25,000 ft2 and in City-owned buildings over 10,000 ft2
Open Sewer Atlas NYC:
- Wastewater treatment facilities & sewersheds
NYC Environmental Protection Agency:
- Map of reservoir capacities
United States Geological Survey (USGS):
- US county-level water use
- US county-level water and energy use data
- Reservoirs capacity
EIS document (New York City Department of Environmental Protection):
- Capacity of Tunnels 1, 2, & 3 (billion gallons per day)

Competencies

Knowledge of Python, R, Julia, Matlab, other data-processing languages, or Excel.
Familiarity with statistical, optimization, or machine-learning inference methodologies.
Familiarity with data-visualization tools in R, Matlab, Python, Julia, or other language.

Learning Outcomes & Deliverables

Through this project, the candidate(s) will:

Integrate and analyze large datasets to provide a comprehensive dataset which include interconnected operations within water supply or wastewater infrastructures.
Broaden their understanding of the economic, environmental, and health impacts of failures in modern urban water and wastewater infrastructures.
Identify spatial and temporal vulnerabilities of urban water infrastructures against potential natural hazards and cyber threats and provide policy recommendations.

Modernizing Organics “Collection” for Managing the City’s Municipal Solid Waste and Achieving Zero Waste Goals

Project Sponsor

Terri Matthews, Director, Town+Gown: NYC Urban Resource Recovery Working Group (NYC Department of Design and Construction)

Project Abstract

This capstone will develop a data visualization tool to illustrate the lifecycle costs and benefits of leveraging late 20th century technology to solve a 21st century problem—the need to achieve zero waste and reduce CO2 emissions– as compared to the current use of 19th century (and earlier) technology.

Project Details

Project Description & Overview

The City’s current policy for residential organic food waste diversion has been a voluntary composting initiative with Department of Sanitation (DSNY) pickup in certain neighborhoods. This policy relies on 19th century (and earlier) technology whereby people collect refuse for pick up and transportation by truck. The future policy debate is likely to be about whether to mandate residential organics diversion on a citywide basis with citywide DSNY pickup and transportation. The multi-year voluntary organics curbside collection program has not been cost effective, with approximately half of the City’s municipal solid waste (MSW) going to landfills. To reduce total program costs below current costs would require either a diversion rate of about 30% or a lower diversion rate with reducing processing costs, with associated increased operation costs, before seeing reductions in the distant future. All this would require a long-term concerted effort prioritizing organics diversion, possibly through fines, and changing residents’ behavior.

At no time has there been consideration of leveraging late 20th century technology, in the form of food waste disposers (FWDs), commonly known as “in sink” garbage disposals, to process organics as a means to reduce CO2 emission and achieve “zero waste” goals. Since October 1997, the City has permitted residential households to install FWDs. A study of the impact for FWDs in the City’s combined sewer areas, assuming FWDs would be installed at a rate of 1%/year, found de minimis increases in City sewer maintenance costs, water consumption, wastewater treatment and biosolids handling costs, water rates, and negative impact on surrounding waters, and costs savings from solid waste export reduction. As of 2008, it was estimated that less than 1 percent of NYC households had installed FWDs.

The benefits and costs of expanding FDWs at food service establishments, studied in 2008 , would apply to mandated expansion of residential FDWs as an alternative to mandated citywide organics diversion. Benefits would include efficiency with related cost savings at the residential sites; associated reductions in municipal truck trips, with labor cost savings, and localized reductions in truck traffic; and beneficial end re-use of food waste, at the city’s water pollution control plants, now rebranded as wastewater resource recovery facilities (WRRFs), with some increases in digester gas, which, with capital investment to modify the WRRFs, could be reused in WRRF boilers to provide heat for the treatment process (cogeneration), with elements of resulting biosolids available for other beneficial end uses, some with commercial applications, and associated revenues. Costs to be balanced against benefits include incremental water use increases; increased sewer maintenance costs and the potential for sewer backups, which until the City resolves its combined sewer overflow problem could result in increased discharges during heavy stormwater events into surrounding waterbodies; and increased capital investment at the WWRFs.

Datasets

This project will use publicly available Department of Buildings plumbing permit data for Downtown Brooklyn and Long Island City and associated publicly available DSNY and Department of Environmental Protection operations and capital cost data. Additional city-produced reports and studies from other cities that have mandated FDWs will be sources of additional data.

Competencies

All students should have proficient data analytic skills. An interest in zero waste and large system operations would be helpful.

Learning Outcomes & Deliverables

The deliverables will be an interactive visualization of a comparative lifecycle cost benefit analysis of the two types of organics diversion technology and a final report that provides the methodology and analyses used and findings.

As part of the research component, the students will gain a deep understanding of the City’s municipal solid waste problem and its operations under aspirations to reach zero waste goals.
This data manipulation and visualization will enable the students to use all data analytic skills learned to date and possibly require them to pick up other techniques required by the project.
If time permits and the students develop theories, students will also develop performance metrics and predictive models.

Virtual and Augmented Reality for Community Preparedness to Disasters

Project Sponsors

Qi Sun, Assistant Professor, NYU Immersive Computing Lab
Luis Ceferino, Assistant Professor, NYU Disaster Risk Analysis Lab (DRAL)
World Bank’s Global Program for Safer Schools (GPSS)

Project Abstract

This project will create physically realistic virtual and augmented reality (VR and AR) environments that represent extreme events such as wildfires, floods, landslides, winds, and earthquakes affecting our communities. The project will use these environments to show how resilient infrastructure and response preparedness in a disaster can significantly reduce the probability of physical and human losses. The virtual environment will then be deployed to VR/AR devices for egocentric and immersive viewing. The project will be led by the NYU Immersive Computing Lab, NYU Disaster Risk Analysis Lab, and the World Bank’s Global Program for Safer Schools (GPSS). The project’s main goal is to raise awareness and prepare our communities to respond to extreme events using immersive realities to enhance the effectiveness of drills, such as evacuating during floods.

Project Details

Project Description & Overview

The project will focus on building AR/VR environments for floods and earthquakes and assessing/surveying people’s responses to them. Students will select the infrastructure of interest (e.g., schools, hospitals) according to their interests and data availability. The project will have three parts:

Building an augmented reality environment for floods*: Students will create an augmented reality environment to represent floods with data from New York City. Using 3D building data, flood records, and projections for sea-level rise, the students will create urban flooding environments useful for testing community response and decision-making, e.g., during evacuations.
Building virtual reality environment for earthquakes*: Students will use existing data on building response to earthquakes to create a virtual reality environment with buildings shaking during an earthquake. Students will develop these environments for different earthquake magnitudes and shaking levels to assess vibration and overturning of non-structural elements in the building and damage in the structural elements.
Surveying human response in virtual reality environments: Students will test the effectiveness of these immersive environments by designing a survey to assess their experience. Results will be compared with interviews for people who witnessed the floods during Hurricane Ida in 2021.

*VR/AR environment interactivity will be explored.

Datasets

1) Floods
- 1.1) 3D building data for NYC
- 1.2) Flood records, maps, and projections
2) Earthquakes
- 2.1) Building shaking records during earthquakes

Competencies

Students will require a background in immersive reality, with experience in Unity. Otherwise, they are highly encouraged to enroll in the Urban Data Visualization Class. Please contact Professors Ceferino or Sun for further inquiries about competencies.

Learning Outcomes & Deliverables

Learning Objectives
- Build skillsets for virtual and augmented reality system engineering with a focus on floods, winds, or earthquakes
- Gain experience in performing subjective studies with broader applications such as UI/UX.
- Gain an understanding of quantitative and data-driven methods to analyze disaster consequences, such as deep neural network-based statistical models.
Deliverables
- Augmented and virtual reality environments for floods, winds, or earthquakes
- Summary of survey findings
- Final presentation including one with the World Bank’s larger audience
- Progress and final reports

Fairness and Inclusivity

City of Bogotá: Data Driven Door-to-Door Care

Project Sponsors

Diana Rodríguez Franco, Secretaria de la Mujer, Office of Women’s Affairs, City of Bogotá
Stefaan G. Verhulst, Co-Founder and Chief Research and Development Officer, The GovLab

Project Abstract

The Office of Women’s Affairs is looking to make its Care System innovative in its objectives and how it uses data. The Care System is an initiative to reach female caregivers living in dire conditions. It brings services directly to those who often cannot leave their homes because of their domestic workload. Primary caregivers receive certified skillset training, well-being activities, and become part of community-building networks with professional facilitation. Others receive care and services to develop their autonomy. Importantly, the initiative is delivered to those in need and provides evidence on the value of redistributing care for closing gender gaps and economic recovery. That’s where data comes in.

Project Details

Project Description & Overview

Women’s “time poverty” is a structural cause of gender inequality. In Bogota, the unpaid care burden falls disproportionately on women, reaching alarming proportions: 30% of Bogota’s female population are full-time unpaid caregivers. 90% of them are low-income and 33% lack time for self-care. In 2020, we launched Bogota´s Care System to address these challenges. Bogata would now like to expand its efforts by providing primary caregivers with certified skillset training, well-being activities, community, and other services to develop their autonomy.

For six months, Capstone students will use data-driven methods to understand program impact and identify new ways to increase traffic to facilities used for this work. Students will support the Office as it:

Hosts a consultation with beneficiaries in a mini-public format to understand which care issues and metrics of success it should prioritize. This consultation will be inspired by The GovLab’s Data Assembly and 100 Questions;
Launches a data collaborative with non-traditional data holders across the city (e.g. telecom operators) to reuse the data needed to subsequently measure these prioritized issues (such as traffic to care centers);
Applies the analysis in the form of real-world data-driven experiments to increase the uptake of services – leveraging a baseline and compare it post interventions. Experiments could include communications campaigns or changing how and where services are deployed.

This work will support Bogota’s Women and Gender Equality Policy. Students will be overseen by the Secretary of Women’s Affairs and a coordinating team.

Datasets

Students will have access to the survey and interview data that the Office collects to measure the average hours per month that women dedicate to unpaid care work, data on the gender gap in unpaid work between men and women, and the number of services the caregiver has accessed before entering into the program as well as the number of services the caregivers and people they care for have access to before the implementation of the program.

Following a mini-public with beneficiaries, students will also work with the Office as it collaborates with a private-sector data holder to access its data for assessment purposes. This data might include telecom data which could be used to map and understand how caregivers move through the city or some other proxy.

Competencies

Spanish language proficiency is a plus
Familiarity with issues facing women and caregivers
Community engagement skills
Data analytical skills
Experimental design competencies

Learning Outcomes & Deliverables

Students will learn how to conduct monitoring and evaluation work to ensure the success of the Care System;
Students will learn to apply alternative datasets for the purposes of public policy and discover the role that data collaboratives can play in matching the supply and demand of data;
Students will learn how to engage with citizens around the “questions that matter” in ways that support Bogota’s efforts to restructure itself around care economies and better reach caregivers in dire conditions.

Community Economic Recovery Tool

Project Sponsors

Palak Agarwal, Data Scientist, US Ignite
Praveen Ashok Kumar, Technical Program Manager, US Ignite

Project Abstract

The COVID-19 pandemic has only exposed the existing health, economic, and social challenges within the county, and have highlighted the need to be prepared for such events in the future. Governments of all levels are tasked with easing the burden that citizens and local businesses face. To address this challenge, we propose to create an “Economic Recovery” tool that offers real-time strategies to community leaders recovering from COVID-19 shocks and can be used post the recovery to identify other underrepresented communities.

Project Details

Project Description & Overview

Using the data we aim at producing four major tools:

Quarterly Unemployment Forecast would be created using a thorough exploration of time series, cross sectional, and deep learning models.
Economic Vulnerability Index would be used to highlight the significant economic inequality facing the city and constructed with an Principal Component Analysis.
Industry Health Index would allow leaders to understand the extent to which different industries have been impacted by the pandemic and previous economic shocks, using a Factor Analysis.
Economic Opportunity Zones would allow leaders to identify zones within the city where new businesses can be introduced to help with the economic value of the area.

Datasets

Federal: Data sources look into federal sources across cities/county/congressional district levels from all agencies to see how it trickles down into the society. Some of the sources include ACS Community Surveys, Bureau of Labor Statistics, LODES, OnTheMap, Small Area Income and Poverty Estimates (SAIPE) Program, SVI, Federal funding, etc.
State: Data sources look into the state’s open data portal to look for data with finer spatial granularity. Some of the sources include certificate of use, business licenses, zoning districts, public transportation systems, Redlining zones, etc.
Private: Data sources look into the private sources of data to add more detail into the model. The data includes real estate data, business data and user traffic patterns.

Competencies

Insightful thinking, geospatial knowledge and an interest to understand urban economic data sources.

Learning Outcomes & Deliverables

After the analysis is complete and available on the dashboard, city departments will be able to use it to identify any funding gaps. The dashboard allows the officials to overlay and understand multiple variables at once, while the data will be automatically updated monthly, giving local officials a chance to understand effects of policy changes within a neighborhoods or congressional. Apart from the city officials, we want the residents of the city to have access to the data as well. This will help them understand the employment patterns within different industries which can be very beneficial. The economic opportunity zone can be used to set up new small businesses and mom and pop shops to have maximum footfall and chance of succeeding.

Democratizing New York City’s Urban Development Processes

Project Sponsor

Dana Chermesh-Reshef, CEO and Founder, inCitu

Project Abstract

New York City’s current planning process is a jumble of information on the websites of various community boards in different boroughs. There is no unified source of truth that various stakeholders like developers, city planners or concerned citizens can access this information through. Through this project we’d like to create a unified dynamic map for New York City highlighting city planning projects in flight along with citizen comments and concerns on them.

Project Details

Project Description & Overview

The purpose of this project is to take the otherwise opaque city planning process during the public review phase and make it transparent and easily accessible to all. Through this project, we aim to build a publicly available, dynamic map that will showcase various planning projects in flight throughout New York City, as well as citizens’ concerns and comments on said projects. This will help to democratize the process of urban development and will be of use to various stakeholders like developers, city planners, citizens, and elected representatives. The current information on various planning projects is available via the websites of various community boards throughout New York and requires hours of searching to find relevant information. Through this process, citizens could access city planning information in other community boards and see how projects were greenlighted or citizen participation brought them to a halt. Developers could also look at our map and decide on how to alter their projects by searching for similar projects that might have faced obstacles and how to rectify them in their proposal.

Datasets

ZAP
Different Websites of NYC Community Boards Example: a project under review inside of the Land-use page at MN CB5’s website
TBD

Competencies

Project management, data visualization, data engineering, modeling

Learning Outcomes & Deliverables

A unified, analyzable dataset of ongoing planning projects across NYC.
A publicly available map listing out planning projects under review New York City (deliverable that can also be used in personal portfolios).
Better understanding of New York City’s complex city planning process.

Informing Policymakers On State Level Supplemental Security Income Support

Project Sponsors

Dani Hochfellner, Adjunct Assistant Professor, NYU CUSP
Mary Hamman, Professor of Economics, University of Wisconsin La Crosse

Project Abstract

Supplemental Security Income (SSI) is a federal social safety net program that provides cash payments to disabled persons and adults over age 65, who are very low income. Many states offer SSI payments, but the information about these supplements is contained in text-heavy historical reports. It is difficult to show how support varies from state to state and over time. The proposed project will involve text analysis of historical reports, construction of performance metrics, and designing a public facing visualization to convey clear and objective insights about the SSI program on a state level to inform policymakers and the public.

Project Details

Project Description & Overview

Policymakers, researchers, and the public lack information about state support for the population that receives Supplemental Security Income (SII). This lack of information hampers policy making, research on program efficacy, and public understanding and awareness of the program. The information needed to address this gap is provided in text-heavy historical reports with too much programmatic jargon and little time-series or state-to-state comparisons. This proposed project aims to gather the hidden information with the goal to provide an overview for policymakers to better understand the impact of the SSI program. Information will be extracted using natural language processing algorithms. The final deliverable of the project is a public facing dashboard/visualization that will help guide policymakers in future program decision making. This project is a collaboration across multiple academic institutions, thus the project will be fully remotely.

Datasets

Publicly available administrative data.

Competencies

Students should have the following competencies:

Python or R, some experiences with extracting text from documents
XML/html
Interest in public policy and social support systems, creating dashboards

Learning Outcomes & Deliverables

Students will learn how to perform text analysis on documents, extract content from document by using natural language processing algorithms. Students will also learn how to design intuitive visualizations. Student will learn how to present results for different audiences.

Measuring Geographic Distribution and Predictors of Variation of Over-Policing Across NYC

Project Sponsors

John Pamplin, Smart Cities Postdoctoral Fellow, NYU CUSP
Spruha Joshi, Postdoctoral Fellow, Center for Opioid Epidemiology and Policy (Dept. of Population Health, NYU Grossman School of Medicine)

Project Abstract

Mass incarceration is a well-recognized public health issue and driver of racial inequities. However, focusing on arrests and subsequent incarceration underestimates the totality of police-community member interactions, and risks obfuscating the full magnitude of disproportionate policing within a city. This project will use publicly available NYPD policing data on multiple endpoints of policing (e.g., arrests, desk appearance tickets, criminal summons) to construct a geographic visualization tool to assess the burden of policing across neighborhoods and time in NYC. Through data linkages with the American Community Survey, this tool will allow for the investigation of predictors of over policing within NYC.

Project Details

Project Description & Overview

The relationship between communities and police has received a lot of attention following multiple high-profile fatal police shootings of unarmed people of color. The disproportionate burden of police violence and the over-representation of Black and Brown people in correctional facilities has raised large questions regarding the unequal policing of communities. Though some investigators have begun to interrogate the impacts of over-policing, few include the broader set of police interactions not captured by arrests. Furthermore, current data structures severely hinder progress towards understanding variations in exposure to policing across time and place. Understanding how policing varies across communities is a critical first step in reducing the burden of over-policing among those disproportionately impacted. The goals of this project are to a) assess geographic and temporal variation of policing in NYC by constructing a geocoded mapping tool of policing interactions, and b) identify predictors of variation in policing burden by constructing predictive models using data from the American Community Survey.

Participating students will create a single, column-oriented database of publicly available NYPD policing data (from 2013 – 2021) on a range of policing end points (e.g., arrests, court summons, desk appearance tickets). Students will geocode the data and create an online mapping tool that will illustrate variations in policing burden across NYC. Finally, students will be asked to link the database to publicly available data from the American Community Survey and build predictive models to identify geographic and sociodemographic predictors of increased policing burden in New York City.

Datasets

The primary data for this project will be a composite of multiple publicly available NYPD policing datasets. NYPD historic arrests data contains over 5 million individual arrests with date and location data ranging from 2006 to 2020. Additional datasets provide analogous information on cannabis court summons, desk appearance tickets, criminal summons, and stop-question-frisk incidents. Metadata is available for each indicator at the precinct and quarter level for the first three quarters of 2021. Policing data will be linked to census data from the American Communities Survey, including but not limited to racial/ethnic composition, poverty, etc.

Competencies

The ideal student for this project would have strong data science and programming skills, specifically as it relates to geocoding data and building a dashboard and/or web tools (e.g., R shiny apps). Additionally, students should have basic analytic skills and understanding of predictive model building. Lastly, experience with data visualization and spatial analysis will be especially useful for building the map-based web tool.

Learning Outcomes & Deliverables

There are three expected deliverables that will result from this project.

Creation of mapping/visualization tools showing the geographic distribution of manifestations of NYPD police interactions (e.g., arrests, cannabis court summons, criminal summons, desk appearance tickets, stop-question-frisks) at varying levels of granularity in NYC (e.g., precinct, census block).
Creation of a column-oriented database to store and access complied NYPD policing data linked to data from the American Community Survey.
An analytic model identifying geographic and sociodemographic predictors of increased exposure to policing in NYC.

Repairing Dallas: Leveraging data to improve housing quality

Project Sponsors

Ashley Flores, Senior Director (Housing), Child Poverty Action Lab
Owen Wilson-Chavez, Senior Director (Analytics), Child Poverty Action Lab

Project Abstract

Substandard homes severely impact resident wellbeing: deficient housing quality is associated with asthma and respiratory illness, lead poisoning, accidental injury, anxiety and depression, and poor academic outcomes. Data on housing quality is limited to MSA-level estimates of housing adequacy and subjective assessments by the local appraisal district, so it’s difficult for housing advocates to understand where housing quality issues are most acute and how to direct resources for repair. The project purpose is twofold: (1) identify neighborhoods in Dallas where there is poor housing quality and (2) develop a sampling and surveying approach to collect granular data within high-repair neighborhoods.

Project Details

Project Description & Overview

Housing quality matters for the mental, emotional, and physical health of residents, but the 2019 American Housing Survey reports that 27,600 housing units in the Dallas-Fort Worth Arlington MSA are severely inadequate. Research indicates that housing quality issues are more severe for people of color, people living in poverty, single parents, and renters. Although there is great need within Dallas’ housing stock, we lack actionable data to elevate the issue of housing quality, better direct limited resources, and advocate for more resources to ensure Dallas residents have a healthy home. Through this Capstone project, we hope to leverage existing datasets to design a methodology for calculating housing quality at a smaller geographic unit in order to identify neighborhoods in Dallas where there is disproportionately poor housing quality that needs to be remedied. This could take the form of a housing quality index that contemplates various data sources and takes into account both renter- and owner-occupied units. In addition, the CUSP team will estimate the cost of housing repair needs in Dallas (see 2019 report from the Philadelphia Fed entitled, Measuring and Understanding Home Repair Costs, as an example using 2017 AHS data). Finally, the CUSP team will develop a sampling and surveying approach to collect more granular data within neighborhoods indicating a high need for repairs. This framework can then be deployed on-the-ground in Dallas in target neighborhoods to better understand specific needs and direct resources, like home repair programs, to units where they’re most needed.

Datasets

Datasets available for this project include Dallas Central Appraisal District property-level data, the Census Bureau’s bi-annual American Housing Survey, the American Community Survey (for relevant household data), CoStar multifamily data (e.g., property class, unit features — like A/C, and property age), and City of Dallas code violations. Other potential datasets include multifamily and single family rental inspections by the City of Dallas, units with failed inspections from the Dallas Housing Authority, and MLS property listing data. Potential datasets require additional steps for CPAL.

Competencies

Specific skills that would be useful for students to have include spatial analysis and regression, econometric modeling, hedonic and/or multilinear regression, and sampling design. Nice-to-have is some understanding of housing quality/adequacy and its impact on residents.

Learning Outcomes & Deliverables

Expected deliverables include:

A descriptive analysis of housing quality in the City of Dallas, e.g., through the creation of a housing quality index applied to the smallest geographic unit possible;
An estimate of what existing home repairs in the City of Dallas cost (see 2019 report from the Philadelphia Fed entitled, Measuring and Understanding Home Repair Costs, as an example using 2017 American Housing Survey data);
A sampling and surveying framework that can be used on-the-ground in neighborhoods to collect unit-level data on housing quality and repair needs. If travel to Dallas is permissible for fieldwork, students could visit to test the sampling and surveying framework in a neighborhood that indicates high need based on the housing quality index. If travel is not permissible, CPAL staff or volunteers will use the framework for data collection efforts.

Simulating Interactions with Visually Impaired

Project Sponsor

Maurizio Porfiri, Institute Professor, NYU Dynamical Systems Laboratory (DSL)

Project Abstract

Urban environments represent particularly dire challenges for the mobility of the visually impaired, who must travel complex routes in often crowded and noisy conditions with limited to no assistance. To help visually impaired regain their independence, they are offered orientation and mobility training (O&M). However, O&M training represent a risk to the visually impaired, as it exposes them to dangerous situations and falls. We seek to overcome this issue by simulating O&M training in virtual and augmented reality (VR/AR), in which trainers and trainees interact within a safe and controlled environment that simulates part of a city.

Project Details

Project Description & Overview

Visual impairments will become a preeminent public health issue, as more baby boomers turn 65 and older. To reduce the impact of these disabilities on mobility, visually impaired attend orientation and mobility training (O&M) sessions, in which they learn techniques to travel safely within their community. These techniques include how to use a white cane, walk in a straight line, or cross an intersection in urban environments. Clearly, O&M training exposes visually impaired to potentially serious harm, including accidental falls, and undesired contact with people and objects.

Our previous work demonstrated that a virtual/augmented reality (VR/AR) platform can help overcome these dangers. Trainees can learn and practice new O&M techniques in a completely safe environment, before translating them in the real world. However, our previous study focused on a single player platform, which did not allow virtual interactions between trainers and trainees.

In this Capstone Project, students will extend our previous work by implementing a VR/AR multiplayer platform in which two users (trainer and trainee) interact in a virtual environment. VR/AR will be exploited to simulate visual impairments in the trainee. Students will design an O&M training in a highly dangerous realistic environment, such as a busy intersection in NYC, and implement it in VR/AR. They will formulate and perform hypothesis-driven experiments with human subjects, toward investigating technology-mediated interaction in training sessions. Ultimately, we aim at demonstrating the potential of a multiplayer VR/AR platform to train visually impaired persons in O&M techniques in a controlled, safe environment.

Datasets

No datasets required.

Competencies

Programming (preferably Unity, C#, and Python)
Data analysis and visualization (using R, Python, or MATLAB)
Statistics

We are looking for highly motivated students with a passion to explore and learn new concepts and ideas that range between engineering and medical science. Students should also show a keen and strong interest in rehabilitation and human-computer interaction.

Learning Outcomes & Deliverables

Students will learn how to develop advanced VR/AR platforms for future experiments;
Students will learn to design experiments involving humans and their technology-mediated interaction;
Students will learn to formulate and test research hypotheses in a statistical framework.

Trustworthy AI for Human Machine Interface

Project Sponsors

S. Farokh Atashzar, Assistant Professor, Medical Robotics and Intelligent Interactive Technologies Lab (MERIIT @ NYU)
Jackie Libby, Smart Cities Postdoctoral Associate, NYU CUSP

Project Abstract

We exist in a world with advanced AI, yet there is a lack of AI for assisting disabled populations who cannot achieve basic manipulability functions. This project will be to develop trustworthy Machine Learning models to to address existing problems of human-machine interfaces.

Project Details

Project Description & Overview

This research project will try to answer whether: 1) AI models can be trained faster to exclude the need for high-performance computers; and 2) if calibration can be minimized for new users so that neuro-robots are easy to use and ubiquitous in less-developed regions.

For this, we will develop a new biosignal processing pipeline using artificial intelligence, specifically a shallow-hybrid neural network, which includes an engine for modeling long term and short term dynamical dependencies in the signal space. The model will be tested in comparison with exiting state-of-the-art algorithms that we have developed in the last years. For this, data will be used from large datasets available to us. There is a potential possibility of involvement in data collection, depending on the academic and research background.

Students are more than welcome to contact f.atashzar@nyu.edu with questions. See one of our recent efforts with applications in neurorobotics for more context.

Datasets

We will use available large data sets on high-density electromyography and will try to predict the intended gestures. The data set includes high volume of signals collected from the upper limb of ~50 human subjects.

Competencies

Academic/research background in machine learning and/or signal/data processing is encouraged.

Learning Outcomes & Deliverables

Signal Processing, Deep Learning, Human-Machine Interface

Understanding Public Opinion About the Police in New York City

Project Sponsor

Maurizio Porfiri, Institute Professor, NYU Dynamical Systems Laboratory (DSL)

Project Abstract

Defunding the police is a polarizing topic that is on the rise in the United States. Public opinion is generally divided due to controversial recent events that have involved law enforcement officers (LEOs). However, how we perceive or not violence around us likely contributes to our own assessment of LEOs’ necessity. In this project, we seek to carry out an analysis about the interplay between these two factors to study if violent incidents, whether from LEOs or criminals, shape the opinion of New York City (NYC) inhabitants.

Project Details

Project Description & Overview

Recently, public opinion has been inflamed by controversial police actions, in particular, by the use of excessive force to maintain public order. However, other factors are at play in defining public opinion to police. On one hand, violent episodes might have generated opposition to police. On the other hand, an increase in local crime could have fueled the demand for stricter law enforcement.

In this project, we seek to understand how local crime and incidents of police brutality contribute to shape public opinion. NYC is a great framework to investigate this relationship, due to its abundant data. To this end, we will undertake a massive data collection effort, by cataloguing tweets, which will allow us to track public opinion citywide using machine learning techniques for sentiment analysis. To obtain an accurate description of citywide violence, geolocated data on crimes will be collected from NYC Open Portal and NYPD databases. For the brutality episodes involving LEOs, we will build a dataset relying on Washington Post fatal police shootings and crowdsource databases.

We will test hypotheses that entail the driving forces behind public opinion: i) “Does the increase in crime rate lead to an increase in police supporters?”; and ii) “Does an increase in police brutality lead to higher support for defunding of police?”. We will apply parametric and non-parametric statistical tools to test our hypotheses and elucidate the emergence of spatio-temporal patterns. The results of the study will shed light on drivers of public-police relations and provide evidence to reform policy-making.

Datasets

Local Crime
- Geo-located local crimes from 2006 to 2020 will be obtained from the NYC Open Data Portal
- NYPD keeps records, on an hourly basis and on a street level of all the various crimes committed in NYC from January 2020 up to date
Police Brutality
- Washington Post records police killings from 2015 up to date with geolocation
- Official CAPstat registers payroll info, disciplinary summaries, and federal lawsuits from 2015 to 2018
- Mapping Police Violence Initiative from 2013 up to date
- Fatal Encounters Initiative from 2000 up to date
Tweets
- Tweets will be collected using the official Twitter API with the help of Python package Tweepy and through The Ohio State University software Hydrator
- Older tweets will also be collected through web scraping through Twint GitHub initiative.

The Sentiment Lexicon dictionary from the University of Pittsburgh and the dictionary of sentiment words from Bing Liu and collaborators (University of Illinois Chicago) will be used for the sentiment analysis.

Competencies

Statistics
Data extraction and web scraping (preferably Python or R)
Data analysis and visualization (using Python, R or Matlab)
Programming (preferably Python, R, or Matlab)

We are looking for highly motivated students with a passion to explore and learn new concepts and ideas, and with an interest in social media, sentiment analysis, and data science.

Learning Outcomes & Deliverables

Students will learn data collection and pre-processing methods and their importance.
Students will be trained in the scientific method approach and hypothesis testing, and they will learn data modeling tools for analysis purposes.
Students will learn to apply traditional tools of temporal analysis and information theory.

Health and Wellbeing

Challenges and Solutions for Walking with Assistive Wearable Robots in Urban Environments

Project Sponsor

Joo H. Kim, Associate Professor, NYU Applied Dynamics and Optimization Laboratory

Project Abstract

The goal of this Capstone project is to identify challenges for the locomotion of persons with lower-limb disabilities who use wearable robots (powered exoskeletons, prosthetics, etc.) for gait assistance. In particular, this capstone will focus on those related to urban environments, and possible solutions to overcome those obstacles. One of the reasons that wearable robots are not commonly used is the discrepancy between the challenging environment in real world as compared to well-controlled laboratory settings. In this capstone project, these under-explored aspects will be investigated.

Project Details

Project Description & Overview

Despite their recent advancement and growing demand, wearable robots (such as robotic exoskeletons and prostheses) developed for gait assistance for persons with lower-limb disabilities or older adults are still confined to laboratory settings. One of the main reasons for this is the challenging outdoor conditions and environmental hazards linked to outdoor falls, particularly in complex urban environments. Examples are poor snow/ice clearance and poorly maintained streets and sidewalks.

In this Capstone project, the factors that affect the usability and safety of gait-assistive wearable robots will be identified, and their possible solutions will be investigated. Broad range of aspects can be considered to identify the challenging factors, including but not limited to the perspectives of infrastructure, engineering, urban planning, and human factors. Any possible solutions can be suggested to enhance the reliability of assistive robots with respect to the urban hazards as related to stumbles, slips, and falls outdoors.

Datasets

The students may use any publicly available datasets as related to this problem.

Competencies

Graduate standing.

Learning Outcomes & Deliverables

Identify factors that adversely affect the safe use of gait-assistive wearable robots.
Possible solutions to those challenging factors.

Mapping Agricultural Production in NYC (M.A.P. NYC)

Project Sponsors

Wythe Marschall, Senior Research Project Manager, Food and Health, Invest NYC SDG (an initiative of the NYU Stern Center for Sustainable Business)
Alice Reznickova,
Ph.D., Industry Assistant Professor, NYU Tandon School of Engineering

Project Abstract

To support an expanded, more just, and self-sustaining urban agriculture sector, Mapping Agricultural Production in NYC (M.A.P. NYC) is using data science to conduct research into NYC’s current food production and distribution. Having created the M.A.P. NYC platform in 2021, the project team is seeking CUSP students to employ this tool to research gaps and opportunities in the local food landscape, analyzing links between urban agriculture and food security status, health outcomes, and land use. Specifically, we seek to set a baseline for agricultural production as a policy recommendation to the new Office of Urban Agriculture.

Project Details

Project Description & Overview

Mapping Agricultural Production in NYC—M.A.P. NYC—is a tool for farmers, gardeners, researchers, politicians, and food activists that shows all existing food production in the city, whether commercial, non-profit, community, or school-based. M.A.P. NYC displays key food production and distribution data, and each entry is editable by verified users approved by the corresponding farm or garden.

In 2022, we are seeking a CUSP team to use the M.A.P. NYC tool to conduct research into the urban agriculture sector and its links to food security and land use. This work entails a strong research component as well as a light platform-maintenance one, as we seek to extend the utility of the tool and continue to manage the influx of data from growers.

First, we are seeking a team of data science students to analyze the available data and establish the baseline of what foods are produced in NYC, and to inform reasonable estimates about the future growth of the urban agriculture sector over time. Second, we would ask the data team to identify specific urban farming opportunities and distribution gaps. That is, we seek to improve food security in NYC by laying a foundation for an expanded urban agriculture sector, and specifically by establishing how to support urban farming in low-income neighborhoods.

Successful CUSP work will result in a data-driven report on the present and future of the urban agriculture sector in NYC with special attention to policy recommendations and opportunities to meaningfully improve food security.

Datasets

The M.A.P. NYC tool may be extended by the new CUSP team. The current datasets that power the tool include:

Extant urban agriculture data
- GreenThumb gardens (NYC OpenData)
- GrowNYC website list
- NYCHA garden dataset (provided by NYC Parks)
- School gardens (provided by GrowNYC)
- Potential garden locations (Local Law 46, 2018, via NYC OpenData)
New urban agriculture data
- Survey of ~100 large (commercial) farms and gardens
- Community garden list curated by 2021 CUSP team
Food distribution data
- All retail food stores (NY State Open Data)
- SNAP retail stores (USDA Food and Nutrition Service)
- Food pantry locations (provided by City Harvest)
- DOHMH Farmers Markets (NYC OpenData)
Other data sources
- 2018 NYC PLUTO tax lot data (from DCP)
- Health and demographic data (from American Community Survey via Data2Go)
- NYC Subway stations (NYC OpenData)
- CDC Physical and Mental Health Data (CDC Open Data)

Competencies

The CUSP team should possess a command of relevant data science concepts and skills, including database management, statistics (including linear and nonlinear regression), and data analysis (Python).

Regarding the map tool, ideally, the 2022 team would also include a software engineer/developer with some full stack experience (Node back-end, Mongo database, React-hooks or Vue front-end) and some visualization experience (Mapbox GL, minimal D3). Spatial statistics (Turf) would be a plus. Basic web design skills, for maintenance of the tool, are also required, although the focus of this project is research.

Comfort with and interest in physical and human geography as well as visual design are important, although domain expertise in these areas, along with agriculture and the social sciences, are not required. The project managers will provide an introduction to research in food systems studies as well as facilitate contacts among farmers, gardeners, and other stakeholders as required by the CUSP team’s research plan.

The team should expect to attend weekly meetings by Zoom and collaborate on research documents using the Google Workspace suite of tools.

Learning Outcomes & Deliverables

The CUSP team will learn to collaborate across disciplines, bridging data science into food studies, business anthropology and sociology, critical geography, urban design, and user experience design. Large and dynamic datasets will inform the construction of a research methodology into the urban agriculture sector’s present character and possible future paths. The tool—a public, visual map—will serve as both a resource for research as well as an object to be improved by research.

In terms of social scientific research outcomes, the CUSP team will learn to develop and refine a clear research question regarding food and society, focusing on the role of urban agriculture. The CUSP team will develop a methodology for correlating urban agriculture data with data regarding other social phenomena (e.g., neighborhood-specific food security, neighborhood-specific health outcomes), as well as identifying and categorizing opportunities for novel urban farms and their likely impacts on health.

This research process will culminate in two deliverables: one, a report that summarizes research findings with an eye toward policy recommendation and also points to future research needs, including methods for addressing any gaps identified in the data. Two, the CUSP team will update the M.A.P. NYC tool as possible in response to their research findings (e.g., improving the visualization of gaps and opportunities and/or adding new layers of data on socioeconomic indicators).

Shape Estimation and Data-driven Intelligent Control of Soft Robotic Upper-limb Exoskeletons for In-home Telerehabilitation

Project Sponsors

Jackie Libby, Smart Cities Postdoctoral Associate, NYU CUSP
S. Farokh Atashzar, Assistant Professor, Medical Robotics and Intelligent Interactive Technologies Lab (MERIIT @ NYU)

Project Abstract

In our aging society, neuromuscular disorders like stroke are becoming more prevalent. With that comes an increasing need for labor-intensive physical therapy, which is prohibitively expensive for many patients in need, resulting in long-term paralysis from lack of appropriate care. Soft robotic exoskeletons can deliver safe, in-home, and quantifiable teletherapy for these patients. We are building a soft exoskeleton to control the hand, wrist, and elbow. We are fabricating, sensorizing, and controlling soft modular actuators. Shape estimation and control of soft robots is nontrivial. In this project, CUSP students will work with us to fabricate soft robots and train machine learning models for shape estimation and data-driven control.

Project Details

Project Description & Overview

In the MERIIT lab, we are building a soft robotic exoskeleton for telerehabilitation. It has over 15 degrees of freedom (DOF). Each DOF is controlled by a soft module, custom-made in our lab with 3D printing and casting, and pneumatically controlled.

Soft robotic actuators are continuum robots. The kinematics of rigid robots can be captured by simple encoders. The kinematics of soft robots must be modeled with many more parameters. These models are unknown, subject to uncertainties and unmodeled dynamics. Thus, the control of these complex systems is a challenging problem. For a rigid robot, the reference commands can be analytically calculated. With a soft robot, machine learning and data-driven modeling combined with analytical computation can be used to map the reference commands to the resulting shape.

Students will be given experimental data from our soft robotic modules. The data includes pressure inputs from the pump station and optical data from cameras. They will clean and process this data and then build machine learning algorithms. They will use computer vision to perform shape estimation. With self-supervision, the shape labels will be used to train a model. The model will take pneumatic pressure commands as input, and it will output the resulting shape of the soft module. Students will also get involved with soft robotic fabrication, controls, sensorization, processing of other biosignals in our lab, and the corresponding learning techniques.

Datasets

We have collected data on the kinematic behavior of a large amount of soft robotic actuators being custom-made in our lab. The data includes optical/vision sensing, force sensing, and pressure/voltage readings from pneumatic pumps. We have been collecting this data for many of our custom-fabricated actuators. These actuators differ in size, form, function, and material properties, resulting in very rich data. We also have simulated data from Finite Element Analysis models for our actuators, which can be fused with our real data for hybrid learning approaches. We continue to collect new data from other biosignals in our lab, including Electromyography (EMG), Mechanomyography (MMG), microphones, and more.

Competencies

Learning Outcomes & Deliverables

A learned model to estimate the kinematics for soft robotic continuum actuators.
Experience with the modeling, control and design of soft robots.
Experience with computer vision, deep learning, and self-supervised learning applied to robotic systems.
Experience with the fabrication of soft robots.

Modern Civil and Communications Infrastructures

Addressing Complexity of Urban Networks with Deep Learning

Project Sponsor

Stanislav Sobolevsky, Associate Professor of Practice, NYU CUSP, Urban Complexity Lab

Project Abstract

Over the recent years, Graph Neural Networks (GNNs) have become increasingly popular in supplementing traditional network analytic techniques. The capstone project will seek proof-of-concept applications on the GNNs and the Hierarchical GNNs in particular to diverse cases of urban network analytics ranging from urban mobility and transportation networks, social media analytics, social networks, urban infrastructure, environmental sensing and beyond.

Project Details

Project Description & Overview

A city is an interconnected complex system and requires network analysis to be understood. Over the recent years, Graph Neural Networks (GNNs) have become increasingly popular in supplementing traditional network analytic techniques. At the same time, many conventional approaches in network science efficiently utilize the hierarchical approaches to account for the hierarchical organization of the networks, and recent works emphasize their critical importance. Our lab is working on a novel model of the Hierarchical GNN, accounting for the hierarchical organization of the urban network and connecting the dots between the traditional network science approaches, vanilla Neural Network, and the GNN architectures. This Capstone project will seek proof-of-concept applications on the GNNs and the Hierarchical GNNs to diverse cases of urban network analytics ranging from urban mobility and transportation networks, social media analytics, social networks, urban infrastructure, environmental sensing and beyond. The practical applications may range from predictive modeling and detection of patterns, impacts, and emergent phenomena in urban mobility and social interactions, urban zoning and regional delineation, classification of urban actors and locations, detecting critical bottlenecks in urban infrastructure, data verification and extrapolation in sensing urban environment and/or quantifying population exposure to urban stressors.

Datasets

LEHD, NYC TLC and other taxies/FHV, CitiBike, public transit, Twitter, migration data, financial

Competencies

Network analysis, neural networks, pytorch or tensorflow, natural language processing and/or social media analytics experience is a plus (optional)
Background in urban transportation, planning, environmental sensing is a plus (optional)

Learning Outcomes & Deliverables

Learn how to train supervised and unsupervised graph neural network models;
Explore applicability of graph neural networks for urban network analysis;
Publication in multidisciplinary, urban or computer science venues.

Airport Departing Passenger Profile Curve at EWR Terminal B: Understanding passengers’ journey through PANYNJ airports

Project Sponsors

Rohun Iyer, Senior Data Scientist, Port Authority of New York & New Jersey
Ai Yamanaka, Data Analytics Program Manager, Port Authority of New York & New Jersey

Project Abstract

This project aims to create a prototype of a departing passenger profile curve to help EWR Terminal B proactively manage its terminal frontage, baggage/check-in, and TSA queues. Using data from various stages in a passenger’s journey at our airport terminal, we hope to estimate when and where passengers will be throughout their airport journey. The model should consider industry knowledge of passenger dwell times and other passenger preferences. Understanding how passengers interact with our terminal will allow EWR Terminal B management to highlight pain points in their journey and plan for future improvements in design or technology.

Project Details

Project Description & Overview

Air passenger travel behavior has become harder to predict following COVID-19. Having a clearer picture of when passengers arrive for departing flights, how long they wait for security inspections, and how they travel through the terminal will help airport operations team enact solutions (I.e. wayfinding, queue management, staff deployment, capital construction) to better improve customer experience at our airports. Prior to 2019, passenger show-up profiles, as well as general trends observed by industry experts, provided a reasonable model of how passengers flowed through the terminal. We hope this new prototype can help update these assumptions to reflect post-pandemic travel.

Using collected data from numerous points in the passenger journey through the airport (“from curb to gate”), we would like to build departing passenger profiles to create near-term and long-term passenger flow predictions and profiles. Our focus will be on EWR Terminal B (the only terminal that the Port Authority both manages and operates). We have insight into the use of terminal frontage, check-in counters, security checkpoints, and gates, but we have not been able to connect these disparate data sources to create an estimated passenger profile.

This prototype model and accompanying paper should answer the following questions:

When do passengers arrive at the airport? Why?
- How does seasonality or weather affect these times?
- Does this change depend on the type of passenger or where they are flying?
Where do passengers dwell in the airport? Why?
Can we predict pain points in the airport based on who is departing/arriving?

Datasets

We will be using various datasets available and regularly used by the data analytics team. These include, official detailed airline flight schedules, PA operational historical flight data, TSA throughput and wait times, CBP throughput and wait times, FHV (for-hire vehicles) frontage data, NYC TLC taxi data, Newark Terminal B baggage scanner data, PA Air Train hourly data, historical passenger survey data, MTA subway, and bus data, official publicly available government data from the DOT and FAA, official reported PA passenger counts, and an internal departing passenger TSA throughput predictive model output.

All the following datasets are available in our centralized data warehouse or accessible through various teams across the department:

Official airline flight schedules
- Arrival/Departure times
- Seat counts
PA operations historical data
- Delays, cancelations, taxi times
PA TSA Throughput and Wait Times
- Broken out by checkpoints
- Official throughput values also included
PA CBP Throughput and Wait Times
FHV Frontage Data
NYC TLC Taxi Data
EWR TB Baggage Scanners
PA AirTrain hourly usage
ACI Surveys
MTA Subway and Bus data
DOT O-D data
PA Official PAX counts
PA Departing Passenger Predictive Model

Competencies

GIS experience
- We would like a prototype map/simulation of what this could look like. We have shapefiles to provide.
Basic data analytics capabilities (in Python or R)

Learning Outcomes & Deliverables

Week/Month long simulation of passenger throughput at the terminal. Highlight hotspots of passengers at the airport through a day.
Analysis/paper into the different types of passengers and their interactions with our airport.
- This analysis should help us determine where and how we can improve operations at the airport.
Recommendations to applying similar methods to other terminals
Learning outcome: Machine learning experience in a business setting, creating assumptions and backing them up with evidence and building a platform to carry to the rest of the department

Audio-Visual Vehicle Localization for Urban Traffic Monitoring

Project Sponsor

Magdalena Fuentes, Assistant Professor, NYU Music and Audio Research Laboratory (MARL)

Project Abstract

Monitoring road traffic is key to ensuring user safety and smooth operation. Increasing traffic volumes impact the stress level of commuters and increase noise levels in communities, leading to health problems. Local authorities need reliable monitoring systems to create policies to help mitigate this. Ideally, automatic monitoring systems should be able not only to count vehicles but also to detect the type of vehicle (e.g. car, truck). In this project we aim to develop a system for classification of vehicles that delivers audio-visual data for the robust localization of vehicles in the wild.

Project Details

Project Description & Overview

This project investigates the use of audio-visual self-supervised deep learning models for the location of vehicles in urban settings, as a step towards building efficient urban mobility systems. Instead of using labelled data, self-supervised models learn by identifying intrinsic characteristics of the data, which they use to accomplish a given task. These models can be trained on unlabelled recordings and images of natural scenes, for which there is an abundance (e.g. YouTube videos), and they tend to outperform supervised models in practice.

This project consists of three stages:

A stage of data analysis from a well-curated dataset of audio-visual urban data to get familiar with the data and the problem;
Adaptation of a state-of-the-art self-supervised audio-visual model to work with this data;
Analysis, evaluation and visualization of results and document writing.

We will use data and code resources from previous work within our team. The goal of the project is to answer the questions: How well can we localize vehicles in urban settings with self-supervised models? Which conditions (e.g. poor lighting or noisy environments) affect the performance of these systems the most?

This project is a continuation of a previous Capstone project, which students can review for examples of the type of things this team will be doing.

Datasets

We will use a dataset of audiovisual road traffic monitoring from the MARL team.

Competencies

The students should be comfortable with Python and familiar with data analysis tools such as pandas or seaborn packages. Having a machine learning background is also desirable (basic classification models such as random forests and test/train splits for evaluation).

Learning Outcomes & Deliverables

To conduct such a project we need audio-visual annotated data to train and assess the performance of the system. We will use our team’s data for that, and a first deliverable would be an analysis of the dataset, its challenges and a definition of subsets of the data to address the problem at different levels of difficulty. Secondly, the students will get familiar and adapt a state-of-the-art self-supervised model (which code is publicly available) to work on our data. Such model was trained with large amounts of data and proved to work successfully in many cases, but has not been tested in urban data yet. So the second deliverable would be the code adaptation of this model to work with new data and a small technical report of the changes needed. Finally, the students will use the model to localize vehicles in urban settings and perform an ablation study on the impact of different conditions on the performance of the system. The final deliverable would be a report with a summary of the work carried out, and visualizations of the predictions that the model produced after its adaptation, and main conclusions along with the associated code used.

Behavior Modeling Using Multi-Modal Mobility Data

Project Sponsors

David K A Mordecai, Co-Advisor, RiskEcon® Lab for Decision Metrics; Visiting Scholar, NYU Courant
Samantha Kappagoda, Co-Advisor, RiskEcon® Lab for Decision Metrics; Visiting Scholar, NYU Courant

Project Abstract

Develop and demonstrate methods for analyzing patterns found in mobility data, such as the aggregate tracks of vehicle populations. An expanding collection of research indicates that information about the movements of populations (i.e., syntactic trajectories) provide informative patterns. This project will explore how multiple sets of data can be jointly analyzed.

Project Details

Project Description & Overview

1. The primary data corpus will be the traces for taxicabs in New York City.

2. Secondary sources of data include the following: (a) dates of movable holidays, e.g., Easter; (b) daily weather data e.g., temperature highs and lows, precipitation, and wind storm conditions (see examples [1] [2]).

3. Prospective ancillary (tertiary) data sources could further include arrivals and departures at one (or more) major transportation hub(s):

Cruise line and ferry terminals
Airport(s) (e.g., La Guardia Airport, JFK Airport, Newark Airport)
One or (more) train station(s) (e.g., Penn Station or Grand Central Station)
Sports/performance venues (e.g., Meadowlands, Yankee and Shea stadiums, Barclay Center, Prudential Center) and convention centers (e.g. Madison Square Garden, Javitz Center);
- Corresponding schedules for major sporting events, performances, and conventions or trade shows.

These data will be employed to investigate patterns across syntactic traces/trajectories in order to perform exploratory data analysis and unsupervised learning related to the following questions:

What is the correspondence between the volume of taxi departures from a particular transit hub and the arrival times of trains or planes?
What diurnal patterns are evident?
How does the level of activity vary over the week (e.g., weekdays, weekends, holidays)?
What are differences exhibited due to weather conditions?
What patterns are exhibited when comparing the trace data to the schedules of events? For example, is the destination of someone arriving by train more likely to be a sports/performance venue or a convention?
- Do differences correspond to train arrival terminals, holiday date(s), time(s) of arrival (diurnal/nocturnal), or weather conditions (e.g. temperature, precipitation)?
Are transit disruptions (e.g., flight cancellations, rail delays) detectable by analyzing the taxi activity?

Another adviser to this project is John Irvine, Department Manager for Civil Defense at MITRE Corporation, and affiliated with RiskEcon® Lab as a Senior Science Advisor-in-Residence with a PhD (Yale Mathematical Statistics), as well as Adjunct Professor appointments on the Health Faculty at Queensland University of Technology and the Institute for Glycomics at Griffith University. Previous to MITRE, he was the Chief Scientist for Data Analytics at The Charles Stark Draper Laboratory, Inc. With 40 years of professional experience, he has led numerous projects in remote sensing, and served on multiple boards and advisory panels, and is active in the research community with over 200 journal and conference publications.

*References:

Datasets

The primary data corpus will be the traces for taxicabs in New York City. Secondary sources of data include the following: (a) dates of movable holidays, e.g., Easter; (b) daily weather data e.g., temperature highs and lows, precipitation, and wind storm conditions (see examples [1] [2]).

Competencies

Reasonable proficiency with statistical applications in NumPy, SciPy and/or R
Basic familiarity with exploratory data analysis, clustering, anomaly detection, outlier analysis is helpful

Learning Outcomes & Deliverables

Problem-solving and experimental design skills with real-world application.
Domain-specific application of statistical reasoning and hypothesis testing.

Building Accessible City by Self-Supervised Visual Place Recognition

Project Sponsors

Chen Feng, Assistant Professor, NYU AI4CE Lab
Chao Chen, Ph.D. Candidate, NYU AI4CE Lab

Project Abstract

Visual Place Recognition (VPR), which aims to identify previously visited places during navigation. In the context of SLAM, VPR is a key component in relocalization when the tracking is lost. Existing learning-based visual place recognition methods are generally supervised and require extra sensors (GPS or wheel encoder) to provide ground truth location labels. Differently, we want to design a self-supervised method for visual place recognition which can smoothly recognize the visited locations in a single scene environment without any ground truth labels. The method should be able to handle a variety of input modalities, including point clouds and RGB pictures.

Project Details

Project Description & Overview

Visual place recognition (VPR), aiming to identify the re-visited places based on visual information, is a well-known challenging problem in computer vision and robotics communities because of visual appearance variation. VPR is crucial in autonomous navigation systems, and is closely related to the concepts of re-localization, loop closure detection, and image retrieval.

Most of the state-of-art methods are supervised, which requires geographical location information in the training dataset. For most indoor scenarios, geospatial information, like GPS, is not obtainable for supervised training. And if GPS can be retrieved, the need for visual place recognition will be less essential. This would reach a paradoxical situation.

The team will work through three stages:

General Self-Supervised model and Supervised Model as Baseline (40%)
- Supervised Model: The team will implement a Supervised method
- Self-Supervised model: The team will modify the current model to an Self-supervised model version purely dependent on the temporal information
Literature review on visual place recognition (10%)
- The team will complete a review on visual place recognition
Design and construct a suitable model for image and point cloud dataset (50%)
- Create a large scene Point Cloud dataset based on template given
- Create a habitatsim environment for Image dataset using a 360 degree RGBD camera
- Design a new suitable framework to fill the performance gap between supervised and self-supervised methods

Datasets

Image dataset from previous year: https://ai4ce.github.io/NYU-VPR/
Existing point cloud dataset can be used for initial testing. After the new model is developed and deployed, we should deploy the model on 2D real 360 degree point cloud dataset and 2D simulated habitatsim environment. We could also collect image dataset ourselves.

Competencies

Machine learning
- Contrastive learning
- Supervised learning
- Self/Weak-supervised learning
- Computer vision experience
- Feature learning
Python SciPy stack and PyTorch DL library
- SLAM
- Topology Mapping
- Self-supervised localization
Technical experience
- Python programming (required for >=2 team members)
- Machine learning
- Data processing pipelines
- Documentation
Dataset experience
- Create simulated dataset
- Play with real dataset and check how real dataset differs from simulated dataset

Learning Outcomes & Deliverables

The expected deliverables for each project stage are:

An self-supervised model that operates with a given minimum performance level on the provided test data.
A literature review on visual place recognition, contrastive learning, and feature learning representations.

All deliverables will be based around Jupyter notebooks and committed to a well documented public GitHub repository.

Data-Driven Agent-Based Modeling of Fake News Dynamics Over Online Social Networks

Project Sponsor

Quanyan Zhu, Associate Professor, NYU CUSP

Project Abstract

The spread of misinformation through social media has led to significant issues in sectors like public health and political discourse. We leverage Twitter data to create misinformation models using agent-based models. We aim to understand the spreading pattern of misinformation and the human response to it. This research will also create intervention mechanisms that will combat the spreading of fake news and its impact on the population.

Project Details

Project Description & Overview

The pervasiveness and accessibility of social media across vast networks of people have rendered it a prime target for spreading malicious misinformation. Historically, social bots have often been deployed targeting certain groups of people in a social network, often with a political agenda. The recent spread of the coronavirus pandemic has also suffered from the spread of harmful and misinformed health-related news. This project aims to use agent-based modeling to model the spread of misinformation in a social network, and couple the spread of misinformation with the spread of a real-world disease. The project aims to use Twitter data to create a heterogeneous network to replicate a real social network by accounting for varying node centrality and creating connections based on shared geographical, political, or general interest attributes. This research will deploy social bots that constantly spread misinformation in the network. We will study the vulnerabilities of the agents based on the agent’s political biases, trust with the agent sharing the misinformation, as well as previous experiences with getting misinformed. We aim to understand how the misinformed agents respond to health-related misinformation and predict the impact of fake news in the real world.

Datasets

There are many online Twitter datasets that can be used for this research. The students can also collect their own datasets from Twitter.

Competencies

The students should have fundamental programming skills and interest in system modeling and research.

Learning Outcomes & Deliverables

Create agent-based models based on datasets.
Analyze the pattern of spreading.
Create methods to combat fake news and its spreading.

Learning Efficient Multi-Agent Robotic Navigation for Exploration and Mapping

Project Sponsors

David K A Mordecai, Co-Advisor, RiskEcon® Lab for Decision Metrics; Visiting Scholar, NYU Courant
Samantha Kappagoda, Co-Advisor, RiskEcon® Lab for Decision Metrics; Visiting Scholar, NYU Courant
Giuseppe Loianno, Professor, NYU Agile Robotics and Perception Lab (ARPL@Tandon)

Project Abstract

This project involves formalizing, both theoretically and experimentally, distributed multi-robot (i.e. swarm) navigation and exploration problems leveraging Graph Neural Networks (GNNs) architectures within the context of reinforcement learning policies. This approach directly employs the graph structure to enable resolution of the multi-robot navigation problem, by leveraging a distributed representation functionality of modern GNN methods, and thereby potentially enabling prospective scalability for large swarms comprised of hundreds of agents. The approach is initially envisioned as a model-free implementation with the option to extend to a model-based (or hybrid) implementation for comparison, as well as scaling to a large number of agents (𝑛 >> 3).

Project Details

Project Description & Overview

Design and Simulation Setting:

Each drone has its own policy and decides its action independently.
The state-history (i.e., mapping coverage of the environment) and relative position of each drone comprising the swarm is signaled pairwise.
State and action spaces are discrete.
Initially a small number of drones employed (𝑛 ≤ 3) and then scaled (𝑛 > 3).
Initially a two-dimensional spatial environment (which could be expanded to three-dimensions).

This represents a further advancement by the extension and expansion of progress for a previous CUSP Capstone project, towards a publication which would incorporate results of both the proposed CUSP Capstone project and the previous CUSP Capstone project. The previous Capstone project in process only entails convergence of strategies for discrete stepwise navigation actions and coverage for adaptive motion planning by a pair of drones (𝑛 = 2), in order to optimize a pairwise objective function via reinforcement learning policies acting on tables.

One limitation to scale for this setting is that of redundant action increasing in proportion to the number of drones acting independently. This limitation to scale might be addressed by applying the aforementioned GNN policy network. In the GNN policy network, each node represents a drone, and each edge represents communication between drone pairs regarding state information (e.g., Kalman state transition matrix representation and/or syntactic traces as a transition vector representation summarizing path history) and corresponding relative position of each respective drone. Using such state information, the drone will be able to take optimized action to map the environment collaboratively. We can evaluate its performance by how efficiently the drones mapped the environment (e.g., time steps to reach 95% coverage). By employing a GNN, the aforementioned corresponding state-action spaces for each drone prospectively might be further augmented with structured and unstructured data streams, e.g., optical flow data (images collected by each drone) as supplemental state data and inertial measurement unit (IMU) data (drone acceleration) as supplemental action, in order to govern autonomous both agent-specific and aggregate drone swarm behavior in real-world environments.

Datasets

Synthetically generated data from simulation.

Competencies

Proficiency with coding (e.g. SciPy, NumPy, Julia).
Basic background or familiarity with robotic SLAM, adaptive state estimation/control, or state modeling (e.g., Kalman Filter) is beneficial.

Learning Outcomes & Deliverables

We expect to produce publications and source code related to this project

Low-Power Computer-Vision Counting Research

Project Sponsor

Paul Rothman, Director, Smart Cities + IoT Lab, NYC Mayor’s Office of the CTO

Project Abstract

Many City agencies are involved in the use, planning, and design of public space but good data on pedestrian flows can be hard to come by. Manual counts take coordination and planning as well as staff costs. Computer-vision (CV) counting technologies are being tested in the city now but it is already clear that the infrastructure requirements (tapping into electricity and mounting to light poles) will be a limiting factor in using this technology more broadly and particularly for shorter-term studies where the infrastructure investment is not worth the time and effort. A low-cost, battery-powered CV sensor can help fill the data gap and allow agencies to utilize privacy-protected automated counts in short-term deployments with minimal infrastructure requirements.

Project Details

Project Description & Overview

In recent years, many hardware manufacturers have created development boards that support low-power computer vision (LPCV) applications. In addition, there has also been a fair amount of research done within academia to create low-power models for LPCV. This proposal aims to take advantage of recent technology advances to develop a hardware device that can be battery operated and utilized by New York City agencies to count pedestrians as they move through public space in the city. As an added resource to the proposed R&D, partnering with a technology developer as a development partner is a possibility.

In terms of requirements, the device should aim to work in outdoor environments, run off a battery for 2-4 weeks (either standalone or with PV), connect to the cloud via LoRaWAN or cellular, and be able to detect at least one object type at a time, e.g. pedestrian or cyclist).

Datasets

This project is design to create new datasets.

Competencies

Hardware engineering, AI/ML/CV development, data visualization

Learning Outcomes & Deliverables

An understanding of how to achieve useful computer vision applications using low-power electronics and visualizing count data contextually in ways that make it relevant for the agency use case.

Rapid Detection of Power Outages with Time-Dependent Proximal Remote Sensing of City Lights

Project Sponsors

Gregory Dobler, Assistant Professor, Multi-city Urban Observatory Network, University of Delaware
Luis Ceferino, Assistant Professor, NYU CUSP

Project Abstract

In this capstone project we propose to use imaging data collected from CUSP’s “Urban Observatory” (UO) facility to build a data processing pipeline and application that can detect and geolocate power outages and restoration in near real-time. The UO’s historical imaging data set includes visible wavelength images of Manhattan at a cadence of roughly 10 seconds per image. An analysis of the lighting variability patterns in these images will be used to identify clusters of lights synchronously turning “off” in the images and with photogrammetric techniques, the geospatial location of those lights will be determined.

Project Details

Project Description & Overview

The “Urban Observatory” (UO) was first created at CUSP as a facility for studying cities as complex systems using proximal remote sensing (Dobler et al., 2021, Remote Sensing, 13, 8, p.1426). Operationally, the UO consists of imaging devices atop tall buildings located at a distance of 1-5 miles from a city that operate in “survey mode”, continuously acquiring images of the city skyline and transferring those images back to a central server for analysis. Typically, the cadence for image acquisition at visible wavelengths is one image per 10 seconds. Previously, we showed that an analysis of these images at night using signal processing, computer vision, and machine learning techniques yields diurnal patterns of lighting variability for city lights (Dobler et al., 2015, Information Systems, 54, pp.115-126). In this proposal, we seek to leverage this capability to develop a method for automatically detecting power outages in near real time by searching the historical UO imaging data set for collections of “off” transitions of individual light sources that are spatially clustered in the scene and that occur simultaneously, indicating a likely power outage. We will then monitor those sources for the return (via “on” transitions which may or may not be simultaneous) to power restoration. Further, we will use the analysis coupled with simulated outages due to extreme conditions, e.g., hurricanes, to build outage and restoration models that take environmental and infrastructure conditions into account as key input variables for a probabilistic classification of likely outages (Ceferino et al. 2021: https://engrxiv.org/pu5da/).

Datasets

The primary data sets that will be used are the historical CUSP visible wavelength imaging data set (consisting of millions of images at 10s cadence over months and years) that is available on the CUSP servers, publicly available topographic LiDAR for photogrammetric geolocation of outages, and publicly available weather information (winds and precipitation for rain and snow).

Competencies

Students should be familiar with Python and statistical analysis on numerical datasets. Expertise with NumPy and array-based operations is a plus, particularly as it might relate to signal and image processing. Familiarity with geospatial data and operations using GeoPandas is also a plus, as is experience with interactive visualizations (Plotly, Bokeh, etc.) or dashboard design (e.g., JupyterDash).

Learning Outcomes & Deliverables

Technical learning outcomes:
- Image processing and computer vision
- Dashboard design and construction
- Large scale data fusion methodologies
- Probabilistic modeling and risk analysis
Qualitative learning outcomes:
- Estimates of power distribution continuity
- Rapid alert systems for outages, even in the absence of monitoring
- Situational awareness and emergency response assessment

RealCity3D: A Large-Scale Georeferenced 3D Shape Dataset of Real-world Cities

Project Sponsors

Chen Feng, Assistant Professor, NYU AI4CE Lab
Wenyu Han, Ph.D. Candidate, NYU AI4CE Lab

Project Abstract

Existing 3D shape datasets in the research community are generally limited to objects or scenes at the home level. City-level shape datasets are rare due to the difficulty in data collection and processing. However, such datasets uniquely present a new type of 3D data with a high variance in geometric complexity and spatial layout styles, such as residential/historical/commercial buildings and skyscrapers. This work focuses on collecting such data, and proposes city generation as new tasks for data-driven content generation. In addition, a proposing new city-level generation models is also included in this project.

Project Details

Project Description & Overview

As an important arena for human activities, cities have been a focal point of research. Alongside the rapid advancement of image/video generation, data-driven 3D city generation has become more feasible and appealing because of 1) the increasing availability of city-level remote sensing, and 2) the intensification of data-driven methods in architecture and urban planning.

While deep generative models are successful for various data modalities, including language, audio, image, video, and even point clouds, the limited publicly available 3D real-world city datasets makes it difficult to apply deep generation methods towards city-level geometric generation.

The team will work through 3 stages:

General city-level generation model as baseline (60%)
- The team should try to retrain our baseline models on Realcity3D datasets and extend our AETree baseline to more blocks.
- The team should propose some city-level generation models (should include 2D and 3D data) and train on Realcity3D datasets.
Literature review (10%)
- The team will complete a review on data-driven generation methods and datasets for geometric generation.
Data collection and cleaning on other cities (30%)
- The team should follow the same way of collecting data as the author did to collect more data on other cities in order to increase the diversity of the Realcity3D dataset.

Datasets

Existing Realcity3D data of NYC and Zurich will be used for stage 1. We need to collect more data of other cities in stage 3.

Competencies

Machine learning
- Supervised learning
- Self-supervised learning
- Computer vision experience
- 2D and 3D data processing
- Good code and data management skills
- Python basic and PyTorch DL library
Technical experience
- Python programming (required for >=2 team members)
- Data processing pipelines
- Documentation
Data management experience
Privacy and data
Data collection and analysis
Ethics

Learning Outcomes & Deliverables

The team will be using a broad range of deep learning models that will result in proven abilities in: computer vision, content generation, urban planning, and machine learning.

The expected deliverables for each project stage are:

Retrain and extend our AEtree baselines and get reasonable results.
Propose some generation models for city-level geometric generation and get some baseline results.
Extend Realcity3D data to more city data.

All deliverables will be based around Jupyter notebooks and committed to a well documented public GitHub repository.

Study of Indoor Spaces Occupancy and Its Correlation with the Performance of HVAC System

Project Sponsors

Yurii Piadyk, Ph.D. Candidate, NY CUSP
Juan Pablo Bello, Professor and Director, NYU CUSP

Project Abstract

Buildings consume around 40% of total US energy use, while heating, ventilation, and air conditioning (HVAC) systems account for 74% of building energy consumption. Current HVAC systems often rely on a fixed schedule, which typically results in conditioning of indoor spaces unnecessarily, without knowing the actual flow of the users. In this project, we directly measure the occupancy of university indoor spaces using a distributed sensor network. We then investigate correlation between the performance of the HVAC system and actual occupancy of these spaces to provide insights into building use patterns for adaptive control strategies of the HVAC system.

Project Details

Project Description & Overview

Without knowing the exact occupancy, building HVAC control systems may set air flow rates for ventilation at a high percentage of maximum air flow rate unnecessarily. Overventilation results in significant energy use and discomfort for occupants.

We plan to use a Reconfigurable Environmental Intelligence Platform (REIP) and a set of 4-6 existing sensors with video and edge computing capabilities to directly measure the user occupancy in public areas of NYU CUSP facilities. For privacy reasons, no video data will be stored but only the outputs of live object (e.g person/user) detection. Students will need to extend the sensor capabilities to environmental (i.e. temperature & humidity) sensing by designing a simple hardware module based on Arduino microcontroller and implementing a corresponding REIP software block.

A total of a couple weeks of data collection (mid-project milestone) will then be used to correlate the occupancy of the spaces (strategically chosen using the floor plans) with the performance of the HVAC system (i.e. temperature & humidity at that time). Our findings about the building usage patterns could help reduce energy waste, carbon and environmental footprint of the building via suggested adjustments to the air conditioning regime. The project is also aiming to demonstrate feasibility of live detection of indoor spaces occupancy using REIP platform, that could be used in dynamically controllable HVAC systems for even better performance and energy efficiency.

Datasets

The data will be acquired by the students with an option of cross-checking with the data from the building HVAC system (upon availability).

Competencies

An ideal size of the team working on this project would be three students with a collective prior experience in at least one of:

Work with physical sensors (Arduino / C);
Python programming language;
Data analysis and visualization (e.g. Pandas / Matplotlib, R or MatLab).

Learning Outcomes & Deliverables

After completing the project, students will learn how to design physical sensors for measuring air temperature & humidity, and extend REIP by providing corresponding hardware and software components. Students will acquire weeks worth of user occupancy data and analyze it using Python programming language and common libraries, such as numpy, pandas, matplotlib, etc.

The Electric Commute: Envisioning 100% Electrified Mobility in New York City (TEC-NYC)

Project Sponsors

Robert Mieth, Postdoctoral Researcher, SEARCH Group, NYU Tandon
Yury Dvorkin, Assistant Professor, NYU CUSP

Project Abstract

Every day, almost two million persons enter and leave the central business district of Manhattan using light-duty vehicles such as cars, taxis, vans, or trucks. Currently, around 1% of these vehicles are electric. This project aims to quantify the ramifications of a 100% electric commute in New York City. We will create a model that translates NYC’s transportation needs into electric charging demand, including emerging mobility trends (e.g., electric scooters) and remote work patterns. Interactive visualizations produced by the model will allow citizens, urban planners, and politicians to analyze the impact of mobility electrification and their policy decisions.

Project Details

Project Description & Overview

TEC-NYC will offer unique insights on the practical challenges of a 100% electrified transportation sector in dense urban areas. This project focuses on New York City, but the methodology will be transferable. The project comprises three central milestones, each focusing on different skills of data processing.

First, we will create a comprehensive data set on the transportation needs between central Manhattan and the greater New York Metropolitan area. This data set will differentiate between the various modes of transportation and the commuted distance. Complementary, we gather data on NYC’s power system infrastructure and the technical specification individual electric transportation (e.g., e-vehicles, e-bikes, and e-scooters).

Second, we will combine the gathered data such that we can answer the following questions: If all transportation switches to e-mobility, what will be the charging demands in the city? How many commuters can switch between different modes of transportation? What would be the best combination of commuting modes for the needs of the commuters and the available infrastructure? This milestone requires data-analytics skills, including identifying cross-correlations and extrapolating data trends, e.g., to quantify the impact of remote working arrangements.

Finally, TEC-NYC will visualize the data in an interactive and engaging manner. The user will be able to change model parameters (e.g., how many commuters exchange their car for an e-bike) and observe the impact on the city’s power system. We will ask: Can you tune The Electric Commute to suit the city and its citizens?

Datasets

The data required for TEC-NYC is readily available through online resources and from our previous projects.

The transportation model will mainly be derived from the “Hub Bound Travel Data” published by the New York Metropolitan Transportation Council. Additional required data is publicly available from the NYC Open Data Platform, the NYC Department of Transportation and the Port Authority of New York and New Jersey. We will provide a detailed list of the relevant data sets and their sources.

Information on e-mobility technology and NYC’s power system infrastructure is available from our previous research. Data on distribution and transmission infrastructure will be “artificial but realistic” to accommodate security concerns related to the publication of real data on critical infrastructure. This data has been tested and used and will provide a realistic foundation for the planned analyses.

Competencies

Students should have fundamental knowledge on data analysis and processing, i.e., be at least comfortable with Excel and have some experience in a high-level programming language (Python or Julia are preferred). Further, students should be familiar with fundamental methods of data analyses such as regression models, correlation analyses and histograms.

Ideally, the students have advanced knowledge in Python, Julia, Matlab or similar data processing language, have experience in collaboration and version control tools such as Git, and are familiar with data-visualization packages such as Plotly or Bokeh.

Learning Outcomes & Deliverables

Each of the three milestones of TEC-NYC will address a central data analyses skill and will provide important insights. During the data collection, project members will learn how to access and handle large public data sets. They will learn the fundamentals of good data hygiene and database control. Students interested in power systems will gain additional insights through infrastructure data that is not publicly available. The initial data set will be the first deliverable.

For the second milestone, students will be required perform data anlysis tasks, e.g., suitable aggregation and de-aggregation, identifying trends and correlations. Depending on progress, more advanced data analytics methods, e.g., modelling traffic patterns using Markov decision processes are possible. The resulting second deliverable will be a numerical model that, at minimum, maps commuter numbers and modes of transportation, to electricity demands in the city.

Finally, the third milestone will strengthen the project member’s skills to design and implement an interactive data visualization tool. Depending on the student’s interests and previous experiences, such a tool can be created online of offline, with or without real-time calculation abilities. For this deliverable, we aim to focus on creating a visualization that not only makes a large data set accessible but is engaging to the user. We plan to achieve this by pursuing a carefully “gamified” approach, e.g., by asking provocative questions to the user or including exaggerated visuals if the user chooses infeasible input parameters.

V2X-Sim - Collaborative Perception for Self-Driving in Urban Scenes

Project Sponsors

Chen Feng, Assistant Professor, NYU AI4CE Lab
Yiming Li, Ph.D. Candidate, NYU AI4CE Lab

Project Abstract

Vehicle-to-everything (V2X), which refers to collaboration between a vehicle and any entity in its vicinity via communication, might significantly increase perception in self-driving systems. Due to a lack of publicly available V2X datasets, collaborative perception has not progressed as quickly as single-agent perception. For this capstone project, we present V2X-Sim, the first public synthetic collaborative perception dataset in urban driving scenarios. The team will train, test and deploy computer vision (CV) and deep learning (DL) models for collaborative perception on V2X-Sim dataset.

Project Details

Project Description & Overview

Vehicle-to-everything (V2X), which denotes the collaboration between a vehicle and other entities such as vehicle-to-vehicle (V2V) and vehicle-to-infrastructure (V2I), seeks to help self-driving vehicles see further, better and even see through occlusion, thereby fundamentally improving safety. According to the estimation of U.S. NHTSA, there would be a minimum of 13% reduction in traffic accidents if a V2V system were implemented, which means 439,000 fewer crashes every year.

The V2X-Sim project aims to provide lightweight collaborative perception technologies for use in urban driving scenarios. The main tasks include: (1) design high-performance and low-bandwidth multi-agent collaboration strategy in the high-dimensional feature space, (2) develop effective and efficient multimodal learning framework based on RGB image and LiDAR point cloud, and 3) improve the system robustness against communication latency and sensor noise.

The Capstone team will train, test and deploy CV/DL models for the collaborative perception tasks including multi-agent collaborative detection, tracking and segmentation. Existing V2X-Sim dataset will be used for training and evaluation. Data-driven multi-agent 3D scene understanding methods could also be explored.

The team will work through 3 stages:

High-dimensional feature-based collaboration model (30%)
- The team will train and test a model built using V2X-Sim dataset.
Multimodal learning framework (40%)
- The team will design a multimodal learning framework based on different sensory input.
Robustness investigation (30%)
- The team will test the robustness of the model against realistic noise, and make improvements.

Alternatively, the team may choose to build a real-world V2X dataset, which is to be discussed with the PI.

Datasets

Given that building a collaborative perception dataset in the real world can be costly and laborious, we had built a virtual dataset to advance collaborative perception research. Specifically, we employ SUMO, a micro-traffic simulation, to produce numerically-realistic traffic flow, and CARLA, a widely-used open-source simulator for autonomous driving research, to retrieve the sensor streams from multiple vehicles located at the same intersection. Besides, we mount sensors on the traffic lights to empower the roadside to perceive the environment, and the sensor streams of both the vehicles and the roadside infrastructure are synchronized to ensure smooth collaboration. In addition, multi-modality sensor streams of different entities are recorded to enable cross-modality perception. Meanwhile, diverse annotations including bounding boxes, vehicle trajectories, and pixel-wise as well as point-wise semantics labels are provided to facilitate various downstream tasks.

Alternatively, if the team chooses to build a real-world V2X dataset, then we will collect real-world visual data (mainly images).

Competencies

Technical experience
- Python programming (required for >=2 team members)
  - Good code and data management skills
- Machine learning
  - Python SciPy stack and PyTorch DL library
  - Dimensionality reduction
  - Federated learning
  - Multimodal learning
- Computer vision experience
- Data processing pipelines
Documentation
- Data management experience
- Privacy and data
- Ethics

Learning Outcomes & Deliverables

The team will be using a broad range of urban analytics approaches that will result in proven abilities in: computer vision, data science, and machine learning.

The expected deliverables for each project stage are:

A high-dimensional feature-based collaborative perception model trained on the provided V2X-Sim data.
A multimodal learning framework which supports both RGB image and LiDAR point cloud.
A report of the robustness investigation under different levels of realistic noise.

All deliverables will be committed to a well documented public GitHub repository.