Research Information Scientist
The effective use of data has become essential for city management and policy development, for citizen engagement, and for academic research. Yet the sheer volume of urban data is overwhelming—ranging from administrative records in city agencies, to financial records in businesses and again to real‐time data on citizen calls to 311, energy use, and particulate pollution. Data collection is outpacing the capacity of the urban policy and research community to make use of the data.
The CUSP NYU data facility has been established to support the empirical study of cities in conjunction with New York based researchers, agencies, and citizens. It uses modern approaches to reduce the multiple technical, legal, bureaucratic, capacity, and cost barriers to access so that the full research and policy benefits can be realized. The facility has two goals: (i) ensure that new and existing urban data are made available to and used by current and future members of the research community in a state of the art facility, and (ii) staff in government agencies and local citizens are engaged by the ability to use the facility to addressing important urban problems.
The data facility is being built out in close collaboration with the NYU Division of Libraries. More information about the vision and services currently provided can be found at https://datahub.cusp.nyu.edu/.
The Research Information Scientist at CUSP
CUSP datasets include administrative data from city agencies, researcher analytical datasets, population datasets from the US Census Bureau, and large, streaming sensor, image, and other spatiotemporal datasets. Data management at this scope and scale is an exciting yet challenging area of research. CUSP seeks a research information scientist who has a strong background in the information sciences with sufficient technical expertise to harness new advances in the research field. A successful candidate will be capable of learning new skills and methods to readily adapt to changes in the way that research data centers manage large and complex datasets.
This is a full time, yearly renewable position. The research information scientist will be embedded at the CUSP data facility and report to both the Associate Director of Data Resources and Data Strategy at CUSP and the NYU Division of Libraries.
The research information scientist will serve as an information specialist, programmer, and ETL engineer, in order to support the full CUSP data life cycle, including data curation, data ingestion, data discovery, and researcher access. The research information scientist will be responsible for collecting, developing, collating, archiving and communicating information about research datasets in the CUSP data facility. In that role, s/he will oversee the metadata management system and design/implement new features or services as needed, which requires strong programming and database skills. S/he will provide basic programming support to software engineers, in order to adapt in-house data profiling and data discovery software. A successful research information scientist candidate will also be able to develop basic complex ETL scripts for data ingest and researcher database development. This person will lead CUSP’s metadata knowledge management – structural and domain information about data assets. In this role, s/he will communicate with domain experts on NYC and related open data, urban policy research data, and physical measurement data, creating a database to facilitate data discovery through new visualization tools that move the field beyond the standard laundry list approach.
- Create and update metadata standards for the data facility – for tabular and non tabular datasets (such as images, sound, text), including geospatial data.
- Provide development support for and maintain an internal metadata management tool (currently CKAN); provide functional specifications and development support for internal data discovery tools.
- Work directly with dataset domain experts (generally, these are the data providers and CUSP researchers) in order to create a domain knowledgebase about dataset quality and content; this includes how data was collected or derived, and known issues.
- Communicate with data facility users about all datasets housed in the facility, providing guidance for users to identify the appropriate data for research questions; this will include documenting user activity to feed into the metadata database.
- Serve as the primary point of contact for data facility users with data access and workspace requests (students, faculty, agency staff, etc.); this includes communication with users prior to submitting data access/workspace requests and internal routing of user access/workspace requests using an in-house workflow management system.
- Develop and run ETL scripts for tabular data.
- Work with software developer and systems engineer to support development of complex ETL scripts for difficult and nontabular datasets.
- Develop technical specs and provision existing ETL scripts for data of all types – tabular; time series; images; GIS, streaming data – in order to create datamarts for facility users
- Manage and track data facility information security training sessions for all users and data stewards; this includes tracking compliance of data stewards to data facility best practices in data management, confidentiality, privacy and governance.
- M.S. in Library and Information Sciences or related field
- Bachelor’s degree in programming, information technology or a related field OR an equivalent combination of education/experience in technology and operations
- 2+ years of practical experience in research dataset curation
- 2+ years of programming experience with Python, Perl, Ruby or similar language
- 2+ years of experience managing data in xml and json
- 2+ years of experience with at least basic database development using Oracle, MySQL, MSSQL, or PostgreSQL
- Experience managing large datasets and creating databases (ETL) for social science research
- Working knowledge of metadata standards: Technical metadata, descriptive metadata (Dublin Core, MODS, DDI, CSDGM), process metadata, and preservation metadata (PREMIS); this will require an ability to learn, implement, and crosswalk metadata standards
- 1-2 years of experience working with and communicating with domain scientists
- Expert written and verbal communication skills with both technical and nontechnical audiences
- Expertise in best practices in use, reuse, reproducibility, curation, and preservation of scientific data
- Excellent time management and project management skills
- Passionate about the value of responsible data management and reproducible data analysis for evidence-based policy; thrives in a fast-paced, entrepreneurial work environment
- Experience using APIs to access and query complex datasets
- Experience developing APIs for dataset dissemination
- 3+ years of practical experience in research dataset curation
- 3+ years of programming experience with Python, Perl, Ruby or similar language
- 3+ years of experience managing data in xml and json
Interested applicants should email a cover letter, curriculum vitae and names and contact information for at least three references to firstname.lastname@example.org listing the job title in the subject line.
The Center for Urban Science and Progress (CUSP) is an Equal Opportunity/Affirmative Action Employer, committed to building a culturally diverse educational environment. In keeping with this commitment, CUSP strongly encourages applications from women, people with disabilities and members of minority groups. Individuals with disabilities seeking accommodations in the application process should contact the Office of Equal Opportunity, at email@example.com.