The task is extremely challenging. Traditional approaches using constraint-solving based solutions have only been able to scale up to small programs. In this project, we will explore new directions using deep learning and deep reinforcement learning combined with other techniques such as formal reasoning and probabilistic programming for broader and more scalable program synthesis.
We will consider program synthesis from a broad spectrum of specification methods, including natural language description, input-output examples, programming by demonstrations, formal goal descriptions, novelty-based metrics, and other reward functions.
We will consider a broad spectrum of application domains, including end-user programming, automatic synthesis of security protocols and secure systems including smart contracts, automatic synthesis of data analytics queries and pipelines, automated assistance for program analysis and verification, and automatic synthesis of distributed systems code.
In this process, besides enabling real-world applications using program synthesis, we hope to make contributions towards addressing core challenges in deep learning including generalization, search, abstraction, and representation. We believe that fully solving the program synthesis problem is equivalent to solving AGI (artificial general intelligence).
Generalization is a key challenge to deep learning systems. How do we know how a deep learning system such as a neural program, a robot or a self-driving car will behave in a new environment and still be safe and secure against attacks such as adversarial perturbation? How do we specify security properties for deep learning systems? How do we test and verify desired security properties for deep learning systems? Is it possible to provide provable guarantees?
Security will be one of the biggest challenges in deploying artificial intelligence. Traditional program verification techniques such as symbolic reasoning are mainly effective for logical programs. To reason about the safety and security of artificial intelligence systems, we need to design and develop new techniques and approaches. We plan to take a multi-pronged approach to explore deeper understanding of attacks, defenses, and methods for reasoning about the security of artificial intelligence systems.
We will explore new techniques and approaches including differential privacy to enable privacy-preserving machine learning and data analytics in the real world. We aim to design and develop a general framework to enable automatic data analytics query analysis and rewriting to ensure the query results are differentially private. We plan to explore different approaches for differentially-private deep learning. Our goal is to both provide practical real-world solutions for privacy-preserving machine learning and data analytics and deepen the theoretical understanding in this area.
How can we create a truly trustworthy secure enclave? It will require open source design and implementation and decentralized trust on its lifecycle management. Although many TEEs have been proposed by both industry (e.g., Intel SGX) and academia (e.g., Sanctum), no full-stack implementation has been open-sourced for use.
Keystone is an open-source project for building trusted execution environments (TEE) with secure hardware enclaves, based on the RISC-V architecture. Our goal is to build a secure and trustworthy open-source secure hardware enclave, accessible to everyone in industry and academia. Keystone introduces customizable TEE, a new paradigm of building TEE wherein both platform providers and enclave developers customize their TEE to have minimal trusted computing base (TCB), and be highly optimized for the resource usage of each application. This enables a lot of use cases of Keystone enclaves from embedded IoT application to machine learning.
We aim to explore new approaches for automated data exploration and insight extraction, while leveraging limited guidan\ ce and feedback from human analysts. To achieve this, we will explore and combine techniques including deep learning, r\ einforcement learning, program synthesis, meta learning, probabilistic programming, and interpretable machine learning.\ Given a dataset, we will explore how to automate the different stages of the data science pipeline, including data wra\ ngling, data cleaning, feature engineering and extraction, model building and architecture search, model criticism and \ revision, and results presentation and interpretation.
We will explore diverse application domains including computer security such as attack and anomaly detection and diagno\ sis, system monitoring and diagnosis, and trend analysis. Our long-term vision is to build real-world systems that auto\ matically explore, analyze and learn from data in order to glean insights from data and facilitate decision making, whi\ le leveraging limited human guidance and feedback.
We plan to explore new techniques for how to reason about smart contracts and decentralized applications. We aim to design new techniques for automatic exploration of the design space of distributed consensus and decentralized system. We also plan to explore new approaches for automatic synthesis of smart contracts. Our exploration will in particular leverage methods and new development in machine learning and deep learning.