Developers write software by combining independently written libraries and functions. Even though individual functions in these libraries are optimized, the lack of end-to-end optimization can cause order of magnitude slowdowns in the whole workflow compared to a tuned implementation written in C. For example, even though TensorFlow uses highly tuned linear algebra functions for each of its operators, workflows that combine these operators can be 16x slower than hand-tuned code. Similarly, workflows that perform relational processing in Spark SQL or Pandas, numerical processing in NumPy, or a combination of these tasks spend much of their time in data movement across processing functions and could run up to 100× faster if optimized end to end. Weld is an ongoing open source project from Stanford to accelerate data-intensive applications by as much as 100×. It does so by JIT-compiling parallel code and optimizing across functions within a single library as well as across different libraries, so developers can write modular code and still get close to bare metal performance without incurring expensive data movement costs. Weld's compiler uses a new, explicitly parallel functional intermediate representation to capture the structure of data-parallel workloads such as SQL, machine learning, and graph analytics and then optimizes across them using adaptive optimizer that takes into account hardware characteristics. We demonstrate how Weld can be incrementally integrated into these libraries by porting only the most impactful operators first without breaking compatibility with other operators in the library, and without changing the API of the libraries (so users do not need to change their application code). We also show how Weld speeds up existing workloads in these frameworks and enables speed-ups of two orders of magnitude in applications that combine them. The Weld library and Weld-enabled versions of the Pandas and NumPy libraries are available to download on PyPi. Weld is open source at https://www.weld.rs.