![title slide](img/title_slide.png)
## 🚀 Performance 🚀 Two primary performance targets: * Task Completion (User-experience) * Throughput (Scale)
## 🏗 Task Completion 🏗 "Work is done in a timely manner." Often includes (and dominated by) to Asynchronous IO in Node.js `task_end - task_start`
## 🏭 Throughput 🏭 "Work isn't blocking other work." Blocking is usually Javascript execution or Garbage Collection in Node.js `requests-per-second`
## 🏭 Throughput 🏭 **IMPORTANT**: Blocking operations can cause cascading blockage resulting in significant task completion delays. Note: A,B,C scheduled at the same time, each will take 10ms A takes 10ms B takes 10ms plus A's 10ms so 20ms C takes 10ms plus A and B, so 30ms
## 🖼 Glitching .gif Images 🖼 ### Case Study * [readimage](http://npm.im/readimage) reads gif/png/jpg and converts to a common format * [glitcher](http://npm.im/glitcher) various image glitches and manpiulations * [writegif](http://npm.im/writegif) write that common format to an animated gif

📝 Workflow 📝

  1. ❔ Is it fast enough?
  2. 🔍 Identify the nature of the problem. (🏗 vs 🏭)
  3. 🔬 Select tools based on the problem.
  4. 📐 Measure.
  5. 📍 Identify the location of the problem.
  6. 👓 Make the slower parts faster.
  7. 📐 Measure again.
  8. 🔁 Go back to step 0.
## 🚨 WARNING 🚨 Do **NOT** skip straight to step 5. Randomly applied V8 compiler tricks are unlikely to have a significant impact. Note: Saving 20ms on a task that includes a 5 second database query won't provide a significant improvement.
## ❔ 0. Is it fast enough? ❔ The logs show some images take well over a second to manipulate with the manipulation `replaceBackground`, and this is a blocking operation.
## 🔍 1. Identify the nature of the problem 🔍 Based on logs and the code, I have identified the bulk of the time is the image manipulation. It's not doing any network calls or other asynchronous work.
## 🙋 What if it was async? 🙋 * 🛠 Tools * Logging * APM (NewRelic, Dynatrace, etc.) * 💪 Solutions * Caching * Query optimization * etc. (out of scope for this presentation) Note: These are usually infrequent and slow enough you can always log. This is essentially what APM vendors do for you automatically.
## 💻 Our test harness 💻 ```js var THRESHOLD = 40 var image = /* read file, call readimage */ function fillBlack(frame) { return frame.fill(0) } replaceBackground(image.frames, fillBlack, THRESHOLD) /* call writegif on image, write to file */ ``` Note: replaceBackground calculates the median frame by comparing every pixel in every frame, then for each frame calculates the difference from that background frame
## ⏳ Before ⏳ ![Hotline Bling](img/bling2.gif)
## ⌛ After ⌛ ![Hotline Bling Manipulated](img/median_bling.gif)
## 🔬 2. Select tools 🔬 Most tools fall into one of three categories: * How slow is it? * Where is the slowness at? * Why is it slow?
## 🛠 How slow is it? 🛠 * `/usr/bin/time` * Benchmark tools (ab, siege, etc.)
## 🛠 Where is the slowness at? 🛠 * Kernel tools (perf, dtrace, etc.) * V8 Instrumentation (v8-profiler, [NSolid](http://downloads.nodesource.com))
## 📐 3. Measure 📐 With our selected poor-performing image, we'll use `/usr/bin/time` to measure a baseline: $ \time node harness.js ~/Downloads/bling2.gif 8.67user 0.06system 0:08.71elapsed 100%CPU (0avgtext+0avgdata 181988maxresident)k 0inputs+752outputs (0major+41833minor)pagefaults 0swaps
## 📍 4. Identify the location 📍
## 🛠 Using `perf` 🛠 * The `perf` tool on Linux is a kernel-level CPU profiling tool * Captures full stack, C++ & JS execution * Follow [these instructions](https://gist.github.com/trevnorris/9616784) (They are not up-to-date but the steps are all there)
## 🔥 How to read a flamegraph 🔥 * X axis is % of total time * Y axis is stack depth * Look for: * plateaus * fat pyramids
## 🔥 Perf Flamegraph 🔥 [![flamegraph from perf tool](img/median-flame.svg)](img/median-flame.svg)
## 🛠 v8-profiler 🛠 * See [v8-profiler README](http://npm.im/v8-profiler) * Uses V8-provided instrumentation hooks * Profiles JS only * View by loading into Chrome Dev Tools
## 📈 v8-profiler results 📈 ![v8-profiler results](img/median-cdt.png)
## 💢 Focus on avg() 💢 The `avg()` function averages every frame to make a simulated background frame. Both `perf` and `v8-profiler` indicate we're spending the bulk of the time there.
## 🛠 Why is it slow? 🛠 If the problem is slow JavaScript execution: * Compiler tracing (IRHydra, --prof, --trace-deopt) * Code review * Static analysis?
## 🙋 What if it was garbage collection? 🙋 * 🛠 Tools * heap snapshots * post-mortem analysis (mdb, lldb, etc.) * 💪 Solutions * fix errant closures * hold less data/parts of objects vs whole objects * code/static analysis * etc. (out of scope for this presentation)
## 👓 5. Make the slower parts faster 👓
## 💩 Reasons for Poor Performance 💩 * Wrong tool for the job * Doing unnecessary things * Poor algorithm choice * Not cooperating with the runtime/compiler Note: * Node isn't great at everything: ssl termination, use nginx * don't create functions in loops, avoid Promises * decrease instruction count * optimizing compiler makes assumptions, work with it, avoid invalidating assumptions
## ⚙ Optimizing Compilers (Simplified) ⚙ * JavaScript is extremely flexible. Most code doesn't use that flexibility. * Optimizing compilers make assumptions on how you use JS and cut corners with optimized assembly code. * Deoptimization is when you invalidate the assumptions it made, forcing it to discard the assembly version. Note: * Parse and analyze your code as it is executed making optimized assembly based on what it encounters with guards against cases it skipped. * If you deoptimize a function too many times it will give up on optimization.
## 🚂 V8 Compiler Output 🚂 * Lots of tips out there on how to optimize your code for V8. * Nothing beats V8 telling you what it didn't like. * My favorite tool: [IRHydra2](http://mrale.ph/irhydra/2/) * Follow the instructions on that page * Load them into IRHydra2 (it's a web app)
## 🌡 IRHydra Results 🌡 (Load the two files in assets/irhydra into IRHydra to explore) ![IRHydra Result preview](img/median-hydra.png)
## 🔎 Analysis 🔎 * the `avg` function had an eager deoptimization * ... but it was the inlined Buffer constructor in node core ... * (So I filed a [PR against node](https://github.com/nodejs/node/pull/4158), released today in 5.2.0) * otherwise nothing too interesting ¯\\\_(ツ)_/¯
## 📝 Code Analysis 📝 `avg()` calls `medianPixel()` The algorithm to calculate `median` requires a sort Note: We don't see medianPixel in the flamegraph or v8-profiler output because it was inlined.
## ⚖ Some Math ⚖ Our image is 800 x 450 pixels with 51 frames 800 * 450 = 360000 pixels per frame So 360000 51 pixel `sortPixels` calls. 😨
## 📊 Median Frame 📊 ![Median Frame](img/median_frame.gif)
## 🔔 Mean 🔔 Calculating `mean` doesn't require a sort. Maybe it will work?
## 🔔 Mean Frame 🔔 ![Mean Frame](img/mean_frame.gif)
## ⌛ After 2 ⌛ ![Mean Frame](img/mean_bling.gif)
## 📐 6. Measure again 📐 $ \time node harness.js ~/Downloads/bling2.gif 3.40user 0.04system 0:03.43elapsed 100%CPU (0avgtext+0avgdata 182048maxresident)k 0inputs+704outputs (0major+44011minor)pagefaults 0swaps 8.67 seconds to 3.40 seconds!
## 🔥 Perf Flamegraph (Mean) 🔥 [![flamegraph from perf tool using mean](img/mean-flame.svg)](img/mean-flame.svg)
## 📉 v8-profiler results 📉 ![v8-profiler results using mean](img/mean-cdt.png)
## 🔁 7. Go back to step 0 🔁 * Fix `copy`: remove call to it? * Reduce instruction count * Even better algorithms? * Optimize `inxsearch` function?
## 📠 Let's remove the call to Buffer::Copy 📠 $ \time node harness.js ~/Downloads/bling2.gif 2.21user 0.06system 0:02.28elapsed 99%CPU (0avgtext+0avgdata 183944maxresident)k 10584inputs+704outputs (0major+43860minor)pagefaults 0swaps Another second saved!
## 💖 Check out NSolid! 💖 Capture flamegraphs of a production process with the click of a button! Bryce B. Baril - [http://brycebaril.com](http://brycebaril.com) Twitter: [@brycebaril](http://twitter.com/brycebaril) NSolid: [Try NSolid!](http://downloads.nodesource.com)