# Node.js Performance Optimization
❔🔍🔬📐📍👓📐🔁
Bryce B. Baril -- [@brycebaril](https://twitter.com/brycebaril)
Note:
Just tweeted the slide deck location
## 🚀 Performance 🚀
Two primary performance concepts:
* Task Completion (User-experience)
* Throughput (Scale)
## 🏗 Task Completion 🏗
"Work is done in a timely manner."
Often includes (and is dominated by) Asynchronous IO in Node.js
`task_end - task_start`
## 🏭 Throughput 🏭
"Work isn't blocking other work."
Blocking work is usually Javascript execution or Garbage Collection in Node.js
`requests-per-second`
## 🏭 Throughput 🏭
JavaScript execution is single-threaded and on-CPU tasks will be serialized.
**IMPORTANT**:
This can cascade into significant task completion delays.
Note:
A,B,C scheduled at the same time, each will take 10ms
A takes 10ms
B takes 10ms plus A's 10ms so 20ms
C takes 10ms plus A and B, so 30ms
## 🖼 Glitching .gif Images 🖼
### Case Study
Collaborative art project that did user-driven image glitching and manipulation. Core manipulation logic is in [glitcher](http://npm.im/glitcher).
📝 Workflow 📝
- ❔ Is it fast enough?
- 🔍 Identify the nature of the problem. (🏗 vs 🏭)
- 🔬 Select tools based on the problem.
- 📐 Measure.
- 📍 Identify the location of the problem.
- 👓 Make the slower parts faster.
- 📐 Measure again.
- 🔁 Go back to step 0.
## 🚨 WARNING 🚨
Do **NOT** skip straight to step 5. Randomly applied V8 compiler tricks are unlikely to have any significant impact.
Note:
Saving 20ms on a task that includes a 5 second database query won't provide a significant improvement.
## ❔ 0. Is it fast enough? ❔
The logs show some images take well over a second with the manipulation `replaceBackground`.
## 🔍 1. Identify the nature of the problem 🔍
Based on logs and the code, the bulk of the time is the image manipulation. It's not doing any network calls or other asynchronous work.
Note:
Feels like a cop-out... how to do this with tooling?
## 🙋 What if it was async? 🙋
* 🛠 Tools
* Logging
* APM (NewRelic, Dynatrace, etc.)
* 💪 Solutions
* Caching
* Query optimization
* etc.
(out of scope for this presentation)
Note:
These are usually infrequent and slow enough you can always log. This is essentially what APM vendors do for you automatically.
Caching: recurring theme--don't do things if you don't have to
## 💻 Our test harness 💻
```js
var THRESHOLD = 40
var image = /* read file, call readimage */
function fillBlack(frame) {
return frame.fill(0)
}
replaceBackground(image.frames, fillBlack, THRESHOLD)
/* call writegif on image, write to file */
```
Note:
replaceBackground calculates the median frame by comparing every pixel in every frame, then for each frame calculates the difference from that background frame
## ⏳ Before ⏳
![Hotline Bling](img/bling2.gif)
## ⌛ After ⌛
![Hotline Bling Manipulated](img/median_bling.gif)
## 🔬 2. Select tools 🔬
Most tools fall into one of three categories:
* How slow is it?
* Where is the slowness at?
* Why is it slow?
## 🛠 How slow is it? 🛠
* `/usr/bin/time`
* Benchmark tools (ab, siege, etc.)
## 🛠 Where is the slowness at? 🛠
* Kernel tools (perf, dtrace, etc.)
* V8 Instrumentation (v8-profiler, [NSolid](http://downloads.nodesource.com))
## 🛠 Why is it slow? 🛠
If the problem is slow JavaScript execution:
* Compiler tracing (IRHydra, --prof, --trace-deopt)
* Code review
## 📐 3. Measure 📐
With our selected poor-performing image, we'll use `/usr/bin/time` to measure a baseline:
$ \time node harness.js ~/Downloads/bling2.gif
8.67user 0.06system 0:08.71elapsed 100%CPU (0avgtext+0avgdata 181988maxresident)k
0inputs+752outputs (0major+41833minor)pagefaults 0swaps
## 🛠 Using `perf` 🛠
* The `perf` tool on Linux is a kernel-level CPU profiling tool
* Captures full stack, C++ & JS execution
* Followed [these instructions](https://gist.github.com/trevnorris/9616784) (not up to date!)
* Or check out the new tool [0x](http://npm.im/0x)
Note:
0x was not an option when this slide deck was created, haven't tried it yet.
## 🔥 How to read a flamegraph 🔥
* X axis is % of total time
* Y axis is stack depth
* Look for:
* plateaus
* fat pyramids
## 🔥 Perf Flamegraph 🔥
![flamegraph from perf tool](img/median-flame.svg)]
## 🛠 v8-profiler 🛠
* See [v8-profiler README](http://npm.im/v8-profiler)
* Uses V8-provided instrumentation hooks
* Profiles JS only
* View by loading into Chrome Dev Tools
## 📈 v8-profiler results 📈
(open assets/median.cpuprofile in Chrome DevTools)
![v8-profiler results](img/median-cdt.png)
Note:
We can see garbage collection in V8-Profiler output, can eliminate that concern
## 🙋 What if it was garbage collection? 🙋
* 🛠 Tools
* heap snapshots
* post-mortem analysis (mdb, lldb, etc.)
* 💪 Solutions
* fix errant closures
* hold less data/parts of objects vs whole objects
* code/static analysis
* etc.
(out of scope for this presentation)
## 📍 4. Identify the location 📍
## 💢 Focus on avg() 💢
The `avg()` function averages every frame to make a simulated background frame.
Both `perf` and `v8-profiler` indicate we're spending the bulk of the time there.
## 👓 5. Make the slower parts faster 👓
## 💩 Reasons for Poor Performance 💩
* Wrong tool for the job
* Doing unnecessary things
* Poor algorithm choice
* Not cooperating with the runtime/compiler
Note:
* Node isn't great at everything: ssl termination, use nginx
* don't create functions in loops, avoid Promises
* decrease instruction count
* optimizing compiler makes assumptions, work with it, avoid invalidating assumptions
## 🆕 COOL NEW THINGS! 🆕
* 🆕 Transpile to ES7!
* 🆕 ES6!
* 🆕 ES5! (wait, what?)
These things can absolutely make your code easier
_**for you**_
to work with.
## 😭 However... 😭
Generally these:
* Add additional code that will add to execution time
* Are not yet optimizable by V8
## ⏪ Welcome to ES3! ⏪
![Romancing The Jit](img/romancing_shoe.gif)
Note:
Even nice ES5 features such as Array.forEach Array.map, etc. are slower
## 😅 Oh right, focus on `avg()` 😅
Focus your effort on high return-on-investment. Don't sacrifice dev convenience by refactoring everything to speed up _fast enough_ code.
Note:
Some things could cause systemic slowdowns, hard to see if every function got slower.
## ⚙ Optimizing Compilers (Simplified) ⚙
* JavaScript is extremely flexible.
* Most code doesn't use that flexibility.
* Observations -> Assumptions -> Optimize to pure assembly
* Assumption invalid -> Deoptimize -> Discard assembly
Note:
* Parse and analyze your code as it is executed making optimized assembly based on what it encounters with guards against cases it skipped.
* "trust but verify"
* If you deoptimize a function too many times it will give up on optimization.
## ⚙ Optimizing Compilers ⚙
The optimizing compiler doesn't care if your code is *GOOD*
* Code that does things the wrong/unoptimal way can be "optimized"
* Code that does the wrong thing can be "optimized"
Note:
Even if it's highly optimized assembly code, the optimizing compiler can't save you from doing stupid things.
Unncecessary work always takes time
## 🚂 V8 Compiler Output 🚂
* Lots of tips out there on how to optimize your code for V8.
* Nothing beats V8 telling you what it didn't like.
* My favorite tool: [IRHydra2](http://mrale.ph/irhydra/2/)
* Follow the instructions on that page
* Load results into IRHydra2 (it's a web app)
## 🌡 IRHydra Results 🌡
(Load the two files in assets/irhydra into IRHydra to explore)
![IRHydra Result preview](img/median-hydra.png)
## 🔎 Analysis 🔎
* the `avg` function had an eager deoptimization
* ... but it was the inlined Buffer constructor in node core ...
* (So I fixed it in Node core...)
* otherwise nothing too interesting ¯\\\_(ツ)_/¯
## 📝 Code Analysis 📝
`avg()` calls `medianPixel()`
The algorithm to calculate `median` requires a sort
Note:
We don't see medianPixel in the flamegraph or v8-profiler output because it was inlined.
## ⚖ Some Math ⚖
Our image is 800 x 450 pixels with 51 frames
800 * 450 = 360000 pixels per frame
So 360_000 `sortPixels` calls, each sorting 51 pixels. 😨
## 📊 Median Frame 📊
![Median Frame](img/median_frame.gif)
## 🔔 Mean 🔔
Calculating `mean` doesn't require a sort. Maybe it will work?
## 🔔 Mean Frame 🔔
![Mean Frame](img/mean_frame.gif)
## ⌛ After 2 ⌛
![Mean Frame](img/mean_bling.gif)
## 📐 6. Measure again 📐
$ \time node harness.js ~/Downloads/bling2.gif
3.40user 0.04system 0:03.43elapsed 100%CPU (0avgtext+0avgdata 182048maxresident)k
0inputs+704outputs (0major+44011minor)pagefaults 0swaps
8.67 seconds to 3.40 seconds!
## 🔥 Perf Flamegraph (Mean) 🔥
![flamegraph from perf tool using mean](img/mean-flame.svg)]
## 📉 v8-profiler results 📉
![v8-profiler results using mean](img/mean-cdt.png)
## 🔁 7. Go back to step 0 🔁
* Fix `copy`: remove call to it?
* Reduce instruction count
* Even better algorithms?
* Optimize `inxsearch` function?
## 📠 Let's remove the call to Buffer::Copy 📠
$ \time node harness.js ~/Downloads/bling2.gif
2.21user 0.06system 0:02.28elapsed 99%CPU (0avgtext+0avgdata 183944maxresident)k
10584inputs+704outputs (0major+43860minor)pagefaults 0swaps
Another second saved!
## 💖 Check out NSolid! 💖
Capture flamegraphs of a production process with the click of a button!
Bryce B. Baril - [http://brycebaril.com](http://brycebaril.com)
Twitter: [@brycebaril](http://twitter.com/brycebaril)
NSolid: [Try NSolid!](http://downloads.nodesource.com)