# Node.js Performance Optimization
❔🔍🔬📐📍👓📐🔁
Bryce B. Baril -- [@brycebaril](https://twitter.com/brycebaril)
Note:
Just tweeted the slide deck location
        
        
          ## 🚀 Performance 🚀
          Two primary performance concepts:
          * Task Completion (User-experience)
          * Throughput (Scale)
        
        
## 🏗 Task Completion 🏗
"Work is done in a timely manner."
Often includes (and is dominated by) Asynchronous IO in Node.js
`task_end - task_start`
        
        
## 🏭 Throughput 🏭
"Work isn't blocking other work."
Blocking work is usually Javascript execution or Garbage Collection in Node.js
`requests-per-second`
        
        
          ## 🏭 Throughput 🏭
          JavaScript execution is single-threaded and on-CPU tasks will be serialized.
          **IMPORTANT**:
          This can cascade into significant task completion delays.
          Note:
          A,B,C scheduled at the same time, each will take 10ms
          A takes 10ms
          B takes 10ms plus A's 10ms so 20ms
          C takes 10ms plus A and B, so 30ms
        
        
## 🖼 Glitching .gif Images 🖼
### Case Study
Collaborative art project that did user-driven image glitching and manipulation. Core manipulation logic is in [glitcher](http://npm.im/glitcher).
        
        
           📝 Workflow 📝 
          
            - ❔ Is it fast enough?
 
            - 🔍 Identify the nature of the problem. (🏗 vs 🏭)
 
            - 🔬 Select tools based on the problem.
 
            - 📐 Measure.
 
            - 📍 Identify the location of the problem.
 
            - 👓 Make the slower parts faster.
 
            - 📐 Measure again.
 
            - 🔁 Go back to step 0.
 
          
        
        
          ## 🚨 WARNING 🚨
          Do **NOT** skip straight to step 5. Randomly applied V8 compiler tricks are unlikely to have any significant impact.
          Note:
          Saving 20ms on a task that includes a 5 second database query won't provide a significant improvement.
        
        
          ## ❔ 0. Is it fast enough? ❔
          The logs show some images take well over a second with the manipulation `replaceBackground`.
        
        
          ## 🔍 1. Identify the nature of the problem 🔍
          Based on logs and the code, the bulk of the time is the image manipulation. It's not doing any network calls or other asynchronous work.
          Note:
          Feels like a cop-out... how to do this with tooling?
        
        
          ## 🙋 What if it was async? 🙋
          * 🛠 Tools
            * Logging
            * APM (NewRelic, Dynatrace, etc.)
          * 💪 Solutions
            * Caching
            * Query optimization
            * etc.
          (out of scope for this presentation)
          Note:
            These are usually infrequent and slow enough you can always log. This is essentially what APM vendors do for you automatically.
            Caching: recurring theme--don't do things if you don't have to
        
        
          ## 💻 Our test harness 💻
          ```js
          var THRESHOLD = 40
          var image = /* read file, call readimage */
          function fillBlack(frame) {
            return frame.fill(0)
          }
          replaceBackground(image.frames, fillBlack, THRESHOLD)
          /* call writegif on image, write to file */
          ```
          Note:
          replaceBackground calculates the median frame by comparing every pixel in every frame, then for each frame calculates the difference from that background frame
        
        
          ## ⏳ Before ⏳
          
        
        
          ## ⌛ After ⌛
          
        
        
          ## 🔬 2. Select tools 🔬
          Most tools fall into one of three categories:
          * How slow is it?
          * Where is the slowness at?
          * Why is it slow?
        
        
          ## 🛠 How slow is it? 🛠
          * `/usr/bin/time`
          * Benchmark tools (ab, siege, etc.)
        
        
          ## 🛠 Where is the slowness at? 🛠
          * Kernel tools (perf, dtrace, etc.)
          * V8 Instrumentation (v8-profiler, [NSolid](http://downloads.nodesource.com))
        
        
          ## 🛠 Why is it slow? 🛠
          If the problem is slow JavaScript execution:
          * Compiler tracing (IRHydra, --prof, --trace-deopt)
          * Code review
        
        
          ## 📐 3. Measure 📐
          With our selected poor-performing image, we'll use `/usr/bin/time` to measure a baseline:
    $ \time node harness.js ~/Downloads/bling2.gif
    8.67user 0.06system 0:08.71elapsed 100%CPU (0avgtext+0avgdata 181988maxresident)k
    0inputs+752outputs (0major+41833minor)pagefaults 0swaps
        
        
          ## 🛠 Using `perf` 🛠
          * The `perf` tool on Linux is a kernel-level CPU profiling tool
          * Captures full stack, C++ & JS execution
          * Followed [these instructions](https://gist.github.com/trevnorris/9616784) (not up to date!)
          * Or check out the new tool [0x](http://npm.im/0x)
          Note:
          0x was not an option when this slide deck was created, haven't tried it yet.
        
        
## 🔥 How to read a flamegraph 🔥
* X axis is % of total time
* Y axis is stack depth
* Look for:
  * plateaus
  * fat pyramids
        
        
## 🔥 Perf Flamegraph 🔥
]
        
        
## 🛠 v8-profiler 🛠
* See [v8-profiler README](http://npm.im/v8-profiler)
* Uses V8-provided instrumentation hooks
* Profiles JS only
* View by loading into Chrome Dev Tools
        
        
## 📈 v8-profiler results 📈
(open assets/median.cpuprofile in Chrome DevTools)

Note:
We can see garbage collection in V8-Profiler output, can eliminate that concern
        
        
        ## 🙋 What if it was garbage collection? 🙋
        * 🛠 Tools
          * heap snapshots
          * post-mortem analysis (mdb, lldb, etc.)
        * 💪 Solutions
          * fix errant closures
          * hold less data/parts of objects vs whole objects
          * code/static analysis
          * etc.
        (out of scope for this presentation)
        
        
          ## 📍 4. Identify the location 📍
        
        
          ## 💢 Focus on avg() 💢
          The `avg()` function averages every frame to make a simulated background frame.
          Both `perf` and `v8-profiler` indicate we're spending the bulk of the time there.
        
        
          ## 👓 5. Make the slower parts faster 👓
        
        
## 💩 Reasons for Poor Performance 💩
* Wrong tool for the job
* Doing unnecessary things
* Poor algorithm choice
* Not cooperating with the runtime/compiler
Note:
* Node isn't great at everything: ssl termination, use nginx
* don't create functions in loops, avoid Promises
* decrease instruction count
* optimizing compiler makes assumptions, work with it, avoid invalidating assumptions
        
        
## 🆕 COOL NEW THINGS! 🆕
* 🆕 Transpile to ES7!
* 🆕 ES6!
* 🆕 ES5! (wait, what?)
These things can absolutely make your code easier
_**for you**_
to work with.
        
        
## 😭 However... 😭
Generally these:
* Add additional code that will add to execution time
* Are not yet optimizable by V8
        
        
## ⏪ Welcome to ES3! ⏪

Note:
Even nice ES5 features such as Array.forEach Array.map, etc. are slower
        
        
## 😅 Oh right, focus on `avg()` 😅
Focus your effort on high return-on-investment. Don't sacrifice dev convenience by refactoring everything to speed up _fast enough_ code.
Note:
Some things could cause systemic slowdowns, hard to see if every function got slower.
        
        
## ⚙ Optimizing Compilers (Simplified) ⚙
* JavaScript is extremely flexible.
* Most code doesn't use that flexibility.
* Observations -> Assumptions -> Optimize to pure assembly
* Assumption invalid -> Deoptimize -> Discard assembly
Note:
* Parse and analyze your code as it is executed making optimized assembly based on what it encounters with guards against cases it skipped.
* "trust but verify"
* If you deoptimize a function too many times it will give up on optimization.
        
        
## ⚙ Optimizing Compilers ⚙
The optimizing compiler doesn't care if your code is *GOOD*
* Code that does things the wrong/unoptimal way can be "optimized"
* Code that does the wrong thing can be "optimized"
Note:
Even if it's highly optimized assembly code, the optimizing compiler can't save you from doing stupid things.
Unncecessary work always takes time
        
        
          ## 🚂 V8 Compiler Output 🚂
          * Lots of tips out there on how to optimize your code for V8.
          * Nothing beats V8 telling you what it didn't like.
          * My favorite tool: [IRHydra2](http://mrale.ph/irhydra/2/)
            * Follow the instructions on that page
            * Load results into IRHydra2 (it's a web app)
        
        
## 🌡 IRHydra Results 🌡
(Load the two files in assets/irhydra into IRHydra to explore)

        
        
## 🔎 Analysis 🔎
* the `avg` function had an eager deoptimization
* ... but it was the inlined Buffer constructor in node core ...
* (So I fixed it in Node core...)
* otherwise nothing too interesting ¯\\\_(ツ)_/¯
        
        
## 📝 Code Analysis 📝
`avg()` calls `medianPixel()`
The algorithm to calculate `median` requires a sort
Note:
We don't see medianPixel in the flamegraph or v8-profiler output because it was inlined.
        
        
## ⚖ Some Math ⚖
Our image is 800 x 450 pixels with 51 frames
    800 * 450 = 360000 pixels per frame
So 360_000 `sortPixels` calls, each sorting 51 pixels. 😨
        
        
          ## 📊 Median Frame 📊
          
        
        
## 🔔 Mean 🔔
Calculating `mean` doesn't require a sort. Maybe it will work?
        
        
          ## 🔔 Mean Frame 🔔
          
        
        
          ## ⌛ After 2 ⌛
          
        
        
## 📐 6. Measure again 📐
    $ \time node harness.js ~/Downloads/bling2.gif
    3.40user 0.04system 0:03.43elapsed 100%CPU (0avgtext+0avgdata 182048maxresident)k
    0inputs+704outputs (0major+44011minor)pagefaults 0swaps
8.67 seconds to 3.40 seconds!
        
        
## 🔥 Perf Flamegraph (Mean) 🔥
]
        
        
## 📉 v8-profiler results 📉

        
        
## 🔁 7. Go back to step 0 🔁
* Fix `copy`: remove call to it?
* Reduce instruction count
* Even better algorithms?
* Optimize `inxsearch` function?
        
        
## 📠 Let's remove the call to Buffer::Copy 📠
    $ \time node harness.js ~/Downloads/bling2.gif
    2.21user 0.06system 0:02.28elapsed 99%CPU (0avgtext+0avgdata 183944maxresident)k
    10584inputs+704outputs (0major+43860minor)pagefaults 0swaps
Another second saved!
        
        
## 💖 Check out NSolid! 💖
Capture flamegraphs of a production process with the click of a button!
Bryce B. Baril - [http://brycebaril.com](http://brycebaril.com)
Twitter: [@brycebaril](http://twitter.com/brycebaril)
NSolid: [Try NSolid!](http://downloads.nodesource.com)