this test shows some run performance on my machine (normal dev machine), using a Jedis Java interface.
1. CPU info
processor : 1
vendor_id : GenuineIntel
cpu family : 6
model : 42
model name : Intel(R) Core(TM) i5-2450M CPU @ 2.50GHz
2. redis bench
redis-benchmark -t set,lpush -n 1000000 -q
SET: 141582.89 requests per second
LPUSH: 137722.08 requests per second
3. redis bench pipeline
redis-benchmark -n 1000000 -t set,get -P 16 -q
SET: 665778.94 requests per second
GET: 813669.62 requests per second
4. jedis pool not piped
| Write / Read | Number Threads | Total Ops | Ops per thread | Time ms | Op/sec |
| x / o | 10 | 1000 | 1000 | 102 | 9803.921569 |
| o / x | 10 | 1000 | 1000 | 71 | 14084.50704 |
| x / x | 10 | 1000 | 1000 | 114 | 8771.929825 |
| x / o | 10 | 500000 | 1000 | 7363 | 67907.10308 |
| o / x | 10 | 500000 | 1000 | 6744 | 74139.97628 |
| x / x | 10 | 500000 | 1000 | 13764 | 36326.64923 |
| x / o | 10 | 1000000 | 1000 | 14579 | 68591.81014 |
| o / x | 10 | 1000000 | 1000 | 13405 | 74599.03021 |
| x / x | 10 | 1000000 | 1000 | 28031 | 35674.78863 |
| x / o | 20 | 1000 | 1000 | 26 | 38461.53846 |
| o / x | 20 | 1000 | 1000 | 27 | 37037.03704 |
| x / x | 20 | 1000 | 1000 | 49 | 20408.16327 |
| x / o | 20 | 500000 | 1000 | 7328 | 68231.44105 |
| o / x | 20 | 500000 | 1000 | 6937 | 72077.26683 |
| x / x | 20 | 500000 | 1000 | 13937 | 35875.72648 |
| x / o | 20 | 1000000 | 1000 | 14405 | 69420.34016 |
| o / x | 20 | 1000000 | 1000 | 13647 | 73276.17791 |
| x / x | 20 | 1000000 | 1000 | 28639 | 34917.4203 |
5. jedis pool piped
| Write / Read | Number Threads | Total Ops | Ops per thread | Time ms | Op/sec |
| x / o | 10 | 1000 | 1000 | 77 | 12987.01299 |
| o / x | 10 | 1000 | 1000 | 20 | 50000 |
| x / x | 10 | 1000 | 1000 | 52 | 19230.76923 |
| x / o | 10 | 500000 | 1000 | 1314 | 380517.5038 |
| o / x | 10 | 500000 | 1000 | 778 | 642673.5219 |
| x / x | 10 | 500000 | 1000 | 1718 | 291036.0885 |
| x / o | 10 | 1000000 | 1000 | 2121 | 471475.719 |
| o / x | 10 | 1000000 | 1000 | 1597 | 626174.0764 |
| x / x | 10 | 1000000 | 1000 | 3448 | 290023.2019 |
| x / o | 20 | 1000 | 1000 | 2 | 500000 |
| o / x | 20 | 1000 | 1000 | 2 | 500000 |
| x / x | 20 | 1000 | 1000 | 4 | 250000 |
| x / o | 20 | 500000 | 1000 | 1017 | 491642.0846 |
| o / x | 20 | 500000 | 1000 | 819 | 610500.6105 |
| x / x | 20 | 500000 | 1000 | 1705 | 293255.132 |
| x / o | 20 | 1000000 | 1000 | 1936 | 516528.9256 |
| o / x | 20 | 1000000 | 1000 | 1663 | 601322.9104 |
| x / x | 20 | 1000000 | 1000 | 3838 | 260552.371 |
Conclusions
a) pipeline makes a great difference; reads logic stays more complex
b) with a 4x core i was not able to get 100% cpu (yet)
c) on reads i was able to get 626k Ops
d) on writes performance was 471 kops
e) number pool threads / ops per thread influences the benchmark time