Combating Memory Bottleneck for Efficient LLM Inference