Large Language Models (LLMs) have been developed for a wide range of language-based tasks, while Reinforcement Learning (RL) has been primarily applied to decision-making problems such as robotics, game theory, and control systems. Nowadays, these two paradigms are integrated through different synergies. In this literature review, we focus on RL4LLM fine-tuning, where RL techniques are systematically leveraged to fine-tune LLMs and align them with various preferences. Our review provides a comprehensive analysis of 230 recent publications, presenting a methodological taxonomy that organizes current research into three primary method domains: Optimization Algorithm, concerning innovation in core RL update rules; Training Framework, regarding innovation in the orchestration of the training process; and Reward Modeling, addressing how LLMs learn and represent preferences and feedback. Within these primary domains, we further analyze methods and innovations through more granular categories to provide an in-depth summary of RL4LLM fine-tuning research. We address three research questions: 1) recent methods overview, 2) methodological innovations, and 3) limitations and future directions. Our analysis comprehensively demonstrates the breadth and impact of recent RL4LLM fine-tuning research while highlighting valuable directions for future investigation.