| Kurzfassung | Domain model generation is a foundational challenge in automated planning, requiring models that are both syntactically correct and semantically robust for effective plan synthesis and execution. Creating planning domain models in languages such as the Planning Domain Definition Language (PDDL) is time-consuming, error-prone, and requires substantial expertise. This limitation has become a principal bottleneck in scaling AI planning to complex, real-world domains. Recent advances in large language models (LLMs) have showcased their ability to generate structured, code-like representations from natural language descriptions, fueling optimism that these models might automate the transformation of human intent into formal planning models. However, fully automating the generation of domain models from natural language is still a challenge. Outputs from LLMs often have semantic errors, logical faults, and lack the reliability needed for dependable planning. Additionally, current processes frequently need manual adjustments or corrections from experts, which goes against the aim of total automation. To address these limitations, this thesis proposes a novel feedback-driven framework that enables LLMs to generate PDDL domain models through a fully automated pipeline with no human intervention. The first step is to have the LLM generate intermediate representations called Answer Set Programming (ASP) rules. These ASP rules are then used, again through the LLM, to generate corresponding PDDL domain models via structured prompts and iterative feedback. Semantic coherence and context relevance are ensured through retrieval-augmented generation (RAG), offering PDDL samples to facilitate generation and correction. Once the domain models are generated, plans are created using the Fast Downward planner and then validated using VAL to ensure logical soundness and successful goal achievement. The framework is evaluated using five advanced LLMs-Gemini 1.5 Flash, Gemini 2.5 Flash, DeepSeek V3, DeepSeek R1, NVIDIA: Llama 3.3 Nemotron Super 49B v1-in the benchmark domains of the International Planning Competition (IPC). Performance is assessed based on accuracy, generation time, makespan, and plan cost, revealing differences in how effectively each LLM adapts to structured feedback. The results demonstrate that LLMs can reliably generate planner-compatible models with few errors through targeted prompting and automated refinement.
|