【原】LLMs之Agent：Personal_LLM_Agents_Survey的簡介、使用方法之詳細(xì)攻略

處女座的程序猿 2024-01-18 發(fā)布于上海

展開全文

LLMs之Agent：Personal_LLM_Agents_Survey的簡介、使用方法之詳細(xì)攻略

導(dǎo)讀：該項(xiàng)目包含了針對(duì)個(gè)人型LLM代理(Personal LLM Agents)的相關(guān)論文列表。通過查詢相關(guān)論文，可以了解這一新興技術(shù)方向的最新研究進(jìn)展，比如在對(duì)話能力、知識(shí)表示、隱私保護(hù)等方面如何進(jìn)行優(yōu)化，從而提升用戶體驗(yàn)。通過論文也可以了解這一技術(shù)的應(yīng)用案例、難點(diǎn)以及解決方法。例如如何將LLM代理應(yīng)用在教育或醫(yī)療助手等領(lǐng)域，如何使其對(duì)話能力更加逼真自然，或者如何保護(hù)用戶隱私不被濫用等都是值得關(guān)注的問題。
總的來說，此項(xiàng)目給出了一個(gè)系統(tǒng)整理的個(gè)人LLM代理相關(guān)論文列表，從多個(gè)角度論述了這個(gè)新技術(shù)方向的發(fā)展現(xiàn)狀和未來走勢(shì)，有助于研究人員和開發(fā)者更好地把握趨勢(shì)并開展工作。

Personal_LLM_Agents_Survey的簡介

個(gè)人LLM代理(智能體)被定義為一種特殊類型的基于LLM的代理，它與個(gè)人數(shù)據(jù)、個(gè)人設(shè)備和個(gè)人服務(wù)深度集成。它們最好部署到資源受限的移動(dòng)/邊緣設(shè)備和/或由輕量級(jí)AI模型提供支持。個(gè)人LLM代理的主要目的是協(xié)助最終用戶并增強(qiáng)其能力，幫助他們更專注、更出色地處理有趣和重要的事務(wù)。

這份論文清單涵蓋了個(gè)人LLM代理的幾個(gè)主要方面，包括能力、效率和安全性。

GitHub地址：https://github.com/MobileLLM/Personal_LLM_Agents_Survey

Personal_LLM_Agents_Survey的使用方法

1、個(gè)人LLM代理的關(guān)鍵能力

(1)、任務(wù)自動(dòng)化

任務(wù)自動(dòng)化是個(gè)人LLM代理的核心能力，它決定了代理能夠多好地響應(yīng)用戶命令和/或自動(dòng)執(zhí)行用戶任務(wù)。由于UI-based任務(wù)自動(dòng)化代理在這個(gè)列表中很受歡迎并與個(gè)人設(shè)備密切相關(guān)，我們專注于這方面。

基于UI的任務(wù)自動(dòng)化代理

LLM-based Approaches

WebGPT: Browser-assisted question-answering with human feedback. [paper]
Enabling Conversational Interaction with Mobile UI Using Large Language Models. [CHI 2023] [paper]
Language Models can Solve Computer Tasks. [NeurIPS 2023] [paper]
DroidBot-GPT: GPT-powered UI Automation for Android. [arxiv] [code]
Responsible Task Automation: Empowering Large Language Models as Responsible Task Automators.[paper]
Mind2Web: Towards a Generalist Agent for the Web. arxiv 2023 [paper][code][code]
(AutoDroid) Empowering LLM to use Smartphone for Intelligent Task Automation. [paper] [code]
You Only Look at Screens: Multimodal Chain-of-Action Agents. ArXiv Preprint [paper] [code]
AXNav: Replaying Accessibility Tests from Natural Language. [paper]
Automatic Macro Mining from Interaction Traces at Scale. [paper]
A Zero-Shot Language Agent for Computer Control with Structured Reflection. [paper]
Reinforced UI Instruction Grounding: Towards a Generic UI Task Automation API. [paper]
GPT-4V in Wonderland: Large Multimodal Models for Zero-Shot Smartphone GUI Navigation. [paper][code]
UGIF: UI Grounded Instruction Following. [paper]
Explore, Select, Derive, and Recall: Augmenting LLM with Human-like Memory for Mobile Task Automation. [paper][code]
CogAgent: A Visual Language Model for GUI Agents. [paper][code]
AppAgent: Multimodal Agents as Smartphone Users. [paper][code]

Traditional Approaches

uLink: Enabling User-Defined Deep Linking to App Content. [Mobisys 2016]
SUGILITE: Creating Multimodal Smartphone Automation by Demonstration. [CHI 2017] [paper][code]
Programming IoT devices by demonstration using mobile apps. [IS-EUD 2017]
Kite: Building Conversational Bots from Mobile Apps. [MobiSys 2018]. [paper]
Reinforcement Learning on Web Interfaces using Workflow-Guided Exploration. [ICLR 2018]. [paper][code]
Mapping Natural Language Instructions to Mobile UI Action Sequences. [ACL 2020] [paper][code]
Glider: A Reinforcement Learning Approach to Extract UI Scripts from Websites. [SIGIR 2021] [paper]
UIBert: Learning Generic Multimodal Representations for UI Understanding. [IJCAI-21] [paper]
META-GUI: Towards Multi-modal Conversational Agents on Mobile GUI. [EMNLP 2022][paper][code]
UINav: A maker of UI automation agents. [paper]

UI自動(dòng)化的基準(zhǔn)測試

Mapping natural language commands to web elements. [EMNLP 2018] [paper][code]
UIBert: Learning Generic Multimodal Representations for UI Understanding. [IJCAI-21] [paper]
Mapping Natural Language Instructions to Mobile UI Action Sequences. [ACL 2020] [paper][code]
A Dataset for Interactive Vision Language Navigation with Unknown Command Feasibility. [ECCV 2022][paper] [code]
META-GUI: Towards Multi-modal Conversational Agents on Mobile GUI. [EMNLP 2022][paper][code]
UGIF: UI Grounded Instruction Following. [paper]
ASSISTGUI: Task-Oriented Desktop Graphical User Interface Automation. [paper][code]
Mind2Web: Towards a Generalist Agent for the Web. arxiv 2023 [paper][code][code]
Android in the Wild: A Large-Scale Dataset for Android Device Control. [paper][code]
Empowering LLM to use Smartphone for Intelligent Task Automation. [paper] [code]
World of Bits: An Open-Domain Platform for Web-Based Agents. [ICML 2017] [paper][code]
Reinforcement Learning on Web Interfaces using Workflow-Guided Exploration. [ICLR 2018]. [paper][code]
WebShop: Towards Scalable Real-World Web Interaction with Grounded Language Agents. [NeurIPS 2022] [paper]
AndroidEnv: A Reinforcement Learning Platform for Android [paper][code]
Mobile-Env: An Evaluation Platform and Benchmark for Interactive Agents in LLM Era. [paper][code]
WebArena: A Realistic Web Environment for Building Autonomous Agents. [paper][code]

(2)、感知

理解當(dāng)前上下文的能力對(duì)于個(gè)人LLM代理提供個(gè)性化、上下文感知的服務(wù)至關(guān)重要。這包括感知用戶活動(dòng)、心理狀態(tài)、環(huán)境動(dòng)態(tài)等技術(shù)。

基于LLM的方法

“Automated Mobile Sensing Strategies Generation for Human Behaviour Understanding” (Gao et al., 2023, p. 521)?arxiv
“Cue-CoT: Chain-of-thought Prompting for Responding to In-depth Dialogue Questions with LLMs” (Wang et al., 2023, p. 1)?EMNLP 2023
“Exploring Large Language Models for Human Mobility Prediction under Public Events” (Liang et al., 2023, p. 1)?arxiv
“Penetrative AI: Making LLMs Comprehend the Physical World” (Xu et al., 2023, p. 1)?arxiv
“Evaluating Subjective Cognitive Appraisals of Emotions from Large Language Models” (Zhan et al., 2023, p. 1)?arxiv
“PALR: Personalization Aware LLMs for Recommendation” (Yang et al., 2023, p. 1)?arxiv
“Sentiment Analysis through LLM Negotiations” (Sun et al., 2023, p. 1)?arxiv
“Bridging the Information Gap Between Domain-Specific Model and General LLM for Personalized Recommendation” (Zhang et al., 2023, p. 1)?arxiv
“Conversational Health Agents: A Personalized LLM-Powered Agent Framework” (Abbasian et al., 2023, p. 1)?arxiv

傳統(tǒng)方法

“Afective State Prediction from Smartphone Touch and Sensor Data in the Wild” (Wampfler et al., 2022, p. 1)?CHI'22
“Mobile Localization Techniques for Wireless Sensor Networks: Survey and Recommendations” (Oliveira et al., 2023, p. 361)?ACM Transactions on Sensor Networks
“Are You Killing Time? Predicting Smartphone Users’ Time-killing Moments via Fusion of Smartphone Sensor Data and Screenshots” (Chen et al., 2023, p. 1)?CHI'23
“Remote Breathing Rate Tracking in Stationary Position Using the Motion and Acoustic Sensors of Earables” (Ahmed et al., 2023, p. 1)?CHI'23
“SAMoSA: Sensing Activities with Motion and Subsampled Audio” (Mollyn et al., 2022, p. 1321)?IMWUT
“A Systematic Survey on Android API Usage for Data-Driven Analytics with Smartphones” (Lee et al., 2023, p. 1)?ACM Computing Surveys
“A Multi-Sensor Approach to Automatically Recognize Breaks and Work Activities of Knowledge Workers in Academia” (Di Lascio et al., 2020, p. 781)?IMWUT
“Robust Inertial Motion Tracking through Deep Sensor Fusion across Smart Earbuds and Smartphone” (Gong et al., 2021, p. 621)?IMWUT
“DancingAnt: Body-empowered Wireless Sensing Utilizing Pervasive Radiations from Powerline” (Cui et al., 2023, p. 873)?ACM MobiCom'23
“DeXAR: Deep Explainable Sensor-Based Activity Recognition in Smart-Home Environments” (Arrotta et al., 2022, p. 11)?IMWUT
“MUSE-Fi: Contactless MUti-person SEnsing Exploiting Near-field Wi-Fi Channel Variation” (Hu et al., 2023, p. 1135)?IMWUT
“SenCom: Integrated Sensing and Communication with Practical WiFi” (He et al., 2023, p. 903)?ACM MobiCom'23
“SleepMore: Inferring Sleep Duration at Scale via Multi-Device WiFi Sensing” (Zakaria et al., 2022, p. 1931)?IMWUT
“COCOA: Cross Modality Contrastive Learning for Sensor Data” (Deldari et al., 2022, p. 1081)?ACM MobiCom'23
“M3Sense: Affect-Agnostic Multitask Representation Learning Using Multimodal Wearable Sensors” (Samyoun et al., 2022, p. 731)?IMWUT
“Predicting Subjective Measures of Social Anxiety from Sparsely Collected Mobile Sensor Data” (Rashid et al., 2020, p. 1091)?IMWUT
“Attend and Discriminate: Beyond the State-of-the-Art for Human Activity Recognition Using Wearable Sensors” (Abedin et al., 2021, p. 11)?IMWUT
“Fall Detection based on Interpretation of Important Features with Wrist-Wearable Sensors” (Kim et al., 2022, p. 1)?IMWUT
“PowerPhone: Unleashing the Acoustic Sensing Capability of Smartphones” (Cao et al., 2023, p. 842)?ACM MobiCom'23
“I Spy You: Eavesdropping Continuous Speech on Smartphones via Motion Sensors” (Zhang et al., 2022, p. 1971)?IMWUT
“Watching Your Phone’s Back: Gesture Recognition by Sensing Acoustical Structure-borne Propagation” (Wang et al., 2021, p. 821)?IMWUT
“Gesture Recognition Method Using Acoustic Sensing on Usual Garment” (Amesaka et al., 2022, p. 411)?IMWUT

“Complex Daily Activities, Country-Level Diversity, and Smartphone Sensing: A Study in Denmark, Italy, Mongolia, Paraguay, and UK” (Assi et al., 2023, p. 1)?CHI'23
“Generalization and Personalization of Mobile Sensing-Based Mood Inference Models: An Analysis of College Students in Eight Countries” (Meegahapola et al., 2022, p. 1761)?IMWUT
“Detecting Social Contexts from Mobile Sensing Indicators in Virtual Interactions with Socially Anxious Individuals” (Wang et al., 2023, p. 1341)?IMWUT
“Examining the Social Context of Alcohol Drinking in Young Adults with Smartphone Sensing” (Meegahapola et al., 2021, p. 1211)?IMWUT
“Towards Open-Domain Twitter User Profile Inference” (Wen et al., 2023, p. 3172)?ACL 2023
“One More Bite? Inferring Food Consumption Level of College Students Using Smartphone Sensing and Self-Reports” (Meegahapola et al., 2021, p. 261)?IMWUT
“FlowSense: Monitoring Airflow in Building Ventilation Systems Using Audio Sensing” (Chhaglani et al., 2022, p. 51)?IMWUT
“MicroCam: Leveraging Smartphone Microscope Camera for Context-Aware Contact Surface Sensing” (Hu et al., 2023, p. 981)?IMWUT

“A Multi-Sensor Approach to Automatically Recognize Breaks and Work Activities of Knowledge Workers in Academia” (Di Lascio et al., 2020, p. 781)?IMWUT
Mobile and Wearable Sensing Frameworks for mHealth Studies and Applications: A Systematic Review” (Kumar et al., 2021, p. 81)?ACM Transaction on Computing for Healthcare
“Afective State Prediction from Smartphone Touch and Sensor Data in the Wild” (Wampfler et al., 2022, p. 1)?CHI'22
“Are You Killing Time? Predicting Smartphone Users’ Time-killing Moments via Fusion of Smartphone Sensor Data and Screenshots” (Chen et al., 2023, p. 1)?CHI'23
“FeverPhone: Accessible Core-Body Temperature Sensing for Fever Monitoring Using Commodity Smartphones” (Breda et al., 2022, p. 31)?IMWUT
“Guard Your Heart Silently: Continuous Electrocardiogram Waveform Monitoring with Wrist-Worn Motion Sensor” (Cao et al., 2022, p. 1031)?IMWUT
“Listen2Cough: Leveraging End-to-End Deep Learning Cough Detection Model to Enhance Lung Health Assessment Using Passively Sensed Audio” (Xu et al., 2021, p. 431)?IMWUT
“HealthWalks: Sensing Fine-grained Individual Health Condition via Mobility Data” (Lin et al., 2020, p. 1381)?IMWUT
“Identifying Mobile Sensing Indicators of Stress-Resilience” (Adler et al., 2021, p. 511)?IMWUT
“MoodExplorer: Towards Compound Emotion Detection via Smartphone Sensing” (Zhang et al., 2018, p. 1761)?IMWUT
“mTeeth: Identifying Brushing Teeth Surfaces Using Wrist-Worn Inertial Sensors” (Akther et al., 2021, p. 531)?IMWUT
“Detecting Job Promotion in Information Workers Using Mobile Sensing” (Nepal et al., 2020, p. 1131)?IMWUT
“First-Gen Lens: Assessing Mental Health of First-Generation Students across Their First Year at College Using Mobile Sensing” (Wang et al., 2022, p. 951)?IMWUT
“Predicting Personality Traits from Physical Activity Intensity” (Gao et al., 2019, p. 1)?IEEE Computer
“Predicting Symptom Trajectories of Schizophrenia using Mobile Sensing” (Wang et al., 2017, p. 1101)?IMWUT
“Predictors of Life Satisfaction based on Daily Activities from Mobile Sensor Data” (Yürüten et al., 2014, p. 1)?CHI'14
“SmartGPA: How Smartphones Can Assess and Predict Academic Performance of College Students” (Wang et al., 2015, p. 1)?UbiComp'15
“Social Sensing: Assessing Social Functioning of Patients Living with Schizophrenia using Mobile Phone Sensing” (Wang et al., 2020, p. 1)?CHI'20
“SmokingOpp: Detecting the Smoking 'Opportunity’ Context Using Mobile Sensors” (Chatterjee et al., 2020, p. 41)?IMWUT

(3)、記憶

記憶是個(gè)人LLM代理保持關(guān)于用戶信息的能力，使代理能夠提供更定制的服務(wù)并根據(jù)用戶偏好自我演變。

記憶獲取

“LifeLogging: Personal Big Data”?Foundations and Trends in information retrieval
“Vision-based human activity recognition: a survey”?Multimedia Tools and Applications
“Predicting personality from patterns of behavior collected with smartphones”?Proceedings of the National Academy of Sciences
“Facial Emotion Detection Using Deep Learning”?2020 international conference for emerging technology (INCET)
“Emotion detection of textual data: An interdisciplinary survey”?2021 IEEE World AI IoT Congress

記憶管理

“Privacystreams: Enabling transparency in personal data processing for mobile apps”?IMWUT
“Tree of Thoughts: Deliberate Problem Solving with Large Language Models”?arxiv
“Chain-of-Thought Prompting Elicits Reasoning in Large Language Models”?Advances in Neural Information Processing Systems
“ReAct: Synergizing Reasoning and Acting in Language Models”?arxiv
“Generative Agents: Interactive Simulacra of Human Behavior”?Proceedings of the 36th Annual ACM Symposium on User Interface Software and Technology
“Show Your Work: Scratchpads for Intermediate Computation with Language Models”?arxiv
“Cognitive Architectures for Language Agents”?arxiv

代理自我演化

“DreamCoder: growing generalizable, interpretable knowledge with wake–sleep Bayesian program learning”?Proceedings of the 42nd acm sigplan international conference on programming language design and implementation
“Voyager: An Open-Ended Embodied Agent with Large Language Models”?arxiv
“Language models as zero-shot planners: Extracting actionable knowledge for embodied agents”?International Conference on Machine Learning
“Bootstrap Your Own Skills: Learning to Solve New Tasks with Large Language Model Guidance”?arxiv
“FireAct: Toward Language Agent Fine-tuning”?arxiv

2、LLM代理的效率

LLM代理的效率與LLM推理、LLM訓(xùn)練/定制以及內(nèi)存管理的效率密切相關(guān)。

(1)、高效的LLM推理與訓(xùn)練

LLM推理/訓(xùn)練的效率已經(jīng)在現(xiàn)有調(diào)查中得到全面總結(jié)（例如此鏈接）。因此，在這個(gè)列表中，我們省略了這部分內(nèi)容。

(2)、高效的記憶檢索與管理

在這里，我們主要列舉與高效內(nèi)存管理相關(guān)的論文，這是LLM代理的重要組成部分。

組織記憶

(with vector library, vector DB, and others)

Vector Library

RETRO: Improving language models by retrieving from trillions of tokens. [ICML, 2021] [paper]
RETA-LLM: A Retrieval-Augmented Large Language Model Toolkit. [arXiv, 2023] [paper] [code]
TRIME: Training Language Models with Memory Augmentation. [EMNLP, 2022] [paper] [code]
Enhancing LLM Intelligence with ARM-RAG: Auxiliary Rationale Memory for Retrieval Augmented Generation. [arXiv, 2023] [paper] [code]

Vector Database

Survey of Vector Database Management Systems. [arXiv, 2023] [paper]
Vector database management systems: Fundamental concepts, use-cases, and current challenges. [arXiv, 2023] [paper]
A Comprehensive Survey on Vector Database: Storage and Retrieval Technique, Challenge. [arXiv, 2023] [paper]

Other Forms of Memory

Memorizing Transformers. [ICLR, 2022] [paper] [code]
RET-LLM: Towards a General Read-Write Memory for Large Language Models. [arXiv, 2023] [paper]

優(yōu)化記憶的效率

Searching Design

Milvus: A purpose-built vector data management system. [SIGMOD, 2021] [paper(Milvus | Proceedings of the 2021 International Conference on Management of Data)] [code]
Analyticdb-v: A hybrid analytical engine towards query fusion for structured and unstructured data. [Proceedings of the VLDB Endowment, Volume 13, Issue 12, pp 3152–3165] [paper]
Hqann: Efficient and robust similarity search for hybrid queries with structured and unstructured constraints. [CIKM, 2022] [paper]
Qdrant [github]

Searching Execution

Faiss:Facebook AI Similarity Search. [wiki] [code]
Milvus: A purpose-built vector data management system. [SIGMOD, 2021] [paper] [code]
Quicker ADC : Unlocking the Hidden Potential of Product Quantization With SIMD. [IEEE Transactions on Pattern Analysis and Machine Intelligence, 2019] [paper] [code]

Efficient Indexing

LSH: Locality-sensitive hashing scheme based on p-stable distributions. [SCG, 2004] [paper]
Random projection trees and low dimensional manifolds. [STOC, 2008] [paper]
SPANN: Highly-efficient Billion-scale Approximate Nearest Neighborhood Search. [NeurIPS, 2021] [paper] [code]
Efficient and Robust Approximate Nearest Neighbor Search Using Hierarchical Navigable Small World Graphs. [IEEE Transactions on Pattern Analysis and Machine Intelligence, VOL. 42, NO. 4, 2020] [paper]
DiskANN: Fast Accurate Billion-point Nearest Neighbor Search on a Single Node. [NeurIPS, 2019] [paper] [code]
DiskANN++: Efficient Page-based Search over Isomorphic Mapped Graph Index using Query-sensitivity Entry Vertex. [arXiv, 2023] [paper]
CXL-ANNS: Software-Hardware Collaborative Memory Disaggregation and Computation for Billion-Scale Approximate Nearest Neighbor Search. [USENIX ATC, 2023] [paper]
Co-design Hardware and Algorithm for Vector Search. [SC, 2023] [paper] [code]

3、個(gè)人LLM代理的安全性和隱私

AI/ML的安全與隱私是一個(gè)龐大的領(lǐng)域，涉及大量相關(guān)論文。在這里，我們只關(guān)注與LLM和LLM代理相關(guān)的論文。

(1)、機(jī)密性（用戶數(shù)據(jù)的保密性）

THE-X: Privacy-Preserving Transformer Inference with Homomorphic Encryption. [ACL, 2022][paper]
TextFusion: Privacy-Preserving Pre-trained Model Inference via Token Fusion [EMNLP, 2022] [paper][code]
TextObfuscator: Making Pre-trained Language Model a Privacy Protector via Obfuscating Word Representations. [ACL, 2023] [paper][code]
Adversarial Training for Large Neural Language Models. [arXiv, 2020] [paper][code]

(2)、完整性（代理行為的完整性）

Adversarial Attacks

Certifying LLM Safety against Adversarial Prompting. [arXiv, 2023] [paper][code]
On evaluating adversarial robustness of large vision-language models. [arXiv, 2023] [paper][code]
Jailbroken: How does llm safety training fail? [arXiv, 2023] [paper]
On the adversarial robustness of multi-modal foundation models. [arXiv, 2023] [paper]
Misusing Tools in Large Language Models With Visual Adversarial Examples. [arXiv, 2023] [paper]
Jailbreak in pieces: Compositional Adversarial Attacks on Multi-Modal Language Models. [arXiv, 2023] [paper]

Backdoor Attacks

Backdoor attacks for in-context learning with language models. [arXiv, 2023] [paper]
Prompt as Triggers for Backdoor Attack: Examining the Vulnerability in Language Models. [arXiv, 2023] [paper]
PoisonPrompt: Backdoor Attack on Prompt-based Large Language Models. [arXiv, 2023] [paper][code]
Defending against backdoor attacks in natural language generation. [arXiv, 2021] [paper][code]

Prompt Injection Attacks

Not What You've Signed Up For: Compromising Real-World LLM-Integrated Applications with Indirect Prompt Injection. [arXiv, 2023] [paper]
Ignore Previous Prompt: Attack Techniques For Language Models. [arXiv, 2022] [paper][code]
Prompt Injection attack against LLM-integrated Applications. [arXiv, 2023] [paper][code]
Jailbreaking Black Box Large Language Models in Twenty Queries. [arXiv, 2023] [paper][code]
Extracting Training Data from Large Language Models. [arXiv, 2020] [paper]
SmoothLLM: Defending Large Language Models Against Jailbreaking Attacks. [arXiv, 2023] [paper][code]

(3)、可靠性（代理決策的可靠性）

Problems

Survey of Hallucination in Natural Language Generation. [ACM Computing Surveys 2023] [paper]
A Survey of Hallucination in Large Foundation Models. [arXiv, 2023] [paper]
DERA: Enhancing Large Language Model Completions with Dialog-Enabled Resolving Agents. [arXiv, 2023] [paper]
Cumulative Reasoning with Large Language Models. [arXiv, 2023] [paper]
Learning From Mistakes Makes LLM Better Reasoner. [arXiv, 2023] [paper]
Large Language Models can Learn Rules. [arXiv, 2023] [paper]

Improvement

PromptSource: An Integrated Development Environment and Repository for Natural Language Prompts. [ACL 2022] [paper]
Super-NaturalInstructions: Generalization via Declarative Instructions on 1600+ NLP Tasks. [EMNLP 2022] [paper]
Finetuned Language Models are Zero-Shot Learners. [ICLR 2022] [paper]
SELFCHECKGPT: Zero-Resource Black-Box Hallucination Detection for Generative Large Language Models. [EMNLP 2023] [paper]
Large Language Models Can Self-Improve. [arXiv, 2022] [paper]
Self-Refine: Iterative Refinement with Self-Feedback. [arXiv, 2023] [paper]
Teaching Large Language Models to Self-Debug. [arXiv, 2023] [paper]
Prompt-Guided Retrieval Augmentation for Non-Knowledge-Intensive Tasks. [ACL 2023] [paper]
Chain-of-Note: Enhancing Robustness in Retrieval-Augmented Language Models. [arXiv, 2023] [paper]
Self-RAG: Learning to Retrieve, Generate, and Critique through Self-Reflection. [arXiv, 2023] [paper]
Self-Knowledge Guided Retrieval Augmentation for Large Language Models. [Findings of EMNLP, 2023] [paper]

Inspection

CGMH: Constrained Sentence Generation by Metropolis-Hastings Sampling. [AAAI 2019] [paper]
Gradient-Based Constrained Sampling from Language Models. [EMNLP 2022] [paper]
Large Language Models are Better Reasoners with Self-Verification. [Findings of EMNLP 2023] [paper]
Explainability for Large Language Models: A Survey. [arXiv, 2023] [paper]
Self-Consistency Improves Chain of Thought Reasoning in Language Models. [ICLR, 2023] [paper]
Enhancing Chain-of-Thoughts Prompting with Iterative Bootstrapping in Large Language Models. [arXiv, 2023] [paper]
Mutual Information Alleviates Hallucinations in Abstractive Summarization. [EMNLP, 2023] [paper]
Overthinking the Truth: Understanding how Language Models Process False Demonstrations. [arXiv, 2023] [paper]
Inference-Time Intervention: Eliciting Truthful Answers from a Language Model. [NeurIPS, 2023] [paper]