LLMs之Agent:Personal_LLM_Agents_Survey的簡介、使用方法之詳細(xì)攻略
導(dǎo)讀:該項(xiàng)目包含了針對(duì)個(gè)人型LLM代理(Personal LLM Agents)的相關(guān)論文列表。通過查詢相關(guān)論文,可以了解這一新興技術(shù)方向的最新研究進(jìn)展,比如在對(duì)話能力、知識(shí)表示、隱私保護(hù)等方面如何進(jìn)行優(yōu)化,從而提升用戶體驗(yàn)。通過論文也可以了解這一技術(shù)的應(yīng)用案例、難點(diǎn)以及解決方法。例如如何將LLM代理應(yīng)用在教育或醫(yī)療助手等領(lǐng)域,如何使其對(duì)話能力更加逼真自然,或者如何保護(hù)用戶隱私不被濫用等都是值得關(guān)注的問題。 總的來說,此項(xiàng)目給出了一個(gè)系統(tǒng)整理的個(gè)人LLM代理相關(guān)論文列表,從多個(gè)角度論述了這個(gè)新技術(shù)方向的發(fā)展現(xiàn)狀和未來走勢(shì),有助于研究人員和開發(fā)者更好地把握趨勢(shì)并開展工作。
Personal_LLM_Agents_Survey的簡介
個(gè)人LLM代理(智能體)被定義為一種特殊類型的基于LLM的代理,它與個(gè)人數(shù)據(jù)、個(gè)人設(shè)備和個(gè)人服務(wù)深度集成。它們最好部署到資源受限的移動(dòng)/邊緣設(shè)備和/或由輕量級(jí)AI模型提供支持。個(gè)人LLM代理的主要目的是協(xié)助最終用戶并增強(qiáng)其能力,幫助他們更專注、更出色地處理有趣和重要的事務(wù)。
這份論文清單涵蓋了個(gè)人LLM代理的幾個(gè)主要方面,包括能力、效率和安全性。
GitHub地址:https://github.com/MobileLLM/Personal_LLM_Agents_Survey
Personal_LLM_Agents_Survey的使用方法
1、個(gè)人LLM代理的關(guān)鍵能力
(1)、任務(wù)自動(dòng)化
任務(wù)自動(dòng)化是個(gè)人LLM代理的核心能力,它決定了代理能夠多好地響應(yīng)用戶命令和/或自動(dòng)執(zhí)行用戶任務(wù)。由于UI-based任務(wù)自動(dòng)化代理在這個(gè)列表中很受歡迎并與個(gè)人設(shè)備密切相關(guān),我們專注于這方面。
基于UI的任務(wù)自動(dòng)化代理
LLM-based Approaches
- WebGPT: Browser-assisted question-answering with human feedback. [paper]
- Enabling Conversational Interaction with Mobile UI Using Large Language Models. [CHI 2023] [paper]
- Language Models can Solve Computer Tasks. [NeurIPS 2023] [paper]
- DroidBot-GPT: GPT-powered UI Automation for Android. [arxiv] [code]
- Responsible Task Automation: Empowering Large Language Models as Responsible Task Automators.[paper]
- Mind2Web: Towards a Generalist Agent for the Web. arxiv 2023 [paper][code][code]
- (AutoDroid) Empowering LLM to use Smartphone for Intelligent Task Automation. [paper] [code]
- You Only Look at Screens: Multimodal Chain-of-Action Agents. ArXiv Preprint [paper] [code]
- AXNav: Replaying Accessibility Tests from Natural Language. [paper]
- Automatic Macro Mining from Interaction Traces at Scale. [paper]
- A Zero-Shot Language Agent for Computer Control with Structured Reflection. [paper]
- Reinforced UI Instruction Grounding: Towards a Generic UI Task Automation API. [paper]
- GPT-4V in Wonderland: Large Multimodal Models for Zero-Shot Smartphone GUI Navigation. [paper][code]
- UGIF: UI Grounded Instruction Following. [paper]
- Explore, Select, Derive, and Recall: Augmenting LLM with Human-like Memory for Mobile Task Automation. [paper][code]
- CogAgent: A Visual Language Model for GUI Agents. [paper][code]
- AppAgent: Multimodal Agents as Smartphone Users. [paper][code]
Traditional Approaches
- uLink: Enabling User-Defined Deep Linking to App Content. [Mobisys 2016]
- SUGILITE: Creating Multimodal Smartphone Automation by Demonstration. [CHI 2017] [paper][code]
- Programming IoT devices by demonstration using mobile apps. [IS-EUD 2017]
- Kite: Building Conversational Bots from Mobile Apps. [MobiSys 2018]. [paper]
- Reinforcement Learning on Web Interfaces using Workflow-Guided Exploration. [ICLR 2018]. [paper][code]
- Mapping Natural Language Instructions to Mobile UI Action Sequences. [ACL 2020] [paper][code]
- Glider: A Reinforcement Learning Approach to Extract UI Scripts from Websites. [SIGIR 2021] [paper]
- UIBert: Learning Generic Multimodal Representations for UI Understanding. [IJCAI-21] [paper]
- META-GUI: Towards Multi-modal Conversational Agents on Mobile GUI. [EMNLP 2022][paper][code]
- UINav: A maker of UI automation agents. [paper]
UI自動(dòng)化的基準(zhǔn)測試
- Mapping natural language commands to web elements. [EMNLP 2018] [paper][code]
- UIBert: Learning Generic Multimodal Representations for UI Understanding. [IJCAI-21] [paper]
- Mapping Natural Language Instructions to Mobile UI Action Sequences. [ACL 2020] [paper][code]
- A Dataset for Interactive Vision Language Navigation with Unknown Command Feasibility. [ECCV 2022][paper] [code]
- META-GUI: Towards Multi-modal Conversational Agents on Mobile GUI. [EMNLP 2022][paper][code]
- UGIF: UI Grounded Instruction Following. [paper]
- ASSISTGUI: Task-Oriented Desktop Graphical User Interface Automation. [paper][code]
- Mind2Web: Towards a Generalist Agent for the Web. arxiv 2023 [paper][code][code]
- Android in the Wild: A Large-Scale Dataset for Android Device Control. [paper][code]
- Empowering LLM to use Smartphone for Intelligent Task Automation. [paper] [code]
- World of Bits: An Open-Domain Platform for Web-Based Agents. [ICML 2017] [paper][code]
- Reinforcement Learning on Web Interfaces using Workflow-Guided Exploration. [ICLR 2018]. [paper][code]
- WebShop: Towards Scalable Real-World Web Interaction with Grounded Language Agents. [NeurIPS 2022] [paper]
- AndroidEnv: A Reinforcement Learning Platform for Android [paper][code]
- Mobile-Env: An Evaluation Platform and Benchmark for Interactive Agents in LLM Era. [paper][code]
- WebArena: A Realistic Web Environment for Building Autonomous Agents. [paper][code]
(2)、感知
理解當(dāng)前上下文的能力對(duì)于個(gè)人LLM代理提供個(gè)性化、上下文感知的服務(wù)至關(guān)重要。這包括感知用戶活動(dòng)、心理狀態(tài)、環(huán)境動(dòng)態(tài)等技術(shù)。
基于LLM的方法
- “Automated Mobile Sensing Strategies Generation for Human Behaviour Understanding” (Gao et al., 2023, p. 521)?arxiv
- “Cue-CoT: Chain-of-thought Prompting for Responding to In-depth Dialogue Questions with LLMs” (Wang et al., 2023, p. 1)?EMNLP 2023
- “Exploring Large Language Models for Human Mobility Prediction under Public Events” (Liang et al., 2023, p. 1)?arxiv
- “Penetrative AI: Making LLMs Comprehend the Physical World” (Xu et al., 2023, p. 1)?arxiv
- “Evaluating Subjective Cognitive Appraisals of Emotions from Large Language Models” (Zhan et al., 2023, p. 1)?arxiv
- “PALR: Personalization Aware LLMs for Recommendation” (Yang et al., 2023, p. 1)?arxiv
- “Sentiment Analysis through LLM Negotiations” (Sun et al., 2023, p. 1)?arxiv
- “Bridging the Information Gap Between Domain-Specific Model and General LLM for Personalized Recommendation” (Zhang et al., 2023, p. 1)?arxiv
- “Conversational Health Agents: A Personalized LLM-Powered Agent Framework” (Abbasian et al., 2023, p. 1)?arxiv
傳統(tǒng)方法
-
“Afective State Prediction from Smartphone Touch and Sensor Data in the Wild” (Wampfler et al., 2022, p. 1)?CHI'22 -
“Mobile Localization Techniques for Wireless Sensor Networks: Survey and Recommendations” (Oliveira et al., 2023, p. 361)?ACM Transactions on Sensor Networks -
“Are You Killing Time? Predicting Smartphone Users’ Time-killing Moments via Fusion of Smartphone Sensor Data and Screenshots” (Chen et al., 2023, p. 1)?CHI'23 -
“Remote Breathing Rate Tracking in Stationary Position Using the Motion and Acoustic Sensors of Earables” (Ahmed et al., 2023, p. 1)?CHI'23 -
“SAMoSA: Sensing Activities with Motion and Subsampled Audio” (Mollyn et al., 2022, p. 1321)?IMWUT -
“A Systematic Survey on Android API Usage for Data-Driven Analytics with Smartphones” (Lee et al., 2023, p. 1)?ACM Computing Surveys -
“A Multi-Sensor Approach to Automatically Recognize Breaks and Work Activities of Knowledge Workers in Academia” (Di Lascio et al., 2020, p. 781)?IMWUT -
“Robust Inertial Motion Tracking through Deep Sensor Fusion across Smart Earbuds and Smartphone” (Gong et al., 2021, p. 621)?IMWUT -
“DancingAnt: Body-empowered Wireless Sensing Utilizing Pervasive Radiations from Powerline” (Cui et al., 2023, p. 873)?ACM MobiCom'23 -
“DeXAR: Deep Explainable Sensor-Based Activity Recognition in Smart-Home Environments” (Arrotta et al., 2022, p. 11)?IMWUT -
“MUSE-Fi: Contactless MUti-person SEnsing Exploiting Near-field Wi-Fi Channel Variation” (Hu et al., 2023, p. 1135)?IMWUT -
“SenCom: Integrated Sensing and Communication with Practical WiFi” (He et al., 2023, p. 903)?ACM MobiCom'23 -
“SleepMore: Inferring Sleep Duration at Scale via Multi-Device WiFi Sensing” (Zakaria et al., 2022, p. 1931)?IMWUT -
“COCOA: Cross Modality Contrastive Learning for Sensor Data” (Deldari et al., 2022, p. 1081)?ACM MobiCom'23 -
“M3Sense: Affect-Agnostic Multitask Representation Learning Using Multimodal Wearable Sensors” (Samyoun et al., 2022, p. 731)?IMWUT -
“Predicting Subjective Measures of Social Anxiety from Sparsely Collected Mobile Sensor Data” (Rashid et al., 2020, p. 1091)?IMWUT -
“Attend and Discriminate: Beyond the State-of-the-Art for Human Activity Recognition Using Wearable Sensors” (Abedin et al., 2021, p. 11)?IMWUT -
“Fall Detection based on Interpretation of Important Features with Wrist-Wearable Sensors” (Kim et al., 2022, p. 1)?IMWUT -
“PowerPhone: Unleashing the Acoustic Sensing Capability of Smartphones” (Cao et al., 2023, p. 842)?ACM MobiCom'23 -
“I Spy You: Eavesdropping Continuous Speech on Smartphones via Motion Sensors” (Zhang et al., 2022, p. 1971)?IMWUT -
“Watching Your Phone’s Back: Gesture Recognition by Sensing Acoustical Structure-borne Propagation” (Wang et al., 2021, p. 821)?IMWUT -
“Gesture Recognition Method Using Acoustic Sensing on Usual Garment” (Amesaka et al., 2022, p. 411)?IMWUT
- “Complex Daily Activities, Country-Level Diversity, and Smartphone Sensing: A Study in Denmark, Italy, Mongolia, Paraguay, and UK” (Assi et al., 2023, p. 1)?CHI'23
- “Generalization and Personalization of Mobile Sensing-Based Mood Inference Models: An Analysis of College Students in Eight Countries” (Meegahapola et al., 2022, p. 1761)?IMWUT
- “Detecting Social Contexts from Mobile Sensing Indicators in Virtual Interactions with Socially Anxious Individuals” (Wang et al., 2023, p. 1341)?IMWUT
- “Examining the Social Context of Alcohol Drinking in Young Adults with Smartphone Sensing” (Meegahapola et al., 2021, p. 1211)?IMWUT
- “Towards Open-Domain Twitter User Profile Inference” (Wen et al., 2023, p. 3172)?ACL 2023
- “One More Bite? Inferring Food Consumption Level of College Students Using Smartphone Sensing and Self-Reports” (Meegahapola et al., 2021, p. 261)?IMWUT
- “FlowSense: Monitoring Airflow in Building Ventilation Systems Using Audio Sensing” (Chhaglani et al., 2022, p. 51)?IMWUT
- “MicroCam: Leveraging Smartphone Microscope Camera for Context-Aware Contact Surface Sensing” (Hu et al., 2023, p. 981)?IMWUT
-
“A Multi-Sensor Approach to Automatically Recognize Breaks and Work Activities of Knowledge Workers in Academia” (Di Lascio et al., 2020, p. 781)?IMWUT -
Mobile and Wearable Sensing Frameworks for mHealth Studies and Applications: A Systematic Review” (Kumar et al., 2021, p. 81)?ACM Transaction on Computing for Healthcare -
“Afective State Prediction from Smartphone Touch and Sensor Data in the Wild” (Wampfler et al., 2022, p. 1)?CHI'22 -
“Are You Killing Time? Predicting Smartphone Users’ Time-killing Moments via Fusion of Smartphone Sensor Data and Screenshots” (Chen et al., 2023, p. 1)?CHI'23 -
“FeverPhone: Accessible Core-Body Temperature Sensing for Fever Monitoring Using Commodity Smartphones” (Breda et al., 2022, p. 31)?IMWUT -
“Guard Your Heart Silently: Continuous Electrocardiogram Waveform Monitoring with Wrist-Worn Motion Sensor” (Cao et al., 2022, p. 1031)?IMWUT -
“Listen2Cough: Leveraging End-to-End Deep Learning Cough Detection Model to Enhance Lung Health Assessment Using Passively Sensed Audio” (Xu et al., 2021, p. 431)?IMWUT -
“HealthWalks: Sensing Fine-grained Individual Health Condition via Mobility Data” (Lin et al., 2020, p. 1381)?IMWUT -
“Identifying Mobile Sensing Indicators of Stress-Resilience” (Adler et al., 2021, p. 511)?IMWUT -
“MoodExplorer: Towards Compound Emotion Detection via Smartphone Sensing” (Zhang et al., 2018, p. 1761)?IMWUT -
“mTeeth: Identifying Brushing Teeth Surfaces Using Wrist-Worn Inertial Sensors” (Akther et al., 2021, p. 531)?IMWUT -
“Detecting Job Promotion in Information Workers Using Mobile Sensing” (Nepal et al., 2020, p. 1131)?IMWUT -
“First-Gen Lens: Assessing Mental Health of First-Generation Students across Their First Year at College Using Mobile Sensing” (Wang et al., 2022, p. 951)?IMWUT -
“Predicting Personality Traits from Physical Activity Intensity” (Gao et al., 2019, p. 1)?IEEE Computer -
“Predicting Symptom Trajectories of Schizophrenia using Mobile Sensing” (Wang et al., 2017, p. 1101)?IMWUT -
“Predictors of Life Satisfaction based on Daily Activities from Mobile Sensor Data” (Yürüten et al., 2014, p. 1)?CHI'14 -
“SmartGPA: How Smartphones Can Assess and Predict Academic Performance of College Students” (Wang et al., 2015, p. 1)?UbiComp'15 -
“Social Sensing: Assessing Social Functioning of Patients Living with Schizophrenia using Mobile Phone Sensing” (Wang et al., 2020, p. 1)?CHI'20 -
“SmokingOpp: Detecting the Smoking 'Opportunity’ Context Using Mobile Sensors” (Chatterjee et al., 2020, p. 41)?IMWUT
(3)、記憶
記憶是個(gè)人LLM代理保持關(guān)于用戶信息的能力,使代理能夠提供更定制的服務(wù)并根據(jù)用戶偏好自我演變。
記憶獲取
- “LifeLogging: Personal Big Data”?Foundations and Trends in information retrieval
- “Vision-based human activity recognition: a survey”?Multimedia Tools and Applications
- “Predicting personality from patterns of behavior collected with smartphones”?Proceedings of the National Academy of Sciences
- “Facial Emotion Detection Using Deep Learning”?2020 international conference for emerging technology (INCET)
- “Emotion detection of textual data: An interdisciplinary survey”?2021 IEEE World AI IoT Congress
記憶管理
- “Privacystreams: Enabling transparency in personal data processing for mobile apps”?IMWUT
- “Tree of Thoughts: Deliberate Problem Solving with Large Language Models”?arxiv
- “Chain-of-Thought Prompting Elicits Reasoning in Large Language Models”?Advances in Neural Information Processing Systems
- “ReAct: Synergizing Reasoning and Acting in Language Models”?arxiv
- “Generative Agents: Interactive Simulacra of Human Behavior”?Proceedings of the 36th Annual ACM Symposium on User Interface Software and Technology
- “Show Your Work: Scratchpads for Intermediate Computation with Language Models”?arxiv
- “Cognitive Architectures for Language Agents”?arxiv
代理自我演化
- “DreamCoder: growing generalizable, interpretable knowledge with wake–sleep Bayesian program learning”?Proceedings of the 42nd acm sigplan international conference on programming language design and implementation
- “Voyager: An Open-Ended Embodied Agent with Large Language Models”?arxiv
- “Language models as zero-shot planners: Extracting actionable knowledge for embodied agents”?International Conference on Machine Learning
- “Bootstrap Your Own Skills: Learning to Solve New Tasks with Large Language Model Guidance”?arxiv
- “FireAct: Toward Language Agent Fine-tuning”?arxiv
2、LLM代理的效率
LLM代理的效率與LLM推理、LLM訓(xùn)練/定制以及內(nèi)存管理的效率密切相關(guān)。
(1)、高效的LLM推理與訓(xùn)練
LLM推理/訓(xùn)練的效率已經(jīng)在現(xiàn)有調(diào)查中得到全面總結(jié)(例如此鏈接)。因此,在這個(gè)列表中,我們省略了這部分內(nèi)容。
(2)、高效的記憶檢索與管理
在這里,我們主要列舉與高效內(nèi)存管理相關(guān)的論文,這是LLM代理的重要組成部分。
組織記憶
(with vector library, vector DB, and others)
Vector Library
- RETRO: Improving language models by retrieving from trillions of tokens. [ICML, 2021] [paper]
- RETA-LLM: A Retrieval-Augmented Large Language Model Toolkit. [arXiv, 2023] [paper] [code]
- TRIME: Training Language Models with Memory Augmentation. [EMNLP, 2022] [paper] [code]
- Enhancing LLM Intelligence with ARM-RAG: Auxiliary Rationale Memory for Retrieval Augmented Generation. [arXiv, 2023] [paper] [code]
Vector Database
- Survey of Vector Database Management Systems. [arXiv, 2023] [paper]
- Vector database management systems: Fundamental concepts, use-cases, and current challenges. [arXiv, 2023] [paper]
- A Comprehensive Survey on Vector Database: Storage and Retrieval Technique, Challenge. [arXiv, 2023] [paper]
Other Forms of Memory
- Memorizing Transformers. [ICLR, 2022] [paper] [code]
- RET-LLM: Towards a General Read-Write Memory for Large Language Models. [arXiv, 2023] [paper]
優(yōu)化記憶的效率
Searching Design
- Milvus: A purpose-built vector data management system. [SIGMOD, 2021] [paper(Milvus | Proceedings of the 2021 International Conference on Management of Data)] [code]
- Analyticdb-v: A hybrid analytical engine towards query fusion for structured and unstructured data. [Proceedings of the VLDB Endowment, Volume 13, Issue 12, pp 3152–3165] [paper]
- Hqann: Efficient and robust similarity search for hybrid queries with structured and unstructured constraints. [CIKM, 2022] [paper]
- Qdrant [github]
Searching Execution
- Faiss:Facebook AI Similarity Search. [wiki] [code]
- Milvus: A purpose-built vector data management system. [SIGMOD, 2021] [paper] [code]
- Quicker ADC : Unlocking the Hidden Potential of Product Quantization With SIMD. [IEEE Transactions on Pattern Analysis and Machine Intelligence, 2019] [paper] [code]
Efficient Indexing
- LSH: Locality-sensitive hashing scheme based on p-stable distributions. [SCG, 2004] [paper]
- Random projection trees and low dimensional manifolds. [STOC, 2008] [paper]
- SPANN: Highly-efficient Billion-scale Approximate Nearest Neighborhood Search. [NeurIPS, 2021] [paper] [code]
- Efficient and Robust Approximate Nearest Neighbor Search Using Hierarchical Navigable Small World Graphs. [IEEE Transactions on Pattern Analysis and Machine Intelligence, VOL. 42, NO. 4, 2020] [paper]
- DiskANN: Fast Accurate Billion-point Nearest Neighbor Search on a Single Node. [NeurIPS, 2019] [paper] [code]
- DiskANN++: Efficient Page-based Search over Isomorphic Mapped Graph Index using Query-sensitivity Entry Vertex. [arXiv, 2023] [paper]
- CXL-ANNS: Software-Hardware Collaborative Memory Disaggregation and Computation for Billion-Scale Approximate Nearest Neighbor Search. [USENIX ATC, 2023] [paper]
- Co-design Hardware and Algorithm for Vector Search. [SC, 2023] [paper] [code]
3、個(gè)人LLM代理的安全性和隱私
AI/ML的安全與隱私是一個(gè)龐大的領(lǐng)域,涉及大量相關(guān)論文。在這里,我們只關(guān)注與LLM和LLM代理相關(guān)的論文。
(1)、機(jī)密性(用戶數(shù)據(jù)的保密性)
- THE-X: Privacy-Preserving Transformer Inference with Homomorphic Encryption. [ACL, 2022][paper]
- TextFusion: Privacy-Preserving Pre-trained Model Inference via Token Fusion [EMNLP, 2022] [paper][code]
- TextObfuscator: Making Pre-trained Language Model a Privacy Protector via Obfuscating Word Representations. [ACL, 2023] [paper][code]
- Adversarial Training for Large Neural Language Models. [arXiv, 2020] [paper][code]
(2)、完整性(代理行為的完整性)
Adversarial Attacks
- Certifying LLM Safety against Adversarial Prompting. [arXiv, 2023] [paper][code]
- On evaluating adversarial robustness of large vision-language models. [arXiv, 2023] [paper][code]
- Jailbroken: How does llm safety training fail? [arXiv, 2023] [paper]
- On the adversarial robustness of multi-modal foundation models. [arXiv, 2023] [paper]
- Misusing Tools in Large Language Models With Visual Adversarial Examples. [arXiv, 2023] [paper]
- Jailbreak in pieces: Compositional Adversarial Attacks on Multi-Modal Language Models. [arXiv, 2023] [paper]
Backdoor Attacks
- Backdoor attacks for in-context learning with language models. [arXiv, 2023] [paper]
- Prompt as Triggers for Backdoor Attack: Examining the Vulnerability in Language Models. [arXiv, 2023] [paper]
- PoisonPrompt: Backdoor Attack on Prompt-based Large Language Models. [arXiv, 2023] [paper][code]
- Defending against backdoor attacks in natural language generation. [arXiv, 2021] [paper][code]
Prompt Injection Attacks
- Not What You've Signed Up For: Compromising Real-World LLM-Integrated Applications with Indirect Prompt Injection. [arXiv, 2023] [paper]
- Ignore Previous Prompt: Attack Techniques For Language Models. [arXiv, 2022] [paper][code]
- Prompt Injection attack against LLM-integrated Applications. [arXiv, 2023] [paper][code]
- Jailbreaking Black Box Large Language Models in Twenty Queries. [arXiv, 2023] [paper][code]
- Extracting Training Data from Large Language Models. [arXiv, 2020] [paper]
- SmoothLLM: Defending Large Language Models Against Jailbreaking Attacks. [arXiv, 2023] [paper][code]
(3)、可靠性(代理決策的可靠性)
Problems
- Survey of Hallucination in Natural Language Generation. [ACM Computing Surveys 2023] [paper]
- A Survey of Hallucination in Large Foundation Models. [arXiv, 2023] [paper]
- DERA: Enhancing Large Language Model Completions with Dialog-Enabled Resolving Agents. [arXiv, 2023] [paper]
- Cumulative Reasoning with Large Language Models. [arXiv, 2023] [paper]
- Learning From Mistakes Makes LLM Better Reasoner. [arXiv, 2023] [paper]
- Large Language Models can Learn Rules. [arXiv, 2023] [paper]
Improvement
- PromptSource: An Integrated Development Environment and Repository for Natural Language Prompts. [ACL 2022] [paper]
- Super-NaturalInstructions: Generalization via Declarative Instructions on 1600+ NLP Tasks. [EMNLP 2022] [paper]
- Finetuned Language Models are Zero-Shot Learners. [ICLR 2022] [paper]
- SELFCHECKGPT: Zero-Resource Black-Box Hallucination Detection for Generative Large Language Models. [EMNLP 2023] [paper]
- Large Language Models Can Self-Improve. [arXiv, 2022] [paper]
- Self-Refine: Iterative Refinement with Self-Feedback. [arXiv, 2023] [paper]
- Teaching Large Language Models to Self-Debug. [arXiv, 2023] [paper]
- Prompt-Guided Retrieval Augmentation for Non-Knowledge-Intensive Tasks. [ACL 2023] [paper]
- Chain-of-Note: Enhancing Robustness in Retrieval-Augmented Language Models. [arXiv, 2023] [paper]
- Self-RAG: Learning to Retrieve, Generate, and Critique through Self-Reflection. [arXiv, 2023] [paper]
- Self-Knowledge Guided Retrieval Augmentation for Large Language Models. [Findings of EMNLP, 2023] [paper]
Inspection
- CGMH: Constrained Sentence Generation by Metropolis-Hastings Sampling. [AAAI 2019] [paper]
- Gradient-Based Constrained Sampling from Language Models. [EMNLP 2022] [paper]
- Large Language Models are Better Reasoners with Self-Verification. [Findings of EMNLP 2023] [paper]
- Explainability for Large Language Models: A Survey. [arXiv, 2023] [paper]
- Self-Consistency Improves Chain of Thought Reasoning in Language Models. [ICLR, 2023] [paper]
- Enhancing Chain-of-Thoughts Prompting with Iterative Bootstrapping in Large Language Models. [arXiv, 2023] [paper]
- Mutual Information Alleviates Hallucinations in Abstractive Summarization. [EMNLP, 2023] [paper]
- Overthinking the Truth: Understanding how Language Models Process False Demonstrations. [arXiv, 2023] [paper]
- Inference-Time Intervention: Eliciting Truthful Answers from a Language Model. [NeurIPS, 2023] [paper]
|