思考
写在多模态征服一切之前（未来数据和模型应该是什么样的？）
关于Compression for AGI
论文阅读
[论文阅读] Genie: 生成式交互环境, Generative Interactive Environments
[论文阅读] Gemma: 基于Gemini研究和技术的开源模型, Gemma: Open Models Based on Gemini Research and Technology
[论文阅读] 迈向全透明的开源大语言模型, LLM360: Towards Fully Transparent Open-Source LLMs
[文章阅读] 作为世界模拟器的视频生成模型, Video generation models as world simulators
[论文阅读] 处理、表示和操作视觉丰富的科学文献的统一工具包: PaperMage
[论文阅读] 统一和处理多种结构化知识基础（SKG）任务, UnifiedSKG
[论文阅读] Transformer: Attention是一切, 《Attention Is All You Need》
[论文阅读] 双子座：一个功能强大的多模态模型系列，Gemini: A Family of Highly Capable Multimodal Models
论文阅读，开源的多模态文档数据集，《OBELISC: An Open Web-Scale Filtered Dataset of Interleaved Image-Text Documents》
论文阅读，看了超过200篇中国人写的英文论文后总结出了这些常见错误
166页超长论文阅读，大多模态模型的黎明：GPT-4V的初步探索，The Dawn of LMMs: Preliminary Explorations with GPT-4V(ision)
GPT-4V系统卡解读：智能以外，如何对社会有益？
论文阅读，大模型的缩放定律，Scaling Laws for Neural Language Models
可以生成空间感知的文本块和以markdown格式的多模态文学模型，Kosmos-2.5: A Multimodal Literate Model
基于科学文档的PDE识别，《Nougat：Neural Optical Understanding for Academic Documents》
工具
科研工具

思考

写在多模态征服一切之前（未来数据和模型应该是什么样的？）

今年（2023）以来，多模态的论文真的是越来越多，我都要看不完了。。不过这也意味着，这个领域即将迎来新的大发展！大的终于要来了！这篇文章，一开始是希望做一个survey的接着去年的写。王junjie：从VQA到多…

https://zhuanlan.zhihu.com/p/667942680

关于Compression for AGI

压缩人类智慧通往AGI：Compression for AGI

关于Compression for AGI的talk的一些想法参考： https://www.youtube.com/watch?v=dO4TPJkeaaU https://zhuanlan.zhihu.com/p/621201155 这里我简单的总结一下视频中的看法。看法1：任务理解与描述长度的关系 …

https://zhuanlan.zhihu.com/p/661691459

论文阅读

[论文阅读] Genie: 生成式交互环境, Generative Interactive Environments

https://zhuanlan.zhihu.com/p/686215795

[论文阅读] Gemma: 基于Gemini研究和技术的开源模型, Gemma: Open Models Based on Gemini Research and Technology

https://zhuanlan.zhihu.com/p/684424814

[论文阅读] 迈向全透明的开源大语言模型, LLM360: Towards Fully Transparent Open-Source LLMs

https://zhuanlan.zhihu.com/p/683666736

[文章阅读] 作为世界模拟器的视频生成模型, Video generation models as world simulators

https://zhuanlan.zhihu.com/p/682425217

[论文阅读] 处理、表示和操作视觉丰富的科学文献的统一工具包: PaperMage

https://zhuanlan.zhihu.com/p/680142822

[论文阅读] 统一和处理多种结构化知识基础（SKG）任务, UnifiedSKG

https://zhuanlan.zhihu.com/p/677789726

[论文阅读] Transformer: Attention是一切, 《Attention Is All You Need》

https://zhuanlan.zhihu.com/p/673226731

[论文阅读] 双子座：一个功能强大的多模态模型系列，Gemini: A Family of Highly Capable Multimodal Models

https://zhuanlan.zhihu.com/p/670821058

论文阅读，开源的多模态文档数据集，《OBELISC: An Open Web-Scale Filtered Dataset of Interleaved Image-Text Documents》

最近看到了一个开源的多模态网页文档数据集。然后发现，竟然还有详细的步骤，太好了，必须读一下。所以就有了这篇文章。 1 IdeaAbstract：本文介绍了OBELICS数据集，这是一个开放的大规模网络数据集，专门包含了图…

https://zhuanlan.zhihu.com/p/670149958

论文阅读，看了超过200篇中国人写的英文论文后总结出了这些常见错误

论文标题：The Most Common Habits from more than 200 English Papers written by Graduate Chinese Engineering Students 论文链接： https://www.chrisyttang.org/assets/misc/The%20Most%20Common%20Habits%20…

https://zhuanlan.zhihu.com/p/665892027

166页超长论文阅读，大多模态模型的黎明：GPT-4V的初步探索，The Dawn of LMMs: Preliminary Explorations with GPT-4V(ision)

上集

166页超长论文阅读，大多模态模型的黎明：GPT-4V的初步探索，The Dawn of LMMs: Preliminary Explorations with GPT-4V(ision) [上]

太长了，所以分了两part，先更新一半吧。下集已经更新：王junjie：166页超长论文阅读，大多模态模型的黎明：GPT-4V的初步探索，The Dawn of LMMs: Preliminary Explorations with GPT-4V(ision) [下]0. 在看之前…

https://zhuanlan.zhihu.com/p/663655741

下集

166页超长论文阅读，大多模态模型的黎明：GPT-4V的初步探索，The Dawn of LMMs: Preliminary Explorations with GPT-4V(ision) [下]

这是第二part了，可以接着上一part来看：王junjie：166页超长论文阅读，大多模态模型的黎明：GPT-4V的初步探索，The Dawn of LMMs: Preliminary Explorations with GPT-4V(ision) [上] 假如想直接看最后的总结，…

https://zhuanlan.zhihu.com/p/663807184

GPT-4V系统卡解读：智能以外，如何对社会有益？

参考： https://zhuanlan.zhihu.com/p/658587700论文链接： https://cdn.openai.com/papers/GPTV_System_Card.pdfReadpaper链接： https://readpaper.com/paper/1977100439624877824 1. 在看之前其实这并不算是一…

https://zhuanlan.zhihu.com/p/662927889

论文阅读，大模型的缩放定律，Scaling Laws for Neural Language Models

论文链接： Scaling Laws for Neural Language Modelsreadpaper链接： Scaling Laws for Neural Language Models 参考： https://zhuanlan.zhihu.com/p/506487373 https://www.cnblogs.com/gaowenxingxing/p/15230…

https://zhuanlan.zhihu.com/p/663296750

可以生成空间感知的文本块和以markdown格式的多模态文学模型，Kosmos-2.5: A Multimodal Literate Model

论文阅读，可以生成空间感知的文本块和以markdown格式的多模态文学模型，Kosmos-2.5: A Multimodal Literate Model

1. Idea 该论文介绍了一个名为KOSMOS -2.5的多模态文学模型，该模型旨在进行文本密集图像的机器阅读。该模型在两个转录任务中表现出色：生成空间感知的文本块和以markdown格式生成结构化的文本输出。通过共享的Tra…

https://zhuanlan.zhihu.com/p/659137599

基于科学文档的PDE识别，《Nougat：Neural Optical Understanding for Academic Documents》

论文阅读，基于科学文档的PDE识别，《Nougat：Neural Optical Understanding for Academic Documents》

好久没有更新了，写点论文阅读笔记好了。论文Arxiv： Nougat: Neural Optical Understanding for Academic Documents论文Readpaper： Nougat: Neural Optical Understanding for Academic DocumentsGithub： GitH…

https://zhuanlan.zhihu.com/p/659064019

工具

科研工具

工欲善科研，必先利其器

我自己常用的一些科研用的工具。 quillbot https://quillbot.com/改写检查语法最近新出了一个查重的功能（但是要收费）摘要生成 Table generatorlatex表格生成 https://www.tablesgenerator.com/latex_tables …

https://zhuanlan.zhihu.com/p/661767969