📊Image-editing judges, now in 4B and 9B. SkyJM-Edit is live on ModelScope for rubric-based visual preference evaluation. 🚀 License: Apache-2.0 🔗 https://lnkd.in/eMKARfB4 🏆 Image-editing results: SkyJM-Edit-9B reaches 75.4 on MMRB2, 46.4 on EditReward-ERB Avg, and 85.6 on EditScore-ERB Avg; SkyJM-Edit-4B follows closely at 73.2 / 45.5 / 85.5 🧾 RubricRM judging flow: • Dynamically builds a prompt-specific rubric. • Scores both images by dimension. • Aggregates rubric-weighted scores into the final preference. 🛠️ Built on Qwen3.5, with inference support through SKYLENAGE-JUDGER for vLLM and Transformers #AIGC #OpenSourceAI #ModelScope #Qwen #ImageEditing #VLM
About us
ModelScope is currently China's largest open, neutral, and non-profit open-source Al model community. Our mission is simple: make Al accessible to everyone by connecting creators, researchers, and developers around the world.
- Website
-
https://modelscope.ai/home
External link for ModelScope
- Industry
- Technology, Information and Internet
- Company size
- 51-200 employees
- Type
- Public Company
Employees at ModelScope
Updates
-
🎨 Boogu-Image-0.1-Edit-Turbo is now on ModelScope, a 4-step distilled image-to-image model for fast visual editing. ⚡️Apache-2.0 🔗 https://lnkd.in/evH6RmtM ✏️ Edit tasks: object changes, style transfer, scene/background edits, and text-aware visual transformation 🧩 Boogu-Image-0.1 family: Base, Turbo, Edit, and Edit-Turbo cover text-to-image, fast generation, and image editing workflows
-
-
Here is AMD Micro-World! An open-source action-controlled interactive world model for generating controllable video environments. 🔗 T2W Model: https://lnkd.in/e5vxUDvz 🔗 I2W Model: https://lnkd.in/eR3kT3JB 🔗 Dataset: https://lnkd.in/ejZjuDEb 🎮 Action-controlled world generationMicro-World generates videos that follow user actions, including keyboard and mouse controls, making it a step toward interactive video environments. 🌍 Text-to-World and Image-to-WorldUse T2W to generate a world from text, or I2W to continue a world from an image and text prompt. 📚 Minecraft-based controllable datasetThe released dataset includes 6,000+ gameplay clips, each with 81 frames, annotated with text captions and action labels. 🛠️ Open research stackBuilt on Wan2.1, with model weights, training and inference code, and curated dataset released for reproducible world model research.
-
Meet SenseNova-U1-8B-MoT-Infographic-V2! This model is designed exactly for dense content, complex layouts, text-heavy visuals. 🤖 https://lnkd.in/ePGZH7wY 🏆 Open-source SOTA performance Strong results on infographic benchmarks, including BizGenEval and IGenBench. 📊 Improves chart accuracy, text rendering, size appropriateness, layout understanding, and arXiv-style page quality, while reducing unintended black backgrounds. 🧠 Native unified multimodal architecture Built on SenseNova-U1, unifying visual understanding, reasoning, and generation in one model.
-
-
-
-
-
+1
-
-
Introducing Agents-A1, A 35B MoE agentic model built for long-horizon tasks across search, engineering, scientific research, instruction following, and tool calling. 🤖 https://lnkd.in/eZxqVQ6p 📚 256K context length + 🧠 Agentic reasoning 🏆 Reaches SOTA results on long-horizon search, scientific research, and instruction-following benchmarks, with competitive results among 35B-class models. 🛠️Supports function calling and tool integration, enabling interaction with APIs, code interpreters, search engines, and other external tools.
-
-
Real-time stereo depth, with FoundationStereo-style zero-shot generalization. ⚡ C-Fast-FoundationStereo is now on ModelScope! 🔗 https://lnkd.in/ec8A4cPe Key points: 🚀 Over 10× faster than FoundationStereo 🎯 Closely matches its zero-shot accuracy 👁️ Input: rectified left-right RGB stereo images 🗺️ Output: dense disparity map for depth estimation 🧠 14.6M parameters 🛠️ Built with distillation, architecture search, and structured pruning Supports PyTorch, NVIDIA TAO, TensorRT, and ONNXRuntime export. License: NVIDIA Open Model Agreement
-
-
DeepSeek-V4-Pro-DSpark lands on ModelScope~ Same DeepSeek-V4-Pro checkpoint, now with an added speculative decoding module for inference experiments. 🚀License: MIT 🤖 https://lnkd.in/e9yC4V_4 📄 https://lnkd.in/eAsXps6A 🏆 Pro-Max results: 93.5 on LiveCodeBench, 3206 Codeforces rating, 80.6 on SWE Verified, and 83.5 on MRCR 1M 📏 Long-context efficiency: at 1M context, DeepSeek-V4-Pro uses only 27% single-token inference FLOPs and 10% KV cache vs DeepSeek-V3.2 🧠 Architecture upgrades: hybrid CSA + HCA attention for 1M-token efficiency, mHC for stronger signal propagation, and Muon optimizer for faster, more stable training
-
Here is MOSS-Transcribe-preview-2B! 🔗 https://lnkd.in/eQ6iKzHF A 2.4B English ASR model, packed into one 4.84GB shard. 🎧 Worth noting: 📊 4.87 average WER on Open ASR Leaderboard eval🎙️ 1.21 WER on LibriSpeech test.clean, 2.84 on test.other 🧠 Qwen3-1.7B language backbone + Qwen3-Omni-MoE audio encoder 🔧 Gated-MLP adapter maps audio features into the LM embedding space 📄 Apache-2.0 license
-
Check out RPC-Bench on ModelScope! Built for long-context models, paper RAG systems, and multimodal document understanding. 🚀 🔗 https://lnkd.in/eVURugvj 📄 https://lnkd.in/eRXy3a3Q 🖼️ Supports both text and visual inputs, with Markdown, original PDFs, parsing outputs, and page images for VLM evaluation ✅ Scale: 61.3K QA pairs from 4,150 papers, with about 15K human-verified QA pairs for evaluation 📚 Built from real review-rebuttal exchanges, so the questions focus on methods, evidence, claims, and reviewer-style paper understanding 📊 Even GPT-5 only reaches 68.2% on correctness-completeness, dropping to 37.46% after conciseness adjustment
-
-
New open-source SOTA on agentic coding! 🚀 Ornith-1.0-397B achieves 82.4 on SWE-bench Verified and 77.5 on Terminal-Bench 2.1, topping every open model in its class and beating Claude Opus 4.7 on both. 🤖 https://lnkd.in/efAdmNyT 📦 Four sizes (9B to 397B-MoE), post-trained on Gemma 4 / Qwen 3.5, MIT licensed and globally accessible. ✨ Notably, Ornith uses RL to generate not just solution rollouts but also the scaffold that drives them. By jointly optimizing both, the model discovers better search trajectories and produces higher-quality solutions. ⚙️ Deployable on a single 8×80GB node, with vLLM and SGLang recipes in the model card.
-