Safetensors
English
qwen2_vl
qwen_vl
video
real-time
multimodal
LLM

Add library name, pipeline tag, link to paper

#1
by nielsr HF Staff - opened
Files changed (1) hide show
  1. README.md +11 -5
README.md CHANGED
@@ -1,13 +1,15 @@
1
  ---
2
- license: apache-2.0
 
3
  datasets:
4
  - chenjoya/Live-CC-5M
5
  - chenjoya/Live-WhisperX-526K
6
  - lmms-lab/LLaVA-Video-178K
7
  language:
8
  - en
9
- base_model:
10
- - Qwen/Qwen2-VL-7B
 
11
  tags:
12
  - qwen_vl
13
  - video
@@ -15,6 +17,7 @@ tags:
15
  - multimodal
16
  - LLM
17
  ---
 
18
  # LiveCC-7B-Instruct
19
 
20
  ## Introduction
@@ -22,6 +25,7 @@ tags:
22
  We introduce LiveCC, the first video LLM capable of real-time commentary, trained with a novel video-ASR streaming method, SOTA on both streaming and offline benchmarks.
23
 
24
  - Project Page: https://showlab.github.io/livecc
 
25
 
26
  > [!Important]
27
  > This is the SFT model. The base model is at [LiveCC-7B-Base](https://huggingface.co/chenjoya/LiveCC-7B-Base).
@@ -154,7 +158,8 @@ class LiveCCDemoInfer:
154
  texts = self.processor.apply_chat_template([message], tokenize=False, add_generation_prompt=True, return_tensors='pt')
155
  past_ids = state.get('past_ids', None)
156
  if past_ids is not None:
157
- texts = '<|im_end|>\n' + texts[self.system_prompt_offset:]
 
158
  inputs = self.processor(
159
  text=texts,
160
  images=None,
@@ -276,7 +281,8 @@ class LiveCCDemoInfer:
276
  image_inputs, video_inputs = process_vision_info(conversation)
277
  texts = self.processor.apply_chat_template(conversation, tokenize=False, add_generation_prompt=True, return_tensors='pt')
278
  if past_ids is not None:
279
- texts = '<|im_end|>\n' + texts[self.system_prompt_offset:]
 
280
  inputs = self.processor(
281
  text=texts,
282
  images=image_inputs,
 
1
  ---
2
+ base_model:
3
+ - Qwen/Qwen2-VL-7B
4
  datasets:
5
  - chenjoya/Live-CC-5M
6
  - chenjoya/Live-WhisperX-526K
7
  - lmms-lab/LLaVA-Video-178K
8
  language:
9
  - en
10
+ license: apache-2.0
11
+ pipeline_tag: video-text-to-text
12
+ library_name: transformers
13
  tags:
14
  - qwen_vl
15
  - video
 
17
  - multimodal
18
  - LLM
19
  ---
20
+
21
  # LiveCC-7B-Instruct
22
 
23
  ## Introduction
 
25
  We introduce LiveCC, the first video LLM capable of real-time commentary, trained with a novel video-ASR streaming method, SOTA on both streaming and offline benchmarks.
26
 
27
  - Project Page: https://showlab.github.io/livecc
28
+ - Paper: https://arxiv.org/abs/2504.16030
29
 
30
  > [!Important]
31
  > This is the SFT model. The base model is at [LiveCC-7B-Base](https://huggingface.co/chenjoya/LiveCC-7B-Base).
 
158
  texts = self.processor.apply_chat_template([message], tokenize=False, add_generation_prompt=True, return_tensors='pt')
159
  past_ids = state.get('past_ids', None)
160
  if past_ids is not None:
161
+ texts = '<|im_end|>
162
+ ' + texts[self.system_prompt_offset:]
163
  inputs = self.processor(
164
  text=texts,
165
  images=None,
 
281
  image_inputs, video_inputs = process_vision_info(conversation)
282
  texts = self.processor.apply_chat_template(conversation, tokenize=False, add_generation_prompt=True, return_tensors='pt')
283
  if past_ids is not None:
284
+ texts = '<|im_end|>
285
+ ' + texts[self.system_prompt_offset:]
286
  inputs = self.processor(
287
  text=texts,
288
  images=image_inputs,