Model Card for TaskDialogViz

Model Details

Model Description

This model is used to generate data visualizations (in Vega-Lite specifications) from multi-turn conversational natural language queries. Unlike end-to-end models that directly translate text to visualizations, TaskDialogViz introduces an Analytic Task Reasoning layer. It decomposes the generation process into a structured, step-wise reasoning chain. This significantly enhances its contextual understanding of complex dialogues and effectively mitigates issues like context forgetting and parameter hallucination in multi-turn interactions.

Model Input Format

Click to expand

The input for TaskDialogViz is context-aware, and the model generates the final visualization specification step-by-step. When inferring step x, the complete context, including the reasoning results from the previous x-1 steps, must be provided.

<...> serves as a separator.

<head> <field> {column names} </field> <type> {column types} </type> </head>
<data> <line 1> {data row 1} </line 1> <line 2> {data row 2} </line 2> ... </data>
<previous utterance> {Previous Utterance} </previous utterance>
<previous chart> {Previous Vega-Lite Chart} </previous chart>
<utterance> {Current Utterance} </utterance>
<thinking> {Step 1 Answer} </thinking> <answer> {Step 1 Answer} </step 1>
...
<thinking> {Step x-1 Answer} </thinking> <answer> {Step x-1 Answer} </step x-1>

The model will output the reasoning result for step x based on the input above.

The 7 steps of the reasoning chain are as follows:

Step 1. Analytic Task Identification: (Identifies the user's analytic task)
Step 2. Data Field Identification: (Identifies the data fields involved in the utterance)
Step 3. Modification Operation Identification: (Identifies the chart modification type, e.g., 'mark', 'encoding', 'filter', 'sort')
Step 4. Chart Type Generation: (Generates the chart type)
Step 5. Encoding Channel Generation: (Generates the encoding channels)
Step 6. Filter Condition Generation: (Generates the filter conditions)
Step 7. Sort Logic Generation: (Generates the sorting logic)

How to Get Started with the Model

Running the Model on a GPU

Here is a simple example demonstrating how to use the model for a user utterance in a multi-turn dialogue.

Click to expand
from transformers import (
    AutoTokenizer,
    AutoModelForSeq2SeqLM,
)

# It is recommended to use the model version specified in the paper.
tokenizer = AutoTokenizer.from_pretrained("GZUzxc/TaskDialogVis_Model")
model = AutoModelForSeq2SeqLM.from_pretrained("GZUzxc/TaskDialogVis_Model", device_map="auto", trust_remote_code=True)

# Example input: Simulating the second turn of a dialogue.
# The first turn has already generated a bar chart comparing the average sales of different stores.
input_text = 
    """<head> <field> Borough_Location, Park_Location, Sports_Played, Week_Start_Date,
Week_End_Date, Sunday_Attendance, Monday_Attendance, Tuesday_Attendance,
Wednesday_Attendance, Thursday_Attendance, Friday_Attendance, Saturday_Attendance,
Attendance_Sum </field>
<type> nominal, nominal,nominal,temporal, temporal, quantitative, quantitative, quantitative,
quantitative, quantitative, quantitative, quantitative,quantitative </type>
<data> <line 1> Bronx, Midland Beach, Basketball, Soccer, Flag Football, Kickball, 07/01/2018,
07/31/2017, 850, 20, 9, 42, 15, 150, 93, 755 </line 1>
<line 2> Manhattan, Williamsbridge Oval, Basketball, Soccer, Dodgeball, ultimate frisbee, 06/25/2017,
04/28/2018, 250, 210, 650, 26, 480, 246, 155, 141 </line 2> </data>
<previous utterance> Break it down by park location with different colors</previous utterance>
<previous chart> {'analyzing task': 'Modify Chart', 'field': {'encoding': ['Park_Location',
'Friday_Attendance', 'Borough_Location'], 'filter': ['Borough_Location']}, 'operations': ['encoding'],
'mark': 'bar', 'encoding': {'x': {'field': 'Borough_Location'}, 'y': {'field': 'Friday_Attendance',
'aggregate': 'sum'}, 'color': {'field': 'Park_Location'}}, 'filter': {'eq': ['Borough_Location', 'Manhattan']},
'sort': {}} </previous chart>
<utterance> how does Sunday attendance relate to Friday attendance in Manhattan parks? </utterance>"""

inputs = tokenizer(input_text, return_tensors="pt", padding=True).to("cuda")
outputs = model.generate(**inputs, max_new_tokens=50)

# The expected output should be 'Modify Chart', as this is a refinement of the previous chart.
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
# Expected output might look like:
# The user wants to modify the existing chart to highlight the stores with higher sales. This is a refinement of the previous comparison task. </thinking> <answer> Modify Chart </answer>

Training Details

Training Data

This model is trained on the TaskDialogData dataset. This dataset was specifically constructed for the Analytic Task Reasoning for Conversational Visualization (ATRCovis) task. It contains 748 multi-turn dialogues, covering 109 data tables and 3,490 charts. Each dialogue turn is annotated with the user's natural language utterance, the corresponding analytic task, and the final Vega-Lite visualization specification.

The dataset and related code can be found in the project's GitHub repository: https://github.com/ACMISLab/TaskDialogViz.git

Training Procedure

The model's training process utilizes the innovative Stepwise Preference Optimization (Step-DPO) mechanism. Unlike traditional end-to-end supervised fine-tuning, Step-DPO applies preference learning at each intermediate step of the reasoning chain, training the model to prefer the correct reasoning path at each step, rather than merely imitating the final output. This approach effectively mitigates error accumulation along the reasoning chain, significantly improving the accuracy and consistency of the final generated charts.

Downloads last month
7
Safetensors
Model size
15B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for GZUzxc/TaskDialogVis_Model

Finetuned
(67)
this model

Dataset used to train GZUzxc/TaskDialogVis_Model