Creating test database for alias 'default' ('test_')...
Got an error creating the test database: database "test_" already exists

Found 1 test(s).
Type 'yes' if you would like to try deleting the test database 'test_', or 'no' to cancel: Destroying old test database for alias 'default' ('test_')...
Operations to perform:
  Synchronize unmigrated apps: bootstrap3, django_cotton, django_prometheus, humanize, log_viewer, messages, rest_framework, staticfiles
  Apply all migrations: admin, auth, contenttypes, dashboard, eval, feedback, groups_manager, sessions, sites
Synchronizing apps without migrations:
  Creating tables...
    Running deferred SQL...
Running migrations:
  Applying contenttypes.0001_initial... OK
  Applying auth.0001_initial... OK
  Applying admin.0001_initial... OK
  Applying admin.0002_logentry_remove_auto_add... OK
  Applying admin.0003_logentry_add_action_flag_choices... OK
  Applying contenttypes.0002_remove_content_type_name... OK
  Applying auth.0002_alter_permission_name_max_length... OK
  Applying auth.0003_alter_user_email_max_length... OK
  Applying auth.0004_alter_user_username_opts... OK
  Applying auth.0005_alter_user_last_login_null... OK
  Applying auth.0006_require_contenttypes_0002... OK
  Applying auth.0007_alter_validators_add_error_messages... OK
  Applying auth.0008_alter_user_username_max_length... OK
  Applying auth.0009_alter_user_last_name_max_length... OK
  Applying auth.0010_alter_group_name_max_length... OK
  Applying auth.0011_update_proxy_permissions... OK
  Applying auth.0012_alter_user_first_name_max_length... OK
  Applying dashboard.0001_initial... OK
  Applying dashboard.0002_delete_companymap_and_more... OK
  Applying dashboard.0003_rename_widget_description_widgets_description_and_more... OK
  Applying eval.0001_initial... OK
  Applying eval.0002_rename_tables_eval_prefix... OK
  Applying eval.0003_remove_goldset_categories_regressionrun_pass_count_and_more... OK
  Applying eval.0004_regressionrun_error_count... OK
  Applying feedback.0001_initial... OK
  Applying feedback.0002_alter_feedback_description_alter_feedback_page_and_more... OK
  Applying groups_manager.0001_initial... OK
  Applying groups_manager.0002_0_4_3_remove_m2m_null... OK
  Applying groups_manager.0003_0_5_0_rename_reverse_relations_with_vars... OK
  Applying groups_manager.0004_0_6_0_groupmember_expiration_date... OK
  Applying groups_manager.0005_0_6_2_verbose_name_expiration_date... OK
  Applying groups_manager.0006_1_0_0_default... OK
  Applying groups_manager.0007_1_2_0_alter_group_group_entities_alter_group_group_members_and_more... OK
  Applying groups_manager.0008_1_3_0_jsonfield_from_django... OK
  Applying groups_manager.0009_alter_group_id_alter_groupentity_id_and_more... OK
  Applying sessions.0001_initial... OK
  Applying sites.0001_initial... OK
  Applying sites.0002_alter_domain_unique...test_execute_regression_run (eval.tests.test_regression_e2e.RunOrchestratorE2ETest.test_execute_regression_run)
Full pipeline: each gold set run independently, one at a time. ...  OK
System check identified no issues (0 silenced).
[E2E] 
[E2E] ══════════════════════════════════════════════════════════════════════
[E2E] 🚀 START  | 6 gold set(s) | 40 cases total (cases run in parallel)
[E2E] ══════════════════════════════════════════════════════════════════════
[E2E] 
[E2E] ▶️  GOLD SET 1/6: CWM — SLA & Tickets (11 cases) — running...
Using selector: KqueueSelector
[Orchestrator] 🚀 Run started  run_id=1  name=E2E — CWM — SLA & Tickets  gold_set=1  (11 cases)
RegressionRun 1 started — gold_set=1
[Scorer] ▶️  case=sla_overall_summary  trace_id=114e87e2d63c
AgentClient POST http://localhost:8000/api/v1/chat/eval/ trace_id=114e87e2d63c9d48d4c7c6ef64a664b6
[Scorer] ▶️  case=sla_datapath_response  trace_id=59ae4327a788
AgentClient POST http://localhost:8000/api/v1/chat/eval/ trace_id=59ae4327a7887caf03e3c1ccefed109f
[Scorer] ▶️  case=sla_lowest_companies  trace_id=67cd4da55795
AgentClient POST http://localhost:8000/api/v1/chat/eval/ trace_id=67cd4da557953e1afa04b4c4c568b223
AgentClient response OK trace_id=59ae4327a7887caf03e3c1ccefed109f
Trace 59ae4327a7887caf03e3c1ccefed109f not found (attempt 1/10), retrying in 7s...
AgentClient response OK trace_id=67cd4da557953e1afa04b4c4c568b223
Trace 67cd4da557953e1afa04b4c4c568b223 not found (attempt 1/10), retrying in 7s...
AgentClient response OK trace_id=114e87e2d63c9d48d4c7c6ef64a664b6
Trace 114e87e2d63c9d48d4c7c6ef64a664b6 not found (attempt 1/10), retrying in 7s...
Trace 59ae4327a7887caf03e3c1ccefed109f not found (attempt 2/10), retrying in 7s...
LangfuseClient got trace 67cd4da557953e1afa04b4c4c568b223 (attempt 2)
LangfuseClient got trace 114e87e2d63c9d48d4c7c6ef64a664b6 (attempt 2)
Trace 59ae4327a7887caf03e3c1ccefed109f not found (attempt 3/10), retrying in 7s...
LangfuseClient got trace 59ae4327a7887caf03e3c1ccefed109f (attempt 4)
Trace 67cd4da557953e1afa04b4c4c568b223 observation count changed: 22 → 31
