Skip to content

fix: handle malformed api_constraints without crashing endpoint (#273)#285

Open
Gokul-social wants to merge 2 commits intoopenml:mainfrom
Gokul-social:fix/273-graceful-api-constraints
Open

fix: handle malformed api_constraints without crashing endpoint (#273)#285
Gokul-social wants to merge 2 commits intoopenml:mainfrom
Gokul-social:fix/273-graceful-api-constraints

Conversation

@Gokul-social
Copy link

Summary

Hardens the /tasktype/{task_type_id} endpoint against inconsistent api_constraints data. Fixes #273.

The Problem

The existing implementation was fragile, relying on strict json.loads() and direct key access. This led to unnecessary 500 errors when encountering:

  • Malformed JSON strings or unexpected dict types in the persistence layer.
  • Missing data_type keys, triggering unhandled KeyErrors.
  • Type mismatches that caused the parser to bail instead of degrading gracefully.

The Solution

I've introduced a defensive parse_api_constraints helper to act as a buffer between the DB and the response.

  • Type Agnostic Parsing: Gracefully handles both JSON strings and pre-parsed dictionaries.
  • Non-Breaking Validation: Switched to .get() with strict type-checking for data_type. If a value is invalid, it is omitted from the output rather than crashing the request.
  • Observability: Added structured logging (WARNING for structural failures, DEBUG for missing data) prefixed with api_constraints: to help us audit bad data in production.

Testing

Comprehensive unit tests added to cover:

  • Valid/Malformed JSON and raw dict inputs.
  • Boundary cases: Empty values, nulls, and unsupported schema types.
  • Verification that response structure remains backward compatible.

Copilot AI review requested due to automatic review settings March 21, 2026 15:11
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Mar 21, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: 0e9e6ecc-f175-4e86-9593-d74da9333067

📥 Commits

Reviewing files that changed from the base of the PR and between f08c487 and ea00485.

📒 Files selected for processing (1)
  • src/routers/openml/tasktype.py
🚧 Files skipped from review as they are similar to previous changes (1)
  • src/routers/openml/tasktype.py

Walkthrough

Adds module-level logging and a new helper parse_api_constraints(api_constraints: Any, *, task_type_id: int, input_name: str) -> str | None in src/routers/openml/tasktype.py that defensively handles None, dict, and string (JSON) inputs and returns the data_type string or None. get_task_type now includes input[].data_type only when the helper returns a non-None value and its docstring documents this optional field. New tests tests/routers/openml/test_parse_api_constraints.py cover parsing edge cases and expected logging.

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 25.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately and concisely describes the main change: adding defensive handling for malformed api_constraints in the tasktype endpoint to prevent crashes.
Description check ✅ Passed The description is well-related to the changeset, clearly explaining the problem, solution, and testing approach for handling malformed api_constraints.
Linked Issues check ✅ Passed The PR fully addresses issue #273 requirements: implements defensive parse_api_constraints helper handling JSON strings/dicts, uses safe lookups for data_type, catches malformed JSON gracefully, adds observability logging, and includes comprehensive unit tests.
Out of Scope Changes check ✅ Passed All changes are directly in scope: the parse_api_constraints helper function, modifications to get_task_type endpoint behavior, and comprehensive test coverage for the new functionality align with issue #273 objectives.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@sourcery-ai sourcery-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey - I've left some high level feedback:

  • In parse_api_constraints, you currently only accept plain dict instances; if upstream ever passes a Mapping (e.g. dict-like) object it will be treated as unsupported, so consider loosening the type check to collections.abc.Mapping to make the helper more robust to different ORM or serialization layers.
  • The test_valid_constraint_no_warning assertion len(caplog.records) == 0 can be brittle if any other logger emits during the test run; it would be safer to assert that there are no records from the specific logger or containing the api_constraints: prefix instead of enforcing an absolutely empty log.
Prompt for AI Agents
Please address the comments from this code review:

## Overall Comments
- In `parse_api_constraints`, you currently only accept plain `dict` instances; if upstream ever passes a `Mapping` (e.g. `dict`-like) object it will be treated as unsupported, so consider loosening the type check to `collections.abc.Mapping` to make the helper more robust to different ORM or serialization layers.
- The `test_valid_constraint_no_warning` assertion `len(caplog.records) == 0` can be brittle if any other logger emits during the test run; it would be safer to assert that there are no records from the specific logger or containing the `api_constraints:` prefix instead of enforcing an absolutely empty log.

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Hardens the /tasktype/{task_type_id} endpoint so malformed or inconsistent api_constraints data no longer causes 500s, aligning behavior with issue #273.

Changes:

  • Added parse_api_constraints helper to defensively parse api_constraints and safely extract data_type.
  • Updated task type response construction to include input[].data_type only when valid.
  • Added unit tests covering valid/malformed inputs and logging behavior.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.

File Description
src/routers/openml/tasktype.py Introduces defensive parsing + logging and uses it when building the /tasktype/{id} response.
tests/routers/openml/test_parse_api_constraints.py Adds unit tests for parsing outcomes and emitted log levels/messages.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +50 to +52
if isinstance(api_constraints, dict):
constraint = api_constraints
elif isinstance(api_constraints, str):
Copy link

Copilot AI Mar 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

parse_api_constraints only treats pre-parsed constraints as valid when they are a concrete dict. If the persistence layer returns a dict-like object (e.g., collections.abc.Mapping / SQLAlchemy row mapping), this will be logged as unsupported_type and data_type will be dropped even though it could be read safely. Consider accepting Mapping (and casting to dict if needed) instead of only dict to match the “dict-like” requirement and existing patterns (e.g., core/conversions.py uses Mapping).

Copilot uses AI. Check for mistakes.
Comment on lines +89 to +96
logger.debug(
"api_constraints: missing_data_type for task_type_id=%d, input=%s",
task_type_id,
input_name,
)
return None

return data_type
Copy link

Copilot AI Mar 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The log label missing_data_type is emitted for all invalid data_type values, including cases where the key exists but is empty or not a string. That makes the log message misleading and reduces the usefulness of the new observability. Consider distinguishing between “missing” (key absent/None) vs “invalid” (wrong type/empty) and logging the latter with a different label (and possibly WARNING) including the received type/value shape.

Copilot uses AI. Check for mistakes.
Comment on lines +8 to +22
@pytest.mark.parametrize(
("api_constraints", "expected_data_type"),
[
# 1. Valid JSON string with data_type
('{"data_type": "matrix"}', "matrix"),
# 2. Valid dict with data_type
({"data_type": "matrix"}, "matrix"),
# 3. Malformed JSON string → None
("{bad json", None),
# 4. Empty string → None
("", None),
# 5. Valid JSON/dict without data_type → None
('{"other_key": "val"}', None),
({"other_key": "val"}, None),
# 6. data_type present but empty string → None
Copy link

Copilot AI Mar 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If parse_api_constraints is updated to accept dict-like inputs (e.g., collections.abc.Mapping), add a parametrized test case for a non-dict mapping instance to ensure this scenario remains supported (this is one of the failure modes described in #273 / PR description).

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Tasktype endpoint should gracefully handle malformed api_constraints values

2 participants