Jentic API AI-Readiness Framework (JAIRF) Specification¶

Version 0.2.0¶

This document uses the key words MUST, MUST NOT, SHOULD, SHOULD NOT, and MAY as defined in BCP 14 RFC2119 RFC8174 when, and only when, they appear in all capitals, as shown here.

This document is licensed under The Apache License, Version 2.0.

Introduction¶

The Jentic API AI-Readiness Framework (JAIRF) defines a standardized methodology for evaluating the suitability of an API for consumption by intelligent agents, LLM-driven orchestration systems, and automated integrations. This specification defines six scored dimensions, grouped into three pillars, and a unified scoring and weighting model that produces an overall AI-Readiness Index between 0 and 100.

JAIRF is designed to evaluate an API’s ability to be:

Interpretable — easily understood by reasoning systems
Operable — safe and predictable to execute
Discoverable — findable, indexable, and contextually exposed
Governable — aligned with secure and trustworthy practices

The framework may be applied to:

Single APIs
Entire API portfolios
API gateways or registries
Design-time and runtime validation tools

JAIRF MAY be implemented as a feedback layer during API delivery, a CI/CD quality gate, a governance control, an automated readiness classifier, or as part of an AI-agent execution platform.

Scope¶

This specification defines:

The conceptual model for API AI-readiness
The scoring dimensions and required signals
The weighting and aggregation model
The readiness levels and classification thresholds
The normative behaviours required for compliant scoring engines

This specification explicitly does not define:

How an API provider MUST design APIs
A proprietary scoring algorithm inaccessible to auditors
Enforcement or certification process

JAIRF is intended to be implementation-agnostic, vendor-neutral, and API-format-agnostic, supporting (for now):

OpenAPI 3.x
OpenAPI 2.x

Terminology¶

The following terms are used in this specification:

API¶

A machine-accessible interface defined by an operational contract (e.g., OpenAPI).

Signal¶

A measurable property extracted from an API (e.g., summary coverage, auth strength). Signals are usually normalised to a value between 0 and 1. In other cases, a signal MAY be binary.

Dimension¶

A thematic grouping of signals that together represent a pillar of API AI-readiness. JAIRF defines six dimensions.

Group¶

A higher-level category that bundles dimensions.

JAIRF defines three groups:

FDX — Foundational & Developer Experience
AIRU — AI-Readiness & Usability
TSD — Trust / Safety / Discoverability

Aggregation¶

The mathematical process of combining normalised signals into dimension scores and dimension scores into the final readiness score.

Harmonic Mean¶

The weighted harmonic mean is used at the final aggregation stage to penalise imbalanced APIs where one weak dimension would otherwise be masked by strong dimensions.

Normalisation¶

The method by which raw measurements (counts, ratios, textual assessments) are converted to a consistent scale. All global normalisation rules are defined in Normalisation Rules.

Readiness Level¶

A human-interpretable categorization mapped from the final readiness score:

Level	Range	Interpretation
Level 0	`< 40`	Not Ready
Level 1	`40 - 60`	Foundational
Level 2	`60 - 75`	AI-Aware
Level 3	`75 - 90`	AI-Ready
Level 4	`> 90`	Agent-Optimized

Grading¶

A letter grade applied to each dimension and to the overall score.

Framework Architecture¶

JAIRF defines a three-layer scoring architecture:

Signals Layer (raw metrics → 0–1 scale)
Dimension Layer (normalised signals → 0–100)
Aggregation Layer (weighted harmonic mean → final score)

This architecture MUST be preserved by any conformant implementation.

A conformant engine MUST:

Compute each dimension exactly as defined
Apply weights as defined (or disclose weights)
Use weighted harmonic aggregation
Apply gating rules prior to final scoring
Apply readiness classification and grading

The scoring mechanism MUST be deterministic and reproducible for any API input.

Dimensional Model Overview¶

The Jentic API AI-Readiness Framework (JAIRF) evaluates an API’s maturity across six dimensions, grouped into three pillar categories. These pillars represent a staged view of maturity: beginning with structural soundness (can this be used at all), then semantic readiness (can AI understand it), and finally trust & exposure (can it be used safely and found by the right agents).

The purpose of this section is conceptual orientation. Normative definitions of each dimension appear in the Dimension Definitions section.

Pillar Categories¶

Foundational and Developer Experience (FDX)¶

Assesses whether an API is structurally valid, standards-compliant, and usable by humans and tooling. This establishes the prerequisite conditions for any form of automated interpretation.

Dimensions included:

Foundational Compliance
Developer Experience & Tooling Compatibility

AI-Readiness & Usability (AIRU)¶

Evaluates the semantic clarity, intent expression, operational structure, and agent-operability of an API. These dimensions determine how effectively intelligent agents can interpret, plan, and execute API operations.

Dimensions included:

AI-Readiness & Agent Experience
Agent Usability

Trust, Safety, & Discoverability¶

Ensures that the API can be safely exposed to AI systems and that it can be effectively located, classified, and reasoned about within automated discovery environments.

Dimensions include:

Security
AI Discoverability

Dimensional Structure¶

Each dimension represents a distinct aspect of AI readiness and MUST be evaluated independently. The framework assumes that:

No dimension compensates for another- a weakness in any dimension constrains overall readiness.
Dimensions progress in logical maturity order (Foundational correctness → semantic clarity → safe exposure and discoverability).
All dimensions score in the range [0–100], derived from normalised signals.
The final AI-Readiness score is computed later using a weighted harmonic mean of the six dimensions.

Dimension Definitions¶

This section defines normative requirements for each of the six dimensions of the Jentic API AI-Readiness Framework (JAIRF).

Foundational Compliance (FC)¶

The Foundational Compliance dimension evaluates whether an API is structurally valid, standards-conformant, and parsable by modern API tooling. To do this, the presence or absence of specification-level correctness, resolution integrity, structural soundness, and linting quality, are all measured.

An API MUST NOT be considered AI-ready if it fails foundational parsing or contains severe structural defects.

Required Signals¶

Signal	Description	Normalisation Rule
`spec_validity`	MUST indicate whether the API parses as valid OpenAPI.	binary
`resolution_completeness`	MUST represent the proportion of $ref references that successfully resolve	coverage
`lint_results`	Aggregated quality score from linter diagnostics, weighted by severity.	inverse weighted categorical ratio
`structural_integrity`	MUST score schema correctness and coherence (e.g., oneOf misuse, contradictory typing, impossible constraints).	logarithmic dampening

Spec Validity (spec_validity)¶

spec_validity = 1  # if specification parses successfully  
spec_validity = 0  # otherwise

Resolution Completeness (resolution_completeness)¶

resolution_completeness = resolved_refs / total_refs

If total_refs = 0, the value MUST be 1.0.

Example¶

resolution_completeness = 0.92  
# (23 of 25 $ref links resolved successfully)

Lint Results (lint_results)¶

This signal leverages core ruleset of Spectral and Redocly, with Jentic opinions applied, as well as a Jentic custom ruleset.

weighted_cost = SQRT((1.0 * critical) + (0.6 * errors) + (0.0025 * warnings) + (0.001 * info) + (0.0 * hint))

lint_results = max(0, 1 - (weighted_cost / 25))

Structural Integrity (structural_integrity)¶

The structural integrity signal measures whether the API's underlying data model is coherent enough for automated reasoning. Unlike linting errors — which often relate to style, documentation, or optional best practices - structural issues reflect semantic or logical defects that prevent reliable interpretation by developers, tools, and AI agents.

Formula:

Structural integrity MUST be calculated using a logarithmic dampening curve:

structural_integrity = max(0, 1 - log10(1 + structural_issues) / log10(1 + structural_issue_threshold))

Where:

structural_issues is the total count of structural defects detected.
structural_issue_threshold represents the point where structural reliability collapses. Once an API has more than ~15 schema-breaking or integrity flaws, automated interpretation is no longer trustworthy.

The formula yields a smooth decay curve, prevents early collapse, but penalises structural issues more heavily than cosmetic issues.

A structural issue MUST be recorded when any of the following occur:

Category	Examples
Invalid model shape	`type: object` but no `properties` defined; objects with additionalProperties: false but empty
Contradictory typing	`type: string` + `format: int32`; arrays using incompatible items definitions
Impossible constraints	`minimum > maximum`; exclusiveMinimum > exclusiveMaximum; enum values violating type, and contradictory schema constructs
Broken polymorphism	`oneOf`/`anyOf`/`allOf` inconsistent; missing or invalid discriminator; unreachable or contradictory sub-schemas
Response/request undefined	`requestBody: {}`; missing schemas; empty content definitions
Non-evaluable example	Examples that are invalid JSON, violate the declared schema, or contradict field constraints
Unresolvable or circular schema structures	Schemas that reference non-existent fields; recursive references without a valid base schema

Dimension Score¶

FC = 100 × (spec_validity + resolution_completeness + lint_results + structural_integrity) / 4

Example¶

spec_validity: 1.0              # spec parsed successfully
resolution_completeness: 0.92   # 92% of refs resolved
lint_results: 0.85              # post-weighting
structural_integrity: 0.88      # schema irregularities

FC = 100 * (1.0 + 0.92 + 0.85 + 0.88) / 4
   ≈ 91.25

Developer Experience & Tooling Compatibility (DXJ)¶

DXJ evaluates clarity, documentation quality, example coverage, and compatibility with Jentic’s ingestion pipeline. An API with strong DXJ SHOULD be predictable and pleasant for both human developers and automated tooling.

Required Signals¶

Signal	Description	Normalisation Rule
`example_density`	MUST indicate presence of examples across eligible locations.	coverage
`example_validity`	MUST show schema-conformance of examples.	coverage
`doc_clarity`	MUST quantify linguistic clarity of summaries and descriptions.	min-max inverted
`response_coverage`	MUST indicate presence of meaningful success and error responses.	coverage
`tooling_readiness`	MUST measure ingestion/bundling health within Jentic pipelines.	inverse error

Example Density (example_density)¶

example_density MUST measure coverage of examples across all eligible specification locations. It represents whether each location includes at least one example, and NOT how many examples are provided.

example_density = present_examples / expected_examples

Where:

each eligible location contributes one (1) to expected_examples, regardless of whether the location supports both example and examples fields from an OpenAPI perspective.
multiple examples defined inside an examples array MUST not increase present_examples beyond 1.

If expected_examples = 0, the value MUST be 1.0.

Example Validity (example_validity)¶

example_validity = valid_examples / total_examples

Doc Clarity (doc_clarity)¶

doc_clarity = 1 - ((readability_score - 8) / (16 - 8))

Where readability_score ∈ [8, 16] (8 is easy to read, 16 would be legaleses / hard to parse.). See readability_score definition in Appendix A: General Definitions.

Response Coverage (response_coverage)¶

Measures the percentage of operations that define complete response outcomes. Uses graded scoring per operation based on response categories present.

Formula:

response_coverage = sum(operation_response_coverage) / total_operations

Where:

Each operation receives an operation_response_coverage from 0.0 to 1.0:

+0.25 if at least one 2XX response is defined
+0.25 if at least one 4XX response is defined
+0.25 if at least one 5XX response is defined
+0.25 if default response is defined

If total_operations = 0, the value MUST be 1.0.

Tooling Readiness (tooling_readiness)¶

tooling_readiness = max(0, 1 - (ingestion_errors / 15))

Tooling Readiness Threshold Unlike structural integrity, tooling_readiness should not be treated as a correctness gate.It reflects how much effort is required for Jentic (or a developer) to prepare the API for downstream use in our tools. For independent implementers, this threshold should reflect the gates for the relevant target tooling. 15 is chosen as an initial threshold.

Tooling Ingestion Issues	Score	Interpretation
0 - 3	0.85 - 1.0	Easily ingested
4 - 8	0.6 - 0.8	Cleanup recommended
9 - 14	0.3 - 0.5	High friction
15+	0.0	Cannot reliably ingest

Dimension Score¶

DXJ = 100 × (example_density + example_validity + doc_clarity + response_coverage + tooling_readiness) / 5

Example¶

example_density: 0.66       # 2/3 expected example points populated
example_validity: 0.80      # valid & typed
doc_clarity: 0.75           # readable, non-legalistic
response_coverage: 0.70     # most methods cover non-2xx cases
tooling_readiness: 0.90     # imports cleanly

DXJ = 100 * (0.66 + 0.80 + 0.75 + 0.70 + 0.90) / 5
    ≈ 76.2

AI-Readiness & Agent Experience (ARAX)¶

ARAX evaluates whether an API is semantically interpretable by AI systems—specifically whether it provides enough contextual meaning for intelligent agents to infer intent, constraints, and expected behaviors. ARAX measures semantic clarity and expressiveness, including descriptive coverage, datatype specificity, and error semantics.

Required Signals¶

Signal	Description	Normalisation Rule
`summary_coverage`	MUST represent presence of concise summaries across specification objects with a `summary` field (e.g.,operations/tags/info etc).	coverage
`description_coverage`	MUST represent descriptive completeness across applicable API specification objects with a `description` field.	coverage
`type_specificity`	MUST quantify richness of datatype modelling.	weighted categorical
`policy_presence`	SHOULD represent inclusion of SLA/rate-limit/policy metadata.	coverage
`error_standardization`	SHOULD favour structured error formats (RFC 9457/7807).	coverage
`opid_quality`	MUST evaluate operationId coverage, uniqueness (with collision penalty), and casing consistency.	composite
`ai_semantic_surface`	MAY provide bonus uplift for AI-oriented metadata.	bonus multiplier

Summary Coverage (summary_coverage)¶

summary_coverage = summaries_present / summaries_expected

Where:

summaries_expected MUST take into account every specification object with summary fixed field.

Example¶

summary_coverage = 0.78  

# 78% of operations/tags/info objects etc. include a summary field

Description Coverage (description_coverage)¶

description_coverage = described_elements / describable_elements

Where:

describable_elements MUST take into account every specification object with a description fixed field.

Example¶

description_coverage = 0.82 

# 82% of info objects, operations, schemas, parameters etc. include a description

Type Specificity (type_specificity)¶

type_specificity =
(
    (1.0  × strong_types)
    + (0.75 × formatted_strings)
    + (0.50 × enum_fields)
    + (0.25 × weak_strings)
) / total_fields

This rewards APIs that model values semantically, not just as loosely typed strings.

Policy Presence (policy_presence)¶

policy_presence = operations_with_policy_metadata / total_operations

This helps AI reason about risk, performance, and operational constraints.

Error Standardisation (error_standardisation)¶

error_standardization = operations_using_RFC9457 / total_operations

This should also cater for RFC7807 which was replaced by RFC9457.

This is important for helping AI reason about failure modes, not just success paths.

OperationId Quality (opid_quality)¶

coverage            = ops_with_operation_id / total_operations
uniqueness          = unambiguous_operation_ids / ops_with_operation_id / total_collision_issues
casing_consistency  = dominant_casing_count / ops_with_operation_id
opid_quality        = coverage × uniqueness × casing_consistency

Where:

unambiguous_operation_ids is count of operationIds whose lowercase form appears exactly once (i.e., no case-insensitive duplicates)
total_collision_issues is the total number of case-sensitive operationId collisions (e.g., getUser vs GetUser vs getUser would count as 2 collision issues)
dominant_casing_count is count of operationIds using the most common casing style

If total_operations = 0, then opid_quality MUST be 1.0. If total_collision_issues = 0, then it is having no effect on uniqueness (i.e., it is 1.0). If ops_with_operation_id = 0, then uniqueness = 1.0 and casing_consistency = 1.0

Casing styles detected:

camelCase, PascalCase, snake_case, kebab-case
SCREAMING_SNAKE_CASE, lowercase, UPPERCASE

If multiple ops appear to offer “getUser” operation, uniqueness drops and harms AI inference.

AI Semantic Surface (ai_semantic_surface)¶

ai_semantic_surface_bonus = 1 + (0.05 * ai_hint_coverage)

Example¶

bonus = 1.015  

# 30% of operations include hints like x-intent, workflows, Arazzo links

Dimension Score¶

core_arax = (summary_coverage + description_coverage + type_specificity + policy_presence + error_standardization + opid_quality) / 6

ARAX = 100 × core_arax × (1 + 0.05 × ai_semantic_surface)

Example¶

summary_coverage: 0.80         # 80% described
description_coverage: 0.82     # 82% described
type_specificity: 0.76         # formats & enums used meaningfully
policy_presence: 0.40          # limited SLA/policy metadata
error_standardization: 0.50    # only half define RFC9457 (of RFC8707)
opid_quality: 0.90             # coverage & uniqueness & consistency
ai_semantic_surface: 0.30      # minimal Arazzo/AI hints

core = (0.8 + 0.82 + 0.76 + 0.40 + 0.50 + 0.90) / 6 
     = 0.6966666667
bonus = 1 + (0.05 * 0.30) = 1.015

ARAX ≈ 100 * 0.6966666667 * 1.015
     ≈ 70.71

Agent Usability (AU)¶

Agent Usability evaluates whether autonomous agents can operate the API reliably, safely, and efficiently. The dimension measures operational composability: navigation, complexity, redundancy, safety, and tool-calling alignment.

Required Signals¶

Signal	Description	Normalisation Rule
`complexity_comfort`	Measures document size, endpoint density, and schema complexity, penalised using a logistic curve.	logistic shaping
`distinctiveness`	MUST quantify semantic separation between operations.	inverse semantic similarity
`pagination`	MUST represent the ration of paginated GET resources.	coverage
`hypermedia_support`	HATEOAS/JSON:API/HAL affordances.	coverage
`intent_legibility`	MUST represent verb-object semantic clarity.	similarity (LLM assisted)
`safety`	MUST evaluate idempotency & sensitive operation protection.	heuristic penalty
`tool_calling_alignment`	SHOULD represent alignment with LLM tool-calling expectations.	coverage
`navigation`	MUST represent pagination & hypermedia affordances.	composite

Complexity Comfort (complexity_comfort)¶

raw_complexity = 0.5 × normalised_endpoint_count
               + 0.5 × normalised_schema_depth

complexity_comfort = 1 / (1 + exp(6 × (raw_complexity - 0.45)))

Normalised Endpoint Count (normalised_endpoint_count)¶

A normalised measure of how large an API is relative to a typical operational complexity threshold.

API usability for agents degrades as endpoint count grows — but only after a certain point. A 3-endpoint API and a 12-endpoint API should not have radically different usability penalties. But a 200-endpoint platform should carry a greater complexity signal.

Formula:

normalised_endpoint_count = min(1, total_operations / endpoint_baseline)

Where:

total_operations is the count of unique forward-callable operations (method + path pairs). Callbacks and webhooks MUST NOT be counted as endpoints.
endpoint_baseline is set at 50.

Normalised Schema Depth (normalised_schema_depth)¶

A normalised indicator of schema nesting and structure depth across all schemas.

Agent reasoning difficulty can be correlated to object depth, degree of polymorphism, and highly nested schemas.

Formula:

normalised_schema_depth = min(1, max_schema_depth / depth_baseline)

Where:

max_schema_depth is the deepest nesting found across all schemas
depth_baseline is set at 8.

Schemas referenced by callbacks/webhooks MUST be included in normalised_schema_depth, because they contribute to the overall semantic model complexity.

Distinctiveness (distinctiveness)¶

distinctiveness = 1- avg_semantic_similarity

Pagination (pagination)¶

pagination = paginated_collection_GETs / collection_GETs

Hypermedia Support (hypermedia_support)¶

hypermedia_support = hypermedia_responses / total_navigable_responses

navigation_readiness = (0.6 × pagination) + (0.4 × hypermedia_support)
navigation = navigation_readiness × (1 + 0.03 × links_coverage)

Example¶

navigation = 0.443         # usable navigation with modest hint enrichment

pagination = 0.60          # 60% of eligible list endpoints are paginated

hypermedia_support = 0.20  # HAL/JSON:API/HATEOAS limited

links_bonus = 1.0075       # 25% of responses include link relations

Intent Legibility (intent_legibility)¶

intent_legibility = mean_semantic_alignment

Distinctiveness is simply the opposite of redundancy. if multiple endpoints do the same thing, the API becomes harder for AI to choose correctly.

For distinctiveness, operation-to-operation comparison uses embedding-based semantic similarity (cosine similarity), which captures meaning rather than string matching. Levenshtein distance MAY be used as a fallback for very small APIs (but likely not too valuable).

For intent_legibility, each operation is compared to a set of canonical verb-object pairs (e.g., create-resource, list-resources), ensuring that agent routing matches human-intended meaning.

Safety (safety)¶

safety = ((idempotent_correctness + sensitive_ops_protected) / (2 * total_operations))

Tool Calling Alignment (tool_calling_alignment)¶

tool_calling_alignment = operations_mappable_to_ai_tool_calls / total_operations

Dimension Score¶

# Pagination and hypermedia combine into `navigation` (with links bound [0,1]).
navigation_readiness = 0.6 * pagination + 0.4 * hypermedia_support
navigation = navigation_readiness * (1 + 0.03 * links_coverage)

AU = 100 × (complexity_comfort + distinctiveness + navigation + intent_legibility + safety + tool_calling_alignment) / 6

Example¶

complexity_comfort: 0.74       # logistic-shaped
distinctiveness: 0.71          # low semantic collision
pagination: 0.60               # most list GETs paginated
hypermedia_support: 0.20       # few HAL/JSON:API affordances
links_coverage: 0.25           # some link relations present
intent_legibility: 0.80
safety: 0.68
tool_calling_alignment: 0.72

navigation_readiness = (0.6 * 0.60) + (0.4 * 0.20)
                     = 0.44

links_bonus = 1 + (0.03 * 0.25) = 1.0075

navigation = 0.44 * 1.0075 ≈ 0.443

AU = 100 * (0.74 + 0.71 + 0.443 + 0.80 + 0.68 + 0.72) / 6
    ≈ 68.8

Security (SEC)¶

Security evaluates trustworthiness, authentication strength, and operational risk posture in the context of agent-based automation. This dimension measures authentication coverage, secret hygiene, transport security, sensitive data handling, and OWASP risk.

Required Signals¶

Signal	Description	Normalisation Rule
`auth_coverage`	Evaluates whether authentication is correctly applied to sensitive or modifying operations, using intent-aware heuristics.	heuristic penalty
`auth_strength`	MUST evaluate strength of security schemes.	weighted categorical
`transport_security`	MUST require HTTPS for externally exposed hosts.	binary
`secret_hygiene`	MUST detect and penalise hardcoded credentials.	binary
`sensitive_handling`	MUST score protection of PII/sensitive fields.	coverage ratio
`owasp_posture`	SHOULD reflect severity-weighted risk findings.	severity weighted inverse with dampening

Auth Coverage (auth_coverage)¶

auth_coverage = protected_sensitive_ops / sensitive_ops_expected

If sensitive_ops_expected = 0, then auth_coverage MUST be 1.0.

Sensitive Operation Determination (sensitive_ops_expected)¶

sensitive_ops_expected represents the count of operations that ought to require authentication. This value is NOT the same as “operations that declare security”, it reflects intent-aware inference of security requirements.

As a guiding principle, an operation SHOULD be classified as a sensitive operation if any of the following are true:

it performs a state changing action
uses HTTP methods such as: POST, PUT, PATCH, DELETE
has summaries/descriptions which suggest state change (e.g., "approve", "update", "assign", "create", "cancel"), even if HTTP verb is misused
it accesses or returns sensitive or personal data (customer records, user profiles, payment data, or any OpenAPI Schema Object containing detected PII fields)
it performs privileged or administrative actions
it exposes operational or system-level behaviours (configuration management details, system logs, workflow executions)

LLM reasoning MAY be used to help perform classification.

Auth Strength (auth_strength)¶

The auth_strength signal measures the robustness and correctness of authentication mechanisms declared by the API. It evaluates the average strength of all security schemes using normative scores based on IANA auth-scheme definitions, OAuth2 best practices, OIDC, API Key placement, and mutual TLS.

Formula:

auth_strength = sum(strength_scores) / schemes_count

Where:

strength_scores is the list of calculated scheme strengths
schemes_count is the number of defined schemes

The following table outlines the auth_strength scoring weights:

Scheme Type	Description	Example	Strength	Rationale
`none`	No authentication mechanism	no `security:` block	`0.00`	Unsafe for sensitive APIs; permitted only when `sensitive_ops_expected = 0`.
`http / basic`	Base64 user:pass	`scheme: basic`	`0.10`	Plaintext credentials; easily leaked (RFC7617).
`http / oauth`	OAuth 1.0	`scheme: oauth`	`0.20`	Deprecated; insecure signature model (RFC5849).
`http / digest`	Digest Access Auth	`scheme: digest`	`0.20`	Outdated; limited protection (RFC7616).
`apiKey (query)`	API key in query string	`in: query`	`0.15`	Very high leakage risk (logs, proxies, URLs).
`apiKey (header/cookie)`	API key in header or cookie	`in: header`	`0.50`	Moderate security; lacks identity, scoping, or rotation controls.
`http / scram-sha-1`	SCRAM with SHA-1	`scheme: scram-sha-1`	`0.25`	Uses deprecated SHA-1 hashing (RFC7804).
`http / negotiate`	Kerberos/NTLM	`scheme: negotiate`	`0.35`	Legacy; violates HTTP semantics (RFC4559).
`http / bearer (opaque)`	Opaque bearer token	`scheme: bearer`	`0.60`	Security depends entirely on token distribution (RFC6750).
`http / vapid`	WebPush VAPID	`scheme: vapid`	`0.60`	Token model similar to bearer; moderate trust (RFC8292).
`http / scram-sha-256`	SCRAM with SHA-256	`scheme: scram-sha-256`	`0.65`	Modern and stronger, still password-based (RFC7804).
`http / bearer (JWT)`	Signed JWT bearer token	`bearerFormat: JWT`	`0.75`	Cryptographically verifiable claims; supports scopes.
`http / privatetoken`	Privacy Pass	`scheme: privatetoken`	`0.75`	Strong privacy-preserving cryptographic identity (RFC9577).
`http / hoba`	HTTP Origin-Bound Authentication	`scheme: hoba`	`0.80`	Asymmetric client-bound authentication (RFC7486).
`http / concealed`	Concealed HTTP authentication	`scheme: concealed`	`0.85`	Modern, high-assurance privacy-preserving authentication (RFC9729).
`http / dpop`	Demonstration of Proof-of-Possession	`scheme: dpop`	`0.90`	Prevents replay; binds token to client (RFC9449).
`http / gnap`	GNAP framework	`scheme: gnap`	`0.90`	Modern alternative to OAuth 2.0 (RFC9635).
`http / mutual`	HTTP Mutual Authentication	`scheme: mutual`	`0.95`	Cryptographically binding client/server identities (RFC8120).
`oauth2 / password`	Resource Owner Password Credentials	`flow: password`	`0.30`	Deprecated; violates least-privilege; insecure.
`oauth2 / implicit`	Browser implicit flow	`flow: implicit`	`0.35`	Deprecated; exposes tokens via redirects.
`oauth2 / clientCredentials`	Server-to-server	`flow: clientCredentials`	`0.85`	Strong, scoped, recommended for machine-to-machine.
`oauth2 / authorizationCode (PKCE)`	Best practice auth flow	`flow: authorizationCode`	`0.90`	Most secure OAuth2 flow; protects public clients.
`openIdConnect`	OIDC Discovery + JWKs	`type: openIdConnect`	`1.00`	Gold-standard identity-bound access.
`mutualTLS`	Client TLS certificates	`type: mutualTLS`	`1.00`	Hardware-backed identity; strongest available.

If no security schemes are defined, auth_strength MUST return 1.0 (not applicable—no schemes to evaluate). This does not imply the API is secure; gating rules handle misconfigurations involving sensitive operations.

Transport Security (transport_security)¶

transport_security = secure_public_endpoints / public_endpoints

Transport security is evaluated only for endpoints intended to be externally reachable. Internal, localhost, cluster, or sandbox endpoints are excluded from the penalty. The score is based on whether externally exposed endpoints use HTTPS.

Public endpoints include:

FQDN-based hosts (e.g., api.example.com)
Partner-facing or customer-facing URLs
Any endpoint not explicitly marked internal

Internal endpoints include:

localhost / 127.0.0.*
service mesh / .internal / .cluster hosts
explicitly flagged x-internal, dev, sandbox, mock

Secret Hygiene (secret_hygiene)¶

secret_hygiene = 1 if no secrets embedded else 0

Sensitive Handling (sensitive_handling)¶

sensitive_handling = protected_pii_fields / detected_pii_fields

If no PII detected, = 1.0.

OWASP Posture (owasp_posture)¶

weighted_cost = (1.0 × critical) + (0.6 × errors) + (0.025 × warnings) + (0.005 × info)
owasp_posture = max(0, 1 - (sqrt(weighted_cost) / 5))

Dimension Score¶

Base Security¶

base_security = (auth_coverage + auth_strength + transport_security +
                 secret_hygiene + sensitive_handling + owasp_posture) / 6

Sensitivity Factor¶

Based on the intent of an operation / endpoint, we determine sensitivity_factor as:

None/Low: 1.0
Moderate: 0.9
High: 0.75

Exposure Factor¶

Based on the intended audience or exposure of an endpoint, we determine exposure_factor as:

internal: 1.0
partner: 0.9
public: 0.8

Scaled Security Score¶

security_scaled = base_security × sensitivity_factor × exposure_factor

Where:
- sensitivity_factor ∈ {1.00, 0.90, 0.75}
- exposure_factor    ∈ {1.00, 0.90, 0.80}

Gating¶

Gating caps apply after scaling

Condition	Cap
Hardcoded credentials	≤ 20
Sensitive op w/o auth	≤ 40
PII unprotected & non-internal	≤ 50
Public HTTP, not HTTPS	×0.8 multiplier

Security Formula¶

SEC = 100 × Security_final
# where security_final is security_scaled after capping rules

Example¶

auth_coverage: 0.85
auth_strength: 0.80
transport_security: 1.00     # all https
secret_hygiene: 1.00
sensitive_handling: 0.70
owasp_posture: 0.65
base_security = (0.85+0.80+1.00+1.00+0.70+0.65)/6 = 0.833
sensitivity_factor: 0.90     # moderate (profile data)
exposure_factor: 0.80        # public API
Security = 100 * 0.833 * 0.90 * 0.80
         ≈ 59.9

AI Discoverability (AID)¶

AID evaluates how easily AI systems can locate, classify, and route to the API across registries, workflows, and knowledge bases.

The scoring framework does NOT hide unsafe APIs, but we apply a risk-aware discount so that high risk reduces discoverability rather than erasing it.

Required Signals¶

Signal	Description	Normalisation Rule
`descriptive_richness`	MUST assess depth and clarity of descriptions.	coverage with semantic weights
`intent_phrasing`	MUST evaluate verb-object clarity of summaries and descriptions.	semantic similarity (LLM assisted)
`workflow_context`	MAY include Arazzo/MCP/workflow references.	coverage
`registry_signals`	SHOULD detect llms.txt, APIs.json, MCP, externalDocs.	coverage (multi-indicator)
`domain_tagging`	SHOULD detect domain/taxonomy classification.	coverage

Descriptive Richness (descriptive_richness)¶

The descriptive_richness signal evaluates the semantic value of textual descriptions within an API description. It measures whether descriptions are sufficiently clear and detailed for AI systems to infer purpose, behaviour, and domain context.

Applies to all describable elements, including but not limited to:

info.description
info.summary
operation-level summary and description
parameter, header, and response descriptions
schema descriptions

Elements MAY be excluded if they cannot reasonably carry semantic meaning (e.g., empty marker schemas).

descriptive_richness = Σ(element_descriptive_score) / (2 × number_of_describable_elements)

# Each describable element MAY earn up to 2 points (1 for clarity + 1 for depth)

Where: element_descriptive_score = clarity_score + depth_score

The final value MUST be normalised to the range [0, 1].

Clarity Score¶

The clarity_score evaluates how understandable and readable a description is.

Level	Description	Score
High clarity	Clear, direct, specific wording; purpose-first phrasing; low cognitive load	1.0
Moderate clarity	Understandable but verbose, generic, or weakly phrased	0.5
Low clarity	Boilerplate text, legalese, placeholders, or content likely to confuse AI systems	0.0

Clarity MAY be evaluated using an LLM-based classifier. Flesch–Kincaid or equivalent readability indicators SHOULD be considered as supporting signals but MUST NOT be used alone.

Descriptions containing placeholder text (e.g., “foo”, “Lorem ipsum”, or obviously templated strings) MUST receive a clarity score of 0.0.

Depth Score¶

The depth_score evaluates the degree of semantic specificity and operational detail. Descriptions providing actionable content SHALL receive higher depth scores.

Level	Description	Score
High depth	Contains domain cues AND behavioural or structural detail	1.0
Medium depth	Contains domain cues but minimal behavioural detail	0.5
Low depth	Generic text without meaningful semantics or context	0.0

Depth SHOULD consider:

domain-specific terminology (e.g., “booking segment”, “AML profile”)
entity relationships
state constraints and lifecycle behaviour
when/why an operation SHOULD be called
key field semantics or constraints

Descriptions that merely restate a field name or summary MUST be scored at 0.0.

Intent Phrasing (intent_phrasing)¶

...To be defined...

Workflow Context (workflow_context)¶

workflow_context = operations_with_workflow_refs / total_operations

Registry Signals (registry_signals)¶

Presence of machine-readable artifacts such as:

llms.txt
APIs.json
API Gateway registry metadata
MCP registry metadata
externalDocs link to developer portals etc.

registry_signals = present_indicators / total_indicators

Domain Tagging (domain_tagging)¶

domain_tagging = ops_with_domain_tags / total_operations

Dimension Score¶

AID_raw = 100 × (descriptive_richness + intent_phrasing + workflow_context + registry_signals + domain_tagging) / 5

Soft Risk Discount¶

risk_index
= exposure_weight × sensitivity_weight × (1−base_security) 

risk_discount = 1 − 0.5 × risk_index clamped to [0.6,1.0]

AID = AIDraw × risk_discount

Example¶

descriptive_richness: 0.80
intent_phrasing: 0.75
workflow_context: 0.20
registry_signals: 0.40
domain_tagging: 0.60
AID_raw = 100 * (0.80+0.75+0.20+0.40+0.60) / 5 ≈ 55.0

Security from above gave base_security ≈ 0.833;

exposure_weight: 1.0   # public
sensitivity_weight: 0.9
risk_index = 1.0 * 0.9 * (1- 0.833) = 0.1503
risk_discount = 1- (0.5 * 0.1503) ≈ 0.9248

AID ≈ 55.0 * 0.9248 ≈ 50.8

Normalisation Rules¶

This section defines the global normalisation functions used throughout the JAIRF framework. All signals MUST be normalised into the range [0, 1] before dimensional aggregation. Unless otherwise noted, higher values represent better API quality or AI-readiness.

Normalisation ensures that heterogeneous signals (e.g., counts, proportions, categorical values, text-derived scores) can be consistently aggregated within and across dimensions.

Coverage Normalisation¶

Coverage is used when measuring presence versus absence of some feature (e.g., summaries, examples, pagination).

Formula:

coverage = present / expected

If there are _zero expected occurrences, then:

coverage = 1.0

This prevents false penalties where the concept is not applicable.

Inverse Error Normalisation¶

Used when the presence of errors decreases quality (linting findings, structural issues, ingestion errors).

Formula:

inverse = max(0, 1 - (issue_count / threshold))

Notes:

threshold represents the point where the score SHOULD drop to zero. Thresholds are dimension-specific.
Once issue_count ≥ threshold, the signal is floored at 0.

Min–Max Inverted Normalisation¶

Applied when lower input values are better (e.g., readability burden).

Formula:

inverted = 1 - (x - min) / (max - min)

If x ≤ min, then inverted = 1.0.
If x ≥ max, then inverted = 0.0.

Weighted Categorical Normalisation¶

Used for discrete categories that map to quality levels (e.g., security scheme strength).

Each category is assigned a weight in [0, 1]. This allows qualitative or enumerated factors to be converted into normalized numeric values.

Formula:

weighted = category_weight / max_weight

Where max_weight is the maximum weight in the category set.

Composite Signal Normalisation¶

Used when a signal is derived from multiple measurable sub-signals (e.g., operationId quality).

Formula:

composite = Σ(sub_score[i] × weight[i]) / Σ(weight[i])

# Example for illustration:
opid_quality = coverage × uniqueness × casing_consistency

All sub-scores MUST first be individually normalised into [0, 1].

Severity-Weighted Inverse¶

Applies weighted penalties for different severity levels.

Formula:

weighted_cost =
    (1.0  × critical)
  + (0.6  × errors)
  + (0.025 × warnings)
  + (0.005 × info)

signal = max(0, 1 - (weighted_cost / max_cost))

Where max_cost is an upper bound chosen per dimension (e.g., 25 for foundational lint).

Logarithmic Dampening¶

Used where a smooth decline is preferred rather than linear penalty (e.g., structural complexity).

Formula:

log_dampened = 1 - ( logBaseN(1 + issues) / logBaseN(1 + threshold) )

Where:

Base MAY be 10 or e, depending on implementation preference. RECOMMENDATION is 10 for easier understanding.
Provides early penalty, slower collapse near threshold.

Semantic Similarity¶

similarity(i, j) ∈ [0, 1]
distinctiveness = 1 - similarity

Similarity is computed from a combination of:

operationId
summary
description
path + HTTP method
optional LLM semantic embedding comparison

Weighted approach:

similarity(i, j) =
    0.35 * embedding_similarity
  + 0.25 * opId_similarity
  + 0.20 * summary_similarity
  + 0.20 * path_similarity

Where similarity = cosine similarity or Levenshtein-based fallback for small APIs.

The evaluator MUST ensure:

identical or near-identical meanings → similarity ≥ 0.85
opposites or different purposes → similarity ≤ 0.20

Logistic Shaping¶

Used to avoid over-penalising large APIs for complexity if they are well-structured.

Formula:

logistic = 1 / (1 + exp(k × (value - midpoint)))

k controls steepness (recommended: 5–7).
midpoint defines neutrality point (recommended: ~0.4–0.5).

Binary Checks¶

Used when a single violation SHOULD drop a score to zero or one.

Examples:

hardcoded credentials
presence/absence of HTTPS
presence of required auth

Formula:

binary = 1 if condition_passes else 0

Bonus Multipliers¶

Used when optional metadata adds value especially in terms of agent usability and AI-discoverability (e.g., hypermedia links, AI intent hints, registry presence).

Formula:

score = base_score * (1 + bonus_factor * coverage)

Where:

bonus_factor typically ∈ [0.01, 0.10]
coverage is a normalised [0, 1] value

Bonus MUST NOT push the signal above 1.0 after dimension aggregation.

Context Scaling¶

Used when sensitivity or exposure magnifies or attenuates risk.

Formula:

scaled = base_score × sensitivity_factor × exposure_factor

Where:

sensitivity_factor ∈ {1.0, 0.9, 0.75}
exposure_factor ∈ {1.0, 0.9, 0.8}

Heuristic Penalty Normalisation¶

Used when qualitative rule-based deductions apply (e.g., unsafe idempotency patterns).

Formula:

score = 1.0 - Σ(penalty[i] × severity_weight[i])

Clamped to [0, 1].

Example:

# Agent Usability- safety:
# An API begins with a score of 1.0, then receives deductions for risk findings:
#  Missing auth on sensitive write: −0.15
#  Non-idempotent PUT/DELETE: −0.10
#  Overlapping unsafe verbs: −0.05


safety = 1 − ( 0.15 + 0.10 + 0.05 )

Soft Risk Discounts¶

Applied in Discoverability scoring when security posture SHOULD diminish visibility, not erase it.

Formula:

risk_discount = 1 - (0.5 × risk_index)

Clamped to [0.6, 1.0], to avoid total suppression.

Where:

risk_index = exposure_weight × sensitivity_weight × (1 - base_security)

Scoring Model & Formulae¶

This section NORMATIVELY defines how signals, dimensions, and the final AI-Readiness score MUST be computed.

Scoring Pipeline¶

The scoring process MUST proceed in the following order:

Raw Measurements (unbounded counts, coverage, structural errors, semantic density, etc.)
Normalised Signals (each MUST be converted to a value ∈ [0,1])
Dimension Scores (each MUST be computed on a 0–100 scale)
Weighted Aggregation (weighted harmonic mean MUST be used)
Gating Rules (MUST be applied BEFORE readiness level classification)
Final Readiness Score (0–100)
Readiness Level Classification

Normalised Signals¶

JAIRF requires each signal to be normalised to a number on the interval [0,1]. Signal normalisation rules are defined in Appendix A, and MUST be applied consistently across all implementations.

A signal value of:

1.0 MUST represent optimal quality
0.0 MUST represent unusable, absent, contradictory, or failing input

Signals MUST NOT exceed the range [0,1] after normalisation.

Dimension Scoring¶

Each dimension score MUST be the arithmetic mean of its normalised signals:

DimensionScore = 100 × (Σ normalised_signals) / (count_of_signals)

Where:

All signals MUST be normalised first
All signals MUST contribute equally (v1.0 does NOT use per-signal weighting)
Result MUST be clamped into the range [0,100]

Implementations MAY emit warnings if insufficient data is present (e.g., zero expected counts across multiple signals) but MUST NOT penalise a dimension for irrelevance.

Dimension Weights¶

The JAIRF final score MUST use the following weight distribution across the six dimensions:

Dimension	Weight
Foundational Compliance (FC)	0.16
Developer Experience & Jentic Compatibility (DXJ)	0.18
AI-Readiness & Agent Experience (ARAX)	0.24
Agent Usability (AU)	0.20
Security (SEC)	0.12
AI Discoverability (AID)	0.10

These weights MUST sum to 1.0.

Weighted Harmonic Aggregation¶

JAIRF uses a weighted harmonic mean, not an arithmetic mean. The harmonic mean MUST be used to enforce that weaknesses in one dimension CANNOT be offset by strengths in others.

Implementations MUST compute:

FinalScore = (Σ weights) / (Σ (weight / (dimensionScore + epsilon)))

Where:

epsilon MUST be a small positive constant to avoid division-by-zero. Recommended: epsilon = 0.000001
dimensionScore MUST be in the range [0,100] before inclusion in the harmonic calculation
The final score MUST also be clamped to [0,100]

The harmonic mean MUST be considered core to the JAIRF model.

Gating Rules¶

Gating rules MUST override or constrain dimension scores to ensure safety and correctness. They MUST be applied immediately before readiness-level classification.

Condition	Effect	Rationale
Foundational Compliance score < 40	API MUST be classified as Level 0 ("Non-Compliant")	If the API cannot be structurally validated, no higher-order AI reasoning is safe or possible.
Hardcoded credentials detected	Security score MUST be capped at `20`	Hardcoded secrets represent an immediate, systemic security failure and cannot be compensated for by other strengths.
Sensitive operations lacking auth (internal)	Security score MUST be capped at `40`	Internal APIs may permit limited trust boundaries, but unauthenticated sensitive operations remain high-risk.
Sensitive operations lacking auth (partner)	Security score MUST be capped at `30`	Partner-facing APIs must enforce authentication on sensitive operations; failure is a severe but not catastrophic risk.
Sensitive operations lacking auth (public)	Security score MUST be capped at `20`	Public unauthenticated sensitive operations are critical vulnerabilities and must be treated as near-fail conditions.
Unprotected PII on partner/public APIs	Security score MUST be capped at `50`	Exposure of identifiable data without proper controls violates trust and regulatory expectations.
Non-TLS public endpoints (http://)	Security score MUST be multiplied by `0.5`	Plaintext transport exposes tokens, credentials, and PII; catastrophic for external integrations.

Gating MUST NOT alter the raw signals or other dimension scores directly; gating applies only to the affected dimension score.

If multiple caps apply, the most restrictive MUST be applied.

Final Score & Readiness Levels¶

The final readiness score MUST be mapped to one of the following readiness levels.

Final Score	Level	Name	Meaning
< 40	Level 0	Not Ready	Fundamentally unsuitable for AI or agents
40–60	Level 1	Foundational	Developer-ready, partially AI-usable
60–75	Level 2	AI-Aware	Semantically interpretable, safe for guided use
75–90	Level 3	AI-Ready	Structurally rich, semantically clear, agent-friendly
90+	Level 4	Agent-Optimized	Highly composable, predictable, automation-ready

This table MUST be treated as normative. Scoring libraries MUST return both the numeric score and readiness level.

Grading Bands¶

Implementations MAY additionally compute a letter grade for UX/visualisation.

Letter Grade	Score Range
A+	90–100
A	80–89
A−	70–79
B+	67–69
B	63–66
B−	60–62
C+	57–59
C	53–56
C−	50–52
D+	47–49
D	43–46
D−	40–42
F	< 40

Grades SHOULD NOT be used as substitutes for readiness levels.

Appendix A: General Definitions¶

Term	Description
Normalisation	The process of converting raw measurements into a standard 0–1 scale so that different signals can be compared fairly. A normalised value of 1 represents “ideal” and 0 represents “unusable or absent.”
Signal	A measurable property inside a dimension (e.g., `auth_coverage` or `summary_coverage`). Dimensions are made of multiple signals, and each signal is normalised before aggregation.
Dimension	A thematic scoring category of API readiness (e.g., Foundational Compliance, AI-Readiness). Each dimension reflects a different aspect of what makes an API usable by AI systems.
Aggregation	The step where multiple normalised signals are mathematically combined into a single score. Aggregation happens first at the dimension level, and then across all dimensions for the final score.
Harmonic Mean	A type of average that penalises imbalanced scores. If one dimension is very weak, the harmonic mean reduces the overall score more than a standard average would. Used to ensure that no single “strong” category can mask a critical weakness in another.
Clamped or Bounded	When a value is restricted to remain within a minimum and maximum range. For example, `risk_discount ∈ [0.6, 1.0]` means it cannot go below 0.6 or above 1.0.
Weighted	A calculation where some values count more than others. Weights may allow emphasis (e.g., security > discovery).
Logistic Shaping	A smoothing technique that stops large APIs from being unfairly penalised simply because they have many endpoints. It scales penalties gradually rather than linearly.
Coverage	The percentage of “where this should exist” versus “where it actually exists.” For example: `auth_coverage = operations_with_auth / ops_that_require_auth`.
Penalty	A downward adjustment applied when a risk or deficiency is identified (e.g., missing pagination, weak auth, or unsafe error handling).
Bonus (or Uplift)	A small upward modifier applied when extra AI-friendly metadata is present (e.g., workflows, AI intent hints).
Readability Score	A measure of how easy text is to understand, based on approximate grade-level complexity (e.g., Flesch–Kincaid). Lower scores indicate simpler, more direct language. APIs targeting AI consumption benefit from 8–12 range; >16 introduces interpretation risk for models. ≤ 8th grade: universally readable 9–12: general technical audience (expected for API docs) > 16: post-grad, cognitively expensive, increases model misinterpretation risk. This score is determined by the LLM evaluating the `readability_score`.

Contact Us Next

Jentic API AI-Readiness Framework (JAIRF) Specification¶

Version 0.2.0¶

Introduction¶

Scope¶

Terminology¶

API¶

Signal¶

Dimension¶

Group¶

Aggregation¶

Harmonic Mean¶

Normalisation¶

Readiness Level¶

Grading¶

Framework Architecture¶

Dimensional Model Overview¶

Pillar Categories¶

Foundational and Developer Experience (FDX)¶

AI-Readiness & Usability (AIRU)¶

Trust, Safety, & Discoverability¶

Dimensional Structure¶

Dimension Definitions¶

Foundational Compliance (FC)¶

Required Signals¶

Spec Validity (spec_validity)¶

Resolution Completeness (resolution_completeness)¶

Example¶

Lint Results (lint_results)¶

Structural Integrity (structural_integrity)¶

Dimension Score¶

Example¶

Developer Experience & Tooling Compatibility (DXJ)¶

Required Signals¶

Example Density (example_density)¶

Example Validity (example_validity)¶

Doc Clarity (doc_clarity)¶

Response Coverage (response_coverage)¶

Tooling Readiness (tooling_readiness)¶

Dimension Score¶

Example¶

AI-Readiness & Agent Experience (ARAX)¶

Required Signals¶

Summary Coverage (summary_coverage)¶

Example¶

Description Coverage (description_coverage)¶

Example¶

Type Specificity (type_specificity)¶

Policy Presence (policy_presence)¶

Error Standardisation (error_standardisation)¶

OperationId Quality (opid_quality)¶

AI Semantic Surface (ai_semantic_surface)¶

Example¶

Dimension Score¶

Example¶

Agent Usability (AU)¶

Required Signals¶

Complexity Comfort (complexity_comfort)¶

Normalised Endpoint Count (normalised_endpoint_count)¶

Normalised Schema Depth (normalised_schema_depth)¶

Distinctiveness (distinctiveness)¶

Pagination (pagination)¶

Hypermedia Support (hypermedia_support)¶

Navigation (navigation)¶

Example¶

Intent Legibility (intent_legibility)¶

Safety (safety)¶

Tool Calling Alignment (tool_calling_alignment)¶

Dimension Score¶

Example¶

Security (SEC)¶

Required Signals¶

Auth Coverage (auth_coverage)¶

Sensitive Operation Determination (sensitive_ops_expected)¶

Auth Strength (auth_strength)¶

Transport Security (transport_security)¶

Secret Hygiene (secret_hygiene)¶

Sensitive Handling (sensitive_handling)¶

OWASP Posture (owasp_posture)¶

Dimension Score¶

Base Security¶