AI-as-a-Service: Architecting GenAI Application Governance on Azure with Azure API Management and Microsoft Fabric
This repo serves as a reference architecture for tracking usage of large and small language models on Azure. Many organizations want to understand AI metrics, including what models are being used, by whom, and how often. They also want to track tokens being consumed, and the prompts being passed in. This leads to the ability to create chargeback models for consuming applications and users and enables analysis to be done on prompt usage and best practices. Azure API Management recently announced a new policy to send token information to App Insights. This is a great feature, but doesn't enable long term usage analysis or handle other llm/slm deployments. This architecture provides a way to track all of the data needed to understand AI usage in a scalable and cost-effective manner.
Azure API Management serves as the cornerstone for this architecture as it enables different consumer access to the same api endpoint through the use of subscriptions or jwt tokens. APIM policy also allow the logging of request/response data to Event Hubs so that it can be processed outside of the request/response path. The data generated is suitable for analytics queries, so rather than land it in a traditional database, Microsoft Fabric becomes a cost-effective and scalable solution for storing the data. Power BI can then be used to create reports on the data in the Lakehouse.
The reference implementation consists of the following components:
- Azure OpenAI: These are the models that are exposed as APIs using Azure API Management. You could deploy any combination of models through Azure AI Studio
- Azure API Management: This is used to expose the OpenAI models as APIs and track usage data.
- Event Hubs: This is used to ingest usage data from Azure API Management.
- Microsoft Fabric: This is used to process and store the usage data in a scalable and cost-effective manner.
Flow:
- A client makes a request to the model through Azure API Management using a subscription key.
- Azure API Management forwards the request to the OpenAI model deployment.
- Azure API Management logs the subscription id and request/response data to Event Hubs using a log-to-eventhub policy.
- An Eventstream processor in Microsoft Fabric reads the data from Event Hubs.
- The output of the stream is writen to a delta table in a Lakehouse.
- The data in the delta table is then queried via a Power BI report or a Notebook.
Note, you can easily swap out the subscription key for tracking an individual user by using JWT tokens and associating the user with the token. This would allow you to track usage at the user level.
This tutorial assumes you have familiarity with the technologies used in this architecture and have deployed instances of each. If you are new to any of the technologies, please refer to the documentation provided by Microsoft.
- Create a model deployment in Azure OpenAI and note the endpoint.
- Enable the System Assigned Identity on your Azure API Management instance and grant it 'Cognitive Services OpenAI User' role on your OpenAI instance. This will allow APIM to call the OpenAI endpoint without needing to rely on the subscription key for OpenAI.
- Create an Event Hub called 'ai-usage' within your Event Hub instance.
- Grant the APIM managed identity the 'Azure Event Hubs Data Sender' role on the 'ai-usage' Event Hub so that it can write to the Event Hub without needing a connection string.
- To create the EventLogger for APIM that is using the managed identity, we have to use the Rest API to create it. The easiest way to do this is the Try It feature in the docs. You will need to provide the following information:
- loggerId: ai-usage
- resourceGroupName: your_resource_group
- serviceName: your_apim_instance_name
- subscriptionId: your_subscription_id
- body:
{ "properties": { "loggerType": "azureEventHub", "description": "eventhub logger for ai usage", "credentials": { "endpointAddress":"your_eventhub_namespace.servicebus.windows.net", "identityClientId":"SystemAssigned", "name":"ai-usage" } } }
- Import the OpenAI inference specification into Azure API Management.
- Update the API settings to rename the Subscription Header Name from 'Ocp-Apim-Subscription-Key' to 'api-key'. The OpenAI API expects the subscription key to be passed in as 'api-key'. Developers will set this value to their APIM subscription key.
- Add the following policy to the 'All Operations' section of your OpenAI API in APIM. This will allow APIM to call Open AI using its managed identity, and then log the request/response data to Event Hubs.
<policies>
<inbound>
<base />
<authentication-managed-identity resource="https://cognitiveservices.azure.com" output-token-variable-name="msi-access-token" ignore-error="false" />
<set-header name="Authorization" exists-action="override">
<value>@("Bearer " + (string)context.Variables["msi-access-token"])</value>
</set-header>
<set-variable name="requestBody" value="@(context.Request.Body.As<string>(preserveContent: true))" />
</inbound>
<backend>
<base />
</backend>
<outbound>
<base />
<choose>
<when condition="@(context.Response.StatusCode == 200)">
<log-to-eventhub logger-id="ai-usage">@{
var responseBody = context.Response.Body?.As<string>(true);
var requestBody = (string)context.Variables["requestBody"];
return new JObject(
new JProperty("EventTime", DateTime.UtcNow),
new JProperty("AppSubscriptionKey", context.Request.Headers.GetValueOrDefault("api-key",string.Empty)),
new JProperty("Request", requestBody),
new JProperty("Response",responseBody )
).ToString();
}</log-to-eventhub>
</when>
</choose>
</outbound>
<on-error>
<base />
</on-error>
</policies>
- Test that your APIM instance can call the OpenAI endpoint by using the 'Test' tab in the APIM portal.
At the time of this writing, connecting to an Event Hub from Fabric must be done using a Shared Access Key, so we cannot use a managed identity to connect to it. This means the connection string will be stored in the Event Hub stream configuration.
- Create a new Workspace.
- Create a Lakehouse to store the data.
- In your Event Hub instance, create a create a Shared access policy for the ai-usage Event Hub that has 'Listen' permissions. Copy the Primary Key.
- Create a new Event stream in the workspace
- Invoke your OpenAI APIM endpoint several times to send some test data in. You should see the Delta table created and data in it.
- Switch to the SQL Analytics endpoint.
- Run the following query to create a view that makes it easier to see the token usage by subscription key.
CREATE OR ALTER VIEW [dbo].[AIUsageView] AS
SELECT CAST(EventTime AS DateTime2) AS [EventTime],
[AppSubscriptionKey],
JSON_VALUE([Response], '$.object') AS [Operation],
JSON_VALUE([Response], '$.model') AS [Model],
[Request],
[Response],
CAST(JSON_VALUE([Response], '$.usage.completion_tokens') AS INT) AS [CompletionTokens],
CAST(JSON_VALUE([Response], '$.usage.prompt_tokens') AS INT) AS [PromptTokens],
CAST(JSON_VALUE([Response], '$.usage.total_tokens') AS INT) AS [TotalTokens]
FROM
[YOUR_LAKEHOUSE_NAME].[dbo].[AIData]
- Refresh the Views to see the new one created.
- Click on the Reporting tab and select 'Automatically update semantic model'
- Create a new report using the AIUsageView as the data source.
- Create a new Notebook.
- Load the managed delta table into a dataframe.
df = spark.sql("SELECT * FROM YOUR_LAKEHOUSE_NAME.AIData LIMIT 1000")
display(df)