How to create a Xinference API key to use in Ozeki AI Gateway

This guide demonstrates how to set up a local Xinference server and configure it in Ozeki AI Gateway. You'll learn how to configure Xinference with authentication, launch the server with your auth config, deploy AI models, and set up Xinference as a provider in your gateway. By following these steps, you can connect your Ozeki AI Gateway to self-hosted AI models running on your own infrastructure with full access control.

Web UI: http://localhost:9997/ui
API URL: http://localhost:9997/v1
Example API Key: sk-abcde

Start command (example):
xinference-local --host 127.0.0.1 --port 9997 --auth-config C:\Xinference\auth_config.json

Auth config file (auth_config.json):
{
    "auth_config": {
        "algorithm": "HS256",
        "secret_key": "09d25e094faa6ca2556c818166b7a9563b93f7099f6f0f4caa6cf63b88e8d3e7",
        "token_expire_in_minutes": 30
    },
    "user_config": [
        {
            "username": "admin",
            "password": "qwe123",
            "permissions": [
                "admin"
            ],
            "api_keys": [
                "sk-72tkvudyGLPMi",
                "sk-ZOTLIY4gt9w11"
            ]
        }
    ]
}

What is Xinference?

Xinference is a powerful open-source model inference framework that allows you to run AI models on your own hardware. It provides a unified interface for deploying and serving various types of models including LLMs, embedding models, and multimodal models. Xinference offers an OpenAI-compatible API interface and includes built-in authentication support, making it easy to secure your AI infrastructure.

Overview
Overview

Steps to follow

  1. Create auth config file
  2. Start Xinference from terminal
  3. Open and log in to Xinference web UI
  4. Configure a model of choice
  5. Launch the model
  6. Open Providers page in Ozeki AI Gateway
  7. Create provider
  8. Test the Xinference provider

How to create Xinference API key video

The following video shows how to set up a local Xinference server with API key authentication and configure it in Ozeki AI Gateway step-by-step. The video covers creating the auth config, starting the server, deploying a model, and setting up the provider.

Step 0 - Install Ozeki AI Gateway

Before configuring Xinference as a provider, you need to have Ozeki AI Gateway installed and running on your system. If you haven't installed Ozeki AI Gateway yet, follow our Installation on Linux guide or Installation on Windows guide to complete the initial setup.

Step 1 - Create auth config file

Create a new JSON file for your Xinference authentication configuration. Choose a location on your system where you'll store this file, such as C:\Xinference\auth_config.json on Windows. This file will contain your authentication settings including API keys and user credentials (Figure 1).

Create auth config file
Figure 1 - Open or create auth config file

Edit the auth config file and add your authentication configuration. The file includes sections for auth settings and user configuration. You can specify one or more API keys in the api_keys array. These keys will be used to authenticate requests to your Xinference server. Make sure to use secure, randomly generated API keys and passwords (Figure 2).

Set API keys in config file
Figure 2 - Set API keys in auth config file

Step 2 - Start Xinference from terminal

Open a terminal or command prompt and start the Xinference server with your auth config file. Use the xinference-local command with the --auth-config parameter pointing to your configuration file. Specify the host and port where the server should listen for connections (Figure 3).

xinference-local --host 127.0.0.1 --port 9997 --auth-config C:\Xinference\auth_config.json

Start Xinference from terminal
Figure 3 - Start Xinference server from terminal

Wait for Xinference to complete its startup process. The terminal will display initialization messages and confirm when the server is ready to accept connections. You should see messages indicating that the web UI and API endpoints are available (Figure 4).

Xinference started
Figure 4 - Xinference server started successfully

Step 3 - Open and log in to Xinference web UI

Open your web browser and navigate to the Xinference web UI. This opens the Xinference management interface where you can deploy models, manage configurations, and monitor your AI infrastructure (Figure 5).

Open Xinference web UI
Figure 5 - Open Xinference web UI

Enter the admin credentials you configured in your auth config file. Use the username and password you specified in the user_config section. Click the sign in button to access the Xinference dashboard (Figure 6).

Login to admin account
Figure 6 - Login to admin account

Step 4 - Configure a model of choice

In the Xinference dashboard, browse the available models and select one to deploy. Choose a model that fits your requirements and hardware capabilities (Figure 7).

Select a model of choice
Figure 7 - Select a model to deploy

Configure the model deployment settings. You can specify parameters such as the model format, quantization level, context length, and other performance settings. Adjust these settings based on your hardware capabilities and use case requirements (Figure 8).

Enter model settings
Figure 8 - Configure model settings

Step 5 - Launch the model

Click the launch button to deploy the model. Xinference will begin loading the model into memory and preparing it for inference. This process may take several minutes depending on the model size and your system's capabilities (Figure 9).

Launch model
Figure 9 - Launch the selected model

Wait for the model to finish loading and verify that it's running successfully. The dashboard will show the model status as Running when deployment is complete. You can now use this model through the Xinference API (Figure 10).

Model launched successfully
Figure 10 - Model launched and ready for use

Step 6 - Open Providers page in Ozeki AI Gateway

Open the Ozeki AI Gateway web interface and navigate to the Providers page. This is where you'll configure Xinference as a new provider using the API key you specified in the auth config file (Figure 11).

Open Providers page
Figure 11 - Open Providers page in Ozeki AI Gateway

Step 7 - Create provider

Click the "New" button to begin the provider creation process. This opens a form where you'll enter the connection details for your Xinference server (Figure 12).

Add new provider
Figure 12 - Click add new provider button

Fill in the provider configuration form with your Xinference server's details. Enter a descriptive provider name, select "OpenAI compatible" as the provider type, specify the API endpoint URL (http://localhost:9997/v1), and paste one of the API keys from your auth config file. Select the model you deployed from the dropdown list, then click "Create" to save the configuration (Figure 13).

http://localhost:9997/v1

Enter provider details and create
Figure 13 - Enter Xinference provider details and create

After creation, the Xinference provider is added to Ozeki AI Gateway. The provider now appears in the list, confirming that your gateway can route requests to your local Xinference server. You can now create routes that allow users to access your self-hosted AI models through the gateway (Figure 14).

Provider created successfully
Figure 14 - Xinference provider created successfully

Step 8 - Test the Xinference provider

The following video demonstrates how to test your Xinference provider by sending a test prompt and verifying the response.

To verify that your Xinference provider is working correctly, navigate to the provider test page in Ozeki AI Gateway. This interface allows you to send test prompts directly to your configured Xinference server (Figure 15).

Open provider test page
Figure 15 - Open provider test page

Select your Xinference provider and the deployed model, then enter a simple test prompt. Click the send button to submit the request to your Xinference server (Figure 16).

Send prompt to provider
Figure 16 - Send test prompt to Xinference provider

After sending the test prompt, verify that you receive a response from the Xinference server. A successful response confirms that your provider is configured correctly, the API key authentication is working, and the Xinference server is processing requests as expected (Figure 17).

Check provider response
Figure 17 - Verify Xinference provider response

Conclusion

You have successfully set up a local Xinference server with API key authentication and configured it as a provider in Ozeki AI Gateway. Your gateway can now communicate with your self-hosted AI models running on your own infrastructure with full authentication and access control. This setup gives you complete control over your AI infrastructure while maintaining centralized access management through Ozeki AI Gateway. You can deploy and serve multiple AI models from your own hardware, ensuring data privacy, reducing cloud costs, and maintaining full control over your AI deployment.

More information