Quick Example

curl -X POST https://fastapi.mymagic.ai/v1/completions \
-H 'Authorization: Bearer <your personal access token>' \
-H 'Content-Type: application/json' \
-d '{ 
"model": "<model>",
"question": "<your question>", 
"storage_provider": "<your storage provider>",
"bucket_name": "<your bucket name>", 
"session": "<your session name>", 
"max_tokens": "<number of tokens to output>",
"system_prompt": "<your system prompt>", 
"role_arn": "arn:aws:iam::<your aws account ID>:role/<your s3 access role>",
"region": "The region your bucket is in",
"return_output": "Boolean indicating whether to return the output or not",
"input_json_file": "The name of the input json file in your s3 bucket",
"structured_output": "json schema for the response output"
}'

Currently the API supports the following llms:

  • Llama3-70b (replace <model> with llama3_70b)
  • Llama2-70b (replace <model> with llama2_70b)
  • Llama2-7b (replace <model> with llama2_7b)
  • CodeLlama-70b (replace <model> with codellama_70b)
  • Mixtral-8x7 (replace <model> with mixtral_8x7)
  • Mistral-7b (replace <model> with mistral_7b)

All our models are quantized and optimized for inference.

AWS S3

Please use the s3 access role you created in the previous step. Also, put your files for batch inference in a folder called <personal_access_token>/<session_name> in your s3 bucket. If you name your session my_session, then you should put your files in the following folder: <personal_access_token>/my_session.

GCS

For using GCS, you will need to set up a service account with the necessary permissions to access bucket. Place your files for batch inference in a bucket, ideally under a folder named <personal_access_token>/<session_name>. If you name your session my_session, then you should put your files in the following folder: <personal_access_token>/my_session.