GitHub - spear-ai/janus-llm: Leveraging LLMs for modernization through intelligent chunking, iterative prompting and reflection, and retrieval augmented generation (RAG).

Overview

Janus (janus-llm) uses LLMs to aid in the modernization of legacy IT systems. The repository can currently do the following:

Chunk code of over 100 programming languages to fit within different model context windows and add to a Chroma vector database.
Translate from one programming language to another on a file-by-file basis using an LLM with varying results (with the translate.py script).
Translate from a binary file to a programming language using Ghidra decompilation.
Do 1-3 with a CLI tool (janus).

Roadmap

Priorities

Scripts interacting with Chroma Vector DB for RAG translation and understanding.
Evaluation of outputs in CLI using LLM self-evaluation or static analysis.

Installation

pip install janus-llm

Installing from Source

Clone the repository:

git clone [email protected]:janus-llm/janus-llm.git

NOTE: Make sure you're using Python 3.10 or 3.11.

Then, install the requirements:

curl -sSkL https://install.python-poetry.org | python -
export PATH=$PATH:$HOME/.local/bin
poetry install

PlantUML example

Files of a single type can be "translated" to PlantUML diagrams. See the following example:

janus translate --input-dir INPUT_DIR --output-dir OUTPUT_DIR --source-lang SOURCE_LANG --target-lang plantuml --prompt-template janus-llm/janus/prompts/templates/uml-diagram/

Please note that using plantuml as an input language in regular translation is very likely broken.

RAG

Retrieval Augmented Generation is a framework that combines chunks of "context" from a database into a prompting scheme to provide more information to a generative model at inference time. This fork supports RAG by adding the in-collection and n-db-results parameters at translation time.

After database initialization, a collection must first be created and embedded by running the appropriate janus db commands. For example, to add a collection of cpp files within the start/ directory, a user might run:

janus db add collection-cpp --input-lang cpp --input-dir start/

Then, that user might craft a prompting scheme that includes {CONTEXT}. This will be replaced by the top integer number of n-db-results of context from that collection at inference time. An example can be found at ./prompts/templates/simple-rag/

That user might finally pass the collection and their prompting scheme into the model during translation:

janus translate --input-dir start --source-lang cpp --output-dir finish-with-retrieval --in-collection collection-cpp --target-lang python --llm-name LLM_NAME --prompt-template ~/forked/janus-llm/janus/prompts/templates/simple-rag/ -n 2

Users can also perform a translation, build a collection out of the result, and ask the model to iterate on its first attempt. The creative application of context can have a big impact on the end result.

Collection input types do not have to match, so a collection can be built out of one language and used for translation of another.

Pipelines

Sometimes a user might want to run a succession of janus-related commands in a row. These can be handled with pipelines.

Within /scripts/pipelines/ exists our example script, header_machine. This Pipeline exists to copy, translate, and use header files in context during cpp-to-python translation. When it acts successfully, it may serve to selectively integrate header content into translations that otherwise might be missing it.

This script requires that the template directory has proper formatted simple/ and simple-rag-headers/ prompt directories within it. Like any other script, a user may have to add permissions to it (such as with chmod) in order to run.

Here is some example usage:

./header_machine.sh -i INPUT_DIR -t PATH_TO/janus-llm/janus/prompts/templates -o OUTPUT_DIR

A user can use the -h flag to get help, and the -k flag to keep intermediate products (inlcuding translated header files).

Intermediate products are stored in a uuid directory. These will be unique to prevent overwriting between runs.

Contributing

See our contributing pages

Name		Name	Last commit message	Last commit date
Latest commit History 447 Commits
.github/workflows		.github/workflows
assets/icons		assets/icons
config		config
docs		docs
janus		janus
scripts		scripts
tests		tests
.flake8		.flake8
.gitignore		.gitignore
.gitlab-ci.yml		.gitlab-ci.yml
.pre-commit-config.yaml		.pre-commit-config.yaml
CITATION.cff		CITATION.cff
LICENSE		LICENSE
README.md		README.md
poetry.lock		poetry.lock
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Overview

Roadmap

Priorities

Installation

Installing from Source

PlantUML example

RAG

Pipelines

Contributing

Copyright

About

Releases

Packages

Languages

License

spear-ai/janus-llm

Folders and files

Latest commit

History

Repository files navigation

Overview

Roadmap

Priorities

Installation

Installing from Source

PlantUML example

RAG

Pipelines

Contributing

Copyright

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages