Spaces:

lynx-analytics
/

lynxkite

Running

App Files Files Community

lynxkite / docs /guides /plugins.md

darabos

Update docs, op_registration.

b501eb6 6 months ago

preview code

raw

history blame

11.3 kB

	# Plugin development

	Plugins can provide additional operations for an existing LynxKite environment,
	and they can also provide new environments.

	## Creating a new plugin

	`.py` files inside the LynxKite data directory are automatically imported each time a
	workspace is executed. You can create a new plugin by creating a new `.py` file in the
	data directory. LynxKite even includes an integrated editor for this purpose.
	Click New code file in the directory where you want to create the file.

	Plugins in subdirectories of the data directory are imported when executing workspaces
	within those directories. This allows you to create plugins that are only available
	in specific workspaces.

	You can also create and distribute plugins as Python packages. In this case the
	module name must start with `lynxkite_` for it to be automatically imported on startup.

	### Plugin dependencies

	When creating a plugin as a "code file", you can create a `requirements.txt` file in the same
	directory. This file will be used to install the dependencies of the plugin.

	## Adding new operations

	Any piece of Python code can easily be wrapped into a LynxKite operation.
	Let's say we have some code that calculates the length of a string column in a Pandas DataFrame:

	```python
	df["length"] = df["my_column"].str.len()
	```

	We can turn it into a LynxKite operation using the
	[`@op`](../reference/lynxkite-core/ops.md#lynxkite.core.ops.op) decorator:

	```python
	import pandas as pd
	from lynxkite.core.ops import op

	@op("LynxKite Graph Analytics", "Get column length")
	def get_length(df: pd.DataFrame, *, column_name: str):
	"""
	Gets the length of a string column.

	Args:
	column_name: The name of the column to get the length of.
	"""
	df = df.copy()
	df["length"] = df[column_name].str.len()
	return df
	```

	Let's review the changes we made.

	### The `@op` decorator

	The [`@op`](../reference/lynxkite-core/ops.md#lynxkite.core.ops.op) decorator registers a
	function as a LynxKite operation. The first argument is the name of the environment,
	the last argument is the name of the operation. Between the two, you can list the hierarchy of
	categories the operation belongs to. For example:

	```python
	@op("LynxKite Graph Analytics", "Machine learning", "Preprocessing", "Split train/test set")
	```

	When defining multiple operations, you can use
	[`ops.op_registration`](../reference/lynxkite-core/ops.md#lynxkite.core.ops.op_registration)
	for convenience:
	```python
	op = ops.op_registration("LynxKite Graph Analytics")

	@op("An operation")
	def my_op():
	...
	```

	### The function signature

	`*` in the list of function arguments marks the start of keyword-only arguments.
	The arguments before `` will become _inputs_ of the operation. The arguments after `` will
	be its _parameters_.

	```python
	# /--- inputs ---\ /- parameters -\
	def get_length(df: pd.DataFrame, *, column_name: str):
	```

	LynxKite uses the type annotations of the function arguments to provide input validation,
	conversion, and the right UI on the frontend.

	The types supported for inputs are determined by the environment. For graph analytics,
	the possibilities are:

	- `pandas.DataFrame`
	- `networkx.Graph`
	- [`lynxkite_graph_analytics.Bundle`](../reference/lynxkite-graph-analytics/core.md#lynxkite_graph_analytics.core.Bundle)

	The inputs of an operation are automatically converted to the right type, when possible.

	To make an input optional, use an optional type, like `pd.DataFrame \| None`.

	The position of the input and output connectors can be controlled using the
	[`@ops.input_position`](../reference/lynxkite-core/ops.md#lynxkite.core.ops.input_position) and
	[`@ops.output_position`](../reference/lynxkite-core/ops.md#lynxkite.core.ops.output_position)
	decorators. By default, inputs are on the left and outputs on the right.

	All parameters are stored in LynxKite workspaces as strings. If a type annotation is provided,
	LynxKite will convert the string to the right type and provide the right UI.

	- `str`, `int`, `float` are presented as a text box and converted to the given type.
	- `bool` is presented as a checkbox.
	- [`lynxkite.core.ops.LongStr`](../reference/lynxkite-core/ops.md#lynxkite.core.ops.LongStr)
	is presented as a text area.
	- Enums are presented as a dropdown list.
	- Pydantic models are presented as their JSON string representations. (Unless you add custom UI
	for them.) They are converted to the model object when your function is called.

	### Slow operations

	If the function takes a significant amount of time to run, we must either:

	- Write an asynchronous function.
	- Pass `slow=True` to the `@op` decorator. LynxKite will run the function in a separate thread.

	`slow=True` also causes the results of the operation to be cached on disk. As long as
	its inputs don't change, the operation will not be run again. This is useful for both
	synchronous and synchronous operations.

	### Documentation

	The docstring of the function is used as the operation's description. You can use
	Google-style or Numpy-style docstrings.
	(See [Griffe's documentation](https://mkdocstrings.github.io/griffe/reference/docstrings/).)

	The docstring should be omitted for simple operations like the one above.

	### Outputting results

	The return value of the function is the output of the operation. It will be passed to the
	next operation in the pipeline.

	An operation can have multiple outputs. In this case, the return value must be a dictionary,
	and the list of outputs must be declared in the `@op` decorator.

	```python
	@op("LynxKite Graph Analytics", "Train/test split", outputs=["train", "test"])
	def test_split(df: pd.DataFrame, *, test_ratio=0.1):
	test = df.sample(frac=test_ratio).reset_index()
	train = df.drop(test.index).reset_index()
	return {"train": train, "test": test}
	```

	### Displaying results

	The outputs of the operation can be used by other operations. But we can also generate results
	that are meant to be viewed by the user. The different options for this are controlled by the `view`
	argument of the `@op` decorator.

	The `view` argument can be one of the following:

	- `matplotlib`: Just plot something with Matplotlib and it will be displayed in the UI.

	```python
	@op("LynxKite Graph Analytics", "Plot column histogram", view="matplotlib")
	def plot(df: pd.DataFrame, *, column_name: str):
	df[column_name].value_counts().sort_index().plot.bar()
	```

	- `visualization`: Draws a chart using [ECharts](https://echarts.apache.org/examples/en/index.html).
	You need to return a dictionary with the chart configuration, which ECharts calls `option`.

	```python
	@op("View loss", view="visualization")
	def view_loss(bundle: core.Bundle):
	loss = bundle.dfs["training"].training_loss.tolist()
	v = {
	"title": {"text": "Training loss"},
	"xAxis": {"type": "category"},
	"yAxis": {"type": "value"},
	"series": [{"data": loss, "type": "line"}],
	}
	return v
	```

	- `image`: Return an image as a
	[data URL](https://developer.mozilla.org/en-US/docs/Web/URI/Reference/Schemes/data)
	and it will be displayed.
	- `molecule`: Return a molecule as a PDB or SDF string, or an `rdkit.Chem.Mol` object.
	It will be displayed using [3Dmol.js](https://3Dmol.org/).
	- `table_view`: Return
	[`Bundle.to_dict()`](../reference/lynxkite-graph-analytics/core.md#lynxkite_graph_analytics.core.Bundle.to_dict).

	## Adding new environments

	A new environment means a completely new set of operations, and (optionally) a new
	executor. There's nothing to be done for setting up a new environment. Just start
	registering operations into it.

	### No executor

	By default, the new environment will have no executor. This can be useful!

	LynxKite workspaces are stored as straightforward JSON files and updated on every modification.
	You can use LynxKite for configuring workflows and have a separate system
	read the JSON files.

	Since the code of the operations is not executed in this case, you can create functions that do nothing.
	Alternatively, you can use the
	[`register_passive_op`](../reference/lynxkite-core/ops.md#lynxkite.core.ops.register_passive_op)
	and
	[`passive_op_registration`](../reference/lynxkite-core/ops.md#lynxkite.core.ops.passive_op_registration)
	functions to easily whip up a set of operations:

	```python
	from lynxkite.core.ops import passive_op_registration, Parameter as P

	op = passive_op_registration("My Environment")
	op('Scrape documents', params=[P('url', '')])
	op('Conversation logs')
	op('Extract graph')
	op('Compute embeddings', params=[P.options('method', ['LLM', 'graph', 'random']), P('dimensions', 1234)])
	op('Vector DB', params=[P.options('backend', ['ANN', 'HNSW'])])
	op('Chat UI', outputs=[])
	op('Chat backend')
	```

	### Built-in executors

	LynxKite comes with two built-in executors. You can register these for your environment
	and you're good to go.

	```python
	from lynxkite.core.executors import simple
	simple.register("My Environment")
	```

	The [`simple` executor](../reference/lynxkite-core/executors/simple.md)
	runs each operation once, passing the output of the preceding operation
	as the input to the next one. No tricks. You can use any types as inputs and outputs.

	```python
	from lynxkite.core.executors import one_by_one
	one_by_one.register("My Environment")
	```

	The [`one_by_one` executor](../reference/lynxkite-core/executors/one_by_one.md)
	expects that the code for operations is the code for transforming
	a single element. If an operation returns an iterable, it will be split up
	into its elements, and the next operation is called for each element.

	Sometimes you need the full contents of an input. The `one_by_one` executor
	lets you choose between the two modes by the orientation of the input connector.
	If the input connector is horizontal (left or right), it takes single elements.
	If the input connector is vertical (top or bottom), it takes an iterable of all the incoming data.

	A unique advantage of this setup is that horizontal inputs can have loops across
	horizontal inputs. Just make sure that loops eventually discard all elements, so you don't
	end up with an infinite loop.

	### Custom executors

	A custom executor can be registered using
	[`@ops.register_executor`](../reference/lynxkite-core/ops.md#lynxkite.core.ops.register_executor).

	```python
	@ops.register_executor(ENV)
	async def execute(ws: workspace.Workspace):
	catalog = ops.CATALOGS[ws.env]
	...
	```

	The executor must be an asynchronous function that takes a
	[`workspace.Workspace`](../reference/lynxkite-core/workspace.md#lynxkite.core.workspace.Workspace)
	as an argument. The return value is ignored and it's up to you how you process the workspace.

	To update the frontend as the executor processes the workspace, call
	[`WorkspaceNode.publish_started`](../reference/lynxkite-core/workspace.md#lynxkite.core.workspace.WorkspaceNode.publish_started)
	when starting to execute a node, and
	[`WorkspaceNode.publish_result`](../reference/lynxkite-core/workspace.md#lynxkite.core.workspace.WorkspaceNode.publish_result)
	to publish the results. Use
	[`WorkspaceNode.publish_error`](../reference/lynxkite-core/workspace.md#lynxkite.core.workspace.WorkspaceNode.publish_error)
	if the node failed.