mirror of
https://github.com/lucaspalomodevelop/indico-plugins.git
synced 2026-03-13 15:34:37 +00:00
* Enable CI on search branch * LiveSync: Clean titles in verbose iterator * LiveSync: Refactor initial export batching And make the batch size configurable via commandline option * LiveSync/Citadel: Support Citadel as the sync backend Co-authored-by: penelope <penelope@fnal.gov> * LiveSync/Citadel: Upgrade to python3 * LiveSync/Citadel: Update backend and refactor schemas * LiveSync/Citadel: Add placeholders and filters/aggregations * LiveSync/Citadel: Add range filters * LiveSync/Citadel: Update the schemas and initial indexing * LiveSync/Citadel: Refactor tests * LiveSync/Citadel: Add status force list to upload retries * LiveSync/Citadel: Update clean_old_entries filter * LiveSync/Citadel: Update file headers * LiveSync/Citadel: Add allowed methods in citadel retry Commit more often in batches * Citadel: Rename livesync_citadel to citadel It's no longer just livesync * Citadel: Move metadata to setup.cfg * Citadel: Remove tika import * Citadel: Mock things using the pytest way * Citadel: Run pyupgrade * Citadel: Run tests in CI * Citadel: Use 'search' plugin category * Citadel: Improve settings form * Citadel: Fix model and alembic revision * Citadel: Make livesync backend unique * Citadel: Fix is_group error * Citadel: Include schema attribute in the record context * Citadel: Exclude content from Attachment schema * Citadel: Use cached category path from context * Citadel: Fix passing category tree cache to schemas * Citadel: Remove unused imports * Citadel: Be more strict with category cache If the cache exists, we can expect ALL categories to be in there * Citadel: Fix duplicate _get_identifiers call * Citadel: Simplify record uploading * ci: Allow `search` as valid upstream branch * Add support for attachment file uploading * Update file id mapping * Update cli to a unique backend * Merge file column migration with previous revision * Refactor citadel and remove obsolete parts * Citadel: Cache search_owner_role setting * Citadel: Dump before parallelization * Citadel: Do not retry on 500, use lower delays * Remove change_type from upload_record * Define a maximum number of pages ES limits results up to 10,000 hits * Add missing linebreak * Citadel: Include category_id in search data * Convert the aggregations to an object * Raise request exceptions * Move query formatting methods to util * Remove search owner role setting * Move the object type to filters * Fix optional string based matches Update tests * Citadel: Include start/end dt for subcontribs * Citadel: Add tests for schemas * LiveSync: Pass force flag to backends * Citadel: Skip already-exported entries unless forced Like this resuming a failed initial export is as easy as just running the command again. * LiveSync: eager-load subcontrib timetable entry * Citadel: Remove obsolete context * Citadel: Do not upload empty files * LiveSync: Fix changing root category protection mode * Citadel: Fix access logic for empty ACLs Also add unit tests for this * Citadel: Add contrib/subcontrib duration to schemas * LiveSync: Add verbose mode * Remove record dumping if it's being deleted * Simplify run_initial_export * Citadel: Print change type in verbose mode * Citadel: Ignore create-and-delete changes * LiveSync: Fix tests * LiveSync: Move citadel-specific code to citadel plugin * Citadel: Make id mapping entries unique * Citadel: Refactor http logic * Citadel: Use 1:1 relationship for id mapping * Citadel: Use more sensible argument order in id mapping * LiveSync: Correctly handle (sub) contribution changes - track contribution time changes - cascade contribution changes to subcontributions * Simplify livesync uploader entries * LiveSync: Improve naming * LiveSync: Fix attachment handling Also fix cascading of contribution changes during simplification step * Citadel: Fix enum display in error message * LiveSync: Cascade creation to children This is necessary e.g. when cloning events because in that case we only have the event creation record but nothing for the elements inside that event. To avoid sending the same thing twice (e.g. a contribution) chunking now only happens on the simplified change level, not on the record level, to ensure all recorded changes are taken into account when simplifying them to creation/updates/deletions. * LiveSync: Allow deferring initial export done flag In case of Citadel we want to require a file export as well before starting to process the queue to avoid cases where we are still doing an initial file upload but queue runs start happening and we would end up sending old files as well. * Citadel: Upload files after a queue run * Refactor checks whether queue runs are possible Also add an `indico citadel reset` command to throw away all local citadel data * Citadel: Re-upload attachments whose file changed * LiveSync: Fix title in initial export progress * Citadel: Make max file size configurable * Check if a livesync plugins is properly configured And use it to check whether citadel has url and token set * LiveSync/Debug: Fix errors and use schemas * Citadel: Refactor verbose printing * LiveSync: Remove unnecessary default value * Update the search API to support multiple types * Move reset cli from citadel to livesync core * Citadel: Rename model/table * Citadel: Remove unnecessary timestamp column * Citadel: Remove some overly verbose comments * Citadel: Commit explicitly outside create() * Citadel: Do not send None inside _data * LiveSync: Fix handling notes * Citadel: Integrate placeholders with Indico (#117) * LiveSync: Fix query spam in user.is_system check * Citadel: Use new result schemas * Update search schemas * Citadel: Various improvements/fixes * Citadel: Integrate filters with Indico & update placeholders (#119) * Move dump schemas from core to livesync * Use type_format only in citadel code * Citadel: Strip HTML from descriptions/notes * Citadel: Show message if some fails failed to upload * Add category id filter * Citadel: Add some debug logging during file uploads * Citadel: Allow filtering by category name * Update title and type placeholders * Citadel: Add -f as --force shortcut * Fix quote dividers in a multi placeholder query * Use all worker threads all the time * Close responses from uploads Hopefully this fixes running out of fds after ~240k uploads * Add some more debug logging * Fix incorrect output * Rename type placeholder * Supported quotes for exact matches * LiveSync: Fix cascading issues - do not cascade implicit changes to deleted attachments - correctly cascade changes involving session-linked objects * Improve error handling * Citadel: Do not enable queue runs after max-size usage * LiveSync: Add setting to disable queue runs * LiveSync: Check category blacklist in initial export * LiveSync: Add schema tests * LiveSync: Handle and cascade location changes * Move query formatting outside the f-string * Refactor the search query parser Ensures the keyword positioning isn't changed and correctly sanitizes placeholder content. * Escape whitelisted placeholders Co-authored-by: Adrian <adrian@planetcoding.net> * Strip the result query * Remove placeholder operator * Add default operator * Update tests * Add support for sort options (#123) * Add support for sort options * Rename SearchFilter to SearchOption * Rename sort option keys * Fix error with lazy strings in filter labels * Revert "Enable CI on search branch" This reverts commit 2ec3f650eed02cd6c72ea336fa92df6927b8da39. Co-authored-by: Pedro Lourenço <pedro.lourenco@cern.ch> Co-authored-by: Michal Kolodziejski <michal.kolodziejski@cern.ch> Co-authored-by: Penelope Constanta <penelope@fnal.gov>
215 lines
8.4 KiB
Python
215 lines
8.4 KiB
Python
# This file is part of the Indico plugins.
|
|
# Copyright (C) 2002 - 2021 CERN
|
|
#
|
|
# The Indico plugins are free software; you can redistribute
|
|
# them and/or modify them under the terms of the MIT License;
|
|
# see the LICENSE file for more details.
|
|
|
|
from flask_pluginengine import depends, trim_docstring
|
|
from sqlalchemy.orm import subqueryload
|
|
|
|
from indico.core.plugins import IndicoPlugin, PluginCategory
|
|
from indico.modules.attachments.models.attachments import Attachment
|
|
from indico.modules.categories import Category
|
|
from indico.modules.categories.models.principals import CategoryPrincipal
|
|
from indico.modules.events.contributions.models.contributions import Contribution
|
|
from indico.modules.events.contributions.models.subcontributions import SubContribution
|
|
from indico.modules.events.models.events import Event
|
|
from indico.modules.events.notes.models.notes import EventNote
|
|
from indico.util.date_time import now_utc
|
|
from indico.util.decorators import classproperty
|
|
|
|
from indico_livesync.forms import AgentForm
|
|
from indico_livesync.initial import (apply_acl_entry_strategy, query_attachments, query_contributions, query_events,
|
|
query_notes, query_subcontributions)
|
|
from indico_livesync.models.queue import LiveSyncQueueEntry
|
|
from indico_livesync.plugin import LiveSyncPlugin
|
|
|
|
|
|
@depends('livesync')
|
|
class LiveSyncPluginBase(IndicoPlugin): # pragma: no cover
|
|
"""Base class for livesync plugins"""
|
|
|
|
#: dict containing the backend(s) provided by the plugin; the keys are unique identifiers
|
|
backend_classes = None
|
|
category = PluginCategory.synchronization
|
|
|
|
def init(self):
|
|
super().init()
|
|
for name, backend_class in self.backend_classes.items():
|
|
assert backend_class.plugin is None
|
|
backend_class.plugin = type(self)
|
|
LiveSyncPlugin.instance.register_backend_class(name, backend_class)
|
|
|
|
|
|
class LiveSyncBackendBase:
|
|
"""Base class for livesync backends"""
|
|
|
|
#: the plugin containing the agent
|
|
plugin = None # set automatically when the agent is registered
|
|
#: the Uploader to use. only needed if run and run_initial_export are not overridden
|
|
uploader = None
|
|
#: the form used when creating/editing the agent
|
|
form = AgentForm
|
|
#: whether only one agent with this backend is allowed
|
|
unique = False
|
|
#: whether a reset can delete data on whatever backend is used as well or the user
|
|
#: needs to do it themself after doing a reset
|
|
reset_deletes_indexed_data = False
|
|
|
|
@classproperty
|
|
@classmethod
|
|
def title(cls):
|
|
parts = trim_docstring(cls.__doc__).split('\n', 1)
|
|
return parts[0].strip()
|
|
|
|
@classproperty
|
|
@classmethod
|
|
def description(cls):
|
|
parts = trim_docstring(cls.__doc__).split('\n', 1)
|
|
try:
|
|
return parts[1].strip()
|
|
except IndexError:
|
|
return 'no description available'
|
|
|
|
def __init__(self, agent):
|
|
"""
|
|
:param agent: a `LiveSyncAgent` instance
|
|
"""
|
|
self.agent = agent
|
|
|
|
def is_configured(self):
|
|
"""Check whether the backend is properly configured.
|
|
|
|
If this returns False, running the initial export or queue
|
|
will not be possible.
|
|
"""
|
|
return True
|
|
|
|
def check_queue_status(self):
|
|
"""Return whether queue runs are allowed (or why not).
|
|
|
|
:return: ``allowed, reason`` tuple; the reason is None if runs are allowed.
|
|
"""
|
|
if not self.is_configured():
|
|
return False, 'not configured'
|
|
if self.agent.initial_data_exported:
|
|
return True, None
|
|
return False, 'initial export not performed'
|
|
|
|
def fetch_records(self, count=None):
|
|
query = (self.agent.queue
|
|
.filter_by(processed=False)
|
|
.order_by(LiveSyncQueueEntry.timestamp)
|
|
.limit(count))
|
|
return query.all()
|
|
|
|
def update_last_run(self):
|
|
"""Updates the last run timestamp.
|
|
|
|
Don't forget to call this if you implement your own `run` method!
|
|
"""
|
|
self.agent.last_run = now_utc()
|
|
|
|
def process_queue(self, uploader):
|
|
"""Process queued entries during an export run."""
|
|
records = self.fetch_records()
|
|
LiveSyncPlugin.logger.info(f'Uploading %d records via {self.uploader.__name__}', len(records))
|
|
uploader.run(records)
|
|
|
|
def run(self, verbose=False, from_cli=False):
|
|
"""Runs the livesync export"""
|
|
if self.uploader is None: # pragma: no cover
|
|
raise NotImplementedError
|
|
|
|
uploader = self.uploader(self, verbose=verbose, from_cli=from_cli)
|
|
self.process_queue(uploader)
|
|
self.update_last_run()
|
|
|
|
def get_initial_query(self, model_cls, force):
|
|
"""Get the initial export query for a given model.
|
|
|
|
Supported models are `Event`, `Contribution`, `SubContribution`,
|
|
`Attachment` and `EventNote`.
|
|
|
|
:param model_cls: The model class to query
|
|
:param force: Whether the initial export was started with ``--force``
|
|
"""
|
|
fn = {
|
|
Event: query_events,
|
|
Contribution: query_contributions,
|
|
SubContribution: query_subcontributions,
|
|
Attachment: query_attachments,
|
|
EventNote: query_notes,
|
|
}[model_cls]
|
|
return fn()
|
|
|
|
def run_initial_export(self, batch_size, force=False, verbose=False):
|
|
"""Runs the initial export.
|
|
|
|
This process is expected to take a very long time.
|
|
:return: True if everything was successful, False if not
|
|
"""
|
|
if self.uploader is None: # pragma: no cover
|
|
raise NotImplementedError
|
|
|
|
uploader = self.uploader(self, verbose=verbose, from_cli=True)
|
|
|
|
Category.allow_relationship_preloading = True
|
|
Category.preload_relationships(Category.query, 'acl_entries',
|
|
strategy=lambda rel: apply_acl_entry_strategy(subqueryload(rel),
|
|
CategoryPrincipal))
|
|
_category_cache = Category.query.all() # noqa: F841
|
|
|
|
events = self.get_initial_query(Event, force)
|
|
contributions = self.get_initial_query(Contribution, force)
|
|
subcontributions = self.get_initial_query(SubContribution, force)
|
|
attachments = self.get_initial_query(Attachment, force)
|
|
notes = self.get_initial_query(EventNote, force)
|
|
|
|
print('Exporting events')
|
|
if not uploader.run_initial(events.yield_per(batch_size), events.count()):
|
|
print('Initial export of events failed')
|
|
return False
|
|
print('Exporting contributions')
|
|
if not uploader.run_initial(contributions.yield_per(batch_size), contributions.count()):
|
|
print('Initial export of contributions failed')
|
|
return False
|
|
print('Exporting subcontributions')
|
|
if not uploader.run_initial(subcontributions.yield_per(batch_size), subcontributions.count()):
|
|
print('Initial export of subcontributions failed')
|
|
return False
|
|
print('Exporting attachments')
|
|
if not uploader.run_initial(attachments.yield_per(batch_size), attachments.count()):
|
|
print('Initial export of attachments failed')
|
|
return False
|
|
print('Exporting notes')
|
|
if not uploader.run_initial(notes.yield_per(batch_size), notes.count()):
|
|
print('Initial export of notes failed')
|
|
return False
|
|
return True
|
|
|
|
def check_reset_status(self):
|
|
"""Return whether a reset is allowed (or why not).
|
|
|
|
When resetting is not allowed, the message indicates why this is the case.
|
|
|
|
:return: ``allowed, reason`` tuple; the reason is None if resetting is allowed.
|
|
"""
|
|
if not self.agent.queue.has_rows() and not self.agent.initial_data_exported:
|
|
return False, 'There is nothing to reset'
|
|
return True, None
|
|
|
|
def reset(self):
|
|
"""Perform a full reset of all data related to the backend.
|
|
|
|
This deletes all queued changes, resets the initial export state back
|
|
to pending and do any other backend-specific tasks that may be required.
|
|
|
|
It is not necessary to delete the actual search indexes (which are possibly
|
|
on a remote service), but if your backend has the ability to do it you may
|
|
want to do it and display a message to the user indicating this.
|
|
"""
|
|
self.agent.initial_data_exported = False
|
|
self.agent.queue.delete()
|