Dr Jarosław Duda
(Jarek Duda)
Assistant professor at Institute of Computer Science (adiunkt),
Jagiellonian University
email:
jaroslaw.duda[at]uj.edu.pl
Short CV:
2015 Jagiellonian
University, Institute of Computer Science, assistant professor,
20132014 Purdue
University, NSF Center for Science of Information, Postdoctoral researcher (webpage),
20062012 Jagiellonian University, Cracow, PhD in
Theoretical Physics (thesis)
20042010 Jagiellonian University, Cracow, PhD in
Theoretical Computer Science (thesis)
20012006 Jagiellonian University, Cracow, MSc in
Theoretical Physics (thesis)
20002005 Jagiellonian University, Cracow, MSc in
Theoretical Mathematics (thesis)
19992004 Jagiellonian University, Cracow, MSc in
Computer Science (thesis)
Main research areas:
Information
theory/statistical physics  for my last MSc ([1] is its
translation) I have worked on optimal encoding with constraints on a lattice
(multidimensional generalization of Fibonacci coding), for example to improve
storage capacity by more precise head positioning. The maximizing capacity way
to choose statistical model (Maximal
Entropy Random Walk – Wikipedia,
2018 article) was further
developed for applications in physics as my second PhD. This 2006 MSc thesis
has also started ANS coding and has lead me to a few new coding approaches (slides):

Asymmetric Numeral Systems
(ANS, Wikipedia,
slides, PCS article) family of entropy coders (heart of data compressors).
Previously a compromise was needed: Huffman coding allowed for fast but
suboptimal compression, arithmetic coding for nearly optimal but slow (costly).
ANS offers compression ratio as arithmetic coding, at similar speed/cost as
Huffman coding. Here is a list of implementations and compressors using ANS.
For example Facebook ZSTD (also used e.g. in
Linux kernel, Android
operating system, was standardized
for email/html) and Apple LZFSE
(default in macOS and iOS) use Finite State Entropy
implementation of tANS variant, CRAM DNA compressor of European
Bioinformatics Institute, Google Draco and PIK, Dropbox
DivANS use rANS
variant. Additionally, chaotic behavior of tANS
makes it also perfect for simultaneous
encryption,

Constrained Coding:
generalization of the KuznetsovTsybakov problem: allowing to encode a message
under some constraints, which are known only to the sender. This generalization
allows to use statistical constraints, for example enforcing resemblance with a
given picture (grayness of a pixel becomes probability of using 1 in its
position). Natural applications are various watermarking/steganography purposes, for example to generate QRlike codes resembling a chosen image
(implementation
, ICIP paper, IEEE
Forensics & Security paper),

Joint Reconstruction Codes (JRC, implementation):
enhancement of the Fountain Codes
concept, which allows to reconstruct a message from any large enough subset of
packets. JRC additionally doesn’t need the sender to know the final individual
damage levels of packets – this knowledge is required in standard approach to
choose redundancy levels, but is often inaccurate or unavailable in reallife
scenarios. For example, while writing a storage medium we usually don’t know
how badly it will be damaged while reading. JRC allows the receivers to adapt
to the actual noise levels, treated as independent trust levels for each packet
while their joint reconstruction/error correction. Introduced continuous family
of rates based on Renyi entropy allow to estimate
statistical behavior of decoding (Pareto coefficient),

Correction trees
philosophy as improvement of sequential decoding for convolutional codes: using
larger state and bidirectional decoding, making it complementary alternative
for stateofart method (implementation).
It also allows to handle synchronization errors like deletion channel.
Machine
learning – searching for mathematically more sophisticated, but still
practical methods. For example molecular
shape descriptors (slides) for
virtual screening – parametrization of shape by fitting general bending of
molecule, then modelling crosssection as evolving ellipse. More general rapid
parametric density estimation models density as just linear combination
of chosen functions e.g. polynomials, what allows for very inexpensive estimation
and approaching real distribution. For example for multiple variables –
allowing for hierarchical correlation reconstruction:
decomposing into correlations between various subsets of variables, perfect for
example for modelling and imputation in missing data case, or for
biologyinspired artificial neurons (slides). It is
also perfect for systematic enhancement of financial ARMA/ARCHlike models:
with proper tail handling, approaching any real joint distribution, allowing to
model its time evolution for nonstationary time series – for example for predicting probability distribution of
values in time series, also multivariate
and credibility evaluation by
modelling conditional distribution.
Maximal
Entropy Random Walk (Wikipedia,
last
PhD, 2018 paper, slides):
standard stochastic models are based on philosophy that the object performs
successive random decisions using probabilities chosen arbitrarily by us. In
contrast, in statistical physics this randomness only represents our lack of
knowledge. Such models should be based on the maximal entropy principle (Jaynes), or equivalently: choosing e.g.
canonical ensemble, getting recent Maximal Entropy Random Walk (MERW) and its
extensions. Thanks of constructing models finally fulfilling this fundamental
mathematical requirement, in contrast to standard approach (which can be seen
as approximation), we finally get agreement with thermodynamical
expectations of quantum mechanics, like thermalization to the quantum
mechanical ground state probability density and Born rule: ‘squares’ relating
amplitudes and probabilities. My work on this subject has started with my
physics MSc thesis ([1] is its translation), where the equations were found for
information theory applications. The topic is continued in [5], [7], [8] and
[9]. Here is conductance
simulator to compare both philosophies.
Soliton
particle models (slides): Skyrme has
made popular the search for alternative approach to particle models – starting not as usually with leading to
many mathematical problems QFT perturbative approximation, but with trying to
understand the configuration of fields building the particle (e.g.
electromagnetic), which generally should maintain its structure (be a soliton),
for example because of topological constraints for spin and charge. Standard skyrmion approach introduces separate fields to model
single mesons or baryons – the perfect situation would be having just a single
field, which soliton family corresponds to our whole particle menagerie and
their dynamics with topological charges as quantum numbers. Working on MERW has
lead me to simple model which surprisingly well fulfills these requirements –
ellipsoid field ([7]). Here is short essay about it and
presentation.
Complex Base Numeral Systems
(first two MScs, slides) :
probably complete family of positional numeral systems with complex base, which
are ‘proper’ – representation function from digit sequences into a complex plane
is surjective and injective everywhere but a zero measure set (it’s
unavoidable, like 0.999(9)=1.000(0) ). Fractional part occurs to be simple
Iterated Function System (fractal). I have also introduced practical methods
for arithmetic in this representation, analytical tool to work with convex hull
of such simple fractals, to get analytical formulas for Hausdorff
dimension of boundary of such sets
and briefly generalization into higher dimensions.
It is described in [2] and [3], here is presentation about it.
Other interests and hobbies:

P vs NP, graph
isomorphism problem, (also for
quantum computing), Markov
fields, DNA reconstruction.

Biology,
e.g. evolutionism, neurobiology, biochemistry. For example chiral life concept (Wikipedia) –
as a computer scientist, while starting studying genetics I thought about
modifying the rules how triples of nucleotides are translated into aminoacids,
to get immunity by incompatibility with our viruses. This approach has a lot of
issues, but later in 2007 it has lead me to the possibility of synthesizing mirror
version of standard cells (original forum post).
It turns out that the race has recently started, e.g. in 2016 reaching
synthesis of mirror polymerase (enantiomer). While mirror life carries enormous
new possibilities including pathogenimmune humans, the dangers of such
synthetic life may include eradication of our life – mirror photosynthesizing
cyanobacteria could dominate our ecosystem. Hence, I believe there is now
required a wide discussion about the ongoing race to this synthesis.

Others:
dancing, climbing, biking, fencing, photography
Articles:
[1] J. Duda, Optimal encoding on
discrete lattice with translational invariant constrains using statistical
algorithms, arXiv:0710.3861
(2007),
[2] J. Duda, Analysis of the convex hull
of the attractor of an IFS, arXiv:0710.3863
(2007),
[3] J. Duda, Complex base numeral
systems, arXiv:0712.1309
(2007),
[4] J. Duda, Combinatorial invariants
for graph isomorphism problem, arXiv:0804.3615
(2008),
[5] Z. Burda, J. Duda,
J. M. Luck, B. Wacław, Localization of the Maximal
Entropy Random Walk, Phys.
Rev. Lett. 102, 160602 (2009),
[6] J. Duda, Asymmetric numeral systems,
arXiv:0902.0271 (2009),
[7] J. Duda, Fourdimensional
understanding of quantum mechanics, arXiv:0910.2724
(2009),
[8] Z. Burda, J. Duda,
J. M. Luck, B. Wacław, The various facets of random
walk entropy, Acta Phys. Polon.
B. 41/5 (2010),
[9] J. Duda, From Maximal Entropy Random
Walk to quantum thermodynamics, arXiv:1111.2253 (2011) (slides),
[10] J. Duda, P. Korus, Correction Trees
as an Alternative to Turbo Codes and Low Density Parity Check Codes, arXiv: 1204.5317
(2012),
[11] J. Duda, Optimal compression of
hashorigin prefix trees, arXiv:1206.4555 (2012) (slides),
[12] J. Duda, Embedding grayscale
halftone pictures in QR Codes using Correction Trees, arXiv:1211.1572
(2012) (slides),
[13] J. Duda, Asymmetric numeral
systems: entropy coding combining speed of Huffman coding with compression rate
of arithmetic coding, arXiv:1311.2540 (2013) (slides),
[14] J. Duda, Joint error
correction enhancement of the Fountain Codes concept, arXiv:1505.07056 (2015),
[15] J. Duda, Normalized
rotation shape descriptors and lossy compression of molecular shape, arXiv:1505:09211
(2015) (slides),
[16] J. Duda, G. Korcyl,
Designing dedicated data compression for physics experiments within FPGA
already used for data acquisition, arXiv:1511.00856 (2015),
[17] J. Duda, P. Korus, N. J. Gadgil, K. Tahboub, E. J. Delp, ImageLike 2D Barcodes Using Generalizations Of The KuznetsovTsybakov Problem, IEEE
Transactions on Information Forensics & Security volume 11, issue 4
(2016),
[18] J. Duda,
W. Szpankowski, A. Grama,
Fundamental Bounds and Approaches to Sequence Reconstruction from Nanopore
Sequencers, arXiv:1601.02420 (2016),
[19] J. Duda, DistortionResistant
Hashing for rapid search of similar DNA subsequence, arXiv:1602.05889
(2016),
[20] Y. Baryshnikov, J. Duda, W. Szpankowski,
Types of Markov Fields and Tilings, IEEE
Transactions of Information Theory volume 62, issue 8 (PDF) (2016),
[21] J. Duda, Nonuniform probability modulation for reducing energy consumption
of remote sensors, arXiv:1608.04271 (2016),
[22] J. Duda, Practical
estimation of rotation distance and induced partial order for binary trees, arXiv:1610.06023
(2016),
[23] A. Magner, J. Duda,
W. Szpankowski, A. Grama, Fundamental
Bounds for Sequence Reconstruction from Nanopore Sequencers, IEEE Transactions on
Molecular, Biological, and MultiScale Communications (2016),
[27] J. Duda, Improving Pyramid Vector
Quantizer with power projection, arXiv:1705.05285 (2017),
[28] J. Duda, Fourdimensional
understanding of quantum mechanics and computation, arXiv:0910.2724v2
(2017),
[29] J. Duda, Polynomialbased rotation
invariant features, arXiv:1801.01058 (2018),
[30] J. Duda, Hierarchical correlation
reconstruction with missing data, for example for biologyinspired neuron, arXiv:1804.06218
(2018) (slides),
[32] J. Duda, M. Snarska,
Modeling joint probability distribution of yield curve parameters, arXiv:1807.11743
(2018),
[33] J. Duda, Gaussian AutoEncoder, arXiv:1811.04751
(2018) (slides),
[34] J. Duda, A. Szulc,
Credibility evaluation of income data with hierarchical correlation
reconstruction, arXiv:1812.08040 (2018) (slides),
Conference papers:
[1] J. Duda, From Maximal Entropy Random
Walk to quantum thermodynamics, J. Phys.: Conf. Ser. 361 012039 (2012),
[2] Y. Baryshnikov, J. Duda, W. Szpankowski, Markov Fields Types and Tilings,
ISIT
2014 (2014),
[3] J. Duda, N. Gadgil,
K. Tahboud, E. J. Delp,
Generalizations of the KuznetsovTsybakov problem for
generating imagelike 2D barcodes, ICIP
2014 (2014),
[4] J. Duda, N. Gadgil,
K. Tahboud, E. J. Delp, The
use of Asymmetric Numeral Systems as an accurate replacement for Huffman
coding, PCS 2015, (PDF),
Here are 10 simulators presenting subjects I worked on in
intuitive, interactive way:
http://demonstrations.wolfram.com/author.html?author=Jarek+Duda
Some my implementations: https://github.com/JarekDuda
Quantum foundations seminar: http://th.if.uj.edu.pl/~dudaj/OPMK.pdf