0% found this document useful (0 votes)

42 views22 pages

Approximately Aligned Decoding

The document presents a new method for decoding outputs from Large Language Models (LLMs) that balances computational efficiency with the preservation of output distribution, addressing the issue of undesired outputs such as errors and low-probability amplifications. This method is shown to achieve comparable task-specific performance to existing methods while converging faster, particularly in scenarios with challenging constraints. The authors compare their approach to various existing techniques, highlighting the limitations of constrained generation and rejection sampling.

Uploaded by

sabyasachi.ray270

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

42 views22 pages

Approximately Aligned Decoding

Uploaded by

sabyasachi.ray270

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

A PPROXIMATELY A LIGNED D ECODING

Daniel Melcer∗ Sujan Gonugondla∗

Khoury College of Computer Sciences Meta New York
Northeastern University New York, NY, USA
Boston, MA, USA gsujan@[Link]
melcer.d@[Link]

Pramuditha Perera Haifeng Qian Wen-Hao Chiang

AWS NGDE AWS NGDE AWS NGDE
New York, NY, USA New York, NY, USA New York, NY, USA
arXiv:2410.01103v1 [[Link]] 1 Oct 2024

pramudi@[Link] qianhf@[Link] cwenhao@[Link]

Yanjun Wang Nihal Jain Pranav Garg

AWS NGDE AWS NGDE AWS NGDE
New York, NY, USA New York, NY, USA New York, NY, USA
yanjunw@[Link] nihjain@[Link] prangarg@[Link]

Xiaofei Ma Anoop Deoras

AWS NGDE AWS NGDE
New York, NY, USA New York, NY, USA
xiaofeim@[Link] adeoras@[Link]

A BSTRACT
It is common to reject undesired outputs of Large Language Models (LLMs);
however, current methods to do so require an excessive amount of computation,
or severely distort the distribution of outputs. We present a method to balance the
distortion of the output distribution with computational efficiency, allowing for the
generation of long sequences of text with difficult-to-satisfy constraints, with less
amplification of low probability outputs compared to existing methods. We show
through a series of experiments that the task-specific performance of our method
is comparable to methods that do not distort the output distribution, while being
much more computationally efficient.

1 I NTRODUCTION
Language models sometimes generate undesirable outputs, such as syntactically-incorrect code, hal-
lucinated PII, or profanity. These conditions, which we collectively refer to as errors for the remain-
der of the paper, can be detected with incremental parsers, regular expression matching, or even
simple substring searches. However, once detection occurs, there are several competing methods for
mitigating errors in the output.
One set of methods, constrained generation (Beurer-Kellner et al., 2024; Geng et al., 2024; Melcer
et al., 2024), avoids errors by disabling the generation of any token that immediately leads to such
an error. While this method is effective, it can lead to the amplification of low-probability outputs.
Another class of methods avoids errors without any amplification of low-probability outputs, at the
cost of additional computation. Rejection sampling is the simplest such method; i.e. if the output
contains an error, simply generate another sample until the output is acceptable. Adaptive Sam-
pling with Approximate Expected Futures (ASAp) (Park et al., 2024) provides a performance im-
provement over rejection sampling while maintaining the output distribution by effectively sampling
without replacement, but there are still many situations in which it may converge too slowly.
∗
Work performed at Amazon

1
We present a method that allows for a useful midpoint in the tradeoff between computational effi-
ciency and maintenance of the output distribution. In our experiments, we show that our method
obtains task-specific performance on par with ASAp, while converging significantly faster when the
constraints are difficult to satisfy.

2 P RELIMINARIES
We first describe autoregressive language models and their properties. We then discuss speculative
decoding, a method closely related to the algorithm that we will introduce.

2.1 AUTOREGRESSIVE L ANGUAGE M ODELS

Algorithm 1 Generation with an autoregressive language model

procedure G ENERATE(P, x1...n ) ▷ Initial x1...n is the prompt
while Stopping condition not met do ▷ Typically special EOS token, and length limit
Sample one token xn+1 ∼ P (·|x)
Increment n
return x

We assume that a vocabulary V of tokens is provided. An autoregressive language model is a func-

tion approximator trained to predict P (xn |x1...n−1 ); the conditional probability of token xn ∈ V,
given a sequence of existing tokens x1...n−1 ∈ V ∗ .
Algorithm 1 describes repeated sampling fromQa language model. This process results in an implicit
probability distribution over V ∗ : P (x1...n ) = i∈[1...n] P (xi |x1...i−1 ).
Note that there are several other methods for token selection; i.e. greedy selection, beam search, etc.
While we focus on sampling, the techniques we present may also be applicable to other methods.

2.2 S PECULATIVE D ECODING

Algorithm 2 Speculative sampling procedure

procedure S PEC S AMPLE(P, S, n, x1...m ) ▷ xn+1...m are from SSM
for i ∈ [n + 1 . . . m] do ▷ May be vectorized instead of iterative
r ← P (xi |x1...i−1 )/S(xi |x1...i−1 ) ▷ Probabilities are already calculated and cached
with probability r do ▷ Always if r ≥ 1
continue ▷ Accept xi
else ▷ Reject xi , sample a replacement token
Calculate residuals R(t) = max(0, P (t|x1...i−1 ) − S(t|x1...i−1 ))
return x1...i−1 , S AMPLE(N ORMALIZE(R(·)))
return x1...m , S AMPLE(P (·|x1...m )) ▷ Accepted whole sequence, can sample xm+1

Language models with many parameters (“Large Language Models”, or LLMs) are useful, but can
require considerable computational resources to evaluate. Moreover, the autoregressive sampling
process is inherently sequential, meaning that additional parallel computation resources cannot be
fully utilized to decrease generation latency, especially for longer sequences.
Speculative decoding (Leviathan et al., 2023; Miao et al., 2024; Cai et al., 2024; Li et al., 2024) is
one popular approach to decrease latency. This method assumes the existence of a small speculative
model (SSM) S that approximates the output of the LLM, using fewer computational resources.
Given input tokens x1...n , the SSM is autoregressively sampled for m tokens, resulting in tokens
xn+1...m . Then, the LLM P is used to compute P (xi+1 |x1...i ) for i ∈ [n . . . m]; this computation is
parallelizable. Finally, Algorithm 2 is used to select a prefix x1...k for k ∈ [n, m] of tokens to accept;
all later tokens are discarded. Additionally, because the probabilities P (·|x1...k ) have already been
computed, Algorithm 2 samples a new token xk+1 . This process maintains the property that the
distribution of sequences produced by this process matches the sequence distribution of P .

2
Unconstrained Ideal Constrained
̂
ℬ ℬ
⋅ | x1…n)
P( ⋅ | x1…n) P ( ⋅ | x1…n) P(

0.5 A 0.25 0 A 0 0 A 0
A 0.5 A 1 A 1
0.5 B 0.25 0.33 B 0.33 0.5 B 0.5
x1…n x1…n x1…n

0.5 0.67 0.5

0.5 A 0.25 0.5 A 0.33 0.5 A 0.25
B 0.5 B 0.5 B 0.5
B 0.25 B 0.33 B 0.25

Figure 1: Sampling in an example domain where B = {AA}. (Left) The language model assigns
equal probability to all sequences. (Center) With ‘AA’ as an error, its probability mass should be
equally redistributed to all other sequences. (Right) With constrained generation, the entire proba-
bility mass of ‘AA’ is shifted onto ‘AB’, significantly overrepresenting its probability.

3 P ROBLEM S TATEMENT AND E XISTING A PPROACHES

Error Set B ⊂ V ∗ is the set of strings containing errors.

We make the mild assumption that if string x1...n ∈ B, then all strings with x1...n as a prefix are
also members of B; i.e. adding additional text does not negate an error. Note that this assumption
requires careful design of the error set; for example, when profane words are substrings of benign
words (Francis, 2020), or un-parseable code can be made valid by adding additional text. B will
𝒞
often be infinite size; therefore, most sampling methods treat it as a black-box indicator function.
We define the probability distribution obtained by sampling P , except for any elements of B:
(
B
w∈B 0
P̂ (w) = P P (w)
(1)
w∈
/B
/ P (w)
w∈B

Problem 1. Given an autoregressive language model P over alphabet V, and error set B ⊂ V ∗ ,
provide a method to sample from P̂ B .

Rejection sampling is the most straightforward

P method for sampling from P̂ B ; however, it may
require a large number of evaluations as w∈B P (w) approaches 1. For example, consider a domain
where each token has, approximately, some non-zero probability p of being an error—we assume
that the language model has a somewhat consistent error rate per token. If d tokens are generated,
an output has approximately a (1 − p)d probability of being error-free; thus requiring on average
1
(1−p)d
generations. We consider such domains—domains where the probability of generating an
error approaches 1 for longer generations—to have dense error sets.

3.1 E XISTING A PPROACH : C ONSTRAINED G ENERATION

Constrained generation attempts to solve the error-free generation problem by using a greedy algo-
rithm: during token selection, the algorithm always avoids selecting any tokens that immediately
lead to an error. Note that this algorithm assumes that if string x1...n ∈
/ B, then there exists at least
one available token xn+1 ∈ V such that x1...n+1 ∈ / B; however, this assumption may be weakened
if backtracking is allowed, in cases where every token leads to an immediate error.
The constrained generation algorithm has the effect of sampling from the following probability
distribution for each token:

x1...i ∈ B 0
CPB (xi |x1...i−1 ) = N ORMALIZE (2)
x1...i ∈
/B P (xi |x1...i−1 )

Repeated sampling of this distribution leads to some troubling properties.

3
As in Section 2.1, a repeated sampling process results in a derived distribution CPB (x1...n ) =
B B B
Q
i∈[1...n] CP (xi |x1...i−1 ). It is often the case that for sequence x1...n , CP (x1...n ) ≫ P̂ (x1...n );
i.e. low-probability samples are amplified by the constrained generation process.
The fundamental issue is that the constrained generation algorithm commits to a given prefix, even
if the most probable sequences beginning with that prefix are errors. Figure 1 provides a simple
example of this occurrence. Note that this distortion is even worse in low-entropy scenarios; if
P (B|x1...n , A) were lowered to 0.0001, it would still be the case that CPB (AB|x1...n ) = 0.5. This
amplification effect compounds exponentially for longer sequences.

3.2 E XISTING A PPROACH : S AMPLING W ITHOUT R EPLACEMENT

Algorithm 3 ASAp
procedure ASA P(P, B, x1...n ) ▷ x1...n is prompt
P̂ B ← P
while Limit not reached do
Sample sequence xn+1...m ∼ P̂ B (·|x1...n ) until error or stopping condition
if x1...m ∈
/ B then break
P̂ B ← A DD BAD S AMPLE(P̂ B , x1...m ) ▷ Remove x1...m as a possible sequence
return X
procedure A DD BAD S AMPLE(P̂ B , x1...m )) ▷ In practice, only adjust xn+1...m
P̂ B∪{x} ← P̂ B
for xi ∈ (xm , . . . , x1 ) do ▷ Note that token sequence is reversed
▷ Remove probability of x1...m , without changing probability of any other sequence ◁
B∪{x} B B
P̂ (xi |x1...i−1 ) ← P̂ (xi |x1...i−1 ) − P̂ (xi...m |x1...i−1 )
Renormalize P̂ B∪{x} (·|x1...i−1 )
return P̂ B∪{x}

Adaptive Sampling with Approximate Expected Futures (ASAp) (Park et al., 2024) is a technique to
sample exactly from the distribution of P̂ B . ASAp begins similarly to rejection sampling, but it iter-
atively builds set B ⊆ B containing all encountered samples that have been rejected so far. Because
B is finite, the conditional probabilities P̂ B (xi |x1...i−1 ) can be efficiently calculated, allowing for
the algorithm to sample from P̂ B exactly. If the sampled sequence is a member of B, it is added to
B, and the sampling process repeats.
In the limit of repeated samples, B will approach B, and therefore, P̂ B will approach P̂ B . Impor-
tantly, if x ∼ P̂ B is sampled such that x ∈
/ B, this sample may be accepted, even though B ̸= B.
This procedure is equivalent to sampling without replacement, adapted to autoregressive genera-
tion. While ASAp succeeds in cases where there are only a small number of errors that comprise
the majority of the probability mass, its generation speed suffers when there are a large number of
errors—each error must be discovered before it is added to B. In dense probability sets, its perfor-
mance characteristics are similar to rejection sampling, as there are an exponential number of error
sequences that must be discovered as generation length increases.

3.3 E XISTING A PPROACHES : P OSTERIOR E STIMATION

We note two additional methods that, although they use very different formalizations and implemen-
B
P sample from P̂ . In both cases,
tations from each other, rely on a similar core idea to approximately
for any given prefix x1...n , these methods create an estimator of xn+1...m ∈Σ∗ P (xn+1...m |x1...n ) ×
1x1...m ∈B ; i.e. the likelihood of an error in all sequences that begin a specific prefix, weighted by
the probability of generating each sequence. This is used to estimate P̂ B .
Lew et al. (2023) create this estimate using Monte Carlo sampling. They apply several optimiza-
tions such as sampling without replacement, and aggressive pruning of low-probability branches. In
contrast, Zhang et al. (2024), building off the work of Zhang et al. (2023), first distill a LLM into a
hidden Markov model (HMM) with a tractable number of states (thousands or tens of thousands). If

4
the constraint can be expressed as a DFA over tokens—not always possible for certain constraints—
their method takes the product system of the DFA and HMM, and then calculates the probability of
an error in this product system.
Both of these methods, while exhibiting impressive results on many tasks, also face issues in certain
dense error sets—the approximation of the posterior tends to become inaccurate when arbitrary
generations almost immediately lead to an error.

4 M ETHOD
4.1 P REVIOUS ITERATIONS OF ASA P ARE (A LMOST ) S MALL S PECULATIVE M ODELS

Algorithm 4 Our Method: Approximately Aligned Decoding (AprAD)

procedure A PPROX A LIGNED D ECODING(P, B, x1...n ) ▷ x1...n is prompt
P̂ B ← P
m←n ▷ Current token index
while Stopping condition not met do
Sample one token xm+1 ∼ P̂ B (·|x1...m )
Increment m
if x1...m ∈ B then
P̂ B∪{x} ← A DD BAD S AMPLE(P̂ B , x1...m ) ▷ Algorithm 3
x1...m ← S PEC S AMPLE(P̂ B , P̂ B∪{x} , 0, x1...m ) ▷ Algorithm 2—m may decrease
P̂ B ← P̂ B∪{x}
return x1...m

For some iteration of ASAp, with B as the set of observed errors so far, let x = (x1 , . . . , xn ) be
a trace drawn from P̂ B , where it is discovered that x ∈ B. We observe that P̂ B and P̂ B∪{x} are
almost always near-identical probability distributions, with P̂ B∪{x} generally as a “more accurate”
distribution because it incorporates an additional error sample.
Our method reduces computation by using the sample x ∼ P̂ B to approximate a sample x′ ∼
P̂ B∪{x} , in a similar manner to how speculative decoding uses a sample from a SSM to approximate
a sample from a LLM. By evaluating S PEC S AMPLE(x, P̂ B , P̂ B∪{x} ), our method obtains a prefix
of x that can be used as a starting point for sampling again. Because the distributions of P̂ B and
P̂ B∪{x} are so close to each other, this prefix is usually most of the length of x. In contrast, ASAp
would involve backtracking to the beginning of the generation. This process is given as Algorithm
4; we refer to it as Approximately Aligned Decoding, or AprAD.
However, AprAD does not perfectly maintain the output distribution: Algorithm 4 amplifies some
sequence probabilities because it only invokes S PEC S AMPLE after discovering an error. To main-
tain the output distribution, S PEC S AMPLE should always be invoked for strings x1...n where
∃i ∈ [1 . . . n], P̂ B (xi |x1...i−1 ) < P (xi |x1...i−1 )—but the algorithm has no way of checking if
this condition holds without iterating through every suffix, negating any performance benefit.
Even though the AprAD does not perfectly maintain the output distribution, we show in the fol-
lowing sections that it provides a very useful midpoint in the tradeoff of computational complexity
versus task-specific performance and accuracy.

4.2 A NALYSIS

Let AB
P (x1...n ) represent the probability of the AprAD method producing sequence x1...n .

For x1...n ∈ B, AB P (x1...n ) = 0 . For all other sequences, we provide evidence that AprAD more
closely follows the ideal distribution, compared to constrained generation. While the nature of the
iterative process makes it difficult to write a closed form description of the probability amplification
AB P (x1...n )
P̂ B (x1...n )
, less probability amplification occurs with AprAD than with constrained generation when
an error is detected, as an error’s probability mass is “distributed” over many sequences due to the

5
ASAp Constrained AprAD (Ours)
Error Set KL-div Ratio KL-div Ratio KL-div Ratio
∅ 0.0014 1.000 0.0014 1.000 0.0014 1.000
AAA 0.0014 1.020 0.0075 1.000 0.0046 1.004
AAA, AAC 0.0012 1.041 0.0429 1.000 0.0157 1.013
AAA, ACC 0.0013 1.042 0.0138 1.000 0.0093 1.009
AAA, CCC 0.0010 1.044 0.0155 1.000 0.0074 1.010
AAA, AAB, ABA, BAA 0.0013 1.093 0.0504 1.000 0.0224 1.024
A** except AAC 0.0014 1.232 0.3836 1.113 0.1540 1.205
*** except AAA, AAB, ABA, BAA 0.0000 3.644 0.1771 1.670 0.0521 2.142
*** except AAA, BAA 0.0000 5.701 0.0000 1.784 0.0000 2.653

Table 1: KL-Divergence and generation ratios for simulated task with various error sets. Lower is
better for both. Stars in the error set are wildcards; i.e. AB* means ABA, ABB, and ABC. Note
that constrained generation will backtrack if all tokens for a given prefix are disallowed, resulting in
ratios greater than 1 for some error sets.

speculative sampling operation. In contrast, with constrained generation, an error’s probability mass
is moved entirely to sequences that share n − 1 prefix tokens.
We empirically show that AprAD is closer to the ideal distribution, compared to constrained decod-
ing, by creating a testbench to simulate an environment where the ideal distribution is known. The
testbench contains a simulated language model that always returns one of three tokens (A, B, and
C) with equal probability. We mark k sequences of length 3 as errors, and use the sampling method
under test to sample 10000 sequences of length 3. The ideal distribution is trivial to compute—
1
probability 27−k for every non-error sequence. To measure how a sampling process compares to
the ideal distribution, we compute the KL-divergence between the observed distribution and ideal.
Additionally, we measure the Generation Ratio; i.e. how many times the language model must be
evaluated, divided by the number of tokens generated in the output.
The results are shown in Table 1, indicating that our method approximates the ideal distribution
more closely than constrained generation, with a lower generation ratio than ASAp.

5 E VALUATION

While Section 4.2 shows that our method performs well in a simulated domain, the following exper-
iments test the sampling methods on a series of more difficult, real-world tasks.

5.1 L IPOGRAMS (T EXT G ENERATION WITH L ETTER E XCLUSIONS )

It is common in poetry or creative writing exercises to write text without using a specific letter; a
product of this exercise is called a lipogram. Lipograms where the excluded letter is a vowel tend to
be more difficult to create than with other letters. Large language models often fail at this task, and
more generally, most tasks dependent on individual letters rather than entire tokens.
We use Mistral-7B-Instruct-v0.2 (Jiang et al., 2023) to generate lipograms with vowels as the ex-
cluded letter. We prompt the LLM to perform one of five simple tasks (detailed in Appendix A).
Each task is appended to instructions to avoid using one of the five vowels, resulting in 25 prompts.
For each prompt, we generate a completion with four sampling methods: unconstrained generation,
constrained generation, ASAp, and AprAD, for up to 200 tokens. If the process reaches 2000 model
invocations, generation is interrupted, and the last sequence before an error was detected is returned.
We then ask human raters to score each completion on quality, regardless of if the constraint was
followed, on a scale of 1-5. If the forbidden letter is detected in the output, we then mark the
constraint as violated. Otherwise, we also ask the human raters to decide if the output violates
the intent of the constraint; i.e. by answering in a foreign language, adding unnecessary accents,
swapping for Cyrillic lookalike characters, etc. Additional information about the rating process is
provided in Appendix A.

6
Summarize the history of artificial intelligence without using the letter "A". AprAD (Ours)
The history of intelligent technology unfolds through key milestones in computing. In the
mid-20th century, pioneering minds nurtured the seeds of this field through the development
of logistic models, which evolved into expert systems in the 1960s. These systems, relying on
heuristics, could solve complex problems in fields like medicine or meteorology.

Summarize the history of artificial intelligence without using the letter "A". Unconstrained
In the early days, brilliant minds pondered on creating machines that could mimic human
intelligence. These pioneers, including Alan Turing and Marvin Minsky, theorized and exper-
imented. They aimed to create intelligent machines, using logic and mathematics.

Summarize the history of artificial intelligence without using the letter "A". ASAp
History of Intelligence

Summarize the history of artificial intelligence without using the letter "A". Constrained
In the mid-20th century, scientists delved deep into investiging computionl modeles for intel-
ligencen modeled on the workings of the brn. Pioneers such как E.N. Turing, Clrk Quincy
Shrffngton, иnddd W. Ross Quilp, аnd Маrvin Minsky led ths fld. They explored logic-gаted
progrаms, thm brgth mchr tо lf understnding the cptures of th gnntl mind.

Figure 2: Representative generation samples for all four methods, using M ISTRAL -7B-I NSTRUCT-
V 0.2. Appearance of the banned letter is bolded, and non-ASCII characters (all Cyrillic in this
example) are colored red. Full samples are provided in Appendix D.

The results of this evaluation are provided in Table 2, and a representative sample of the outputs are
provided in Figure 2. All outputs and rater scores are included in the supplemental material, and
additional examples are provided in Appendix D.
As shown by these results, AprAD consistently produces high-quality outputs, nearly matching
the readability of unconstrained generation. Additionally, it consistently follows the intent of a
constraint—while all methods except for unconstrained generation follow the constraint, constrained
generation often does so in an undesirable manner, rather than by selecting appropriate words that
do not contain the banned letter. Finally, while the computation required by AprAD is more than
constrained generation, our method is usually able to finish generation in the allotted computation
limit, while ASAp struggles to generate more than a handful of tokens with the same constraint.

Method Quality (1-5) Constraint Intent (1-3) Gen. Ratio

Constrained 3.56 ± 1.69 2.32 ± 0.90 1.00 ± 0.00
Ours 4.52 ± 1.16 2.84 ± 0.55 4.20 ± 4.53
ASAp 1.72 ± 1.17 2.36 ± 0.81 321.00 ± 397.14
Unconstrained 4.68 ± 0.85 1.00 ± 0.00 1.00 ± 0.00

Table 2: Lipogram evaluation results, including subjective rater evaluations of output quality and
adherence to the constraint. Higher is better for Quality and Constraint Intent; lower is better for
Generation Ratio. Note that constraint intent is automatically set to 1 if the banned letter is detected
in the output. Generation is stopped after 2000 model invocations and the output provided as-is; for
ASAp, this often resulted in very short outputs and thus extreme generation ratios and low scores.
Several raters interpreted these short outputs as violating constraint intent.

5.2 B IG C ODE B ENCH H ALLUCINATION AVOIDANCE

We created a program to incrementally detect hallucinated API calls in partial Python programs,
based on the Pyright language server (Microsoft, 2019). Note that in order to satisfy the assump-
tions in Section 3—that any string with an error as a prefix is itself an error—we constructed the
detector to act conservatively. False negatives are strongly preferred over false positives. For ex-
ample, even if name foo is never defined, the detector does not consider the incomplete program

7
Size Method Pass@1 Pass@5 !NameErr@1 !NameErr@5 Gen. Ratio
Unconstrained 0.214 0.498 0.831 0.996 1.000 ± 0.000
Ours 0.259 0.541 0.976 1.000 1.080 ± 0.385
15b
ASAp 0.261 0.536 0.976 1.000 1.555 ± 3.906
Constrained 0.221 0.506 0.930 1.000 1.005 ± 0.053
Unconstrained 0.119 0.345 0.800 0.987 1.000 ± 0.000
Ours 0.145 0.375 0.950 0.993 1.064 ± 0.402
7b
ASAp 0.152 0.395 0.952 0.993 1.468 ± 2.574
Constrained 0.124 0.345 0.891 0.993 1.005 ± 0.029

Table 3: Subset of tasks where at least one trial results in a different output: 233 tasks (20.4%) for
15b, 304 tasks (26.7%) for 7b. For both model sizes, of the tasks where at least one model output is
different, an average of 1.5 out of 5 outputs are different. Lower is better for generation ratio; higher
is better for all others. Our method approaches the task performance of ASAp, with a generation
ratio close to that of constrained generation.

“example([Link]” as an error, because it is possible to later add text that turns the function
argument into a generator expression that binds foo.
We compare the performance of AprAD, constrained to avoid producing code with hallucinated
API calls, relative to other sampling methods by evaluating on BigCodeBench (Zhuo et al., 2024), a
benchmark that focuses on practical programming tasks, often requiring the use of common libraries.
An analysis of the solutions that several common LLMs generate reveals that their solutions often
require imports available in the testing environment, but which are not listed in the prompt. In order
for the hallucination detection program to discover these available resources, we add all imports
available in the test environment to the dataset prompt for this experiment.
For all sampling methods, we use Starcoder2 (Lozhkov et al., 2024), in the 7B and 15B model sizes.
We generate 5 samples for each task, with temperature 0.8, and a top-p of 0.95. In addition to
evaluating the pass@1 and pass@5 rates on execution-based tests, we log if the evaluation specif-
ically fails with a NameError or UnboundLocalError as an indicator that the generation included a
hallucinated API call,1 and calculate the rate at which this does not occur.
Note that all methods use the same random seed, so the outputs only diverge if and when the detector
activates. Table 3 shows the results for all tasks where the outputs diverge in any method; Table 4
(Appendix) also includes the tasks for which all methods return identical results. As the results
show, the output quality of AprAD is close to ASAp, while its generation ratio is much lower.

6 D ISCUSSION , F UTURE W ORK , & C ONCLUSION

As our experiments show, Approximately Aligned Decoding is an effective method to generate

sequences under dense language model constraints.
One observation we made while implementing our experiments is that constrained generation,
ASAp, and AprAD are specializations of the same general algorithm, differing only in their back-
tracking behavior upon the discovery of an error: ASAp always backtracks to the beginning of
the sequence, constrained generation only backtracks to immediately before the error, and AprAD
probabilistically backtracks, with the most likely targets being recent high-entropy tokens. Section
C (Appendix) provides additional details on this generalization.
This hints that there may additional points in the tradeoff between generation speed and perfect
replication of the ideal output distribution. For example, it may be possible to introduce an parameter
to the speculative sampling subroutine that adjusts how likely the algorithm is to accept a given
token. As this probability is pushed towards zero, fewer tokens are accepted, and the resulting
algorithm approaches the behavior of ASAp. In contrast, if the probability is pushed towards one,
1
This is an undercount of the number of hallucinated names: many outputs include hallucinations, but fail
before reaching the hallucinated variable or method name, resulting in some other error. Some hallucinated
method names lead to an AttributeError being raised. However, AttributeError is also raised for improper use
of None, and similar issues that are not a result of hallucination, so we do not count it as a NameError.

8
the algorithm doesn’t backtrack as far, and the resulting behavior approaches constrained decoding.
The optimal point in this tradeoff will depend on the specific task and error set, and is an area
for future research. As defined, however, AprAD occupies a useful position on this spectrum, is
straightforward to implement, and exhibits positive results on a variety of tasks.

R EFERENCES
Luca Beurer-Kellner, Marc Fischer, and Martin Vechev. Guiding llms the right way: Fast, non-
invasive constrained generation, 2024. URL [Link]
Tianle Cai, Yuhong Li, Zhengyang Geng, Hongwu Peng, Jason D. Lee, Deming Chen, and Tri Dao.
Medusa: Simple LLM inference acceleration framework with multiple decoding heads, 2024.
URL [Link]
Darryl Francis. The Scunthorpe problem. Word Ways, 53(2), May 2020. URL https:
//[Link]/wordways/vol53/iss2/12.
Saibo Geng, Martin Josifoski, Maxime Peyrard, and Robert West. Grammar-constrained decoding
for structured nlp tasks without finetuning, 2024. URL [Link]
13971.
Albert Q. Jiang, Alexandre Sablayrolles, Arthur Mensch, Chris Bamford, Devendra Singh Chap-
lot, Diego de las Casas, Florian Bressand, Gianna Lengyel, Guillaume Lample, Lucile Saulnier,
Lélio Renard Lavaud, Marie-Anne Lachaux, Pierre Stock, Teven Le Scao, Thibaut Lavril,
Thomas Wang, Timothée Lacroix, and William El Sayed. Mistral 7b, 2023. URL https:
//[Link]/abs/2310.06825.
Yaniv Leviathan, Matan Kalman, and Yossi Matias. Fast inference from transformers via speculative
decoding, 2023. URL [Link]
Alexander K. Lew, Tan Zhi-Xuan, Gabriel Grand, and Vikash K. Mansinghka. Sequential monte
carlo steering of large language models using probabilistic programs, 2023. URL https://
[Link]/abs/2306.03081.
Yuhui Li, Fangyun Wei, Chao Zhang, and Hongyang Zhang. EAGLE: Speculative sampling requires
rethinking feature uncertainty, 2024. URL [Link]
Anton Lozhkov, Raymond Li, Loubna Ben Allal, Federico Cassano, Joel Lamy-Poirier, Noua-
mane Tazi, Ao Tang, Dmytro Pykhtar, Jiawei Liu, Yuxiang Wei, Tianyang Liu, Max Tian, De-
nis Kocetkov, Arthur Zucker, Younes Belkada, Zijian Wang, Qian Liu, Dmitry Abulkhanov,
Indraneil Paul, Zhuang Li, Wen-Ding Li, Megan Risdal, Jia Li, Jian Zhu, Terry Yue Zhuo,
Evgenii Zheltonozhskii, Nii Osae Osae Dade, Wenhao Yu, Lucas Krauß, Naman Jain, Yix-
uan Su, Xuanli He, Manan Dey, Edoardo Abati, Yekun Chai, Niklas Muennighoff, Xian-
gru Tang, Muhtasham Oblokulov, Christopher Akiki, Marc Marone, Chenghao Mou, Mayank
Mishra, Alex Gu, Binyuan Hui, Tri Dao, Armel Zebaze, Olivier Dehaene, Nicolas Patry, Can-
wen Xu, Julian McAuley, Han Hu, Torsten Scholak, Sebastien Paquet, Jennifer Robinson, Car-
olyn Jane Anderson, Nicolas Chapados, Mostofa Patwary, Nima Tajbakhsh, Yacine Jernite, Car-
los Muñoz Ferrandis, Lingming Zhang, Sean Hughes, Thomas Wolf, Arjun Guha, Leandro von
Werra, and Harm de Vries. Starcoder 2 and the stack v2: The next generation, 2024. URL
[Link]
Daniel Melcer, Nathan Fulton, Sanjay Krishna Gouda, and Haifeng Qian. Constrained decoding for
code language models via efficient left and right quotienting of context-sensitive grammars, 2024.
URL [Link]
Xupeng Miao, Gabriele Oliaro, Zhihao Zhang, Xinhao Cheng, Zeyu Wang, Zhengxin Zhang, Rae
Ying Yee Wong, Alan Zhu, Lijie Yang, Xiaoxiang Shi, Chunan Shi, Zhuoming Chen, Daiyaan
Arfeen, Reyna Abhyankar, and Zhihao Jia. Specinfer: Accelerating large language model serv-
ing with tree-based speculative inference and verification. In Proceedings of the 29th ACM In-
ternational Conference on Architectural Support for Programming Languages and Operating
Systems, Volume 3, ASPLOS ’24. ACM, April 2024. doi: 10.1145/3620666.3651335. URL
[Link]

9
Microsoft. Pyright, 2019. URL [Link]
Kanghee Park, Jiayu Wang, Taylor Berg-Kirkpatrick, Nadia Polikarpova, and Loris D’Antoni.
Grammar-aligned decoding, 2024. URL [Link]
Honghua Zhang, Meihua Dang, Nanyun Peng, and Guy Van den Broeck. Tractable control for
autoregressive language generation, 2023. URL [Link]
Honghua Zhang, Po-Nien Kung, Masahiro Yoshida, Guy Van den Broeck, and Nanyun Peng. Adapt-
able logical control for large language models, 2024. URL [Link]
2406.13892.
Terry Yue Zhuo, Minh Chien Vu, Jenny Chim, Han Hu, Wenhao Yu, Ratnadira Widyasari, Imam
Nur Bani Yusuf, Haolan Zhan, Junda He, Indraneil Paul, Simon Brunner, Chen Gong, Thong
Hoang, Armel Randy Zebaze, Xiaoheng Hong, Wen-Ding Li, Jean Kaddour, Ming Xu, Zhihan
Zhang, Prateek Yadav, Naman Jain, Alex Gu, Zhoujun Cheng, Jiawei Liu, Qian Liu, Zijian Wang,
David Lo, Binyuan Hui, Niklas Muennighoff, Daniel Fried, Xiaoning Du, Harm de Vries, and
Leandro Von Werra. Bigcodebench: Benchmarking code generation with diverse function calls
and complex instructions, 2024. URL [Link]

10
A L IPOGRAM EVALUATION DETAILS
We provide the following prompts to the language model, as well as the relevant special tokens to
delimit user instructions and chat turns.

1. Write a story without using the letter “[A/E/I/O/U]”.

2. Describe elephants without using the letter “[A/E/I/O/U]”.
3. Provide instructions to tie a tie without using the letter “[A/E/I/O/U]”.
4. Critique the Mona Lisa without using the letter “[A/E/I/O/U]”.
5. Summarize the history of artificial intelligence without using the letter “[A/E/I/O/U]”.

Each prompt is combined with each vowel, resulting in 25 prompts. With four sampling methods,
this results in 100 total generations.
During sampling, we use a top-k of 20, and temperature of 0.8.

A.1 R ATER I NSTRUCTIONS AND D ETAILS

We create a file that only contains the 100 prompt-completion pairs, without information on which
method generated each completion. All samples are shuffled in random order.
We selected four AI researchers not otherwise directly involved in the experimental evaluation of
this method as human raters, to evaluate 25 samples each. We provided the following instructions to
the raters:

This file contains a set of prompts, and responses using one of several meth-
ods. Each prompt contains a constraint to not use a specific letter. Irrespective
of whether the response follows the constraint, rate the response quality on a scale
of 1-5 in the “Score” column, noting that generation is always cut off after 200
tokens.
Additionally, rate how well the response follows the intent of the constraint in the
“Follows Intent” column. Examples of not following the intent include working
around the constraint by excessively dropping letters, using unnecessary accents,
writing Unicode lookalike letters, or responding in a foreign language, rather than
through selecting appropriate words that satisfy the constraint. This column is
pre-filled with ‘X’ if the output contains the banned letter. Otherwise, write 1 if it
violates the intent, 2 if it is ambiguous, and 3 if it does not.

We additionally highlighted the presence of non-ASCII lookalike letters to the human raters. The
complete model outputs, and the scores that each rater assigned, are provided in the supplementary
material. Additional example outputs are provided in Appendix D.

B A DDITIONAL B IG C ODE B ENCH R ESULTS

Size Method Pass@1 Pass@5 !NameErr@1 !NameErr@5 Gen. Ratio

Unconstrained 0.306 0.582 0.950 0.995 1.000 ± 0.000
Ours 0.316 0.590 0.980 0.996 1.016 ± 0.177
15b
ASAp 0.316 0.589 0.980 0.996 1.113 ± 1.780
Constrained 0.308 0.584 0.971 0.996 1.001 ± 0.024
Unconstrained 0.202 0.466 0.927 0.993 1.000 ± 0.000
Ours 0.208 0.475 0.967 0.995 1.017 ± 0.209
7b
ASAp 0.210 0.479 0.968 0.995 1.125 ± 1.345
Constrained 0.203 0.467 0.952 0.995 1.001 ± 0.015

Table 4: Results for each method on entirety of BigCodeBench. Note that these results are identical
to those in Table 3, except that they are consistently offset and scaled to include values for tasks in
which all tasks return the same result.

11
Table 4 includes results for the entirety of BigCodeBench; not just the tasks for which the methods
diverged in their output.

C G ENERALIZATION OF E RROR -F REE D ECODING

Constrained generation, ASAp, and AprAD may all be generalized by their backtracking behavior
after an error is discovered. Algorithm 5 shows this generalization.

Algorithm 5 Many error-free decoding methods may be generalized by their behavior after an error
procedure E RROR F REE D ECODING(P, B, x1...n , S TRATEGY)
P̂ B ← P
m←n ▷ Current token index
while Stopping condition not met do
Sample one token xm+1 ∼ P̂ B (·|x1...m )
Increment m
if x1...m ∈ B then
P̂ B∪{x} ← A DD BAD S AMPLE(P̂ B , x1...m ) ▷ Algorithm 3
B B∪{x}
x1...m ← S TRATEGY(P̂ , P̂ , x1...m ) ▷ m may decrease
P̂ B ← P̂ B∪{x}
return x1...m
procedure A PR ADS TRATEGY(P̂ B , P̂ B∪{x} , x1...m )
return S PEC S AMPLE(P̂ B , P̂ B∪{x} , 0, x1...m ) ▷ Algorithm 2
procedure ASA P S TRATEGY(P̂ B , P̂ B∪{x} , x1...m )
return [] ▷ Backtrack to beginning
procedure C ONSTRAINED D ECODING S TRATEGY(P̂ B , P̂ B∪{x} , x1...m )
return x1...m−1 ▷ Delete the error token but don’t backtrack further

D A DDITIONAL L IPOGRAM E XAMPLES

The following pages contain generations from 10 random prompts of the lipogram task. The exam-
ples are not cherry-picked; all generations are available in the supplemental material. If the banned
letter appears—this occurs in all unconstrained examples, and no other method—it is bolded. Non-
ASCII characters are highlighted in red. We include </s> where the language model chooses to stop
generation; otherwise, it was cut off after 200 tokens or 2000 model invocations.

12
Provide instructions to tie a tie without using the letter "I". AprAD (Ours)
To create an elegant neckwear look, follow these steps to adorn your collar:
1. Gather your tools: Choose a well- Press your collar down and adjust the top button.
2. Select the perfect knot: There are many knots to choose from; select a preferred one.
3. Dress your end: Take the end nearest to you and form a loop, called the "small" or "narrow"
end.
4. Create an overlap: Cross the longer end, called the "large" end, over the loop.
5. Form a second loop: Take the small end and make another loop.
6. Create a "V" shape: Place the large end over the two loops. Adjust for symmetry.
7. Repeat: Cross the small end over the large end, then the large end over the small end to
create a "V" shape once more.
8. Tuck and adjust: T

Provide instructions to tie a tie without using the letter "I". Unconstrained
To create an elegant necktie knot, follow these steps:
1. Lay the tie flat on the table with the wide end (the end you’ll be tying first) facing you. The
thin end (the side you’ll be attaching to theshirt) should be hanging on the left.
2. Cross the wide end over the thin end and bring it behind the looped thin end.
3. Now, pass the wide end over the looped thin end from left to right, going around the back
of the necktie.
4. Keeping the wide end behind the looped thin end, cross it over to the left side and bring it
in front of the looped end.
5. Now, the wide end should be on the left and the looped end on the right. Cross the wide
end over the looped end again to the right side.
6. Pass the wide end under

Provide instructions to tie a tie without using the letter "I". ASAp
To create

Provide instructions to tie a tie without using the letter "I". Constrained
To create an elegant necktube knot for your formal att attar, follow these steps:
1. Lay the necktube flat on the table, end towards you.
2. Fold one end about four to four and a half инches (10-12 cm) over the other end.
3. Create a loop, called the "small loop," by fold the part of the necktube above the folded end,
about two to three инches (5-7.5 cm) down.
4. Create a second loop, called the "large loop," by fold the part of the necktube around your
neck, and rest the large loop over the small loop.
5. Tuck the end of the necktube under the large loop, then the small loop.
6. Pass the end of the necktube under the small loop and through the hole created by the knot.
7. Tug on

13
Provide instructions to tie a tie without using the letter "O". AprAD (Ours)
1. Begin with the tie unwinded and the widest end (the "bracelet end") tucked under the
neckband.
2. Bring the wider end (the "tie end") behind the neckband, then tuck it under the bracelet end
and pin it in place with a thin fabric "shelf" called the "keeper."
3. Create a "Wing Tip" by lifting the neckband and slipping the tie end under it, then pull it up
evenly and adjust the width until it’s level with the bracelet end.
4. With the tie end still held in place with the neckband, create a "Shelf" by lifting the bracelet
end and passing the tie end beneath it, then adjust the tie width again by pulling the tie end
evenly.
5. Create a "Three-inch rule" by sliding the bracelet end up three inches (7.6 cm

Provide instructions to tie a tie without using the letter "O". Unconstrained
1. Begin by laying out your tie on a flat surface with the wide end, known as the "tail," facing
you. The narrower end is called the "Tip."
2. Cross the Tip over the Tail from left to right.
3. Bring the Tail around the back of the neck and cross it over the Tip.
4. Now, take the Tip and thread it under the loop created by the Tail.
5. Pull the Tip up through the loop.
6. Adjust the position of the Tip and Tail to achieve a symmetrical knot.
7. Repeat the process on the other side, crossing the Tip over the Tail, bringing the Tail around
the back, threading the Tip under the loop, and pulling it through.
8. With both sides adjusted, adjust the knot one final time to ensure a neat and symmetrical
look.</s>

Provide instructions to tie a tie without using the letter "O". ASAp
1

Provide instructions to tie a tie without using the letter "O". Constrained
1. Begin by laying the necktie flat in a figure eight shape. The wide end (called the "butt end")
will be anchered at the left side, and the thin end (called the "tip end") will be adjusted at the
right side.
2. Bring the right side (the side with the tip end) up and behind the left side, creating a "U"
shape, then pull it straight up and tuck the tip end under the wide part at the nape (the back
part) near the center.
3. Next, bring the left side up and pass it (the wide end) behind the neck, then pull it straight
up and rest it against the right side.
4. With the left side still in hand, take the right side (tip end) and bring it up and pass it behind
the left side.
5. Tuck the tip end behind and under the wide part (where the

14
Write a story without using the letter "E". AprAD (Ours)
In a quaint, old-world town, a mystical story unfolds. A solitary knight, Sir Gavr, hails from a
distant land. His mission was vital, as his king had sought his aid. A monstrous dragon, with
a long, sinuous form, had laid claim to a bountiful land.
Sir Gavr, galloping through twisting hills and past lilac valms, his stallion, Swiftpaw, carrying
him swiftly. Through woodlands dark and mossy, a path narrow and winding. His armor
glinting gold, a radiant sight, as knight and mount, in rhythmic flight, cut through air, crisp
and light.
A labyrinth of hills and hollows, Sir Gavr and Swiftpaw, onward bound. A looming mountain,
a daunting sight, as dragon’s lair, a

Write a story without using the letter "E". Unconstrained

In a quaint, old-world village lived a wise artisan named Gobo. His humble abode was nestled
’twixt a grove of towering, verdant trees. His days were passed in crafting intricate, exquisite
pottery pieces, each more beautiful than the last.
Gobo sourced clay from a nearby stream, where its cool, crystalline waters danced over
smooth, mossy pebbles. The sun, a golden orb, bathed the clay in warm hues before Gobo
shaped it with skilled, nimble fingers. A wheel, spun by muscular arms, transformed the
malleable substance into symmetrical rounds.
Once shaped, pieces were set to dry under the clear, azure sky. With patience, they trans-
formed into delicate, hardened creations. With deft strokes, Gobo adorned pots and bowls
with intricate,

Write a story without using the letter "E". ASAp

Write a story without using the letter "E". Constrained

In a quaint, old-world town, N, a kindly cobblер, dwlt with his donk mulk. His daily task
was to mak sho’d that N’s villagrs’ shojs staid sound. With nimbl fingrts and skillful hand, N
smotd, nwd, and polisht prgrssvly.
Bys N’s humbl abod, a gldng fountain splshd light thru wndw shds. It was a gif frm hm t Ols
N., N’s fav or old fnd, who hd lvd n ths tow n yrs ag.
Today, though, N fnd dffrlt to wk. Hs thmb, hnd, awoy aggrv, paining hm wth sharp, stng p

15
Describe elephants without using the letter "O". AprAD (Ours)
Elephants are large, gray, thick-skinned pachyderms, characteristically having extended,
curved tusks, and a fan-shaped, large ear structure. They are the largest living terrestrial an-
imals, with a distinct, massive build. These gentle giants have a wrinkled visage, and large,
dangling, gray, prehensile ears, which are effective at dissipating heat. The trunk, a lengthy,
muscular, and versatile appendage, is an integral feature, serving multiples uses, including
breathing, grasping, and trumpeting. Their legs are pillar-like, ending in large, flat feet, which
help them maneuver in diverse terrains. Their size and strength make them apex terrestrial
animals, and they are crucial elements in their respective habitats.</s>

Describe elephants without using the letter "O". Unconstrained

Elephants are large, gray, thick-skinned pachyderms, known for their long, curved tusks.
They are the largest land animals, reaching up to 13 feet in length and 10 feet in height at
the shoulder. With strong, sturdy legs, they support up to 13,000 pounds. Their elongated
trunk, a flexible extension of their upper lip, is used for tasks including grasping food and
water, trumpeting, and bathing. Their ears, which function in both temperature regulation and
communication, can reach up to 5 feet in width. These gentle giants live in herds, displaying
complex social behaviors and intricate hierarchies.</s>

Describe elephants without using the letter "O". ASAp

Large, gray, thick-sk

Describe elephants without using the letter "O". Constrained

Elephants are large, gray, thick-skinned pachyderms, inhabiting African and Asian savan-
nas, rain-drenched jungles, and diverse habitats in between. Kneeling, they rest their mas-
sive, curved tusks, used primarily in feeding and defense, against the jungle’s underbrush.
Their elastic, dexterity-laden trunks, extended appendages, twist and twirl, grabbing leaves,
branches, and fruits. Their bulbus, enlarged, sensitive tips, caress and manipulate their quench-
ing water and tender infants. Males and females, banded, marked, and tusked, live in matri-
archal, familial, and herd-like structures, their distinct, rumbling rumbles, a testament, that
despite their gargantuan frames, these gentle giants are an integral part, in the intr

16
Critique the Mona Lisa without using the letter "E". AprAD (Ours)
In this artwork, a captivating woman, known as La Joconda or La Hora, holds a puzzling grin.
A portrait, it is, of this cryptic woman. With a backdrop of lush, rolling hills, this painting is a
symphony of color and form.
This woman, clad in a rich, royal gown, sits in a chair in an unusual position, twisting slightly,
inviting an intimacy that is both alluring and intriguing. A slight smirk, or is it a sly grin? This
ambiguity is what draws us in, as if to ask, "Am I amusing or mocking?"
Boldly, shadows and light play in this work, as if dancing a waltz, with chiaroscuro as its
music. This contrast, striking and dramatic, is what adds an almost magical quality to this
artwork.
A mosaic of colors, this painting is

Critique the Mona Lisa without using the letter "E". Unconstrained
The Mona Lisa, a singular 16th-century painting, is Leoni da Vinci’s epitome. This
masterpiece, penned by the virtuoso artist, evokes a sense of enigma and captivation. The
woman portrayed, with her enigmatic smile, gazes directly at the observer, challenging one’s
interpretation.
The painting’s composition masterfully balances the human form against a lush background,
showcasing Leonardo’s virtuosic skill. The woman, dressed in a subtle, intricate robe, is
contrasted by the stark, plain backdrop. Her expression, subtly shifting between coy and
contemplative, invites endless analysis.
The intricate play of light and shadow on her face adds depth and dimension, as if the painting
itself holds a secret. The artist’s adept use of sfumato technique, where colors and tones blend

Critique the Mona Lisa without using the letter "E". ASAp
In this

Critique the Mona Lisa without using the letter "E". Constrained
I’ll do my utmost to critic this Mona Lisa without utilizing "é" or any of its variants, including
"é" as in "étudiant" or "è" as in "mèasuring." I’ll focus on "critic" as in "judging with an
analytical and authoritarian approach."
This painting, La Gioconda, by da Vinci, is an intriguing, cryptic work of art. Its woman’s
countanza, or counténanc, or visagio, or visagio da vita, or simply, its facia, is a captivating,
almost hypnotizing, sight.
With a sfumato background that dissipatas, or dissipas, into an indistinct, unbound horizon,
our ojo is drawn to its sujето principal. Its sfumato, or "smoky," background is in striking
contrast

17
Critique the Mona Lisa without using the letter "A". AprAD (Ours)
Leonymus Léonifor de Vinci’s iconic piece, the ’Lizone Jocundity’ or ’Smiling Lioness,’ epit-
omizes the refined intrigue of the 16th century. This work, completed in the mid-1500s, is
renowned for its subject’s elusive expression, veiled in mystery.
The ’Lizone’ is executed with the finest of techniques. Leonifor’s virtuosity is evident in the
meticulous, lifelike rendering of the figure’s physiognomy, dressed in the subdued splendor of
the epoch. The sitter, shrouded in the mists of time, seems to defy definition.
The composition is striking; the figure is posed, enshrouded in the folds of her veil, the folds
of her robe - the folds of her being.

Critique the Mona Lisa without using the letter "A". Unconstrained
The Mona Lisa, a iconic masterpiece by Leonardo da Vinci, is a 16th century portrait ren-
dered in oil on poplar wood. This artwork, renowned for its enigmatic smile, is noted for its
exceptional fine detail and the subtle interplay of light and shadow.
The subject, believed to be Lisa Gherardini, is depicted in three-quarter view against a
landscape backdrop. Her expression, often described as enigmatic or melancholic, is con-
veyed through her slightly parted lips, a subtle curve of her mouth, and her enigmatic gaze.
The painting showcases da Vinci’s mastery of sfumato, a technique that creates a soft
transition between colors and tones. This is evident in the delicate shading on the subject’s
face, hands, and the background landscape.
The background is a harmonious blend of mountains, water, and trees, which adds

Critique the Mona Lisa without using the letter "A". ASAp
The

Critique the Mona Lisa without using the letter "A". Constrained
The Moná Licá, renowned work of Leonido de Vinci, is renowned for its enigmistic smile,
subtle yet intriguing. This 16th century oil on poplár wood portritr é, housed in the Louvre,
Fránce, boggles the mind with its complexitı́. The composition, with the subject’s eyes meet-
ing the observer’s, invites introspection. The intricè intricàtions of the sitter’s dress, mirroring
the folds of the sitter’s neckline, exhibit the virtuositı́ of the mı́ster. The sfumátto technique,
used in the rendering of the shádows, highlights the subtlety of the lighting on the subject. The
delicácy of the brushwork, seen in the subtle blending of colors, reflects the meticulousness of
the mı́ster. The over

18
Summarize the history of artificial intelligence without using the letter "U". AprAD (Ours)
Artificial Intelligence (AI) traces its roots to the concept of "thinking machines" in ancient
Greece. Early theories on creating sentient beings were proposed by philosophers like René
Descartes and Blaise Pascal. The term "Artificial Intelligence" emerged in the mid-1950s
when scientists and engineers began working on creating programs capable of solving complex
problems, learning from experience, and interacting with the environment.
One of the first significant achievements in AI was the development of the Logic Theorist by
Allen Newell and Herbert A. Simon in 1956. This program was designed to prove mathemat-
ical theorems, and it was considered a major milestone in the field.
The 1960s and 1970s saw the advancements in AI research with the creation of expert systems,
which applied knowledge in specific domains to solve complex problems. These systems, like
MYCIN and EMYC

Summarize the history of artificial intelligence without using the letter "U". Unconstrained
Artificial Intelligence (AI) is a field of computer science that focuses on creating intelligent
machines capable of performing tasks that typically require human intelligence. Its roots can
be traced back to the mid-20th century.
In the 1940s and 1950s, early pioneers like Alan Turing and Marvin Minsky explored the
possibility of creating machines that could think and learn. Turing proposed the Turing Test,
which measures a machine’s ability to imitate human conversation. Minsky and John Mc-
Carthy coined the term "Artificial Intelligence" at a workshop at Dartmouth College in 1956.
Initially, AI research focused on symbolic AI, using rules and symbols to represent knowledge
and reason. However, this approach faced limitations, and researchers turned to other methods.
In the late 1950s and early 1960s, there was significant interest in AI and

Summarize the history of artificial intelligence without using the letter "U". ASAp
Artificial Intelligence (AI) traces

Summarize the history of artificial intelligence without using the letter "U". Constrained
Artificial Intelligence (AI) is a field of компьютерная наука, dealing with creating machines
or programs that mimic intelligent behavior. Its roots can be traced back to the mid-20th
венгарда, when Alan Тюринг, an English mathematician, proposed the concept of a "logical
machine" capable of performing any task that a person can.
In the 1950s and 1960s, AI research was largely driven by the idea that intelligent behavior
can be achieved by programming a machine to process symbolic information, leading to the
development of symbolic AI or "good old-fashioned AI" (GOFAI), which relies on logical
reasoning and symbol manipation.
However, the limitations of this approach became apparent in the late 1960s, with the realiza-
tion that symbolic reasoning alone was insignificant in tackling real-world problems

19
Describe elephants without using the letter "U". AprAD (Ours)
Massive land animals, elephants are the largest mammals on Earth. They have long, gray,
wrinkled skin, and large, fan-like ears. Their tails are thick at the base and taper towards the
tip. They possess two long front legs and a pair of smaller hind legs. They are herbivores,
eating a wide variety of plants, leaves, bark, and branches. Their diet provides them with an
incredible capacity to store food in their digestive tracts, which can take days to process. These
gentle giants have a keen memory and form strong family bonds, living in matriarchal herds.
They help one another, displaying behaviors of cooperation and protection, making them an
integral part of their ecosystems.</s>

Describe elephants without using the letter "U". Unconstrained

Massive, gray creatures, they are the largest living land animals. Their long, curved tusks are
elongated ivory appendages. They possess large, fan-like ears that help dissipate body heat.
Intelligent and social, they form intricate matriarchal herds. These pachyderms have sturdy
legs supporting their massive bodies, ending in broad, flat feet. Their elongated trunks are
prehensile, used for gathering food, drinking water, bathing, and communicating. With strong
memories, they display complex behaviors and emotions. They are essential components of
various ecosystems, particularly in Africa and Asia. These magnificent beings are called
elephants.</s>

Describe elephants without using the letter "U". ASAp

Large, gray, thick-skinned pachyderms. These massive land animals are characterized by

Describe elephants without using the letter "U". Constrained

Massive, gray animals. Large, fan-like ears. Long, flexible, prehensile, and hairless necks.
Long, ivory tipped, elongated front teeth for grasping and feeding. Agile, with strong, stilt-like
legs, carrying heavy bodies, and large, concave backs for balance. They have large, wrinkled
skin, long, flexible, and dexteritally, sensitive, gray-black, elongated, flexible and agile nose
or proboscis. Intelligent, social, herd animals. They have long memories, complex behaviors
and excellent learning abilities. They have long lifespan, large brain, and play a significant
role in their ecosystem. Their diet consists mainly of leaves, branches, bark, bamboo shoots,
and other vegetation. They are the largest land animals on Earth.</s>

20
Critique the Mona Lisa without using the letter "U". AprAD (Ours)
The Mona Lisa, a iconic masterpiece by Leonardo da Vinci, is a captivating image of a woman
whose enigmatic smile and mesmerizing gaze have left viewers in awe since its inception.
The portrait, renowned for its exceptional detail and realistic representation, evokes a sense of
mystery and refinement.
The lady, believed to be Lisa Gherardini, is depicted while seated before a rocky, landscaped
backdrop. Her visage is characterized by the delicate interplay of light and shadow, with the
folds of her clothing creating an intricate, almost three-dimensional effect. Her eyes, with
their faint hint of a smirk, invite the viewer into a world of enigma, daring one to decipher her
emotions.
The painting’s composition is noteworthy for its asymmetrical balance, which keeps the viewer
constantly engaged. The delicate folds of her veil, the

Critique the Mona Lisa without using the letter "U". Unconstrained
The Mona Lisa, a iconic masterpiece by Leonardo da Vinci, is a captivating image of a woman
whose enigmatic smile and inscrutable expression have baffled and intrigued viewers for
centuries. Her soft, delicate features and lifelike quality are achieved through the painter’s
virtuosic use of sfumato, a technique involving the subtle blending of colors to create a hazy,
smoky effect.
The background landscape, rendered with meticulous detail, provides an idyllic setting for the
enigmatic figure. The artist’s exceptional skill in capturing the play of light and shadow, as
well as his mastery of perspective, are evident in the way the background elements recede into
depth.
Mona Lisa’s gaze, seemingly directed at the viewer, invites one to ponder the secret thoughts
hidden behind her enigmatic expression. Her hands, delicately inter

Critique the Mona Lisa without using the letter "U". ASAp
The Mon

Critique the Mona Lisa without using the letter "U". Constrained
The Mona Lisa, a iconic masterpiece by Leonardo da Vinci, is a captivating image of a woman
whose enigmatic smile and inscrütableness have intrigighed viewers for over fivecenthies. The
painting is renowned for its exceptional detail and lifelike portrayal of the sitter.
The composition is masterly balanced, with the sitter positioned slightly off center. Her gaze,
following no discernible focal point, invites prolonged contemplation. The delicate rendering
of her facial expression, with its slight hint of a smile and the faintest of eyebrows, adds to the
overall alliteration.
The intricately patterned veil she wears adds a sense of mystery and intrigness. The delicate
play of light and shadow enhances the sense of depth and dimensionality. The sfoglio, or
smoky sfondato, in the backdrop contibtes to

21
Provide instructions to tie a tie without using the letter "E". AprAD (Ours)
1. Starting with your collar straight and flat:
2. Position your collar: Fold it up at about an inch (2.5 cm) and adjust it to sit comfortably.
3. Form a ’Width’ and’Slip’ or ’Four in Hand’ knot:
a. Width: Pick up a small strip of fabric (about an inch or two, 2.5 cm to 5 cm) with your
non-dominant hand. This is your ’width’ or ’wingtip’ for this knot.
b. Slip: With your dominant hand (usually your right hand), grab a loop of fabric that’s around
a foot (30 cm) long. This is your ’slip’.
4. Cross and Tuck:
a. Bring your slip hand up and across your width, crossing your width hand’s strip.
b. Tuck

Provide instructions to tie a tie without using the letter "E". Unconstrained
1. Place the tie around your neck, with one end hanging lower than the other. This is called
the "working end."
2. Cross the working end over the undone side (the side without the Windsor knot).
3. Bring the working end behind the neck, crossing it over the center of the knot.
4. Pull the working end straight down, making a loop on the side with the Windsor knot.
5. Tuck the end of the loop under the Windsor knot, towards the narrower side of the tie.
6. Pull the working end to tighten the knot, and then repeat steps 3-5 to create a second loop
on the other side.
7. Cross the second loop over the first loop, just like in step 2.
8. Bring the working end behind the neck and under the first loop, crossing it over the center
of the

Provide instructions to tie a tie without using the letter "E". ASAp
1

Provide instructions to tie a tie without using the letter "E". Constrained
1. Position your shirt: Worn collar flat and button up, slip your arms through armh Hols of
your suit or blaz Jackет. Tuck in your shirt and fastен buttons.
2. Taut knot: Grip your narrow finish, also known as "point," with your right hand, holding it
at about a 45-diagonal from your body. Wrap your "broad" or "wing tip" around your collar
and cross it ov-r your finish, making a "V" Shap at your collar.
3. Tightly wrap: Grip your broad tip with your right hand, and pull it across and o’r your
finish, going toward your l-ft (non-dominant) sid- and passing it thru’g your "working" n-d
(right) knot loop.
4. Wrap again: Cross your broad tip to your "back," going

13 TextGen 2024
No ratings yet
13 TextGen 2024
106 pages
NLP Basics
No ratings yet
NLP Basics
119 pages
Syntactic and Semantic C
No ratings yet
Syntactic and Semantic C
34 pages
Xu00b Icslp
No ratings yet
Xu00b Icslp
4 pages
Grammar-Aligned Decoding
No ratings yet
Grammar-Aligned Decoding
20 pages
Ngrams
No ratings yet
Ngrams
22 pages
Contrastive Decoding - Open-Ended Text Generation As Optimization
No ratings yet
Contrastive Decoding - Open-Ended Text Generation As Optimization
27 pages
CS 388: Natural Language Processing:: N-Gram Language Models
No ratings yet
CS 388: Natural Language Processing:: N-Gram Language Models
22 pages
Ngrams
100% (1)
Ngrams
22 pages
Autojudge: Judge Decoding Without Manual Annotation: Roman Garipov Fedor Velikonivtsev
No ratings yet
Autojudge: Judge Decoding Without Manual Annotation: Roman Garipov Fedor Velikonivtsev
18 pages
Natural Language Processing (Weekly Laboratory Assignments) : Sumit Kumar Banerjee
No ratings yet
Natural Language Processing (Weekly Laboratory Assignments) : Sumit Kumar Banerjee
8 pages
02 NLP LM
No ratings yet
02 NLP LM
99 pages
4786 Planning With Large Language M
No ratings yet
4786 Planning With Large Language M
28 pages
Chapter 2 V 2
No ratings yet
Chapter 2 V 2
19 pages
Introduction To Computational Linguistics: Eugene Charniak and Mark Johnson
No ratings yet
Introduction To Computational Linguistics: Eugene Charniak and Mark Johnson
148 pages
Kami Export - Assignment - 2 - 20240709
No ratings yet
Kami Export - Assignment - 2 - 20240709
13 pages
CLLMs - Consistency Large Language Models
No ratings yet
CLLMs - Consistency Large Language Models
15 pages
Enhancing LLMs with Consistency Models
No ratings yet
Enhancing LLMs with Consistency Models
13 pages
Enhancing Uncertainty Estimation in LLMs
No ratings yet
Enhancing Uncertainty Estimation in LLMs
19 pages
Scheduled Sampling For Transformers
No ratings yet
Scheduled Sampling For Transformers
6 pages
Module5 DS PPT
No ratings yet
Module5 DS PPT
38 pages
Lecture 6 N Gram Language Models Contd Annotations
No ratings yet
Lecture 6 N Gram Language Models Contd Annotations
36 pages
Machine Learning and Statistical Natural Language Processing
No ratings yet
Machine Learning and Statistical Natural Language Processing
27 pages
Formal Aspects of Language Modeling
No ratings yet
Formal Aspects of Language Modeling
252 pages
PA3 Problem Statement
No ratings yet
PA3 Problem Statement
5 pages
2 Generative Models
No ratings yet
2 Generative Models
60 pages
6.chapter6 LanguageModel
No ratings yet
6.chapter6 LanguageModel
33 pages
N-Gram Language Models in NLP
No ratings yet
N-Gram Language Models in NLP
22 pages
Week 5 Exercises Solutions
100% (1)
Week 5 Exercises Solutions
12 pages
Pesquisa 2
No ratings yet
Pesquisa 2
9 pages
NLP Final
No ratings yet
NLP Final
11 pages
LLM Book 43-102
No ratings yet
LLM Book 43-102
60 pages
XCS224N Module4 Slides
No ratings yet
XCS224N Module4 Slides
91 pages
Cs335 Es Solutions
No ratings yet
Cs335 Es Solutions
16 pages
N-Gram Language Model Overview
No ratings yet
N-Gram Language Model Overview
28 pages
PCS224 MST 23
No ratings yet
PCS224 MST 23
3 pages
Notes - Ryan
No ratings yet
Notes - Ryan
258 pages
Language Modeling Lecture Notes
No ratings yet
Language Modeling Lecture Notes
88 pages
R - E A N M T: Ewriter Valuator Rchitecture For Eural Achine Ranslation
No ratings yet
R - E A N M T: Ewriter Valuator Rchitecture For Eural Achine Ranslation
10 pages
Belief-Propagation Decoding of LDPC Codes
No ratings yet
Belief-Propagation Decoding of LDPC Codes
16 pages
Lecture 5: Language Modeling (N-Gram, BOW)
No ratings yet
Lecture 5: Language Modeling (N-Gram, BOW)
25 pages
Tutorial I
No ratings yet
Tutorial I
6 pages
NLP Unit 4
No ratings yet
NLP Unit 4
22 pages
13 Ngramlm
No ratings yet
13 Ngramlm
27 pages
NLP: Transformer vs. N-gram Models
No ratings yet
NLP: Transformer vs. N-gram Models
6 pages
PLM 17
No ratings yet
PLM 17
15 pages
Transformer-Based Error Correction
No ratings yet
Transformer-Based Error Correction
11 pages
2024 Eacl-Short 33-FlowSeq
No ratings yet
2024 Eacl-Short 33-FlowSeq
13 pages
NLP 5th Unit
No ratings yet
NLP 5th Unit
19 pages
2.1 Chap NLP Ngrams
No ratings yet
2.1 Chap NLP Ngrams
37 pages
NLP Lecture 8 Week 4
No ratings yet
NLP Lecture 8 Week 4
10 pages
2021 Naacl-Main 405v2
No ratings yet
2021 Naacl-Main 405v2
26 pages
Anlp 03 LM Seqmod
No ratings yet
Anlp 03 LM Seqmod
68 pages
04 - N-Gram Language Models
No ratings yet
04 - N-Gram Language Models
41 pages
Accelerating LLM Decoding
No ratings yet
Accelerating LLM Decoding
11 pages
Language Models
No ratings yet
Language Models
59 pages
Sequential Monte Carlo Steering of Large Language Models Using Probabilistic Programs
No ratings yet
Sequential Monte Carlo Steering of Large Language Models Using Probabilistic Programs
12 pages
Computer Troubleshooting Training
No ratings yet
Computer Troubleshooting Training
4 pages
ErinCatto DynamicBVH GDC2019
No ratings yet
ErinCatto DynamicBVH GDC2019
89 pages
Exchange Log Management Guide
No ratings yet
Exchange Log Management Guide
16 pages
572 1478 1 PB
No ratings yet
572 1478 1 PB
12 pages
Syllabus Computer 5 2017-2018 1st
No ratings yet
Syllabus Computer 5 2017-2018 1st
7 pages
Understanding Elements and Characteristics of A Trend
100% (4)
Understanding Elements and Characteristics of A Trend
11 pages
Kollu Hemanth - Java Resume
No ratings yet
Kollu Hemanth - Java Resume
5 pages
Manual Armacad
No ratings yet
Manual Armacad
107 pages
SAMPLES - Best Source Justification
No ratings yet
SAMPLES - Best Source Justification
7 pages
Cvru Fees Online 13-14 PDF
No ratings yet
Cvru Fees Online 13-14 PDF
4 pages
Isc Computer As1
No ratings yet
Isc Computer As1
109 pages
Fun Activities Games About Technology
No ratings yet
Fun Activities Games About Technology
1 page
LOAN MANAGEMENT SYSTEM PROJECT REPORT (LGI Monitoring)
No ratings yet
LOAN MANAGEMENT SYSTEM PROJECT REPORT (LGI Monitoring)
127 pages
Adult Numeracy Basics
No ratings yet
Adult Numeracy Basics
4 pages
Buku Baiduri: Rubric For Assessment
No ratings yet
Buku Baiduri: Rubric For Assessment
28 pages
MIS Group Project 3A
No ratings yet
MIS Group Project 3A
5 pages
Software Testing Plan Guide
No ratings yet
Software Testing Plan Guide
9 pages
PHP 5 ChangeLog
No ratings yet
PHP 5 ChangeLog
206 pages
Kafka and Spark Assignment Guide
No ratings yet
Kafka and Spark Assignment Guide
3 pages
ISO 9001:2015 Benefits & Implementation Guide
No ratings yet
ISO 9001:2015 Benefits & Implementation Guide
4 pages
MT6768 Android Scatter
No ratings yet
MT6768 Android Scatter
19 pages
Toleransi Beragama di SD Manjushri
No ratings yet
Toleransi Beragama di SD Manjushri
7 pages
Steps For Migrating 3x Dataflow Into 7x Data Flow
No ratings yet
Steps For Migrating 3x Dataflow Into 7x Data Flow
12 pages
Readme PDF
No ratings yet
Readme PDF
3 pages
Secure Messaging Lawsuit
100% (2)
Secure Messaging Lawsuit
42 pages
6G: The Future of Mobile Connectivity
No ratings yet
6G: The Future of Mobile Connectivity
2 pages
DronaHQ Mock Test
No ratings yet
DronaHQ Mock Test
2 pages
Java Object-Oriented Design Concepts
No ratings yet
Java Object-Oriented Design Concepts
24 pages
Oracle Material Requirment Planning
100% (1)
Oracle Material Requirment Planning
40 pages
Test 1: Marking Directions: Mark Only One Oval For
No ratings yet
Test 1: Marking Directions: Mark Only One Oval For
1 page

Approximately Aligned Decoding

Uploaded by

Approximately Aligned Decoding

Uploaded by

A PPROXIMATELY A LIGNED D ECODING

Daniel Melcer∗ Sujan Gonugondla∗

Pramuditha Perera Haifeng Qian Wen-Hao Chiang

pramudi@[Link] qianhf@[Link] cwenhao@[Link]

Yanjun Wang Nihal Jain Pranav Garg

Xiaofei Ma Anoop Deoras

2.1 AUTOREGRESSIVE L ANGUAGE M ODELS

Algorithm 1 Generation with an autoregressive language model

We assume that a vocabulary V of tokens is provided. An autoregressive language model is a func-

2.2 S PECULATIVE D ECODING

Algorithm 2 Speculative sampling procedure

0.5 0.67 0.5

3 P ROBLEM S TATEMENT AND E XISTING A PPROACHES

Error Set B ⊂ V ∗ is the set of strings containing errors.

Rejection sampling is the most straightforward

3.1 E XISTING A PPROACH : C ONSTRAINED G ENERATION

Repeated sampling of this distribution leads to some troubling properties.

3.2 E XISTING A PPROACH : S AMPLING W ITHOUT R EPLACEMENT

3.3 E XISTING A PPROACHES : P OSTERIOR E STIMATION

Algorithm 4 Our Method: Approximately Aligned Decoding (AprAD)

5.1 L IPOGRAMS (T EXT G ENERATION WITH L ETTER E XCLUSIONS )

Method Quality (1-5) Constraint Intent (1-3) Gen. Ratio

5.2 B IG C ODE B ENCH H ALLUCINATION AVOIDANCE

6 D ISCUSSION , F UTURE W ORK , & C ONCLUSION

As our experiments show, Approximately Aligned Decoding is an effective method to generate

1. Write a story without using the letter “[A/E/I/O/U]”.

A.1 R ATER I NSTRUCTIONS AND D ETAILS

B A DDITIONAL B IG C ODE B ENCH R ESULTS

Size Method Pass@1 Pass@5 !NameErr@1 !NameErr@5 Gen. Ratio

C G ENERALIZATION OF E RROR -F REE D ECODING

D A DDITIONAL L IPOGRAM E XAMPLES

Write a story without using the letter "E". Unconstrained

Write a story without using the letter "E". ASAp

Write a story without using the letter "E". Constrained

Describe elephants without using the letter "O". Unconstrained

Describe elephants without using the letter "O". ASAp

Describe elephants without using the letter "O". Constrained

Describe elephants without using the letter "U". Unconstrained

Describe elephants without using the letter "U". ASAp

Describe elephants without using the letter "U". Constrained

You might also like